astaroth

Author	SHA1	Message	Date
jpekkila	53695d66a3	Benchmarking now prints out also percentiles	2019-11-26 16:26:31 +02:00
jpekkila	0b0ccd697a	Added some explicit casts in get_neighbor (MPI) to fix warnings raised when compiling with older gcc	2019-11-20 10:18:10 +02:00
Johannes Pekkila	8894b7c7d6	Added a function for getting pid of a neighboring process when decomposing in 3D	2019-10-23 19:26:35 +02:00
Johannes Pekkila	474bdf185d	Cleaned up the MPI solution for 3D decomp test	2019-10-23 12:33:46 +02:00
Johannes Pekkila	1d81333ff7	More concurrent kernels and MPI comm	2019-10-23 12:07:23 +02:00
Johannes Pekkila	04867334e7	Full integration step with MPI comms	2019-10-22 19:59:15 +02:00
Johannes Pekkila	870cd91b5f	Added the final MPI solution for the benchmark tests: RDMA is now used and I don't think we can go much faster with the current decomposition scheme. To get better scaling, we probably would have to change 3D decomposition instead of using the current simple 1D decomp	2019-10-22 19:28:35 +02:00
jpekkila	3d7ad7c8f2	Code cleanup	2019-10-22 15:38:34 +03:00
jpekkila	64221c218d	Made some warnings go away	2019-10-22 15:03:55 +03:00
Johannes Pekkila	e4a7cdcf1d	Added functions for packing and unpacking data on the device	2019-10-22 13:48:47 +02:00
Johannes Pekkila	915e1c7c14	Trying to overlap MPI communication with computation of boundary conditions. However, NVIDIA seemed to forget one important detail in the documentation for CUDA-aware MPI: it looks like CUDA streams are not supported with CUDA-aware MPI communication. So in the end the fastest solution might be to use old-school gpu->cpu->cpu->gpu MPI communication after all	2019-10-21 15:50:53 +02:00
jpekkila	f120343110	Bugfix: peer access was not disabled when Node was destroyed, leading to cudaErrorPeerAccessAlreadyEnabled error when creating new Nodes	2019-10-21 16:23:24 +03:00
Johannes Pekkila	7b475b6dee	Better MPI synchronization	2019-10-18 11:50:22 +02:00
Johannes Pekkila	155d369888	MPI communication now 10x faster	2019-10-17 22:39:57 +02:00
jpekkila	26bbfa089d	Better multi-node communication: fire and forget.	2019-10-17 18:17:37 +03:00
jpekkila	3d852e5082	Added timing to the MPI benchmark	2019-10-17 17:43:54 +03:00
jpekkila	e0a631d81a	Added the hires timer to utils	2019-10-17 17:43:34 +03:00
jpekkila	588a94c772	Added more MPI stuff. Now multi-node GPU-GPU communication with GPUDirect RDMA should work. Also device memory is now allocated in unified memory by default as this makes MPI communication simpler if RDMA is not supported. This does not affect Astaroth any other way since different devices use different portions of the memory space and we continue managing memory transfers manually.	2019-10-17 16:09:05 +03:00
jpekkila	0e88d6c339	Marked some internal functions static	2019-10-17 14:41:44 +03:00
jpekkila	7390d53f79	Added missing extern Cs to verification.h	2019-10-17 14:41:13 +03:00
jpekkila	f1e988ba6a	Added stuff for the device layer for testing GPU-GPU MPI. This is a quick and dirty solution which is primarily meant for benchmarking/verification. Figuring out what the MPI interface should look like is more challenging and is not the priority right now	2019-10-17 14:40:53 +03:00
jpekkila	bb9e65a741	AC_DEFAULT_CONFIG now propagated to projects that link to astaroth utils	2019-10-17 13:05:17 +03:00
jpekkila	65a2d47ef7	Made grid.cu (multi-node) to compile without errors. Not used though.	2019-10-17 13:03:42 +03:00
jpekkila	ef94ab5b96	A small update to ctest	2019-10-17 13:02:41 +03:00
jpekkila	0865f0499b	Various improvements to the MPI-GPU implementation, but linking MPI libraries with both the host C-project and the core library seems to be a major pain. Currently the communication is done via gpu->cpu->cpu->gpu.	2019-10-15 19:32:16 +03:00
jpekkila	113be456d6	Undeprecated the wrong function in commit `b693c8a`	2019-10-15 18:11:07 +03:00
jpekkila	1ca089c163	New cmake option: MPI_ENABLED. Enables MPI functions on the device layer	2019-10-15 17:57:53 +03:00
jpekkila	0d02faa5f5	Working base for gathering, distributing and communicating halos with MPI	2019-10-15 17:39:26 +03:00
jpekkila	b11ef143eb	Moved a debug print further to reduce clutter	2019-10-15 17:38:29 +03:00
jpekkila	fd9dc7ca98	Added periodic boundconds to utils	2019-10-15 17:37:57 +03:00
jpekkila	ff1ad37047	Some small improvements to the utils library	2019-10-15 17:00:58 +03:00
jpekkila	46ad9da8c8	Pulled some stuff from the mpi branch	2019-10-15 17:00:44 +03:00
jpekkila	4ae9c74d9d	Added a function for randomizing vertex buffers (useful for testing)	2019-10-15 16:13:11 +03:00
jpekkila	37171689c8	Formatting	2019-10-15 16:12:44 +03:00
jpekkila	b693c8adb4	Undeprecated acDeviceLoadMesh and acDeviceStoreMesh, these are actually very nice to have	2019-10-15 16:12:31 +03:00
jpekkila	8d86ac6f9e	Started preparing the MPI version for benchmarks and added a solve-independent version of the verification functions to the utils library	2019-10-15 15:54:15 +03:00
jpekkila	08188f3f5b	is_valid is now consistently overloaded (parameter passed as a reference). Older CUDA compilers complained about this.	2019-10-14 21:18:21 +03:00
jpekkila	08f155cbec	Finetuning some error checks	2019-10-07 20:40:32 +03:00
jpekkila	5d4f47c3d2	Added overloads for vector in-place addition and subtraction	2019-10-07 19:40:54 +03:00
jpekkila	ba49e7e400	Replaced deprecated DCONST_INT calls with overloaded DCONST()	2019-10-07 19:40:27 +03:00
jpekkila	ee4ff730f6	Deprecated inv_dsx and friends from utils/config_loader.c since those are not defined in the case where the user does not include stdderiv.h	2019-10-07 17:01:21 +03:00
jpekkila	66cfcefb34	More error checks	2019-10-07 17:00:23 +03:00
jpekkila	0e1d1b9fb4	Some optimizations for DSL compilation. Also a new feature: Inplace addition and subtraction += and -= are now allowed	2019-10-07 16:33:24 +03:00
jpekkila	f7c079be2a	Removed everything unnecessary from integration.cuh. Now all derivatives etc are available in a standard library header (acc/stdlib/stdderiv.h)	2019-10-07 15:47:33 +03:00
Miikka Vaisala	f8e82d41af	Can now set the endtime for simulation, instead of step number.	2019-10-02 15:09:26 +08:00
Miikka Vaisala	79fe634a84	Tested and works. Now it is possible to continue the simulatiom with a specific file number. Next task: move I/O to src/utils/	2019-10-02 14:30:13 +08:00
Miikka Vaisala	0dbbcd22d5	Tested and works. We can now continue simulation from the chose snapshot number.	2019-10-02 14:09:47 +08:00
Miikka Vaisala	1b0e9803b0	Compiles and runs again. Now to actual testing. Can we read and cotinue from and old file?	2019-10-02 13:52:38 +08:00
Miikka Vaisala	54d89f7a46	In principle should read a specifield old run. Needs still testing and compilation.	2019-10-02 11:37:51 +08:00
Miikka Vaisala	d5b6f3b48e	Drafted read_mesh() to read existing binary data with at a specific step number.	2019-10-02 11:16:30 +08:00

1 2 3 4 5 ...

338 Commits