astaroth

Author	SHA1	Message	Date
jpekkila	01ad141d90	Added comments and a short overview of the MPI implementation	2020-05-28 17:05:12 +03:00
jpekkila	f1138b04ac	Cleaned up the MPI implementation, removed all older implementations (removed also MPI window implementation which might be handy in the future when CUDA-aware support is introduced). If the removed stuff is needed later, here are some keywords to help find this commit: MPI_window, sendrecv, bidirectional, unidirectional transfer, real-time pinning, a0s, b0s.	2020-05-28 16:42:50 +03:00
jpekkila	0d62f56e27	Tried an alternative approach to comm (was worse than the current solution) and rewrote the current best solution for (now easier to read)	2020-05-28 15:31:43 +03:00
jpekkila	f97005a75d	Added WIP version of the new bidirectional comm scheme	2020-05-27 19:09:32 +03:00
jpekkila	afe5b973ca	Added multiplication operator for int3	2020-05-27 19:08:39 +03:00
jpekkila	7e59ea0eff	MPI: corners are no longer communicated. Slight performance impact (14 ms vs 15 ms). Tests still pass with 8 GPUs.	2020-05-26 19:00:14 +03:00
jpekkila	c93b3265e6	Made comm streams high prio	2020-04-22 17:03:53 +03:00
jpekkila	22e01b7f1d	Rewrote partitioning code	2020-04-19 23:23:23 +03:00
jpekkila	4dd825f574	Proper decomposition when using Morton order to partition the computational domain	2020-04-19 22:50:26 +03:00
jpekkila	ffb274e16f	Linking dynamic CUDA library instead of static (less prone to breaking since Astaroth does not have to be rebuilt when CUDA is updated)	2020-04-19 22:33:01 +03:00
jpekkila	8c210b3292	3D decomposition is now done using Morton order instead of linear indexing	2020-04-19 22:31:57 +03:00
jpekkila	d6e74ee270	Added missing files	2020-04-09 19:24:55 +03:00
jpekkila	fb41741d74	Improvements to samples	2020-04-07 17:58:47 +03:00
jpekkila	427a3ac5d8	Rewrote the previous implementation, now fully works (verified) and gives the speedup we want. Communication latency is now completely hidden on at least two nodes (8 GPUs). Scaling looks very promising.	2020-04-06 17:28:02 +03:00
jpekkila	37f1c841a3	Added functions for pinning memory that is sent over the network. TODO pack to and from pinned memory selectively (currently P2P results are overwritten with data in pinned memory)	2020-04-06 14:09:12 +03:00
jpekkila	fe14ae4665	Added an alternative MPI implementation which uses one-sided communication	2020-04-02 17:59:53 +03:00
Johannes Pekkila	742dcc2697	Optimized MPI synchronization a bit	2020-03-31 12:36:25 +02:00
jpekkila	24e65ab02d	Set decompositions for some nprocs by hand	2020-03-30 18:13:50 +03:00
jpekkila	9c5011d275	Renamed t to terr to avoid naming conflicts	2020-03-30 17:41:09 +03:00
jpekkila	cc64968b9e	GPUDirect was off, re-enabled	2020-03-26 18:24:42 +02:00
jpekkila	28792770f2	Better overlap with computation and comm. when inner integration is launched first	2020-03-26 18:00:01 +02:00
jpekkila	4c82e3c563	Removed old debug error check	2020-03-26 17:59:29 +02:00
jpekkila	08f567619a	Removed old unused functions for MPi integration and comm	2020-03-26 15:04:57 +02:00
jpekkila	ed7cf3f540	Added a production-ready interface for doing multi-node runs with Astaroth with MPI	2020-03-26 15:02:37 +02:00
jpekkila	dad84b361f	Renamed Grid structure to GridDims structure to avoid confusion with MPI Grids used in device.cc	2020-03-26 15:01:33 +02:00
jpekkila	db120c129e	Modelsolver computes now any built-in parameters automatically instead of relying on the user to supply them (inv_dsx etc)	2020-03-26 14:59:07 +02:00
jpekkila	fbd4b9a385	Made the MPI flag global instead of just core	2020-03-26 14:57:22 +02:00
jpekkila	e1bec4459b	Removed an unused variable	2020-03-25 13:54:43 +02:00
jpekkila	e36ee7e2d6	AC_multigpu_offset tested to work on at least 2 nodes and 8 GPUs. Forcing should now work with MPI	2020-03-25 13:51:00 +02:00
jpekkila	672137f7f1	WIP further MPI optimizations	2020-03-24 19:02:58 +02:00
jpekkila	ef63813679	Explicit check that critical parameters like inv_dsx are properly initialized before calling integration	2020-03-24 17:01:24 +02:00
jpekkila	8c362b44f0	Added more warning in case some of the model solver parameters are not initialized	2020-03-24 16:56:30 +02:00
jpekkila	d520835c42	Added integration to MPI comm, now completes a full integration step. Works at least on 2 nodes	2020-03-24 16:55:38 +02:00
jpekkila	7b39a6bb1d	AC_multigpu_offset is now calculated with MPI. Should now work with forcing, but not tested	2020-02-03 15:45:23 +02:00
jpekkila	50af620a7b	More accurate timing when benchmarking MPI. Also made GPU-GPU communication the default. Current version of UCX is bugged, must export 'UCX_MEMTYPE_CACHE=n' to workaround memory errors when doing GPU-GPU comm	2020-02-03 15:27:36 +02:00
jpekkila	89f4d08b6c	Fixed a possible out-of-bounds access in error checking when NUM_*_PARAMS is 0	2020-01-28 18:43:03 +02:00
jpekkila	67f2fcc88d	Setting inv_dsx etc explicitly is no longer required as they are set to default values in acc/stdlib/stdderiv.h	2020-01-28 18:22:27 +02:00
jpekkila	0ccd4e3dbc	Major improvement: uniforms can now be set to default values. The syntax is the same as for setting any other values, f.ex. 'uniform Scalar a = 1; uniform Scalar b = 0.5 * a;'. Undefined uniforms are still allowed, but in this case the user should load a proper value into it during runtime. Default uniform values can be overwritten by calling any of the uniform loader funcions (like acDeviceLoadScalarUniform). Improved also error checking. Now there are explicit warnings if the user tries to load an invalid value into a device constant.	2020-01-28 18:17:31 +02:00
jpekkila	6dfe3ed4d6	Added out-of-the-box support for MPI (though not enabled by default). Previously the user had to pass mpicxx explicitly as the cmake compiler in order to compile MPI code, but this was bad practice and it's better to let cmake handle the include and compilation flags.	2020-01-28 15:59:20 +02:00
jpekkila	5444c84cff	Formatting	2020-01-27 18:24:46 +02:00
jpekkila	927d4d31a5	Enabled CXX 11 support for CUDA code (required)	2020-01-27 17:04:52 +02:00
jpekkila	e751ee991b	Math operators are now using consistent precision throughout the project	2020-01-27 17:04:14 +02:00
jpekkila	2bc3f9fedd	Including when compiling Core seems to be unnecessary since we already include earlier	2020-01-24 07:31:51 +02:00
jpekkila	f8cd571323	Now CMake and compilation flags are functionally equivalent with the current master branch, not taking into account the deprecated flags. Also various small improvements to building. Deprecated flags: * BUILD_DEBUG. This was redundant since CMake also has such flag. The build type can now be switched by passing -DCMAKE_BUILD_TYPE=<Release\|Debug\|RelWithDebugInfo\|...> to cmake. See CMake documentation on CMAKE_BUILD_TYPE on all av * BUILD_UTILS. The utility library is now always built along the core library. We can reintroduce this flag if needed when the library grows larger. Currently MPI functions depend on Utils and without the flag we don't have to worr * BUILD_RT_VISUALIZATION. RT visualization has been dormant for a while and I'm not even sure if it works any more. Eventually the RT library should be generalized and moved to Utils at some point. Disabled the build flag for the t	2020-01-24 07:00:49 +02:00
jpekkila	a5b5e418d4	Moved all headers used throughout the library to src/common	2020-01-23 20:06:47 +02:00
jpekkila	78fbcc090d	Reordered src/core to have better division to host and device code (this is more likely to work when compiling with mpicxx). Disabled separate compilation of CUDA kernels as this complicates compilation and is a source of many cmake/cuda bugs. As a downside, GPU code takes longer to compile.	2020-01-23 20:06:20 +02:00
jpekkila	96389e9da6	Modified standalone includes to function with new astaroth headers	2020-01-23 20:03:25 +02:00
jpekkila	3adb0242a4	src/utils is now a real library. Includable with the astaroth_utils.h header and linkable with libastaroth_utils.a. The purpose of Astaroth Utils is to function as a generic utility library in contrast to Astaroth Standalone which is essentially hardcoded only for MHD.	2020-01-23 20:02:38 +02:00
jpekkila	5de163e8d1	Added commented out pragma unrolls to remind how packing could be improved. Though at the moment unrolls actually make the performance much worse, reasons unknown.	2020-01-22 19:27:45 +02:00
jpekkila	41f8e9aebb	Removed an old inefficient function for MPI comm	2020-01-22 19:26:33 +02:00

1 2 3 4 5 ...

518 Commits