astaroth

Author	SHA1	Message	Date
jpekkila	7b39a6bb1d	AC_multigpu_offset is now calculated with MPI. Should now work with forcing, but not tested	2020-02-03 15:45:23 +02:00
jpekkila	50af620a7b	More accurate timing when benchmarking MPI. Also made GPU-GPU communication the default. Current version of UCX is bugged, must export 'UCX_MEMTYPE_CACHE=n' to workaround memory errors when doing GPU-GPU comm	2020-02-03 15:27:36 +02:00
jpekkila	89f4d08b6c	Fixed a possible out-of-bounds access in error checking when NUM_*_PARAMS is 0	2020-01-28 18:43:03 +02:00
jpekkila	67f2fcc88d	Setting inv_dsx etc explicitly is no longer required as they are set to default values in acc/stdlib/stdderiv.h	2020-01-28 18:22:27 +02:00
jpekkila	0ccd4e3dbc	Major improvement: uniforms can now be set to default values. The syntax is the same as for setting any other values, f.ex. 'uniform Scalar a = 1; uniform Scalar b = 0.5 * a;'. Undefined uniforms are still allowed, but in this case the user should load a proper value into it during runtime. Default uniform values can be overwritten by calling any of the uniform loader funcions (like acDeviceLoadScalarUniform). Improved also error checking. Now there are explicit warnings if the user tries to load an invalid value into a device constant.	2020-01-28 18:17:31 +02:00
jpekkila	6dfe3ed4d6	Added out-of-the-box support for MPI (though not enabled by default). Previously the user had to pass mpicxx explicitly as the cmake compiler in order to compile MPI code, but this was bad practice and it's better to let cmake handle the include and compilation flags.	2020-01-28 15:59:20 +02:00
jpekkila	5444c84cff	Formatting	2020-01-27 18:24:46 +02:00
jpekkila	927d4d31a5	Enabled CXX 11 support for CUDA code (required)	2020-01-27 17:04:52 +02:00
jpekkila	e751ee991b	Math operators are now using consistent precision throughout the project	2020-01-27 17:04:14 +02:00
jpekkila	2bc3f9fedd	Including when compiling Core seems to be unnecessary since we already include earlier	2020-01-24 07:31:51 +02:00
jpekkila	f8cd571323	Now CMake and compilation flags are functionally equivalent with the current master branch, not taking into account the deprecated flags. Also various small improvements to building. Deprecated flags: * BUILD_DEBUG. This was redundant since CMake also has such flag. The build type can now be switched by passing -DCMAKE_BUILD_TYPE=<Release\|Debug\|RelWithDebugInfo\|...> to cmake. See CMake documentation on CMAKE_BUILD_TYPE on all av * BUILD_UTILS. The utility library is now always built along the core library. We can reintroduce this flag if needed when the library grows larger. Currently MPI functions depend on Utils and without the flag we don't have to worr * BUILD_RT_VISUALIZATION. RT visualization has been dormant for a while and I'm not even sure if it works any more. Eventually the RT library should be generalized and moved to Utils at some point. Disabled the build flag for the t	2020-01-24 07:00:49 +02:00
jpekkila	a5b5e418d4	Moved all headers used throughout the library to src/common	2020-01-23 20:06:47 +02:00
jpekkila	78fbcc090d	Reordered src/core to have better division to host and device code (this is more likely to work when compiling with mpicxx). Disabled separate compilation of CUDA kernels as this complicates compilation and is a source of many cmake/cuda bugs. As a downside, GPU code takes longer to compile.	2020-01-23 20:06:20 +02:00
jpekkila	96389e9da6	Modified standalone includes to function with new astaroth headers	2020-01-23 20:03:25 +02:00
jpekkila	3adb0242a4	src/utils is now a real library. Includable with the astaroth_utils.h header and linkable with libastaroth_utils.a. The purpose of Astaroth Utils is to function as a generic utility library in contrast to Astaroth Standalone which is essentially hardcoded only for MHD.	2020-01-23 20:02:38 +02:00
jpekkila	5de163e8d1	Added commented out pragma unrolls to remind how packing could be improved. Though at the moment unrolls actually make the performance much worse, reasons unknown.	2020-01-22 19:27:45 +02:00
jpekkila	41f8e9aebb	Removed an old inefficient function for MPI comm	2020-01-22 19:26:33 +02:00
jpekkila	caacf2b33c	Removed --restrict flag from CUDA compilation for safety	2020-01-22 19:25:26 +02:00
jpekkila	354cf81777	MPI_Request was saved to address pointing to local memory, fixed	2020-01-20 19:15:20 +02:00
jpekkila	54d91e7eeb	Removed debug synchronization from packing.cu	2020-01-20 18:58:06 +02:00
jpekkila	993bfc4533	Better concurrency and some simplifications (MPI).	2020-01-20 18:45:24 +02:00
jpekkila	765ce9a573	Some concurrency optimizations for 3D blocking	2020-01-20 17:08:23 +02:00
jpekkila	6d4f696e60	Initial implementation for parallel compute + communication	2020-01-20 16:21:19 +02:00
jpekkila	3625e9db5f	Added timing to acDeviceRunMPITest()	2020-01-20 14:54:26 +02:00
jpekkila	d034cadfac	Updated copyright years	2020-01-17 15:34:10 +02:00
jpekkila	88a4a1718d	More cleanup	2020-01-17 15:27:02 +02:00
jpekkila	ff6a7155e5	Added a simplified and cleaned up 3D decomp MPI implementation. Tested to work at least up to 2x2x2 nodes.	2020-01-17 15:22:23 +02:00
jpekkila	975a15f7f4	Removed all MPI-related code in preparation of a rewrite of the MPI stuff	2020-01-17 14:22:11 +02:00
jpekkila	9264b7515a	Working 3D decomp, unoptimized	2020-01-16 21:47:05 +02:00
jpekkila	29b38d3b89	MPI distribute and gather were incorrect, fixed. Now tested to work with 1,2, and 4 GPUs.	2020-01-16 19:12:32 +02:00
jpekkila	d7f56eeb67	Boundary conditions for 3D decomposition with MPI now working on a single node.	2020-01-16 16:34:33 +02:00
jpekkila	50bf8b7148	MPI communication of corners via CPU OK	2020-01-16 15:17:57 +02:00
jpekkila	c76c2afd5e	Merge branch 'master' into 3d-decomposition-2020-01	2020-01-16 13:21:59 +02:00
jpekkila	23efcb413f	Introduced a sample directory and moved all non-library-components from src to there	2020-01-15 16:24:38 +02:00
Miikka Vaisala	185b33980f	Forcing function bug correction.	2020-01-14 13:58:11 +08:00
jpekkila	5e1500fe97	Happy new year! :)	2020-01-13 21:38:07 +02:00
jpekkila	92a6a1bdec	Added more professional run flags to ./ac_run	2020-01-13 15:35:01 +02:00
jpekkila	794e4393c3	Added a new function for the legacy Astaroth layer: acGetNode(). This functions returns a Node, which can be used to access acNode layer functions	2020-01-13 11:33:15 +02:00
jpekkila	1d315732e0	Giving up on 3D decomposition with CUDA-aware MPI. The MPI implementation on Puhti seems to be painfully bugged, the device pointers are not tracked properly in some cases (f.ex. if there's an array of structures which contain CUDA pointers). Going to implement 3D decomp the traditional way for now (communicating via the CPU). It's easy to switch to CUDA-aware MPI once Mellanox/NVIDIA/CSC have fixed their software.	2020-01-07 21:06:27 +02:00
jpekkila	299ff5cb67	All fields are now packed to simplify communication	2020-01-07 21:01:22 +02:00
jpekkila	5d60791f13	Current 3D decomp method still too complicated. Starting again from scratch.	2020-01-07 14:40:51 +02:00
jpekkila	eaee81bf06	Merge branch 'master' into 3d-decomposition-2020-01	2020-01-07 14:25:06 +02:00
jpekkila	f0208c66a6	Now compiles also for P100 by default (was removed accidentally in earlier commits)	2020-01-07 10:29:44 +00:00
jpekkila	1dbcc469fc	Allocations for packed data (MPI)	2020-01-05 18:57:14 +02:00
jpekkila	bee930b151	Merge branch 'master' into 3d-decomposition-2020-01	2020-01-05 16:48:26 +02:00
jpekkila	be7946c2af	Added the multiplication operator for int3 structures	2020-01-05 16:47:28 +02:00
jpekkila	51b48a5a36	Some intermediate MPI changes	2020-01-05 16:46:40 +02:00
jpekkila	d6c81c89fb	This 3D blocking approach is getting too complicated, removed code and trying again	2019-12-28 16:38:10 +02:00
jpekkila	e86b082c98	MPI transfer for the first corner with 3D blocking now complete. Disabled/enabled some error checking for development	2019-12-27 13:43:22 +02:00
jpekkila	bd0cc3ee20	There was some kind of mismatch between CUDA and MPI (UCX) libraries when linking with cudart. Switching to provided by cmake fixed the issue.	2019-12-27 13:41:18 +02:00

1 2 3 4 5 ...

435 Commits