jpekkila
9c5011d275
Renamed t to terr to avoid naming conflicts
2020-03-30 17:41:09 +03:00
jpekkila
cc64968b9e
GPUDirect was off, re-enabled
2020-03-26 18:24:42 +02:00
jpekkila
28792770f2
Better overlap with computation and comm. when inner integration is launched first
2020-03-26 18:00:01 +02:00
jpekkila
4c82e3c563
Removed old debug error check
2020-03-26 17:59:29 +02:00
jpekkila
08f567619a
Removed old unused functions for MPi integration and comm
2020-03-26 15:04:57 +02:00
jpekkila
ed7cf3f540
Added a production-ready interface for doing multi-node runs with Astaroth with MPI
2020-03-26 15:02:37 +02:00
jpekkila
dad84b361f
Renamed Grid structure to GridDims structure to avoid confusion with MPI Grids used in device.cc
2020-03-26 15:01:33 +02:00
jpekkila
db120c129e
Modelsolver computes now any built-in parameters automatically instead of relying on the user to supply them (inv_dsx etc)
2020-03-26 14:59:07 +02:00
jpekkila
fbd4b9a385
Made the MPI flag global instead of just core
2020-03-26 14:57:22 +02:00
jpekkila
e1bec4459b
Removed an unused variable
2020-03-25 13:54:43 +02:00
jpekkila
e36ee7e2d6
AC_multigpu_offset tested to work on at least 2 nodes and 8 GPUs. Forcing should now work with MPI
2020-03-25 13:51:00 +02:00
jpekkila
672137f7f1
WIP further MPI optimizations
2020-03-24 19:02:58 +02:00
jpekkila
ef63813679
Explicit check that critical parameters like inv_dsx are properly initialized before calling integration
2020-03-24 17:01:24 +02:00
jpekkila
8c362b44f0
Added more warning in case some of the model solver parameters are not initialized
2020-03-24 16:56:30 +02:00
jpekkila
d520835c42
Added integration to MPI comm, now completes a full integration step. Works at least on 2 nodes
2020-03-24 16:55:38 +02:00
jpekkila
7b39a6bb1d
AC_multigpu_offset is now calculated with MPI. Should now work with forcing, but not tested
2020-02-03 15:45:23 +02:00
jpekkila
50af620a7b
More accurate timing when benchmarking MPI. Also made GPU-GPU communication the default. Current version of UCX is bugged, must export 'UCX_MEMTYPE_CACHE=n' to workaround memory errors when doing GPU-GPU comm
2020-02-03 15:27:36 +02:00
jpekkila
89f4d08b6c
Fixed a possible out-of-bounds access in error checking when NUM_*_PARAMS is 0
2020-01-28 18:43:03 +02:00
jpekkila
67f2fcc88d
Setting inv_dsx etc explicitly is no longer required as they are set to default values in acc/stdlib/stdderiv.h
2020-01-28 18:22:27 +02:00
jpekkila
0ccd4e3dbc
Major improvement: uniforms can now be set to default values. The syntax is the same as for setting any other values, f.ex. 'uniform Scalar a = 1; uniform Scalar b = 0.5 * a;'. Undefined uniforms are still allowed, but in this case the user should load a proper value into it during runtime. Default uniform values can be overwritten by calling any of the uniform loader funcions (like acDeviceLoadScalarUniform). Improved also error checking. Now there are explicit warnings if the user tries to load an invalid value into a device constant.
2020-01-28 18:17:31 +02:00
jpekkila
6dfe3ed4d6
Added out-of-the-box support for MPI (though not enabled by default). Previously the user had to pass mpicxx explicitly as the cmake compiler in order to compile MPI code, but this was bad practice and it's better to let cmake handle the include and compilation flags.
2020-01-28 15:59:20 +02:00
jpekkila
5444c84cff
Formatting
2020-01-27 18:24:46 +02:00
jpekkila
927d4d31a5
Enabled CXX 11 support for CUDA code (required)
2020-01-27 17:04:52 +02:00
jpekkila
e751ee991b
Math operators are now using consistent precision throughout the project
2020-01-27 17:04:14 +02:00
jpekkila
2bc3f9fedd
Including when compiling Core seems to be unnecessary since we already include earlier
2020-01-24 07:31:51 +02:00
jpekkila
f8cd571323
Now CMake and compilation flags are functionally equivalent with the current master branch, not taking into account the deprecated flags. Also various small improvements to building.
...
Deprecated flags:
* BUILD_DEBUG. This was redundant since CMake also has such flag. The build type can now be switched by passing -DCMAKE_BUILD_TYPE=<Release|Debug|RelWithDebugInfo|...> to cmake. See CMake documentation on CMAKE_BUILD_TYPE on all av
* BUILD_UTILS. The utility library is now always built along the core library. We can reintroduce this flag if needed when the library grows larger. Currently MPI functions depend on Utils and without the flag we don't have to worr
* BUILD_RT_VISUALIZATION. RT visualization has been dormant for a while and I'm not even sure if it works any more. Eventually the RT library should be generalized and moved to Utils at some point. Disabled the build flag for the t
2020-01-24 07:00:49 +02:00
jpekkila
a5b5e418d4
Moved all headers used throughout the library to src/common
2020-01-23 20:06:47 +02:00
jpekkila
78fbcc090d
Reordered src/core to have better division to host and device code (this is more likely to work when compiling with mpicxx). Disabled separate compilation of CUDA kernels as this complicates compilation and is a source of many cmake/cuda bugs. As a downside, GPU code takes longer to compile.
2020-01-23 20:06:20 +02:00
jpekkila
96389e9da6
Modified standalone includes to function with new astaroth headers
2020-01-23 20:03:25 +02:00
jpekkila
3adb0242a4
src/utils is now a real library. Includable with the astaroth_utils.h header and linkable with libastaroth_utils.a. The purpose of Astaroth Utils is to function as a generic utility library in contrast to Astaroth Standalone which is essentially hardcoded only for MHD.
2020-01-23 20:02:38 +02:00
jpekkila
5de163e8d1
Added commented out pragma unrolls to remind how packing could be improved. Though at the moment unrolls actually make the performance much worse, reasons unknown.
2020-01-22 19:27:45 +02:00
jpekkila
41f8e9aebb
Removed an old inefficient function for MPI comm
2020-01-22 19:26:33 +02:00
jpekkila
caacf2b33c
Removed --restrict flag from CUDA compilation for safety
2020-01-22 19:25:26 +02:00
jpekkila
354cf81777
MPI_Request was saved to address pointing to local memory, fixed
2020-01-20 19:15:20 +02:00
jpekkila
54d91e7eeb
Removed debug synchronization from packing.cu
2020-01-20 18:58:06 +02:00
jpekkila
993bfc4533
Better concurrency and some simplifications (MPI).
2020-01-20 18:45:24 +02:00
jpekkila
765ce9a573
Some concurrency optimizations for 3D blocking
2020-01-20 17:08:23 +02:00
jpekkila
6d4f696e60
Initial implementation for parallel compute + communication
2020-01-20 16:21:19 +02:00
jpekkila
3625e9db5f
Added timing to acDeviceRunMPITest()
2020-01-20 14:54:26 +02:00
jpekkila
d034cadfac
Updated copyright years
2020-01-17 15:34:10 +02:00
jpekkila
88a4a1718d
More cleanup
2020-01-17 15:27:02 +02:00
jpekkila
ff6a7155e5
Added a simplified and cleaned up 3D decomp MPI implementation. Tested to work at least up to 2x2x2 nodes.
2020-01-17 15:22:23 +02:00
jpekkila
975a15f7f4
Removed all MPI-related code in preparation of a rewrite of the MPI stuff
2020-01-17 14:22:11 +02:00
jpekkila
9264b7515a
Working 3D decomp, unoptimized
2020-01-16 21:47:05 +02:00
jpekkila
29b38d3b89
MPI distribute and gather were incorrect, fixed. Now tested to work with 1,2, and 4 GPUs.
2020-01-16 19:12:32 +02:00
jpekkila
d7f56eeb67
Boundary conditions for 3D decomposition with MPI now working on a single node.
2020-01-16 16:34:33 +02:00
jpekkila
50bf8b7148
MPI communication of corners via CPU OK
2020-01-16 15:17:57 +02:00
jpekkila
c76c2afd5e
Merge branch 'master' into 3d-decomposition-2020-01
2020-01-16 13:21:59 +02:00
jpekkila
23efcb413f
Introduced a sample directory and moved all non-library-components from src to there
2020-01-15 16:24:38 +02:00
Miikka Vaisala
185b33980f
Forcing function bug correction.
2020-01-14 13:58:11 +08:00