Commit Graph

523 Commits

Author SHA1 Message Date
jpekkila 78fbcc090d Reordered src/core to have better division to host and device code (this is more likely to work when compiling with mpicxx). Disabled separate compilation of CUDA kernels as this complicates compilation and is a source of many cmake/cuda bugs. As a downside, GPU code takes longer to compile. 2020-01-23 20:06:20 +02:00
jpekkila 96389e9da6 Modified standalone includes to function with new astaroth headers 2020-01-23 20:03:25 +02:00
jpekkila 3adb0242a4 src/utils is now a real library. Includable with the astaroth_utils.h header and linkable with libastaroth_utils.a. The purpose of Astaroth Utils is to function as a generic utility library in contrast to Astaroth Standalone which is essentially hardcoded only for MHD. 2020-01-23 20:02:38 +02:00
jpekkila 5de163e8d1 Added commented out pragma unrolls to remind how packing could be improved. Though at the moment unrolls actually make the performance much worse, reasons unknown. 2020-01-22 19:27:45 +02:00
jpekkila 41f8e9aebb Removed an old inefficient function for MPI comm 2020-01-22 19:26:33 +02:00
jpekkila caacf2b33c Removed --restrict flag from CUDA compilation for safety 2020-01-22 19:25:26 +02:00
jpekkila 354cf81777 MPI_Request was saved to address pointing to local memory, fixed 2020-01-20 19:15:20 +02:00
jpekkila 54d91e7eeb Removed debug synchronization from packing.cu 2020-01-20 18:58:06 +02:00
jpekkila 993bfc4533 Better concurrency and some simplifications (MPI). 2020-01-20 18:45:24 +02:00
jpekkila 765ce9a573 Some concurrency optimizations for 3D blocking 2020-01-20 17:08:23 +02:00
jpekkila 6d4f696e60 Initial implementation for parallel compute + communication 2020-01-20 16:21:19 +02:00
jpekkila 3625e9db5f Added timing to acDeviceRunMPITest() 2020-01-20 14:54:26 +02:00
jpekkila d034cadfac Updated copyright years 2020-01-17 15:34:10 +02:00
jpekkila 88a4a1718d More cleanup 2020-01-17 15:27:02 +02:00
jpekkila ff6a7155e5 Added a simplified and cleaned up 3D decomp MPI implementation. Tested to work at least up to 2x2x2 nodes. 2020-01-17 15:22:23 +02:00
jpekkila 975a15f7f4 Removed all MPI-related code in preparation of a rewrite of the MPI stuff 2020-01-17 14:22:11 +02:00
jpekkila 9264b7515a Working 3D decomp, unoptimized 2020-01-16 21:47:05 +02:00
jpekkila 29b38d3b89 MPI distribute and gather were incorrect, fixed. Now tested to work with 1,2, and 4 GPUs. 2020-01-16 19:12:32 +02:00
jpekkila d7f56eeb67 Boundary conditions for 3D decomposition with MPI now working on a single node. 2020-01-16 16:34:33 +02:00
jpekkila 50bf8b7148 MPI communication of corners via CPU OK 2020-01-16 15:17:57 +02:00
jpekkila c76c2afd5e Merge branch 'master' into 3d-decomposition-2020-01 2020-01-16 13:21:59 +02:00
jpekkila 23efcb413f Introduced a sample directory and moved all non-library-components from src to there 2020-01-15 16:24:38 +02:00
Miikka Vaisala 185b33980f Forcing function bug correction. 2020-01-14 13:58:11 +08:00
jpekkila 5e1500fe97 Happy new year! :) 2020-01-13 21:38:07 +02:00
jpekkila 92a6a1bdec Added more professional run flags to ./ac_run 2020-01-13 15:35:01 +02:00
jpekkila 794e4393c3 Added a new function for the legacy Astaroth layer: acGetNode(). This functions returns a Node, which can be used to access acNode layer functions 2020-01-13 11:33:15 +02:00
jpekkila 1d315732e0 Giving up on 3D decomposition with CUDA-aware MPI. The MPI implementation on Puhti seems to be painfully bugged, the device pointers are not tracked properly in some cases (f.ex. if there's an array of structures which contain CUDA pointers). Going to implement 3D decomp the traditional way for now (communicating via the CPU). It's easy to switch to CUDA-aware MPI once Mellanox/NVIDIA/CSC have fixed their software. 2020-01-07 21:06:27 +02:00
jpekkila 299ff5cb67 All fields are now packed to simplify communication 2020-01-07 21:01:22 +02:00
jpekkila 5d60791f13 Current 3D decomp method still too complicated. Starting again from scratch. 2020-01-07 14:40:51 +02:00
jpekkila eaee81bf06 Merge branch 'master' into 3d-decomposition-2020-01 2020-01-07 14:25:06 +02:00
jpekkila f0208c66a6 Now compiles also for P100 by default (was removed accidentally in earlier commits) 2020-01-07 10:29:44 +00:00
jpekkila 1dbcc469fc Allocations for packed data (MPI) 2020-01-05 18:57:14 +02:00
jpekkila bee930b151 Merge branch 'master' into 3d-decomposition-2020-01 2020-01-05 16:48:26 +02:00
jpekkila be7946c2af Added the multiplication operator for int3 structures 2020-01-05 16:47:28 +02:00
jpekkila 51b48a5a36 Some intermediate MPI changes 2020-01-05 16:46:40 +02:00
jpekkila d6c81c89fb This 3D blocking approach is getting too complicated, removed code and trying again 2019-12-28 16:38:10 +02:00
jpekkila e86b082c98 MPI transfer for the first corner with 3D blocking now complete. Disabled/enabled some error checking for development 2019-12-27 13:43:22 +02:00
jpekkila bd0cc3ee20 There was some kind of mismatch between CUDA and MPI (UCX) libraries when linking with cudart. Switching to provided by cmake fixed the issue. 2019-12-27 13:41:18 +02:00
jpekkila 6b5910f7df Added allocations for the packed buffers 2019-12-21 19:00:35 +02:00
jpekkila 57a1f3e30c Added a generic pack/unpack function 2019-12-21 16:20:40 +02:00
jpekkila e4f7214b3a benchmark.cc edited online with Bitbucket 2019-12-21 11:26:54 +00:00
jpekkila 3ecd47fe8b Merge branch 'master' into 3d-decomposition-2020-01 2019-12-21 13:22:45 +02:00
jpekkila 35b56029cf Build failed with single-precision, added the correct casts to modelsolver.c 2019-12-21 13:21:56 +02:00
jpekkila 4d873caf38 Changed utils CMakeList.txt to modern cmake style 2019-12-21 13:16:08 +02:00
jpekkila bad64f5307 Started the 3D decomposition branch. Four tasks: 1) Determine how to distribute the work given n processes 2) Distribute and gather the mesh to/from these processes 3) Create packing/unpacking functions and 4) Transfer packed data blocks between neighbors. Tasks 1 and 2 done with this commit. 2019-12-21 12:37:01 +02:00
jpekkila ecff5c3041 Added some final changes to benchmarking 2019-12-15 21:47:41 +02:00
jpekkila 8bd81db63c Added CPU parallelization to make CPU integration and boundconds faster 2019-12-14 15:45:42 +02:00
jpekkila ff35d78509 Rewrote the MPI benchmark-verification function 2019-12-14 15:26:19 +02:00
jpekkila f0e77181df Benchmark finetuning 2019-12-14 14:52:06 +02:00
jpekkila b8a997b0ab Added code for doing a proper verification run with MPI. Passes nicely with full MHD + upwinding when using the new utility stuff introduced in the previous commits. Note: forcing is not enabled in the utility library by default. 2019-12-14 07:37:59 +02:00