Commit Graph

420 Commits

Author SHA1 Message Date
jpekkila
5de163e8d1 Added commented out pragma unrolls to remind how packing could be improved. Though at the moment unrolls actually make the performance much worse, reasons unknown. 2020-01-22 19:27:45 +02:00
jpekkila
41f8e9aebb Removed an old inefficient function for MPI comm 2020-01-22 19:26:33 +02:00
jpekkila
caacf2b33c Removed --restrict flag from CUDA compilation for safety 2020-01-22 19:25:26 +02:00
jpekkila
354cf81777 MPI_Request was saved to address pointing to local memory, fixed 2020-01-20 19:15:20 +02:00
jpekkila
54d91e7eeb Removed debug synchronization from packing.cu 2020-01-20 18:58:06 +02:00
jpekkila
993bfc4533 Better concurrency and some simplifications (MPI). 2020-01-20 18:45:24 +02:00
jpekkila
765ce9a573 Some concurrency optimizations for 3D blocking 2020-01-20 17:08:23 +02:00
jpekkila
6d4f696e60 Initial implementation for parallel compute + communication 2020-01-20 16:21:19 +02:00
jpekkila
3625e9db5f Added timing to acDeviceRunMPITest() 2020-01-20 14:54:26 +02:00
jpekkila
d034cadfac Updated copyright years 2020-01-17 15:34:10 +02:00
jpekkila
88a4a1718d More cleanup 2020-01-17 15:27:02 +02:00
jpekkila
ff6a7155e5 Added a simplified and cleaned up 3D decomp MPI implementation. Tested to work at least up to 2x2x2 nodes. 2020-01-17 15:22:23 +02:00
jpekkila
975a15f7f4 Removed all MPI-related code in preparation of a rewrite of the MPI stuff 2020-01-17 14:22:11 +02:00
jpekkila
9264b7515a Working 3D decomp, unoptimized 2020-01-16 21:47:05 +02:00
jpekkila
29b38d3b89 MPI distribute and gather were incorrect, fixed. Now tested to work with 1,2, and 4 GPUs. 2020-01-16 19:12:32 +02:00
jpekkila
d7f56eeb67 Boundary conditions for 3D decomposition with MPI now working on a single node. 2020-01-16 16:34:33 +02:00
jpekkila
50bf8b7148 MPI communication of corners via CPU OK 2020-01-16 15:17:57 +02:00
jpekkila
c76c2afd5e Merge branch 'master' into 3d-decomposition-2020-01 2020-01-16 13:21:59 +02:00
jpekkila
23efcb413f Introduced a sample directory and moved all non-library-components from src to there 2020-01-15 16:24:38 +02:00
Miikka Vaisala
185b33980f Forcing function bug correction. 2020-01-14 13:58:11 +08:00
jpekkila
5e1500fe97 Happy new year! :) 2020-01-13 21:38:07 +02:00
jpekkila
92a6a1bdec Added more professional run flags to ./ac_run 2020-01-13 15:35:01 +02:00
jpekkila
794e4393c3 Added a new function for the legacy Astaroth layer: acGetNode(). This functions returns a Node, which can be used to access acNode layer functions 2020-01-13 11:33:15 +02:00
jpekkila
1d315732e0 Giving up on 3D decomposition with CUDA-aware MPI. The MPI implementation on Puhti seems to be painfully bugged, the device pointers are not tracked properly in some cases (f.ex. if there's an array of structures which contain CUDA pointers). Going to implement 3D decomp the traditional way for now (communicating via the CPU). It's easy to switch to CUDA-aware MPI once Mellanox/NVIDIA/CSC have fixed their software. 2020-01-07 21:06:27 +02:00
jpekkila
299ff5cb67 All fields are now packed to simplify communication 2020-01-07 21:01:22 +02:00
jpekkila
5d60791f13 Current 3D decomp method still too complicated. Starting again from scratch. 2020-01-07 14:40:51 +02:00
jpekkila
eaee81bf06 Merge branch 'master' into 3d-decomposition-2020-01 2020-01-07 14:25:06 +02:00
jpekkila
f0208c66a6 Now compiles also for P100 by default (was removed accidentally in earlier commits) 2020-01-07 10:29:44 +00:00
jpekkila
1dbcc469fc Allocations for packed data (MPI) 2020-01-05 18:57:14 +02:00
jpekkila
bee930b151 Merge branch 'master' into 3d-decomposition-2020-01 2020-01-05 16:48:26 +02:00
jpekkila
be7946c2af Added the multiplication operator for int3 structures 2020-01-05 16:47:28 +02:00
jpekkila
51b48a5a36 Some intermediate MPI changes 2020-01-05 16:46:40 +02:00
jpekkila
d6c81c89fb This 3D blocking approach is getting too complicated, removed code and trying again 2019-12-28 16:38:10 +02:00
jpekkila
e86b082c98 MPI transfer for the first corner with 3D blocking now complete. Disabled/enabled some error checking for development 2019-12-27 13:43:22 +02:00
jpekkila
bd0cc3ee20 There was some kind of mismatch between CUDA and MPI (UCX) libraries when linking with cudart. Switching to provided by cmake fixed the issue. 2019-12-27 13:41:18 +02:00
jpekkila
6b5910f7df Added allocations for the packed buffers 2019-12-21 19:00:35 +02:00
jpekkila
57a1f3e30c Added a generic pack/unpack function 2019-12-21 16:20:40 +02:00
jpekkila
e4f7214b3a benchmark.cc edited online with Bitbucket 2019-12-21 11:26:54 +00:00
jpekkila
3ecd47fe8b Merge branch 'master' into 3d-decomposition-2020-01 2019-12-21 13:22:45 +02:00
jpekkila
35b56029cf Build failed with single-precision, added the correct casts to modelsolver.c 2019-12-21 13:21:56 +02:00
jpekkila
4d873caf38 Changed utils CMakeList.txt to modern cmake style 2019-12-21 13:16:08 +02:00
jpekkila
bad64f5307 Started the 3D decomposition branch. Four tasks: 1) Determine how to distribute the work given n processes 2) Distribute and gather the mesh to/from these processes 3) Create packing/unpacking functions and 4) Transfer packed data blocks between neighbors. Tasks 1 and 2 done with this commit. 2019-12-21 12:37:01 +02:00
jpekkila
ecff5c3041 Added some final changes to benchmarking 2019-12-15 21:47:41 +02:00
jpekkila
8bd81db63c Added CPU parallelization to make CPU integration and boundconds faster 2019-12-14 15:45:42 +02:00
jpekkila
ff35d78509 Rewrote the MPI benchmark-verification function 2019-12-14 15:26:19 +02:00
jpekkila
f0e77181df Benchmark finetuning 2019-12-14 14:52:06 +02:00
jpekkila
b8a997b0ab Added code for doing a proper verification run with MPI. Passes nicely with full MHD + upwinding when using the new utility stuff introduced in the previous commits. Note: forcing is not enabled in the utility library by default. 2019-12-14 07:37:59 +02:00
jpekkila
277905aafb Added a model integrator to the utility library (written in pure C). Requires support for AVX vector instructions. 2019-12-14 07:34:33 +02:00
jpekkila
22a3105068 Finished the latest version of autotesting (utility library). Uses ulps to determine the acceptable error instead of the relative error used previously 2019-12-14 07:27:11 +02:00
jpekkila
5ec2f6ad75 Better wording in config_loader.c 2019-12-14 07:23:25 +02:00