jpekkila
|
5de163e8d1
|
Added commented out pragma unrolls to remind how packing could be improved. Though at the moment unrolls actually make the performance much worse, reasons unknown.
|
2020-01-22 19:27:45 +02:00 |
|
jpekkila
|
41f8e9aebb
|
Removed an old inefficient function for MPI comm
|
2020-01-22 19:26:33 +02:00 |
|
jpekkila
|
caacf2b33c
|
Removed --restrict flag from CUDA compilation for safety
|
2020-01-22 19:25:26 +02:00 |
|
jpekkila
|
354cf81777
|
MPI_Request was saved to address pointing to local memory, fixed
|
2020-01-20 19:15:20 +02:00 |
|
jpekkila
|
54d91e7eeb
|
Removed debug synchronization from packing.cu
|
2020-01-20 18:58:06 +02:00 |
|
jpekkila
|
993bfc4533
|
Better concurrency and some simplifications (MPI).
|
2020-01-20 18:45:24 +02:00 |
|
jpekkila
|
765ce9a573
|
Some concurrency optimizations for 3D blocking
|
2020-01-20 17:08:23 +02:00 |
|
jpekkila
|
6d4f696e60
|
Initial implementation for parallel compute + communication
|
2020-01-20 16:21:19 +02:00 |
|
jpekkila
|
3625e9db5f
|
Added timing to acDeviceRunMPITest()
|
2020-01-20 14:54:26 +02:00 |
|
jpekkila
|
d034cadfac
|
Updated copyright years
|
2020-01-17 15:34:10 +02:00 |
|
jpekkila
|
88a4a1718d
|
More cleanup
|
2020-01-17 15:27:02 +02:00 |
|
jpekkila
|
ff6a7155e5
|
Added a simplified and cleaned up 3D decomp MPI implementation. Tested to work at least up to 2x2x2 nodes.
|
2020-01-17 15:22:23 +02:00 |
|
jpekkila
|
975a15f7f4
|
Removed all MPI-related code in preparation of a rewrite of the MPI stuff
|
2020-01-17 14:22:11 +02:00 |
|
jpekkila
|
9264b7515a
|
Working 3D decomp, unoptimized
|
2020-01-16 21:47:05 +02:00 |
|
jpekkila
|
29b38d3b89
|
MPI distribute and gather were incorrect, fixed. Now tested to work with 1,2, and 4 GPUs.
|
2020-01-16 19:12:32 +02:00 |
|
jpekkila
|
d7f56eeb67
|
Boundary conditions for 3D decomposition with MPI now working on a single node.
|
2020-01-16 16:34:33 +02:00 |
|
jpekkila
|
50bf8b7148
|
MPI communication of corners via CPU OK
|
2020-01-16 15:17:57 +02:00 |
|
jpekkila
|
c76c2afd5e
|
Merge branch 'master' into 3d-decomposition-2020-01
|
2020-01-16 13:21:59 +02:00 |
|
jpekkila
|
23efcb413f
|
Introduced a sample directory and moved all non-library-components from src to there
|
2020-01-15 16:24:38 +02:00 |
|
Miikka Vaisala
|
185b33980f
|
Forcing function bug correction.
|
2020-01-14 13:58:11 +08:00 |
|
jpekkila
|
5e1500fe97
|
Happy new year! :)
|
2020-01-13 21:38:07 +02:00 |
|
jpekkila
|
92a6a1bdec
|
Added more professional run flags to ./ac_run
|
2020-01-13 15:35:01 +02:00 |
|
jpekkila
|
794e4393c3
|
Added a new function for the legacy Astaroth layer: acGetNode(). This functions returns a Node, which can be used to access acNode layer functions
|
2020-01-13 11:33:15 +02:00 |
|
jpekkila
|
1d315732e0
|
Giving up on 3D decomposition with CUDA-aware MPI. The MPI implementation on Puhti seems to be painfully bugged, the device pointers are not tracked properly in some cases (f.ex. if there's an array of structures which contain CUDA pointers). Going to implement 3D decomp the traditional way for now (communicating via the CPU). It's easy to switch to CUDA-aware MPI once Mellanox/NVIDIA/CSC have fixed their software.
|
2020-01-07 21:06:27 +02:00 |
|
jpekkila
|
299ff5cb67
|
All fields are now packed to simplify communication
|
2020-01-07 21:01:22 +02:00 |
|
jpekkila
|
5d60791f13
|
Current 3D decomp method still too complicated. Starting again from scratch.
|
2020-01-07 14:40:51 +02:00 |
|
jpekkila
|
eaee81bf06
|
Merge branch 'master' into 3d-decomposition-2020-01
|
2020-01-07 14:25:06 +02:00 |
|
jpekkila
|
f0208c66a6
|
Now compiles also for P100 by default (was removed accidentally in earlier commits)
|
2020-01-07 10:29:44 +00:00 |
|
jpekkila
|
1dbcc469fc
|
Allocations for packed data (MPI)
|
2020-01-05 18:57:14 +02:00 |
|
jpekkila
|
bee930b151
|
Merge branch 'master' into 3d-decomposition-2020-01
|
2020-01-05 16:48:26 +02:00 |
|
jpekkila
|
be7946c2af
|
Added the multiplication operator for int3 structures
|
2020-01-05 16:47:28 +02:00 |
|
jpekkila
|
51b48a5a36
|
Some intermediate MPI changes
|
2020-01-05 16:46:40 +02:00 |
|
jpekkila
|
d6c81c89fb
|
This 3D blocking approach is getting too complicated, removed code and trying again
|
2019-12-28 16:38:10 +02:00 |
|
jpekkila
|
e86b082c98
|
MPI transfer for the first corner with 3D blocking now complete. Disabled/enabled some error checking for development
|
2019-12-27 13:43:22 +02:00 |
|
jpekkila
|
bd0cc3ee20
|
There was some kind of mismatch between CUDA and MPI (UCX) libraries when linking with cudart. Switching to provided by cmake fixed the issue.
|
2019-12-27 13:41:18 +02:00 |
|
jpekkila
|
6b5910f7df
|
Added allocations for the packed buffers
|
2019-12-21 19:00:35 +02:00 |
|
jpekkila
|
57a1f3e30c
|
Added a generic pack/unpack function
|
2019-12-21 16:20:40 +02:00 |
|
jpekkila
|
e4f7214b3a
|
benchmark.cc edited online with Bitbucket
|
2019-12-21 11:26:54 +00:00 |
|
jpekkila
|
3ecd47fe8b
|
Merge branch 'master' into 3d-decomposition-2020-01
|
2019-12-21 13:22:45 +02:00 |
|
jpekkila
|
35b56029cf
|
Build failed with single-precision, added the correct casts to modelsolver.c
|
2019-12-21 13:21:56 +02:00 |
|
jpekkila
|
4d873caf38
|
Changed utils CMakeList.txt to modern cmake style
|
2019-12-21 13:16:08 +02:00 |
|
jpekkila
|
bad64f5307
|
Started the 3D decomposition branch. Four tasks: 1) Determine how to distribute the work given n processes 2) Distribute and gather the mesh to/from these processes 3) Create packing/unpacking functions and 4) Transfer packed data blocks between neighbors. Tasks 1 and 2 done with this commit.
|
2019-12-21 12:37:01 +02:00 |
|
jpekkila
|
ecff5c3041
|
Added some final changes to benchmarking
|
2019-12-15 21:47:41 +02:00 |
|
jpekkila
|
8bd81db63c
|
Added CPU parallelization to make CPU integration and boundconds faster
|
2019-12-14 15:45:42 +02:00 |
|
jpekkila
|
ff35d78509
|
Rewrote the MPI benchmark-verification function
|
2019-12-14 15:26:19 +02:00 |
|
jpekkila
|
f0e77181df
|
Benchmark finetuning
|
2019-12-14 14:52:06 +02:00 |
|
jpekkila
|
b8a997b0ab
|
Added code for doing a proper verification run with MPI. Passes nicely with full MHD + upwinding when using the new utility stuff introduced in the previous commits. Note: forcing is not enabled in the utility library by default.
|
2019-12-14 07:37:59 +02:00 |
|
jpekkila
|
277905aafb
|
Added a model integrator to the utility library (written in pure C). Requires support for AVX vector instructions.
|
2019-12-14 07:34:33 +02:00 |
|
jpekkila
|
22a3105068
|
Finished the latest version of autotesting (utility library). Uses ulps to determine the acceptable error instead of the relative error used previously
|
2019-12-14 07:27:11 +02:00 |
|
jpekkila
|
5ec2f6ad75
|
Better wording in config_loader.c
|
2019-12-14 07:23:25 +02:00 |
|