Miikka Vaisala
|
543c565e5d
|
Dummy at the moment, but now the boundary condition kernel caller can see what vertex buffer name is in use.
|
2020-11-23 15:43:52 +08:00 |
|
Miikka Vaisala
|
efd3cc40cd
|
Compiles without the API funtion call.
|
2020-11-20 14:11:14 +08:00 |
|
Miikka Vaisala
|
cb15668f2d
|
Figuring out compilations.
|
2020-11-20 11:58:15 +08:00 |
|
Miikka Vaisala
|
cf6b21e3ab
|
Created separate acGridIntegrateNonperiodic()
from +acGridIntegrate() this is to avoid some potential issues with an upcoming merge by Oskar.
|
2020-10-06 17:49:36 +08:00 |
|
Miikka Vaisala
|
711cc4d350
|
Moving code in wrong place.
|
2020-09-29 16:36:43 +08:00 |
|
Miikka Vaisala
|
9386129f1b
|
Cleaning and improving the boundary condition draft.
|
2020-09-29 16:31:16 +08:00 |
|
Miikka Vaisala
|
2f85cbba1a
|
Last mimnnut modification before the meeting.
|
2020-09-21 15:54:33 +08:00 |
|
Miikka Vaisala
|
1868031f1e
|
Working on marking the active edges.
|
2020-09-18 17:25:34 +08:00 |
|
Miikka Vaisala
|
f736aa1cd1
|
Attemptiong to make kernels to go where they should.
|
2020-09-18 16:55:36 +08:00 |
|
Miikka Vaisala
|
67aa87731b
|
Crafting code. Attempting to figure out the MPI domain decomposition etc.
|
2020-09-18 15:25:14 +08:00 |
|
Miikka Vaisala
|
94d1d053bc
|
ReduceVecScal calls added. For Alfven speeds.
|
2020-09-11 15:54:53 +08:00 |
|
jpekkila
|
4052120f52
|
dt wasn't propagated properly to all GPUs when computing integration steps, fixed
|
2020-08-24 18:33:54 +03:00 |
|
jpekkila
|
8199ff914f
|
Added acGridLoadScalarUniform and acGridLoadVectorUniform functions for loading specific device constants with MPI
|
2020-08-24 17:19:02 +03:00 |
|
jpekkila
|
d966afe830
|
Added VERBOSE CMake option and made various prints optional to clean the output. VERBOSE is by off by default, pass cmake -DVERBOSE=ON to re-enable various non-critical warning and status prints (important warnings are still visible regardless of the flag).
|
2020-08-21 21:19:42 +03:00 |
|
jpekkila
|
56273433fe
|
Fixed inconsistency in the acGridLoad parameter order
|
2020-08-21 14:40:11 +03:00 |
|
jpekkila
|
a85b8b8cd1
|
MPI: corners are now transferred by default because 1) with those Astaroth works with any symmetric stencil and 2) corners have very small impact on performance. Also disabled resetting the device s.t. one could potentially assign many subgrids to one GPU (f.ex. in AMR)
|
2020-08-19 12:05:20 +03:00 |
|
jpekkila
|
3afab77533
|
Removed astaroth_utils from astaroth_core dependencies
|
2020-07-29 19:58:21 +03:00 |
|
jpekkila
|
003c202e8c
|
Pulled useful changes from the benchmark branch. GPUDirect RDMA (unpinned) is now the default for MPI communication.
|
2020-07-29 16:39:24 +03:00 |
|
jpekkila
|
0a19192004
|
Auto-optimization was not on for all GPUs when using MPI. May have to rerun all benchmarks for the MPI paper.
|
2020-06-25 19:53:39 +03:00 |
|
jpekkila
|
ba0bfd65b4
|
Merged the new reduction functions manually
|
2020-06-24 16:10:27 +03:00 |
|
jpekkila
|
0d1c5b3911
|
Autoformatted
|
2020-06-24 15:56:30 +03:00 |
|
jpekkila
|
88f99c12e4
|
Fixed #fi -> #endif
|
2020-06-24 15:20:43 +03:00 |
|
jpekkila
|
f04e347c45
|
Cleanup before merging to the master merge candidate branch
|
2020-06-24 15:13:15 +03:00 |
|
jpekkila
|
0e4b39d6d7
|
Added a toggle for using pinned memory
|
2020-06-11 11:28:52 +03:00 |
|
jpekkila
|
1cdb9e2ce7
|
Added missing synchronization to the end of the new integration function
|
2020-06-10 12:32:56 +03:00 |
|
jpekkila
|
fa422cf457
|
Added a better-pipelined version of the acGridIntegrate and a switch for toggling the transfer of corners
|
2020-06-10 02:16:23 +03:00 |
|
jpekkila
|
9840b817d0
|
Added the (hopefully final) basic test case used for the benchmarks
|
2020-06-07 21:59:33 +03:00 |
|
jpekkila
|
17a4f31451
|
Added the latest setup used for benchmarks
|
2020-06-04 20:47:03 +03:00 |
|
jpekkila
|
f97ed9e513
|
For reason X git decided to remove integration from the most critical part of the program when merging. Luckily we have autotests.
|
2020-05-30 20:59:39 +03:00 |
|
jpekkila
|
176ceae313
|
Fixed various compilation warnings
|
2020-05-30 20:23:53 +03:00 |
|
jpekkila
|
4748e48c7d
|
Spelling fixes
|
2020-05-28 17:10:17 +03:00 |
|
jpekkila
|
01ad141d90
|
Added comments and a short overview of the MPI implementation
|
2020-05-28 17:05:12 +03:00 |
|
jpekkila
|
f1138b04ac
|
Cleaned up the MPI implementation, removed all older implementations (removed also MPI window implementation which might be handy in the future when CUDA-aware support is introduced). If the removed stuff is needed later, here are some keywords to help find this commit: MPI_window, sendrecv, bidirectional, unidirectional transfer, real-time pinning, a0s, b0s.
|
2020-05-28 16:42:50 +03:00 |
|
jpekkila
|
0d62f56e27
|
Tried an alternative approach to comm (was worse than the current solution) and rewrote the current best solution for (now easier to read)
|
2020-05-28 15:31:43 +03:00 |
|
jpekkila
|
f97005a75d
|
Added WIP version of the new bidirectional comm scheme
|
2020-05-27 19:09:32 +03:00 |
|
jpekkila
|
7e59ea0eff
|
MPI: corners are no longer communicated. Slight performance impact (14 ms vs 15 ms). Tests still pass with 8 GPUs.
|
2020-05-26 19:00:14 +03:00 |
|
jpekkila
|
c93b3265e6
|
Made comm streams high prio
|
2020-04-22 17:03:53 +03:00 |
|
jpekkila
|
22e01b7f1d
|
Rewrote partitioning code
|
2020-04-19 23:23:23 +03:00 |
|
jpekkila
|
4dd825f574
|
Proper decomposition when using Morton order to partition the computational domain
|
2020-04-19 22:50:26 +03:00 |
|
jpekkila
|
8c210b3292
|
3D decomposition is now done using Morton order instead of linear indexing
|
2020-04-19 22:31:57 +03:00 |
|
jpekkila
|
fb41741d74
|
Improvements to samples
|
2020-04-07 17:58:47 +03:00 |
|
jpekkila
|
427a3ac5d8
|
Rewrote the previous implementation, now fully works (verified) and gives the speedup we want. Communication latency is now completely hidden on at least two nodes (8 GPUs). Scaling looks very promising.
|
2020-04-06 17:28:02 +03:00 |
|
jpekkila
|
37f1c841a3
|
Added functions for pinning memory that is sent over the network. TODO pack to and from pinned memory selectively (currently P2P results are overwritten with data in pinned memory)
|
2020-04-06 14:09:12 +03:00 |
|
jpekkila
|
fe14ae4665
|
Added an alternative MPI implementation which uses one-sided communication
|
2020-04-02 17:59:53 +03:00 |
|
Johannes Pekkila
|
742dcc2697
|
Optimized MPI synchronization a bit
|
2020-03-31 12:36:25 +02:00 |
|
jpekkila
|
24e65ab02d
|
Set decompositions for some nprocs by hand
|
2020-03-30 18:13:50 +03:00 |
|
jpekkila
|
cc64968b9e
|
GPUDirect was off, re-enabled
|
2020-03-26 18:24:42 +02:00 |
|
jpekkila
|
28792770f2
|
Better overlap with computation and comm. when inner integration is launched first
|
2020-03-26 18:00:01 +02:00 |
|
jpekkila
|
08f567619a
|
Removed old unused functions for MPi integration and comm
|
2020-03-26 15:04:57 +02:00 |
|
jpekkila
|
ed7cf3f540
|
Added a production-ready interface for doing multi-node runs with Astaroth with MPI
|
2020-03-26 15:02:37 +02:00 |
|