jpekkila
9840b817d0
Added the (hopefully final) basic test case used for the benchmarks
2020-06-07 21:59:33 +03:00
Oskar Lappi
cd49db68d7
No barrier benchmark
2020-06-07 15:50:49 +03:00
Oskar Lappi
53b48bb8ce
MPI_Allreduce -> MPI_Reduce for MPI reductions + benchmark batch script
...
Slightly ugly because this changes the benchmark behaviour slightly
However we now have a way to run batch benchmarks from one script, no need to generate new ones
2020-06-06 22:56:05 +03:00
Oskar Lappi
eb05e02793
Added vector reductions to mpi reduction benchmarks
2020-06-06 19:25:30 +03:00
Oskar Lappi
666f01a23d
Benchmarking program for scalar mpi reductions, and nonbatch script for running benchmarks
...
- New program mpi_reduce_bench
- runs testcases defined in source
- writes all benchmark results to a csv file, tags the testcase and benchmark run
- takes optional argument for benchmark tag, default benchmark tag is a timestamp
- New script mpibench.sh
- runs the mpi_reduce_bench with defined parameters:
- number of tasks
- number of nodes
- the benchmark tag for mpi_reduce_bench, default tag is the current git HEAD short hash
2020-06-05 19:48:40 +03:00
jpekkila
17a4f31451
Added the latest setup used for benchmarks
2020-06-04 20:47:03 +03:00
Oskar Lappi
9e5fd40838
Changes after code review by Johannes, and clang-format
2020-06-04 18:50:22 +03:00
Oskar Lappi
f7d8de75d2
Reduction test pipeline added to mpitest, Error struct changed: new label field
...
- CHANGED: Error struct has a new label field for labeling an error
- The label is what is printed to screen
- vtxbuf name lookup moved out of printErrorToScreen/print_error_to_screen
- NEW: acScalReductionTestCase and acVecReductionTestCase
- Define new test cases by adding them to a list in samples/mpitest/main.cc:main
- Minor style change in verification.c to make all Verification functions similar
and fit one screen
2020-06-04 15:10:35 +03:00
jpekkila
226de32651
Added model solution for reductions and functions for automated testing
2020-06-03 13:37:00 +03:00
Oskar Lappi
34793d4e8b
Changes after code review with Johannes
2020-06-03 12:44:43 +03:00
Oskar Lappi
899d679518
Draft of MPI-based reductions acGridReduceScal, acGridReduceVec
...
- Calls acDeviceReduceScal/Vec first
- Both functions then perform the same MPI-reduction (MPI_Allreduce)
- Not tested
2020-06-02 21:59:30 +03:00
jpekkila
0d80834619
Disabled forcing and upwinding for performance tests. Set default grid size to 512^3. Set default cmake params s.t. benchmarks can be reproduced out-of-the-box.
2020-06-02 14:09:00 +03:00
jpekkila
a753ca92f2
Made cmake handle MPI linking. Potentially a bad idea (usually better to use mpicc and mpicxx wrappers)
2020-05-30 22:02:39 +03:00
jpekkila
f97ed9e513
For reason X git decided to remove integration from the most critical part of the program when merging. Luckily we have autotests.
2020-05-30 20:59:39 +03:00
jpekkila
9cafe88d13
Merge branch 'mpi-to-master-merge-candidate-2020-06-01' of https://bitbucket.org/jpekkila/astaroth into mpi-to-master-merge-candidate-2020-06-01
2020-05-30 20:25:48 +03:00
jpekkila
176ceae313
Fixed various compilation warnings
2020-05-30 20:23:53 +03:00
jpekkila
2ddeef22ac
bitbucket-pipelines.yml edited online with Bitbucket
2020-05-30 16:58:45 +00:00
jpekkila
f929b21ac0
bitbucket-pipelines.yml edited online with Bitbucket
2020-05-30 16:52:26 +00:00
jpekkila
95275df3f2
bitbucket-pipelines.yml edited online with Bitbucket
2020-05-30 16:48:39 +00:00
jpekkila
c24996fdb3
Added a the official Kitware PPA for pulling the latest CMake when doing automated builds.
2020-05-30 16:45:08 +00:00
jpekkila
b719306266
Upped the required CMake version. This may be an issue on older machines. Instead of making the user to compile CMake themselves in this case, we could maybe add CMake as a submodule. In any case supporting older CMake versions is not really an option because CUDA support with those is so bad and requires adding dirty hacks to the clean cmakefiles we have now.
2020-05-30 19:36:32 +03:00
jpekkila
e05338c128
Merged the newest MPI changes
2020-05-30 19:18:46 +03:00
jpekkila
555bf8b252
Reverted the default settings to same as on master for easier merge
2020-05-30 19:06:21 +03:00
jpekkila
4748e48c7d
Spelling fixes
2020-05-28 17:10:17 +03:00
jpekkila
01ad141d90
Added comments and a short overview of the MPI implementation
2020-05-28 17:05:12 +03:00
jpekkila
f1138b04ac
Cleaned up the MPI implementation, removed all older implementations (removed also MPI window implementation which might be handy in the future when CUDA-aware support is introduced). If the removed stuff is needed later, here are some keywords to help find this commit: MPI_window, sendrecv, bidirectional, unidirectional transfer, real-time pinning, a0s, b0s.
2020-05-28 16:42:50 +03:00
jpekkila
0d62f56e27
Tried an alternative approach to comm (was worse than the current solution) and rewrote the current best solution for (now easier to read)
2020-05-28 15:31:43 +03:00
jpekkila
f97005a75d
Added WIP version of the new bidirectional comm scheme
2020-05-27 19:09:32 +03:00
jpekkila
afe5b973ca
Added multiplication operator for int3
2020-05-27 19:08:39 +03:00
jpekkila
7e59ea0eff
MPI: corners are no longer communicated. Slight performance impact (14 ms vs 15 ms). Tests still pass with 8 GPUs.
2020-05-26 19:00:14 +03:00
jpekkila
ec59cdb973
Some formatting and unimportant changes to samples
2020-05-26 18:57:46 +03:00
jpekkila
c93b3265e6
Made comm streams high prio
2020-04-22 17:03:53 +03:00
jpekkila
22e01b7f1d
Rewrote partitioning code
2020-04-19 23:23:23 +03:00
jpekkila
4dd825f574
Proper decomposition when using Morton order to partition the computational domain
2020-04-19 22:50:26 +03:00
jpekkila
ffb274e16f
Linking dynamic CUDA library instead of static (less prone to breaking since Astaroth does not have to be rebuilt when CUDA is updated)
2020-04-19 22:33:01 +03:00
jpekkila
8c210b3292
3D decomposition is now done using Morton order instead of linear indexing
2020-04-19 22:31:57 +03:00
jpekkila
9cd5909f5a
BWtest calculates now aggregate bandwidths per process instead of assuming that all neighbor communication can be done in parallel (Within a node one can have parallel P2P connections to all neighbors and we have an insane total bandwidth, but this is not the case with network, we seem to have only one bidirectional socket)
2020-04-09 20:28:04 +03:00
jpekkila
d4a84fb887
Added a PCIe bandwidth test
2020-04-09 20:04:54 +03:00
jpekkila
d6e74ee270
Added missing files
2020-04-09 19:24:55 +03:00
jpekkila
ed8a0bf7e6
Added bwtest and benchmarkscript to CMakeLists
2020-04-07 18:35:12 +03:00
jpekkila
fb41741d74
Improvements to samples
2020-04-07 17:58:47 +03:00
jpekkila
427a3ac5d8
Rewrote the previous implementation, now fully works (verified) and gives the speedup we want. Communication latency is now completely hidden on at least two nodes (8 GPUs). Scaling looks very promising.
2020-04-06 17:28:02 +03:00
jpekkila
37f1c841a3
Added functions for pinning memory that is sent over the network. TODO pack to and from pinned memory selectively (currently P2P results are overwritten with data in pinned memory)
2020-04-06 14:09:12 +03:00
jpekkila
cc9d3f1b9c
Found a workaround that gives good inter and intra-node performance. HPC-X MPI implementation does not know how to do p2p comm with pinned arrays (should be 80 GiB/s, measured 10 GiB/s) and internode comm is super slow without pinned arrays (should be 40 GiB/s, measured < 1 GiB/s). Made a proof of concept communicator that pins arrays that are send or received from another node.
2020-04-05 20:15:32 +03:00
jpekkila
88e53dfa21
Added a little program for testing the bandwidths of different MPI comm styles on n nodes and processes
2020-04-05 17:09:57 +03:00
jpekkila
fe14ae4665
Added an alternative MPI implementation which uses one-sided communication
2020-04-02 17:59:53 +03:00
jpekkila
d6d5920553
Pulled improvements to device.cc from the benchmark branch to master
2020-03-31 14:23:36 +03:00
Johannes Pekkila
9b6d927cf1
It might be better to benchmark MPI codes without synchronization because of overhead of timing individual steps
2020-03-31 12:37:54 +02:00
Johannes Pekkila
742dcc2697
Optimized MPI synchronization a bit
2020-03-31 12:36:25 +02:00
jpekkila
24e65ab02d
Set decompositions for some nprocs by hand
2020-03-30 18:13:50 +03:00