15785f6c44Revert "Updated README.md that MPI runs are production-ready."
jpekkila
2020-08-19 12:51:22 +03:00
0a26112a93Updated README.md that MPI runs are production-ready.
jpekkila
2020-08-19 09:48:39 +00:00
a85b8b8cd1MPI: corners are now transferred by default because 1) with those Astaroth works with any symmetric stencil and 2) corners have very small impact on performance. Also disabled resetting the device s.t. one could potentially assign many subgrids to one GPU (f.ex. in AMR)
jpekkila
2020-08-19 12:05:20 +03:00
7f7b0b89eaFetched improvements to benchmarks from the mpi-paper-benchmarks branch
jpekkila
2020-08-19 12:03:15 +03:00
eb86ce09f4I have inspected the branch and it complies and functions fine in my system. NOTE: I could not test MPI at the moment, but it should not prevent the merge.
jpekkila
2020-08-19 06:54:34 +00:00
0872695c48Updated API_specification_and_user_manual.md with info on the acGrid layer
jpekkila
2020-07-30 14:38:12 +00:00
5185a4d471README.md edited online with Bitbucket
jpekkila
2020-07-30 13:58:11 +00:00
fca615defbRemoved an old unused file
jpekkila
2020-07-29 20:01:11 +03:00
3afab77533Removed astaroth_utils from astaroth_core dependencies
jpekkila
2020-07-29 19:58:21 +03:00
a5d6fb4303Host flags were not propagated to the CUDA compiler, fixed
jpekkila
2020-07-29 19:34:28 +03:00
8fb271bbf3Upped CMake version to 3.18 and cleaned up CUDA architecture selection
jpekkila
2020-07-29 18:45:10 +03:00
5e04a61cd2README.md edited online with Bitbucket
jpekkila
2020-07-29 15:43:58 +00:00
bb821df686README.md edited online with Bitbucket
jpekkila
2020-07-29 15:19:05 +00:00
cd888be9ecREADME.md edited online with Bitbucket
jpekkila
2020-07-29 15:17:37 +00:00
31db032f43Upped the version number
jpekkila
2020-07-29 17:05:07 +03:00
770173a55dLimited automated build time to 5 minutes.
jpekkila
2020-07-29 13:55:22 +00:00
003c202e8cPulled useful changes from the benchmark branch. GPUDirect RDMA (unpinned) is now the default for MPI communication.
jpekkila
2020-07-29 16:39:24 +03:00
6cab3586cfThe generated fortran header is now consistent with fortran conventions. Also cleaned up the C version of the header.
jpekkila
2020-06-29 01:06:30 +03:00
d0ca1f8195Reduction types are now generated with acc instead of being explicitly declared in astaroth.h
jpekkila
2020-06-28 18:16:19 +03:00
852fae17cfAdded a function for getting the GPU count from fortran
jpekkila
2020-06-28 18:15:40 +03:00
50fb54f1aaAdded more warnings since its easy to make off-by-one mistakes when dealing with fortran-c-interop
jpekkila
2020-06-28 18:14:54 +03:00
e764725564acUpdateBuiltinParams now recalculates AC_inv_dsx and others if necessary
jpekkila
2020-06-26 09:54:17 +03:00
6f59890a3fAdded loading and storing functions to the fortran interface
jpekkila
2020-06-26 09:52:33 +03:00
39c7fc6c6fStreams are now generated with acc
jpekkila
2020-06-25 20:40:02 +03:00
7e71e32359Fortran does not seem to really support arrays of pointers, better to modify the interface function to take the f array as an input and use it in C to costruct a proper AcMesh
jpekkila
2020-06-25 20:21:16 +03:00
1b50374cdbAdded the rest of the basic functions required for running simulations with the fortran interface
jpekkila
2020-06-25 20:09:35 +03:00
0a19192004Auto-optimization was not on for all GPUs when using MPI. May have to rerun all benchmarks for the MPI paper.
jpekkila
2020-06-25 19:53:39 +03:00
172ffc34dcWas missing another fortran file, added
jpekkila
2020-06-25 06:44:27 +03:00
264abddefbbitbucket-pipelines.yml edited online with Bitbucket
jpekkila
2020-06-25 03:41:23 +00:00
f11c5b84fbForgot the actual interface from previous commits, here it is
jpekkila
2020-06-25 06:36:00 +03:00
c44c3d02b4Added a sample for testing the Fortran interface
jpekkila
2020-06-25 06:35:13 +03:00
fbb8d7c7c6Added a minimal Fortran interface to Astaroth
jpekkila
2020-06-25 06:34:16 +03:00
70ecacee7cReverted the default build options to what they were before merging (again)
jpekkila
2020-06-24 17:04:35 +03:00
196edac46dAdded proper casts to modelsolver.c
jpekkila
2020-06-24 17:03:54 +03:00
c0c337610bAdded mpi_reduce_bench to samples
jpekkila
2020-06-24 16:42:39 +03:00
fab620eb0dReordered reduction autotests and made it so that the exact same mesh is used for both the model and candidates instead of the unclean integrated one
jpekkila
2020-06-24 16:34:50 +03:00
ba0bfd65b4Merged the new reduction functions manually
jpekkila
2020-06-24 16:10:27 +03:00
ff1a601f85Merged mpi-to-master-merge-candidate-2020-06-01 here
jpekkila
2020-06-24 16:08:14 +03:00
3c3b2a1885Reverted the default settings to what they were before merge. Note: LFORCING (1) is potentially not tested properly, TODO recheck.
jpekkila
2020-06-24 15:35:19 +03:00
f04e347c45Cleanup before merging to the master merge candidate branch
jpekkila
2020-06-24 15:13:15 +03:00
0e4b39d6d7Added a toggle for using pinned memory
jpekkila
2020-06-11 11:28:52 +03:00
0030db01f3Automatic calculation of nodes based on processes
Oskar Lappi
2020-06-10 16:51:35 +03:00
1cdb9e2ce7Added missing synchronization to the end of the new integration function
jpekkila
2020-06-10 12:32:56 +03:00
fa422cf457Added a better-pipelined version of the acGridIntegrate and a switch for toggling the transfer of corners
jpekkila
2020-06-10 02:16:23 +03:00
c7f23eb50cAdded partition argument to mpibench script
Oskar Lappi
2020-06-09 14:07:37 +03:00
9840b817d0Added the (hopefully final) basic test case used for the benchmarks
jpekkila
2020-06-07 21:59:33 +03:00
cd49db68d7No barrier benchmark
Oskar Lappi
2020-06-07 15:50:49 +03:00
53b48bb8ceMPI_Allreduce -> MPI_Reduce for MPI reductions + benchmark batch script
Oskar Lappi
2020-06-06 22:53:08 +03:00
eb05e02793Added vector reductions to mpi reduction benchmarks
Oskar Lappi
2020-06-06 19:22:05 +03:00
666f01a23dBenchmarking program for scalar mpi reductions, and nonbatch script for running benchmarks - New program mpi_reduce_bench - runs testcases defined in source - writes all benchmark results to a csv file, tags the testcase and benchmark run - takes optional argument for benchmark tag, default benchmark tag is a timestamp - New script mpibench.sh - runs the mpi_reduce_bench with defined parameters: - number of tasks - number of nodes - the benchmark tag for mpi_reduce_bench, default tag is the current git HEAD short hash
Oskar Lappi
2020-06-05 19:48:40 +03:00
17a4f31451Added the latest setup used for benchmarks
jpekkila
2020-06-04 20:47:03 +03:00
9e5fd40838Changes after code review by Johannes, and clang-format
Oskar Lappi
2020-06-04 18:47:31 +03:00
f7d8de75d2Reduction test pipeline added to mpitest, Error struct changed: new label field
Oskar Lappi
2020-06-04 13:42:34 +03:00
226de32651Added model solution for reductions and functions for automated testing
jpekkila
2020-06-03 13:37:00 +03:00
34793d4e8bChanges after code review with Johannes
Oskar Lappi
2020-06-03 12:44:43 +03:00
899d679518Draft of MPI-based reductions acGridReduceScal, acGridReduceVec
Oskar Lappi
2020-06-02 21:30:53 +03:00
0d80834619Disabled forcing and upwinding for performance tests. Set default grid size to 512^3. Set default cmake params s.t. benchmarks can be reproduced out-of-the-box.
jpekkila
2020-06-02 14:08:34 +03:00
a753ca92f2Made cmake handle MPI linking. Potentially a bad idea (usually better to use mpicc and mpicxx wrappers)
jpekkila
2020-05-30 22:02:39 +03:00
f97ed9e513For reason X git decided to remove integration from the most critical part of the program when merging. Luckily we have autotests.
jpekkila
2020-05-30 20:59:39 +03:00
176ceae313Fixed various compilation warnings
jpekkila
2020-05-30 20:23:53 +03:00
2ddeef22acbitbucket-pipelines.yml edited online with Bitbucket
jpekkila
2020-05-30 16:58:45 +00:00
f929b21ac0bitbucket-pipelines.yml edited online with Bitbucket
jpekkila
2020-05-30 16:52:26 +00:00
95275df3f2bitbucket-pipelines.yml edited online with Bitbucket
jpekkila
2020-05-30 16:48:39 +00:00
c24996fdb3Added a the official Kitware PPA for pulling the latest CMake when doing automated builds.
jpekkila
2020-05-30 16:45:08 +00:00
b719306266Upped the required CMake version. This may be an issue on older machines. Instead of making the user to compile CMake themselves in this case, we could maybe add CMake as a submodule. In any case supporting older CMake versions is not really an option because CUDA support with those is so bad and requires adding dirty hacks to the clean cmakefiles we have now.
jpekkila
2020-05-30 19:36:32 +03:00
e05338c128Merged the newest MPI changes
jpekkila
2020-05-30 19:18:46 +03:00
555bf8b252Reverted the default settings to same as on master for easier merge
jpekkila
2020-05-30 19:06:21 +03:00
01ad141d90Added comments and a short overview of the MPI implementation
jpekkila
2020-05-28 17:05:12 +03:00
f1138b04acCleaned up the MPI implementation, removed all older implementations (removed also MPI window implementation which might be handy in the future when CUDA-aware support is introduced). If the removed stuff is needed later, here are some keywords to help find this commit: MPI_window, sendrecv, bidirectional, unidirectional transfer, real-time pinning, a0s, b0s.
jpekkila
2020-05-28 16:42:50 +03:00
0d62f56e27Tried an alternative approach to comm (was worse than the current solution) and rewrote the current best solution for (now easier to read)
jpekkila
2020-05-28 15:31:43 +03:00
f97005a75dAdded WIP version of the new bidirectional comm scheme
jpekkila
2020-05-27 19:09:32 +03:00
afe5b973caAdded multiplication operator for int3
jpekkila
2020-05-27 19:08:39 +03:00
7e59ea0effMPI: corners are no longer communicated. Slight performance impact (14 ms vs 15 ms). Tests still pass with 8 GPUs.
jpekkila
2020-05-26 19:00:14 +03:00
ec59cdb973Some formatting and unimportant changes to samples
jpekkila
2020-05-26 18:57:46 +03:00
c93b3265e6Made comm streams high prio
jpekkila
2020-04-22 17:03:53 +03:00
4dd825f574Proper decomposition when using Morton order to partition the computational domain
jpekkila
2020-04-19 22:50:26 +03:00
ffb274e16fLinking dynamic CUDA library instead of static (less prone to breaking since Astaroth does not have to be rebuilt when CUDA is updated)
jpekkila
2020-04-19 22:33:01 +03:00
8c210b32923D decomposition is now done using Morton order instead of linear indexing
jpekkila
2020-04-19 22:31:57 +03:00
9cd5909f5aBWtest calculates now aggregate bandwidths per process instead of assuming that all neighbor communication can be done in parallel (Within a node one can have parallel P2P connections to all neighbors and we have an insane total bandwidth, but this is not the case with network, we seem to have only one bidirectional socket)
jpekkila
2020-04-09 20:28:04 +03:00
d4a84fb887Added a PCIe bandwidth test
jpekkila
2020-04-09 20:04:54 +03:00
ed8a0bf7e6Added bwtest and benchmarkscript to CMakeLists
jpekkila
2020-04-07 18:35:12 +03:00
fb41741d74Improvements to samples
jpekkila
2020-04-07 17:58:47 +03:00
427a3ac5d8Rewrote the previous implementation, now fully works (verified) and gives the speedup we want. Communication latency is now completely hidden on at least two nodes (8 GPUs). Scaling looks very promising.
jpekkila
2020-04-06 17:28:02 +03:00
37f1c841a3Added functions for pinning memory that is sent over the network. TODO pack to and from pinned memory selectively (currently P2P results are overwritten with data in pinned memory)
jpekkila
2020-04-06 14:09:12 +03:00
cc9d3f1b9cFound a workaround that gives good inter and intra-node performance. HPC-X MPI implementation does not know how to do p2p comm with pinned arrays (should be 80 GiB/s, measured 10 GiB/s) and internode comm is super slow without pinned arrays (should be 40 GiB/s, measured < 1 GiB/s). Made a proof of concept communicator that pins arrays that are send or received from another node.
jpekkila
2020-04-05 20:15:32 +03:00
88e53dfa21Added a little program for testing the bandwidths of different MPI comm styles on n nodes and processes
jpekkila
2020-04-05 17:09:57 +03:00
fe14ae4665Added an alternative MPI implementation which uses one-sided communication
jpekkila
2020-04-02 17:59:53 +03:00
d6d5920553Pulled improvements to device.cc from the benchmark branch to master
jpekkila
2020-03-31 14:23:36 +03:00
9b6d927cf1It might be better to benchmark MPI codes without synchronization because of overhead of timing individual steps
Johannes Pekkila
2020-03-31 12:37:54 +02:00
742dcc2697Optimized MPI synchronization a bit
Johannes Pekkila
2020-03-31 12:36:25 +02:00
24e65ab02dSet decompositions for some nprocs by hand
jpekkila
2020-03-30 18:13:50 +03:00