astaroth

cwpearson/astaroth

Fork 0

15785f6c44 Revert "Updated README.md that MPI runs are production-ready." jpekkila 2020-08-19 12:51:22 +03:00
0a26112a93 Updated README.md that MPI runs are production-ready. jpekkila 2020-08-19 09:48:39 +00:00
a85b8b8cd1 MPI: corners are now transferred by default because 1) with those Astaroth works with any symmetric stencil and 2) corners have very small impact on performance. Also disabled resetting the device s.t. one could potentially assign many subgrids to one GPU (f.ex. in AMR) jpekkila 2020-08-19 12:05:20 +03:00
7f7b0b89ea Fetched improvements to benchmarks from the mpi-paper-benchmarks branch jpekkila 2020-08-19 12:03:15 +03:00
eb86ce09f4 I have inspected the branch and it complies and functions fine in my system. NOTE: I could not test MPI at the moment, but it should not prevent the merge. jpekkila 2020-08-19 06:54:34 +00:00
0872695c48 Updated API_specification_and_user_manual.md with info on the acGrid layer jpekkila 2020-07-30 14:38:12 +00:00
5185a4d471 README.md edited online with Bitbucket jpekkila 2020-07-30 13:58:11 +00:00
fca615defb Removed an old unused file jpekkila 2020-07-29 20:01:11 +03:00
3afab77533 Removed astaroth_utils from astaroth_core dependencies jpekkila 2020-07-29 19:58:21 +03:00
a5d6fb4303 Host flags were not propagated to the CUDA compiler, fixed jpekkila 2020-07-29 19:34:28 +03:00
8fb271bbf3 Upped CMake version to 3.18 and cleaned up CUDA architecture selection jpekkila 2020-07-29 18:45:10 +03:00
372f9add36 Merge branch 'mpi-to-master-merge-candidate-2020-06-01' of https://bitbucket.org/jpekkila/astaroth into mpi-to-master-merge-candidate-2020-06-01 jpekkila 2020-07-29 18:44:23 +03:00
5e04a61cd2 README.md edited online with Bitbucket jpekkila 2020-07-29 15:43:58 +00:00
bb821df686 README.md edited online with Bitbucket jpekkila 2020-07-29 15:19:05 +00:00
cd888be9ec README.md edited online with Bitbucket jpekkila 2020-07-29 15:17:37 +00:00
31db032f43 Upped the version number jpekkila 2020-07-29 17:05:07 +03:00
770173a55d Limited automated build time to 5 minutes. jpekkila 2020-07-29 13:55:22 +00:00
003c202e8c Pulled useful changes from the benchmark branch. GPUDirect RDMA (unpinned) is now the default for MPI communication. jpekkila 2020-07-29 16:39:24 +03:00
6cab3586cf The generated fortran header is now consistent with fortran conventions. Also cleaned up the C version of the header. jpekkila 2020-06-29 01:06:30 +03:00
d0ca1f8195 Reduction types are now generated with acc instead of being explicitly declared in astaroth.h jpekkila 2020-06-28 18:16:19 +03:00
852fae17cf Added a function for getting the GPU count from fortran jpekkila 2020-06-28 18:15:40 +03:00
50fb54f1aa Added more warnings since its easy to make off-by-one mistakes when dealing with fortran-c-interop jpekkila 2020-06-28 18:14:54 +03:00
e764725564 acUpdateBuiltinParams now recalculates AC_inv_dsx and others if necessary jpekkila 2020-06-26 09:54:17 +03:00
6f59890a3f Added loading and storing functions to the fortran interface jpekkila 2020-06-26 09:52:33 +03:00
ee4b18c81c Merge branch 'mpi-to-master-merge-candidate-2020-06-01' of https://bitbucket.org/jpekkila/astaroth into mpi-to-master-merge-candidate-2020-06-01 jpekkila 2020-06-25 20:40:24 +03:00
39c7fc6c6f Streams are now generated with acc jpekkila 2020-06-25 20:40:02 +03:00
7e71e32359 Fortran does not seem to really support arrays of pointers, better to modify the interface function to take the f array as an input and use it in C to costruct a proper AcMesh jpekkila 2020-06-25 20:21:16 +03:00
1b50374cdb Added the rest of the basic functions required for running simulations with the fortran interface jpekkila 2020-06-25 20:09:35 +03:00
0a19192004 Auto-optimization was not on for all GPUs when using MPI. May have to rerun all benchmarks for the MPI paper. jpekkila 2020-06-25 19:53:39 +03:00
225c660e0d Merge branch 'mpi-to-master-merge-candidate-2020-06-01' of https://bitbucket.org/jpekkila/astaroth into mpi-to-master-merge-candidate-2020-06-01 jpekkila 2020-06-25 06:44:54 +03:00
172ffc34dc Was missing another fortran file, added jpekkila 2020-06-25 06:44:27 +03:00
264abddefb bitbucket-pipelines.yml edited online with Bitbucket jpekkila 2020-06-25 03:41:23 +00:00
f11c5b84fb Forgot the actual interface from previous commits, here it is jpekkila 2020-06-25 06:36:00 +03:00
c44c3d02b4 Added a sample for testing the Fortran interface jpekkila 2020-06-25 06:35:13 +03:00
fbb8d7c7c6 Added a minimal Fortran interface to Astaroth jpekkila 2020-06-25 06:34:16 +03:00
70ecacee7c Reverted the default build options to what they were before merging (again) jpekkila 2020-06-24 17:04:35 +03:00
196edac46d Added proper casts to modelsolver.c jpekkila 2020-06-24 17:03:54 +03:00
c0c337610b Added mpi_reduce_bench to samples jpekkila 2020-06-24 16:42:39 +03:00
fab620eb0d Reordered reduction autotests and made it so that the exact same mesh is used for both the model and candidates instead of the unclean integrated one jpekkila 2020-06-24 16:34:50 +03:00
ba0bfd65b4 Merged the new reduction functions manually jpekkila 2020-06-24 16:10:27 +03:00
ff1a601f85 Merged mpi-to-master-merge-candidate-2020-06-01 here jpekkila 2020-06-24 16:08:14 +03:00
0d1c5b3911 Autoformatted jpekkila 2020-06-24 15:56:30 +03:00
3c3b2a1885 Reverted the default settings to what they were before merge. Note: LFORCING (1) is potentially not tested properly, TODO recheck. jpekkila 2020-06-24 15:35:19 +03:00
88f99c12e4 Fixed #fi -> #endif jpekkila 2020-06-24 15:20:43 +03:00
f04e347c45 Cleanup before merging to the master merge candidate branch jpekkila 2020-06-24 15:13:15 +03:00
0e4b39d6d7 Added a toggle for using pinned memory jpekkila 2020-06-11 11:28:52 +03:00
0030db01f3 Automatic calculation of nodes based on processes Oskar Lappi 2020-06-10 16:51:35 +03:00
1cdb9e2ce7 Added missing synchronization to the end of the new integration function jpekkila 2020-06-10 12:32:56 +03:00
fa422cf457 Added a better-pipelined version of the acGridIntegrate and a switch for toggling the transfer of corners jpekkila 2020-06-10 02:16:23 +03:00
c7f23eb50c Added partition argument to mpibench script Oskar Lappi 2020-06-09 14:07:37 +03:00
9840b817d0 Added the (hopefully final) basic test case used for the benchmarks jpekkila 2020-06-07 21:59:33 +03:00
cd49db68d7 No barrier benchmark Oskar Lappi 2020-06-07 15:50:49 +03:00
53b48bb8ce MPI_Allreduce -> MPI_Reduce for MPI reductions + benchmark batch script Oskar Lappi 2020-06-06 22:53:08 +03:00
eb05e02793 Added vector reductions to mpi reduction benchmarks Oskar Lappi 2020-06-06 19:22:05 +03:00
666f01a23d Benchmarking program for scalar mpi reductions, and nonbatch script for running benchmarks - New program mpi_reduce_bench - runs testcases defined in source - writes all benchmark results to a csv file, tags the testcase and benchmark run - takes optional argument for benchmark tag, default benchmark tag is a timestamp - New script mpibench.sh - runs the mpi_reduce_bench with defined parameters: - number of tasks - number of nodes - the benchmark tag for mpi_reduce_bench, default tag is the current git HEAD short hash Oskar Lappi 2020-06-05 19:48:40 +03:00
17a4f31451 Added the latest setup used for benchmarks jpekkila 2020-06-04 20:47:03 +03:00
9e5fd40838 Changes after code review by Johannes, and clang-format Oskar Lappi 2020-06-04 18:47:31 +03:00
f7d8de75d2 Reduction test pipeline added to mpitest, Error struct changed: new label field Oskar Lappi 2020-06-04 13:42:34 +03:00
226de32651 Added model solution for reductions and functions for automated testing jpekkila 2020-06-03 13:37:00 +03:00
34793d4e8b Changes after code review with Johannes Oskar Lappi 2020-06-03 12:44:43 +03:00
899d679518 Draft of MPI-based reductions acGridReduceScal, acGridReduceVec Oskar Lappi 2020-06-02 21:30:53 +03:00
0d80834619 Disabled forcing and upwinding for performance tests. Set default grid size to 512^3. Set default cmake params s.t. benchmarks can be reproduced out-of-the-box. jpekkila 2020-06-02 14:08:34 +03:00
a753ca92f2 Made cmake handle MPI linking. Potentially a bad idea (usually better to use mpicc and mpicxx wrappers) jpekkila 2020-05-30 22:02:39 +03:00
f97ed9e513 For reason X git decided to remove integration from the most critical part of the program when merging. Luckily we have autotests. jpekkila 2020-05-30 20:59:39 +03:00
9cafe88d13 Merge branch 'mpi-to-master-merge-candidate-2020-06-01' of https://bitbucket.org/jpekkila/astaroth into mpi-to-master-merge-candidate-2020-06-01 jpekkila 2020-05-30 20:25:48 +03:00
176ceae313 Fixed various compilation warnings jpekkila 2020-05-30 20:23:53 +03:00
2ddeef22ac bitbucket-pipelines.yml edited online with Bitbucket jpekkila 2020-05-30 16:58:45 +00:00
f929b21ac0 bitbucket-pipelines.yml edited online with Bitbucket jpekkila 2020-05-30 16:52:26 +00:00
95275df3f2 bitbucket-pipelines.yml edited online with Bitbucket jpekkila 2020-05-30 16:48:39 +00:00
c24996fdb3 Added a the official Kitware PPA for pulling the latest CMake when doing automated builds. jpekkila 2020-05-30 16:45:08 +00:00
b719306266 Upped the required CMake version. This may be an issue on older machines. Instead of making the user to compile CMake themselves in this case, we could maybe add CMake as a submodule. In any case supporting older CMake versions is not really an option because CUDA support with those is so bad and requires adding dirty hacks to the clean cmakefiles we have now. jpekkila 2020-05-30 19:36:32 +03:00
e05338c128 Merged the newest MPI changes jpekkila 2020-05-30 19:18:46 +03:00
555bf8b252 Reverted the default settings to same as on master for easier merge jpekkila 2020-05-30 19:06:21 +03:00
4748e48c7d Spelling fixes jpekkila 2020-05-28 17:10:17 +03:00
01ad141d90 Added comments and a short overview of the MPI implementation jpekkila 2020-05-28 17:05:12 +03:00
f1138b04ac Cleaned up the MPI implementation, removed all older implementations (removed also MPI window implementation which might be handy in the future when CUDA-aware support is introduced). If the removed stuff is needed later, here are some keywords to help find this commit: MPI_window, sendrecv, bidirectional, unidirectional transfer, real-time pinning, a0s, b0s. jpekkila 2020-05-28 16:42:50 +03:00
0d62f56e27 Tried an alternative approach to comm (was worse than the current solution) and rewrote the current best solution for (now easier to read) jpekkila 2020-05-28 15:31:43 +03:00
f97005a75d Added WIP version of the new bidirectional comm scheme jpekkila 2020-05-27 19:09:32 +03:00
afe5b973ca Added multiplication operator for int3 jpekkila 2020-05-27 19:08:39 +03:00
7e59ea0eff MPI: corners are no longer communicated. Slight performance impact (14 ms vs 15 ms). Tests still pass with 8 GPUs. jpekkila 2020-05-26 19:00:14 +03:00
ec59cdb973 Some formatting and unimportant changes to samples jpekkila 2020-05-26 18:57:46 +03:00
c93b3265e6 Made comm streams high prio jpekkila 2020-04-22 17:03:53 +03:00
22e01b7f1d Rewrote partitioning code jpekkila 2020-04-19 23:23:23 +03:00
4dd825f574 Proper decomposition when using Morton order to partition the computational domain jpekkila 2020-04-19 22:50:26 +03:00
ffb274e16f Linking dynamic CUDA library instead of static (less prone to breaking since Astaroth does not have to be rebuilt when CUDA is updated) jpekkila 2020-04-19 22:33:01 +03:00
8c210b3292 3D decomposition is now done using Morton order instead of linear indexing jpekkila 2020-04-19 22:31:57 +03:00
9cd5909f5a BWtest calculates now aggregate bandwidths per process instead of assuming that all neighbor communication can be done in parallel (Within a node one can have parallel P2P connections to all neighbors and we have an insane total bandwidth, but this is not the case with network, we seem to have only one bidirectional socket) jpekkila 2020-04-09 20:28:04 +03:00
d4a84fb887 Added a PCIe bandwidth test jpekkila 2020-04-09 20:04:54 +03:00
d6e74ee270 Added missing files jpekkila 2020-04-09 19:24:55 +03:00
ed8a0bf7e6 Added bwtest and benchmarkscript to CMakeLists jpekkila 2020-04-07 18:35:12 +03:00
fb41741d74 Improvements to samples jpekkila 2020-04-07 17:58:47 +03:00
427a3ac5d8 Rewrote the previous implementation, now fully works (verified) and gives the speedup we want. Communication latency is now completely hidden on at least two nodes (8 GPUs). Scaling looks very promising. jpekkila 2020-04-06 17:28:02 +03:00
37f1c841a3 Added functions for pinning memory that is sent over the network. TODO pack to and from pinned memory selectively (currently P2P results are overwritten with data in pinned memory) jpekkila 2020-04-06 14:09:12 +03:00
cc9d3f1b9c Found a workaround that gives good inter and intra-node performance. HPC-X MPI implementation does not know how to do p2p comm with pinned arrays (should be 80 GiB/s, measured 10 GiB/s) and internode comm is super slow without pinned arrays (should be 40 GiB/s, measured < 1 GiB/s). Made a proof of concept communicator that pins arrays that are send or received from another node. jpekkila 2020-04-05 20:15:32 +03:00
88e53dfa21 Added a little program for testing the bandwidths of different MPI comm styles on n nodes and processes jpekkila 2020-04-05 17:09:57 +03:00
fe14ae4665 Added an alternative MPI implementation which uses one-sided communication jpekkila 2020-04-02 17:59:53 +03:00
d6d5920553 Pulled improvements to device.cc from the benchmark branch to master jpekkila 2020-03-31 14:23:36 +03:00
9b6d927cf1 It might be better to benchmark MPI codes without synchronization because of overhead of timing individual steps Johannes Pekkila 2020-03-31 12:37:54 +02:00
742dcc2697 Optimized MPI synchronization a bit Johannes Pekkila 2020-03-31 12:36:25 +02:00
24e65ab02d Set decompositions for some nprocs by hand jpekkila 2020-03-30 18:13:50 +03:00

Commit Graph Select branches Hide Pull Requests gaussian_explosion Mono Color

Commit Graph

Select branches

Hide Pull Requests

gaussian_explosion