Commit Graph

904 Commits

Author SHA1 Message Date
jpekkila f97ed9e513 For reason X git decided to remove integration from the most critical part of the program when merging. Luckily we have autotests. 2020-05-30 20:59:39 +03:00
jpekkila 9cafe88d13 Merge branch 'mpi-to-master-merge-candidate-2020-06-01' of https://bitbucket.org/jpekkila/astaroth into mpi-to-master-merge-candidate-2020-06-01 2020-05-30 20:25:48 +03:00
jpekkila 176ceae313 Fixed various compilation warnings 2020-05-30 20:23:53 +03:00
jpekkila 2ddeef22ac bitbucket-pipelines.yml edited online with Bitbucket 2020-05-30 16:58:45 +00:00
jpekkila f929b21ac0 bitbucket-pipelines.yml edited online with Bitbucket 2020-05-30 16:52:26 +00:00
jpekkila 95275df3f2 bitbucket-pipelines.yml edited online with Bitbucket 2020-05-30 16:48:39 +00:00
jpekkila c24996fdb3 Added a the official Kitware PPA for pulling the latest CMake when doing automated builds. 2020-05-30 16:45:08 +00:00
jpekkila b719306266 Upped the required CMake version. This may be an issue on older machines. Instead of making the user to compile CMake themselves in this case, we could maybe add CMake as a submodule. In any case supporting older CMake versions is not really an option because CUDA support with those is so bad and requires adding dirty hacks to the clean cmakefiles we have now. 2020-05-30 19:36:32 +03:00
jpekkila e05338c128 Merged the newest MPI changes 2020-05-30 19:18:46 +03:00
jpekkila 555bf8b252 Reverted the default settings to same as on master for easier merge 2020-05-30 19:06:21 +03:00
jpekkila 4748e48c7d Spelling fixes 2020-05-28 17:10:17 +03:00
jpekkila 01ad141d90 Added comments and a short overview of the MPI implementation 2020-05-28 17:05:12 +03:00
jpekkila f1138b04ac Cleaned up the MPI implementation, removed all older implementations (removed also MPI window implementation which might be handy in the future when CUDA-aware support is introduced). If the removed stuff is needed later, here are some keywords to help find this commit: MPI_window, sendrecv, bidirectional, unidirectional transfer, real-time pinning, a0s, b0s. 2020-05-28 16:42:50 +03:00
jpekkila 0d62f56e27 Tried an alternative approach to comm (was worse than the current solution) and rewrote the current best solution for (now easier to read) 2020-05-28 15:31:43 +03:00
jpekkila f97005a75d Added WIP version of the new bidirectional comm scheme 2020-05-27 19:09:32 +03:00
jpekkila afe5b973ca Added multiplication operator for int3 2020-05-27 19:08:39 +03:00
jpekkila 7e59ea0eff MPI: corners are no longer communicated. Slight performance impact (14 ms vs 15 ms). Tests still pass with 8 GPUs. 2020-05-26 19:00:14 +03:00
jpekkila ec59cdb973 Some formatting and unimportant changes to samples 2020-05-26 18:57:46 +03:00
jpekkila c93b3265e6 Made comm streams high prio 2020-04-22 17:03:53 +03:00
jpekkila 22e01b7f1d Rewrote partitioning code 2020-04-19 23:23:23 +03:00
jpekkila 4dd825f574 Proper decomposition when using Morton order to partition the computational domain 2020-04-19 22:50:26 +03:00
jpekkila ffb274e16f Linking dynamic CUDA library instead of static (less prone to breaking since Astaroth does not have to be rebuilt when CUDA is updated) 2020-04-19 22:33:01 +03:00
jpekkila 8c210b3292 3D decomposition is now done using Morton order instead of linear indexing 2020-04-19 22:31:57 +03:00
jpekkila 9cd5909f5a BWtest calculates now aggregate bandwidths per process instead of assuming that all neighbor communication can be done in parallel (Within a node one can have parallel P2P connections to all neighbors and we have an insane total bandwidth, but this is not the case with network, we seem to have only one bidirectional socket) 2020-04-09 20:28:04 +03:00
jpekkila d4a84fb887 Added a PCIe bandwidth test 2020-04-09 20:04:54 +03:00
jpekkila d6e74ee270 Added missing files 2020-04-09 19:24:55 +03:00
jpekkila ed8a0bf7e6 Added bwtest and benchmarkscript to CMakeLists 2020-04-07 18:35:12 +03:00
jpekkila fb41741d74 Improvements to samples 2020-04-07 17:58:47 +03:00
jpekkila 427a3ac5d8 Rewrote the previous implementation, now fully works (verified) and gives the speedup we want. Communication latency is now completely hidden on at least two nodes (8 GPUs). Scaling looks very promising. 2020-04-06 17:28:02 +03:00
jpekkila 37f1c841a3 Added functions for pinning memory that is sent over the network. TODO pack to and from pinned memory selectively (currently P2P results are overwritten with data in pinned memory) 2020-04-06 14:09:12 +03:00
jpekkila cc9d3f1b9c Found a workaround that gives good inter and intra-node performance. HPC-X MPI implementation does not know how to do p2p comm with pinned arrays (should be 80 GiB/s, measured 10 GiB/s) and internode comm is super slow without pinned arrays (should be 40 GiB/s, measured < 1 GiB/s). Made a proof of concept communicator that pins arrays that are send or received from another node. 2020-04-05 20:15:32 +03:00
jpekkila 88e53dfa21 Added a little program for testing the bandwidths of different MPI comm styles on n nodes and processes 2020-04-05 17:09:57 +03:00
jpekkila fe14ae4665 Added an alternative MPI implementation which uses one-sided communication 2020-04-02 17:59:53 +03:00
jpekkila d6d5920553 Pulled improvements to device.cc from the benchmark branch to master 2020-03-31 14:23:36 +03:00
Johannes Pekkila 9b6d927cf1 It might be better to benchmark MPI codes without synchronization because of overhead of timing individual steps 2020-03-31 12:37:54 +02:00
Johannes Pekkila 742dcc2697 Optimized MPI synchronization a bit 2020-03-31 12:36:25 +02:00
jpekkila 24e65ab02d Set decompositions for some nprocs by hand 2020-03-30 18:13:50 +03:00
jpekkila 9065381b2a Added the configuration used for benchmarking (not to be merged to master) 2020-03-30 18:01:35 +03:00
jpekkila 850b37e8c8 Added a switch for generating strong and weak scaling results 2020-03-30 17:56:12 +03:00
jpekkila d4eb3e0d35 Benchmarks are now written into a csv-file 2020-03-30 17:41:42 +03:00
jpekkila 9c5011d275 Renamed t to terr to avoid naming conflicts 2020-03-30 17:41:09 +03:00
jpekkila 864699360f Better-looking autoformat 2020-03-30 17:40:38 +03:00
jpekkila af531c1f96 Added a sample for benchmarking 2020-03-30 17:22:41 +03:00
jpekkila cc64968b9e GPUDirect was off, re-enabled 2020-03-26 18:24:42 +02:00
jpekkila 28792770f2 Better overlap with computation and comm. when inner integration is launched first 2020-03-26 18:00:01 +02:00
jpekkila 4c82e3c563 Removed old debug error check 2020-03-26 17:59:29 +02:00
jpekkila 5a898b8e95 mpitest now gives a warning instead of a compilation failure if MPI is not enabled 2020-03-26 15:31:29 +02:00
jpekkila 08f567619a Removed old unused functions for MPi integration and comm 2020-03-26 15:04:57 +02:00
jpekkila 329a71d299 Added an example how to run the code with MPI 2020-03-26 15:02:55 +02:00
jpekkila ed7cf3f540 Added a production-ready interface for doing multi-node runs with Astaroth with MPI 2020-03-26 15:02:37 +02:00