jpekkila
|
ffb274e16f
|
Linking dynamic CUDA library instead of static (less prone to breaking since Astaroth does not have to be rebuilt when CUDA is updated)
|
2020-04-19 22:33:01 +03:00 |
|
jpekkila
|
8c210b3292
|
3D decomposition is now done using Morton order instead of linear indexing
|
2020-04-19 22:31:57 +03:00 |
|
jpekkila
|
9cd5909f5a
|
BWtest calculates now aggregate bandwidths per process instead of assuming that all neighbor communication can be done in parallel (Within a node one can have parallel P2P connections to all neighbors and we have an insane total bandwidth, but this is not the case with network, we seem to have only one bidirectional socket)
|
2020-04-09 20:28:04 +03:00 |
|
jpekkila
|
d4a84fb887
|
Added a PCIe bandwidth test
|
2020-04-09 20:04:54 +03:00 |
|
jpekkila
|
d6e74ee270
|
Added missing files
|
2020-04-09 19:24:55 +03:00 |
|
jpekkila
|
ed8a0bf7e6
|
Added bwtest and benchmarkscript to CMakeLists
|
2020-04-07 18:35:12 +03:00 |
|
jpekkila
|
fb41741d74
|
Improvements to samples
|
2020-04-07 17:58:47 +03:00 |
|
jpekkila
|
427a3ac5d8
|
Rewrote the previous implementation, now fully works (verified) and gives the speedup we want. Communication latency is now completely hidden on at least two nodes (8 GPUs). Scaling looks very promising.
|
2020-04-06 17:28:02 +03:00 |
|
jpekkila
|
37f1c841a3
|
Added functions for pinning memory that is sent over the network. TODO pack to and from pinned memory selectively (currently P2P results are overwritten with data in pinned memory)
|
2020-04-06 14:09:12 +03:00 |
|
jpekkila
|
cc9d3f1b9c
|
Found a workaround that gives good inter and intra-node performance. HPC-X MPI implementation does not know how to do p2p comm with pinned arrays (should be 80 GiB/s, measured 10 GiB/s) and internode comm is super slow without pinned arrays (should be 40 GiB/s, measured < 1 GiB/s). Made a proof of concept communicator that pins arrays that are send or received from another node.
|
2020-04-05 20:15:32 +03:00 |
|
jpekkila
|
88e53dfa21
|
Added a little program for testing the bandwidths of different MPI comm styles on n nodes and processes
|
2020-04-05 17:09:57 +03:00 |
|
jpekkila
|
fe14ae4665
|
Added an alternative MPI implementation which uses one-sided communication
|
2020-04-02 17:59:53 +03:00 |
|
Johannes Pekkila
|
9b6d927cf1
|
It might be better to benchmark MPI codes without synchronization because of overhead of timing individual steps
|
2020-03-31 12:37:54 +02:00 |
|
Johannes Pekkila
|
742dcc2697
|
Optimized MPI synchronization a bit
|
2020-03-31 12:36:25 +02:00 |
|
jpekkila
|
24e65ab02d
|
Set decompositions for some nprocs by hand
|
2020-03-30 18:13:50 +03:00 |
|
jpekkila
|
9065381b2a
|
Added the configuration used for benchmarking (not to be merged to master)
|
2020-03-30 18:01:35 +03:00 |
|
jpekkila
|
850b37e8c8
|
Added a switch for generating strong and weak scaling results
|
2020-03-30 17:56:12 +03:00 |
|
jpekkila
|
d4eb3e0d35
|
Benchmarks are now written into a csv-file
|
2020-03-30 17:41:42 +03:00 |
|
jpekkila
|
9c5011d275
|
Renamed t to terr to avoid naming conflicts
|
2020-03-30 17:41:09 +03:00 |
|
jpekkila
|
864699360f
|
Better-looking autoformat
|
2020-03-30 17:40:38 +03:00 |
|
jpekkila
|
af531c1f96
|
Added a sample for benchmarking
|
2020-03-30 17:22:41 +03:00 |
|
jpekkila
|
cc64968b9e
|
GPUDirect was off, re-enabled
|
2020-03-26 18:24:42 +02:00 |
|
jpekkila
|
28792770f2
|
Better overlap with computation and comm. when inner integration is launched first
|
2020-03-26 18:00:01 +02:00 |
|
jpekkila
|
4c82e3c563
|
Removed old debug error check
|
2020-03-26 17:59:29 +02:00 |
|
jpekkila
|
5a898b8e95
|
mpitest now gives a warning instead of a compilation failure if MPI is not enabled
|
2020-03-26 15:31:29 +02:00 |
|
jpekkila
|
08f567619a
|
Removed old unused functions for MPi integration and comm
|
2020-03-26 15:04:57 +02:00 |
|
jpekkila
|
329a71d299
|
Added an example how to run the code with MPI
|
2020-03-26 15:02:55 +02:00 |
|
jpekkila
|
ed7cf3f540
|
Added a production-ready interface for doing multi-node runs with Astaroth with MPI
|
2020-03-26 15:02:37 +02:00 |
|
jpekkila
|
dad84b361f
|
Renamed Grid structure to GridDims structure to avoid confusion with MPI Grids used in device.cc
|
2020-03-26 15:01:33 +02:00 |
|
jpekkila
|
db120c129e
|
Modelsolver computes now any built-in parameters automatically instead of relying on the user to supply them (inv_dsx etc)
|
2020-03-26 14:59:07 +02:00 |
|
jpekkila
|
fbd4b9a385
|
Made the MPI flag global instead of just core
|
2020-03-26 14:57:22 +02:00 |
|
jpekkila
|
e1bec4459b
|
Removed an unused variable
|
2020-03-25 13:54:43 +02:00 |
|
jpekkila
|
ce81df00e3
|
Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth
|
2020-03-25 13:51:07 +02:00 |
|
jpekkila
|
e36ee7e2d6
|
AC_multigpu_offset tested to work on at least 2 nodes and 8 GPUs. Forcing should now work with MPI
|
2020-03-25 13:51:00 +02:00 |
|
jpekkila
|
0254628016
|
Updated API specification. The DSL syntax allows only C++-style casting.
|
2020-03-25 11:28:30 +00:00 |
|
jpekkila
|
672137f7f1
|
WIP further MPI optimizations
|
2020-03-24 19:02:58 +02:00 |
|
jpekkila
|
ef63813679
|
Explicit check that critical parameters like inv_dsx are properly initialized before calling integration
|
2020-03-24 17:01:24 +02:00 |
|
jpekkila
|
8c362b44f0
|
Added more warning in case some of the model solver parameters are not initialized
|
2020-03-24 16:56:30 +02:00 |
|
jpekkila
|
d520835c42
|
Added integration to MPI comm, now completes a full integration step. Works at least on 2 nodes
|
2020-03-24 16:55:38 +02:00 |
|
jpekkila
|
37d6ad18d3
|
Fixed formatting in the API specification file
|
2020-03-04 15:09:23 +02:00 |
|
jpekkila
|
13b9b39c0d
|
Renamed sink_particle.md to .txt to avoid it showing up in the documentation
|
2020-02-28 14:44:51 +02:00 |
|
jpekkila
|
daa895d2fc
|
Fixed an issue that prevented Ninja being used as an alternative build system to Make. There's no signifant performance benefit to using Ninja though. Build times: 29-32 s (Make) and 27-28 s (Ninja)
|
2020-02-10 14:37:48 +02:00 |
|
jpekkila
|
7b39a6bb1d
|
AC_multigpu_offset is now calculated with MPI. Should now work with forcing, but not tested
|
2020-02-03 15:45:23 +02:00 |
|
jpekkila
|
50af620a7b
|
More accurate timing when benchmarking MPI. Also made GPU-GPU communication the default. Current version of UCX is bugged, must export 'UCX_MEMTYPE_CACHE=n' to workaround memory errors when doing GPU-GPU comm
|
2020-02-03 15:27:36 +02:00 |
|
jpekkila
|
459d39a411
|
README.md edited online with Bitbucket
|
2020-01-28 17:10:52 +00:00 |
|
jpekkila
|
ade8b10e8f
|
bitbucket-pipelines.yml edited online with Bitbucket. Removed an unnecessary compiler flag.
|
2020-01-28 17:09:35 +00:00 |
|
jpekkila
|
17c935ce19
|
Added padding to param name buffers to make them have NUM_*_PARAMS+1 elements. This should satisfy some strict compilation checks.
|
2020-01-28 18:53:09 +02:00 |
|
jpekkila
|
89f4d08b6c
|
Fixed a possible out-of-bounds access in error checking when NUM_*_PARAMS is 0
|
2020-01-28 18:43:03 +02:00 |
|
jpekkila
|
7685d8a830
|
Astaroth 2.2 update complete.
|
2020-01-28 18:28:38 +02:00 |
|
jpekkila
|
67f2fcc88d
|
Setting inv_dsx etc explicitly is no longer required as they are set to default values in acc/stdlib/stdderiv.h
|
2020-01-28 18:22:27 +02:00 |
|