Commit Graph

779 Commits

Author SHA1 Message Date
jpekkila d4a84fb887 Added a PCIe bandwidth test 2020-04-09 20:04:54 +03:00
jpekkila d6e74ee270 Added missing files 2020-04-09 19:24:55 +03:00
jpekkila ed8a0bf7e6 Added bwtest and benchmarkscript to CMakeLists 2020-04-07 18:35:12 +03:00
jpekkila fb41741d74 Improvements to samples 2020-04-07 17:58:47 +03:00
jpekkila 427a3ac5d8 Rewrote the previous implementation, now fully works (verified) and gives the speedup we want. Communication latency is now completely hidden on at least two nodes (8 GPUs). Scaling looks very promising. 2020-04-06 17:28:02 +03:00
jpekkila 37f1c841a3 Added functions for pinning memory that is sent over the network. TODO pack to and from pinned memory selectively (currently P2P results are overwritten with data in pinned memory) 2020-04-06 14:09:12 +03:00
jpekkila cc9d3f1b9c Found a workaround that gives good inter and intra-node performance. HPC-X MPI implementation does not know how to do p2p comm with pinned arrays (should be 80 GiB/s, measured 10 GiB/s) and internode comm is super slow without pinned arrays (should be 40 GiB/s, measured < 1 GiB/s). Made a proof of concept communicator that pins arrays that are send or received from another node. 2020-04-05 20:15:32 +03:00
jpekkila 88e53dfa21 Added a little program for testing the bandwidths of different MPI comm styles on n nodes and processes 2020-04-05 17:09:57 +03:00
jpekkila fe14ae4665 Added an alternative MPI implementation which uses one-sided communication 2020-04-02 17:59:53 +03:00
Johannes Pekkila 9b6d927cf1 It might be better to benchmark MPI codes without synchronization because of overhead of timing individual steps 2020-03-31 12:37:54 +02:00
Johannes Pekkila 742dcc2697 Optimized MPI synchronization a bit 2020-03-31 12:36:25 +02:00
jpekkila 24e65ab02d Set decompositions for some nprocs by hand 2020-03-30 18:13:50 +03:00
jpekkila 9065381b2a Added the configuration used for benchmarking (not to be merged to master) 2020-03-30 18:01:35 +03:00
jpekkila 850b37e8c8 Added a switch for generating strong and weak scaling results 2020-03-30 17:56:12 +03:00
jpekkila d4eb3e0d35 Benchmarks are now written into a csv-file 2020-03-30 17:41:42 +03:00
jpekkila 9c5011d275 Renamed t to terr to avoid naming conflicts 2020-03-30 17:41:09 +03:00
jpekkila 864699360f Better-looking autoformat 2020-03-30 17:40:38 +03:00
jpekkila af531c1f96 Added a sample for benchmarking 2020-03-30 17:22:41 +03:00
jpekkila cc64968b9e GPUDirect was off, re-enabled 2020-03-26 18:24:42 +02:00
jpekkila 28792770f2 Better overlap with computation and comm. when inner integration is launched first 2020-03-26 18:00:01 +02:00
jpekkila 4c82e3c563 Removed old debug error check 2020-03-26 17:59:29 +02:00
jpekkila 5a898b8e95 mpitest now gives a warning instead of a compilation failure if MPI is not enabled 2020-03-26 15:31:29 +02:00
jpekkila 08f567619a Removed old unused functions for MPi integration and comm 2020-03-26 15:04:57 +02:00
jpekkila 329a71d299 Added an example how to run the code with MPI 2020-03-26 15:02:55 +02:00
jpekkila ed7cf3f540 Added a production-ready interface for doing multi-node runs with Astaroth with MPI 2020-03-26 15:02:37 +02:00
jpekkila dad84b361f Renamed Grid structure to GridDims structure to avoid confusion with MPI Grids used in device.cc 2020-03-26 15:01:33 +02:00
jpekkila db120c129e Modelsolver computes now any built-in parameters automatically instead of relying on the user to supply them (inv_dsx etc) 2020-03-26 14:59:07 +02:00
jpekkila fbd4b9a385 Made the MPI flag global instead of just core 2020-03-26 14:57:22 +02:00
jpekkila e1bec4459b Removed an unused variable 2020-03-25 13:54:43 +02:00
jpekkila ce81df00e3 Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth 2020-03-25 13:51:07 +02:00
jpekkila e36ee7e2d6 AC_multigpu_offset tested to work on at least 2 nodes and 8 GPUs. Forcing should now work with MPI 2020-03-25 13:51:00 +02:00
jpekkila 0254628016 Updated API specification. The DSL syntax allows only C++-style casting. 2020-03-25 11:28:30 +00:00
jpekkila 672137f7f1 WIP further MPI optimizations 2020-03-24 19:02:58 +02:00
jpekkila ef63813679 Explicit check that critical parameters like inv_dsx are properly initialized before calling integration 2020-03-24 17:01:24 +02:00
jpekkila 8c362b44f0 Added more warning in case some of the model solver parameters are not initialized 2020-03-24 16:56:30 +02:00
jpekkila d520835c42 Added integration to MPI comm, now completes a full integration step. Works at least on 2 nodes 2020-03-24 16:55:38 +02:00
jpekkila 37d6ad18d3 Fixed formatting in the API specification file 2020-03-04 15:09:23 +02:00
jpekkila 13b9b39c0d Renamed sink_particle.md to .txt to avoid it showing up in the documentation 2020-02-28 14:44:51 +02:00
jpekkila daa895d2fc Fixed an issue that prevented Ninja being used as an alternative build system to Make. There's no signifant performance benefit to using Ninja though. Build times: 29-32 s (Make) and 27-28 s (Ninja) 2020-02-10 14:37:48 +02:00
jpekkila 7b39a6bb1d AC_multigpu_offset is now calculated with MPI. Should now work with forcing, but not tested 2020-02-03 15:45:23 +02:00
jpekkila 50af620a7b More accurate timing when benchmarking MPI. Also made GPU-GPU communication the default. Current version of UCX is bugged, must export 'UCX_MEMTYPE_CACHE=n' to workaround memory errors when doing GPU-GPU comm 2020-02-03 15:27:36 +02:00
jpekkila 459d39a411 README.md edited online with Bitbucket 2020-01-28 17:10:52 +00:00
jpekkila ade8b10e8f bitbucket-pipelines.yml edited online with Bitbucket. Removed an unnecessary compiler flag. 2020-01-28 17:09:35 +00:00
jpekkila 17c935ce19 Added padding to param name buffers to make them have NUM_*_PARAMS+1 elements. This should satisfy some strict compilation checks. 2020-01-28 18:53:09 +02:00
jpekkila 89f4d08b6c Fixed a possible out-of-bounds access in error checking when NUM_*_PARAMS is 0 2020-01-28 18:43:03 +02:00
jpekkila 7685d8a830 Astaroth 2.2 update complete. 2020-01-28 18:28:38 +02:00
jpekkila 67f2fcc88d Setting inv_dsx etc explicitly is no longer required as they are set to default values in acc/stdlib/stdderiv.h 2020-01-28 18:22:27 +02:00
jpekkila 0ccd4e3dbc Major improvement: uniforms can now be set to default values. The syntax is the same as for setting any other values, f.ex. 'uniform Scalar a = 1; uniform Scalar b = 0.5 * a;'. Undefined uniforms are still allowed, but in this case the user should load a proper value into it during runtime. Default uniform values can be overwritten by calling any of the uniform loader funcions (like acDeviceLoadScalarUniform). Improved also error checking. Now there are explicit warnings if the user tries to load an invalid value into a device constant. 2020-01-28 18:17:31 +02:00
jpekkila 6dfe3ed4d6 Added out-of-the-box support for MPI (though not enabled by default). Previously the user had to pass mpicxx explicitly as the cmake compiler in order to compile MPI code, but this was bad practice and it's better to let cmake handle the include and compilation flags. 2020-01-28 15:59:20 +02:00
jpekkila 85d4de24e3 Recompilation is now properly triggered when acc sources or the ac standard library are modified 2020-01-28 14:12:25 +02:00