astaroth

Author	SHA1	Message	Date
jpekkila	fb41741d74	Improvements to samples	2020-04-07 17:58:47 +03:00
jpekkila	427a3ac5d8	Rewrote the previous implementation, now fully works (verified) and gives the speedup we want. Communication latency is now completely hidden on at least two nodes (8 GPUs). Scaling looks very promising.	2020-04-06 17:28:02 +03:00
jpekkila	37f1c841a3	Added functions for pinning memory that is sent over the network. TODO pack to and from pinned memory selectively (currently P2P results are overwritten with data in pinned memory)	2020-04-06 14:09:12 +03:00
jpekkila	cc9d3f1b9c	Found a workaround that gives good inter and intra-node performance. HPC-X MPI implementation does not know how to do p2p comm with pinned arrays (should be 80 GiB/s, measured 10 GiB/s) and internode comm is super slow without pinned arrays (should be 40 GiB/s, measured < 1 GiB/s). Made a proof of concept communicator that pins arrays that are send or received from another node.	2020-04-05 20:15:32 +03:00
jpekkila	88e53dfa21	Added a little program for testing the bandwidths of different MPI comm styles on n nodes and processes	2020-04-05 17:09:57 +03:00
jpekkila	fe14ae4665	Added an alternative MPI implementation which uses one-sided communication	2020-04-02 17:59:53 +03:00
jpekkila	d6d5920553	Pulled improvements to device.cc from the benchmark branch to master	2020-03-31 14:23:36 +03:00
Johannes Pekkila	9b6d927cf1	It might be better to benchmark MPI codes without synchronization because of overhead of timing individual steps	2020-03-31 12:37:54 +02:00
Johannes Pekkila	742dcc2697	Optimized MPI synchronization a bit	2020-03-31 12:36:25 +02:00
jpekkila	24e65ab02d	Set decompositions for some nprocs by hand	2020-03-30 18:13:50 +03:00
jpekkila	9065381b2a	Added the configuration used for benchmarking (not to be merged to master)	2020-03-30 18:01:35 +03:00
jpekkila	850b37e8c8	Added a switch for generating strong and weak scaling results	2020-03-30 17:56:12 +03:00
jpekkila	d4eb3e0d35	Benchmarks are now written into a csv-file	2020-03-30 17:41:42 +03:00
jpekkila	9c5011d275	Renamed t to terr to avoid naming conflicts	2020-03-30 17:41:09 +03:00
jpekkila	864699360f	Better-looking autoformat	2020-03-30 17:40:38 +03:00
jpekkila	af531c1f96	Added a sample for benchmarking	2020-03-30 17:22:41 +03:00
jpekkila	cc64968b9e	GPUDirect was off, re-enabled	2020-03-26 18:24:42 +02:00
jpekkila	28792770f2	Better overlap with computation and comm. when inner integration is launched first	2020-03-26 18:00:01 +02:00
jpekkila	4c82e3c563	Removed old debug error check	2020-03-26 17:59:29 +02:00
jpekkila	5a898b8e95	mpitest now gives a warning instead of a compilation failure if MPI is not enabled	2020-03-26 15:31:29 +02:00
jpekkila	08f567619a	Removed old unused functions for MPi integration and comm	2020-03-26 15:04:57 +02:00
jpekkila	329a71d299	Added an example how to run the code with MPI	2020-03-26 15:02:55 +02:00
jpekkila	ed7cf3f540	Added a production-ready interface for doing multi-node runs with Astaroth with MPI	2020-03-26 15:02:37 +02:00
jpekkila	dad84b361f	Renamed Grid structure to GridDims structure to avoid confusion with MPI Grids used in device.cc	2020-03-26 15:01:33 +02:00
jpekkila	db120c129e	Modelsolver computes now any built-in parameters automatically instead of relying on the user to supply them (inv_dsx etc)	2020-03-26 14:59:07 +02:00
jpekkila	fbd4b9a385	Made the MPI flag global instead of just core	2020-03-26 14:57:22 +02:00
jpekkila	e1bec4459b	Removed an unused variable	2020-03-25 13:54:43 +02:00
jpekkila	ce81df00e3	Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth	2020-03-25 13:51:07 +02:00
jpekkila	e36ee7e2d6	AC_multigpu_offset tested to work on at least 2 nodes and 8 GPUs. Forcing should now work with MPI	2020-03-25 13:51:00 +02:00
jpekkila	0254628016	Updated API specification. The DSL syntax allows only C++-style casting.	2020-03-25 11:28:30 +00:00
jpekkila	672137f7f1	WIP further MPI optimizations	2020-03-24 19:02:58 +02:00
jpekkila	ef63813679	Explicit check that critical parameters like inv_dsx are properly initialized before calling integration	2020-03-24 17:01:24 +02:00
jpekkila	8c362b44f0	Added more warning in case some of the model solver parameters are not initialized	2020-03-24 16:56:30 +02:00
jpekkila	d520835c42	Added integration to MPI comm, now completes a full integration step. Works at least on 2 nodes	2020-03-24 16:55:38 +02:00
jpekkila	37d6ad18d3	Fixed formatting in the API specification file	2020-03-04 15:09:23 +02:00
jpekkila	13b9b39c0d	Renamed sink_particle.md to .txt to avoid it showing up in the documentation	2020-02-28 14:44:51 +02:00
jpekkila	daa895d2fc	Fixed an issue that prevented Ninja being used as an alternative build system to Make. There's no signifant performance benefit to using Ninja though. Build times: 29-32 s (Make) and 27-28 s (Ninja)	2020-02-10 14:37:48 +02:00
jpekkila	7b39a6bb1d	AC_multigpu_offset is now calculated with MPI. Should now work with forcing, but not tested	2020-02-03 15:45:23 +02:00
jpekkila	50af620a7b	More accurate timing when benchmarking MPI. Also made GPU-GPU communication the default. Current version of UCX is bugged, must export 'UCX_MEMTYPE_CACHE=n' to workaround memory errors when doing GPU-GPU comm	2020-02-03 15:27:36 +02:00
jpekkila	459d39a411	README.md edited online with Bitbucket	2020-01-28 17:10:52 +00:00
jpekkila	ade8b10e8f	bitbucket-pipelines.yml edited online with Bitbucket. Removed an unnecessary compiler flag.	2020-01-28 17:09:35 +00:00
jpekkila	17c935ce19	Added padding to param name buffers to make them have NUM_*_PARAMS+1 elements. This should satisfy some strict compilation checks.	2020-01-28 18:53:09 +02:00
jpekkila	89f4d08b6c	Fixed a possible out-of-bounds access in error checking when NUM_*_PARAMS is 0	2020-01-28 18:43:03 +02:00
jpekkila	7685d8a830	Astaroth 2.2 update complete.	2020-01-28 18:28:38 +02:00
jpekkila	67f2fcc88d	Setting inv_dsx etc explicitly is no longer required as they are set to default values in acc/stdlib/stdderiv.h	2020-01-28 18:22:27 +02:00
jpekkila	0ccd4e3dbc	Major improvement: uniforms can now be set to default values. The syntax is the same as for setting any other values, f.ex. 'uniform Scalar a = 1; uniform Scalar b = 0.5 * a;'. Undefined uniforms are still allowed, but in this case the user should load a proper value into it during runtime. Default uniform values can be overwritten by calling any of the uniform loader funcions (like acDeviceLoadScalarUniform). Improved also error checking. Now there are explicit warnings if the user tries to load an invalid value into a device constant.	2020-01-28 18:17:31 +02:00
jpekkila	6dfe3ed4d6	Added out-of-the-box support for MPI (though not enabled by default). Previously the user had to pass mpicxx explicitly as the cmake compiler in order to compile MPI code, but this was bad practice and it's better to let cmake handle the include and compilation flags.	2020-01-28 15:59:20 +02:00
jpekkila	85d4de24e3	Recompilation is now properly triggered when acc sources or the ac standard library are modified	2020-01-28 14:12:25 +02:00
jpekkila	07dd9ff024	Updated documentation with the changes made for Astaroth 2.2	2020-01-28 13:36:51 +02:00
jpekkila	5444c84cff	Formatting	2020-01-27 18:24:46 +02:00

... 2 3 4 5 6 ...

927 Commits