astaroth

Author	SHA1	Message	Date
jpekkila	9840b817d0	Added the (hopefully final) basic test case used for the benchmarks	2020-06-07 21:59:33 +03:00
Oskar Lappi	cd49db68d7	No barrier benchmark	2020-06-07 15:50:49 +03:00
Oskar Lappi	53b48bb8ce	MPI_Allreduce -> MPI_Reduce for MPI reductions + benchmark batch script Slightly ugly because this changes the benchmark behaviour slightly However we now have a way to run batch benchmarks from one script, no need to generate new ones	2020-06-06 22:56:05 +03:00
Oskar Lappi	eb05e02793	Added vector reductions to mpi reduction benchmarks	2020-06-06 19:25:30 +03:00
Oskar Lappi	666f01a23d	Benchmarking program for scalar mpi reductions, and nonbatch script for running benchmarks - New program mpi_reduce_bench - runs testcases defined in source - writes all benchmark results to a csv file, tags the testcase and benchmark run - takes optional argument for benchmark tag, default benchmark tag is a timestamp - New script mpibench.sh - runs the mpi_reduce_bench with defined parameters: - number of tasks - number of nodes - the benchmark tag for mpi_reduce_bench, default tag is the current git HEAD short hash	2020-06-05 19:48:40 +03:00
jpekkila	17a4f31451	Added the latest setup used for benchmarks	2020-06-04 20:47:03 +03:00
Oskar Lappi	9e5fd40838	Changes after code review by Johannes, and clang-format	2020-06-04 18:50:22 +03:00
Oskar Lappi	f7d8de75d2	Reduction test pipeline added to mpitest, Error struct changed: new label field - CHANGED: Error struct has a new label field for labeling an error - The label is what is printed to screen - vtxbuf name lookup moved out of printErrorToScreen/print_error_to_screen - NEW: acScalReductionTestCase and acVecReductionTestCase - Define new test cases by adding them to a list in samples/mpitest/main.cc:main - Minor style change in verification.c to make all Verification functions similar and fit one screen	2020-06-04 15:10:35 +03:00
jpekkila	226de32651	Added model solution for reductions and functions for automated testing	2020-06-03 13:37:00 +03:00
Oskar Lappi	34793d4e8b	Changes after code review with Johannes	2020-06-03 12:44:43 +03:00
Oskar Lappi	899d679518	Draft of MPI-based reductions acGridReduceScal, acGridReduceVec - Calls acDeviceReduceScal/Vec first - Both functions then perform the same MPI-reduction (MPI_Allreduce) - Not tested	2020-06-02 21:59:30 +03:00
jpekkila	0d80834619	Disabled forcing and upwinding for performance tests. Set default grid size to 512^3. Set default cmake params s.t. benchmarks can be reproduced out-of-the-box.	2020-06-02 14:09:00 +03:00
jpekkila	a753ca92f2	Made cmake handle MPI linking. Potentially a bad idea (usually better to use mpicc and mpicxx wrappers)	2020-05-30 22:02:39 +03:00
jpekkila	f97ed9e513	For reason X git decided to remove integration from the most critical part of the program when merging. Luckily we have autotests.	2020-05-30 20:59:39 +03:00
jpekkila	9cafe88d13	Merge branch 'mpi-to-master-merge-candidate-2020-06-01' of https://bitbucket.org/jpekkila/astaroth into mpi-to-master-merge-candidate-2020-06-01	2020-05-30 20:25:48 +03:00
jpekkila	176ceae313	Fixed various compilation warnings	2020-05-30 20:23:53 +03:00
jpekkila	2ddeef22ac	bitbucket-pipelines.yml edited online with Bitbucket	2020-05-30 16:58:45 +00:00
jpekkila	f929b21ac0	bitbucket-pipelines.yml edited online with Bitbucket	2020-05-30 16:52:26 +00:00
jpekkila	95275df3f2	bitbucket-pipelines.yml edited online with Bitbucket	2020-05-30 16:48:39 +00:00
jpekkila	c24996fdb3	Added a the official Kitware PPA for pulling the latest CMake when doing automated builds.	2020-05-30 16:45:08 +00:00
jpekkila	b719306266	Upped the required CMake version. This may be an issue on older machines. Instead of making the user to compile CMake themselves in this case, we could maybe add CMake as a submodule. In any case supporting older CMake versions is not really an option because CUDA support with those is so bad and requires adding dirty hacks to the clean cmakefiles we have now.	2020-05-30 19:36:32 +03:00
jpekkila	e05338c128	Merged the newest MPI changes	2020-05-30 19:18:46 +03:00
jpekkila	555bf8b252	Reverted the default settings to same as on master for easier merge	2020-05-30 19:06:21 +03:00
jpekkila	4748e48c7d	Spelling fixes	2020-05-28 17:10:17 +03:00
jpekkila	01ad141d90	Added comments and a short overview of the MPI implementation	2020-05-28 17:05:12 +03:00
jpekkila	f1138b04ac	Cleaned up the MPI implementation, removed all older implementations (removed also MPI window implementation which might be handy in the future when CUDA-aware support is introduced). If the removed stuff is needed later, here are some keywords to help find this commit: MPI_window, sendrecv, bidirectional, unidirectional transfer, real-time pinning, a0s, b0s.	2020-05-28 16:42:50 +03:00
jpekkila	0d62f56e27	Tried an alternative approach to comm (was worse than the current solution) and rewrote the current best solution for (now easier to read)	2020-05-28 15:31:43 +03:00
jpekkila	f97005a75d	Added WIP version of the new bidirectional comm scheme	2020-05-27 19:09:32 +03:00
jpekkila	afe5b973ca	Added multiplication operator for int3	2020-05-27 19:08:39 +03:00
jpekkila	7e59ea0eff	MPI: corners are no longer communicated. Slight performance impact (14 ms vs 15 ms). Tests still pass with 8 GPUs.	2020-05-26 19:00:14 +03:00
jpekkila	ec59cdb973	Some formatting and unimportant changes to samples	2020-05-26 18:57:46 +03:00
jpekkila	c93b3265e6	Made comm streams high prio	2020-04-22 17:03:53 +03:00
jpekkila	22e01b7f1d	Rewrote partitioning code	2020-04-19 23:23:23 +03:00
jpekkila	4dd825f574	Proper decomposition when using Morton order to partition the computational domain	2020-04-19 22:50:26 +03:00
jpekkila	ffb274e16f	Linking dynamic CUDA library instead of static (less prone to breaking since Astaroth does not have to be rebuilt when CUDA is updated)	2020-04-19 22:33:01 +03:00
jpekkila	8c210b3292	3D decomposition is now done using Morton order instead of linear indexing	2020-04-19 22:31:57 +03:00
jpekkila	9cd5909f5a	BWtest calculates now aggregate bandwidths per process instead of assuming that all neighbor communication can be done in parallel (Within a node one can have parallel P2P connections to all neighbors and we have an insane total bandwidth, but this is not the case with network, we seem to have only one bidirectional socket)	2020-04-09 20:28:04 +03:00
jpekkila	d4a84fb887	Added a PCIe bandwidth test	2020-04-09 20:04:54 +03:00
jpekkila	d6e74ee270	Added missing files	2020-04-09 19:24:55 +03:00
jpekkila	ed8a0bf7e6	Added bwtest and benchmarkscript to CMakeLists	2020-04-07 18:35:12 +03:00
jpekkila	fb41741d74	Improvements to samples	2020-04-07 17:58:47 +03:00
jpekkila	427a3ac5d8	Rewrote the previous implementation, now fully works (verified) and gives the speedup we want. Communication latency is now completely hidden on at least two nodes (8 GPUs). Scaling looks very promising.	2020-04-06 17:28:02 +03:00
jpekkila	37f1c841a3	Added functions for pinning memory that is sent over the network. TODO pack to and from pinned memory selectively (currently P2P results are overwritten with data in pinned memory)	2020-04-06 14:09:12 +03:00
jpekkila	cc9d3f1b9c	Found a workaround that gives good inter and intra-node performance. HPC-X MPI implementation does not know how to do p2p comm with pinned arrays (should be 80 GiB/s, measured 10 GiB/s) and internode comm is super slow without pinned arrays (should be 40 GiB/s, measured < 1 GiB/s). Made a proof of concept communicator that pins arrays that are send or received from another node.	2020-04-05 20:15:32 +03:00
jpekkila	88e53dfa21	Added a little program for testing the bandwidths of different MPI comm styles on n nodes and processes	2020-04-05 17:09:57 +03:00
jpekkila	fe14ae4665	Added an alternative MPI implementation which uses one-sided communication	2020-04-02 17:59:53 +03:00
jpekkila	d6d5920553	Pulled improvements to device.cc from the benchmark branch to master	2020-03-31 14:23:36 +03:00
Johannes Pekkila	9b6d927cf1	It might be better to benchmark MPI codes without synchronization because of overhead of timing individual steps	2020-03-31 12:37:54 +02:00
Johannes Pekkila	742dcc2697	Optimized MPI synchronization a bit	2020-03-31 12:36:25 +02:00
jpekkila	24e65ab02d	Set decompositions for some nprocs by hand	2020-03-30 18:13:50 +03:00

... 2 3 4 5 6 ...

967 Commits