jpekkila
764e4dda69
Streamlined verification
2020-08-21 20:11:25 +03:00
jpekkila
56273433fe
Fixed inconsistency in the acGridLoad parameter order
2020-08-21 14:40:11 +03:00
jpekkila
b0cfceab98
Merged with master
2020-08-19 16:16:06 +03:00
jpekkila
46cfa9cd37
Now using MPI C bindings instead of the (deprecated?) C++ bindings due to compilation issues on some machines (error: cast between incompatible function types, ompi_mpi_cxx_op_intercept)
2020-08-19 15:50:16 +03:00
jpekkila
e051d72091
Moved standalone from src to samples
2020-08-19 13:35:49 +03:00
jpekkila
7f7b0b89ea
Fetched improvements to benchmarks from the mpi-paper-benchmarks branch
2020-08-19 12:03:15 +03:00
jpekkila
fca615defb
Removed an old unused file
2020-07-29 20:01:11 +03:00
jpekkila
003c202e8c
Pulled useful changes from the benchmark branch. GPUDirect RDMA (unpinned) is now the default for MPI communication.
2020-07-29 16:39:24 +03:00
jpekkila
c44c3d02b4
Added a sample for testing the Fortran interface
2020-06-25 06:35:13 +03:00
jpekkila
fab620eb0d
Reordered reduction autotests and made it so that the exact same mesh is used for both the model and candidates instead of the unclean integrated one
2020-06-24 16:34:50 +03:00
jpekkila
ff1a601f85
Merged mpi-to-master-merge-candidate-2020-06-01 here
2020-06-24 16:08:14 +03:00
jpekkila
0d1c5b3911
Autoformatted
2020-06-24 15:56:30 +03:00
jpekkila
f04e347c45
Cleanup before merging to the master merge candidate branch
2020-06-24 15:13:15 +03:00
Oskar Lappi
0030db01f3
Automatic calculation of nodes based on processes
2020-06-10 16:51:35 +03:00
Oskar Lappi
c7f23eb50c
Added partition argument to mpibench script
2020-06-09 14:07:37 +03:00
jpekkila
9840b817d0
Added the (hopefully final) basic test case used for the benchmarks
2020-06-07 21:59:33 +03:00
Oskar Lappi
53b48bb8ce
MPI_Allreduce -> MPI_Reduce for MPI reductions + benchmark batch script
...
Slightly ugly because this changes the benchmark behaviour slightly
However we now have a way to run batch benchmarks from one script, no need to generate new ones
2020-06-06 22:56:05 +03:00
Oskar Lappi
eb05e02793
Added vector reductions to mpi reduction benchmarks
2020-06-06 19:25:30 +03:00
Oskar Lappi
666f01a23d
Benchmarking program for scalar mpi reductions, and nonbatch script for running benchmarks
...
- New program mpi_reduce_bench
- runs testcases defined in source
- writes all benchmark results to a csv file, tags the testcase and benchmark run
- takes optional argument for benchmark tag, default benchmark tag is a timestamp
- New script mpibench.sh
- runs the mpi_reduce_bench with defined parameters:
- number of tasks
- number of nodes
- the benchmark tag for mpi_reduce_bench, default tag is the current git HEAD short hash
2020-06-05 19:48:40 +03:00
jpekkila
17a4f31451
Added the latest setup used for benchmarks
2020-06-04 20:47:03 +03:00
Oskar Lappi
9e5fd40838
Changes after code review by Johannes, and clang-format
2020-06-04 18:50:22 +03:00
Oskar Lappi
f7d8de75d2
Reduction test pipeline added to mpitest, Error struct changed: new label field
...
- CHANGED: Error struct has a new label field for labeling an error
- The label is what is printed to screen
- vtxbuf name lookup moved out of printErrorToScreen/print_error_to_screen
- NEW: acScalReductionTestCase and acVecReductionTestCase
- Define new test cases by adding them to a list in samples/mpitest/main.cc:main
- Minor style change in verification.c to make all Verification functions similar
and fit one screen
2020-06-04 15:10:35 +03:00
jpekkila
226de32651
Added model solution for reductions and functions for automated testing
2020-06-03 13:37:00 +03:00
jpekkila
176ceae313
Fixed various compilation warnings
2020-05-30 20:23:53 +03:00
jpekkila
ec59cdb973
Some formatting and unimportant changes to samples
2020-05-26 18:57:46 +03:00
jpekkila
9cd5909f5a
BWtest calculates now aggregate bandwidths per process instead of assuming that all neighbor communication can be done in parallel (Within a node one can have parallel P2P connections to all neighbors and we have an insane total bandwidth, but this is not the case with network, we seem to have only one bidirectional socket)
2020-04-09 20:28:04 +03:00
jpekkila
d4a84fb887
Added a PCIe bandwidth test
2020-04-09 20:04:54 +03:00
jpekkila
d6e74ee270
Added missing files
2020-04-09 19:24:55 +03:00
jpekkila
fb41741d74
Improvements to samples
2020-04-07 17:58:47 +03:00
jpekkila
cc9d3f1b9c
Found a workaround that gives good inter and intra-node performance. HPC-X MPI implementation does not know how to do p2p comm with pinned arrays (should be 80 GiB/s, measured 10 GiB/s) and internode comm is super slow without pinned arrays (should be 40 GiB/s, measured < 1 GiB/s). Made a proof of concept communicator that pins arrays that are send or received from another node.
2020-04-05 20:15:32 +03:00
jpekkila
88e53dfa21
Added a little program for testing the bandwidths of different MPI comm styles on n nodes and processes
2020-04-05 17:09:57 +03:00
Johannes Pekkila
9b6d927cf1
It might be better to benchmark MPI codes without synchronization because of overhead of timing individual steps
2020-03-31 12:37:54 +02:00
jpekkila
850b37e8c8
Added a switch for generating strong and weak scaling results
2020-03-30 17:56:12 +03:00
jpekkila
d4eb3e0d35
Benchmarks are now written into a csv-file
2020-03-30 17:41:42 +03:00
jpekkila
af531c1f96
Added a sample for benchmarking
2020-03-30 17:22:41 +03:00
jpekkila
5a898b8e95
mpitest now gives a warning instead of a compilation failure if MPI is not enabled
2020-03-26 15:31:29 +02:00
jpekkila
329a71d299
Added an example how to run the code with MPI
2020-03-26 15:02:55 +02:00
jpekkila
67f2fcc88d
Setting inv_dsx etc explicitly is no longer required as they are set to default values in acc/stdlib/stdderiv.h
2020-01-28 18:22:27 +02:00
jpekkila
0ccd4e3dbc
Major improvement: uniforms can now be set to default values. The syntax is the same as for setting any other values, f.ex. 'uniform Scalar a = 1; uniform Scalar b = 0.5 * a;'. Undefined uniforms are still allowed, but in this case the user should load a proper value into it during runtime. Default uniform values can be overwritten by calling any of the uniform loader funcions (like acDeviceLoadScalarUniform). Improved also error checking. Now there are explicit warnings if the user tries to load an invalid value into a device constant.
2020-01-28 18:17:31 +02:00
jpekkila
fdd829b888
Cleaned up samples and removed old unused stuff. Simplified CMake files.
2020-01-23 20:00:19 +02:00
jpekkila
f77ab8a809
Removed unnecessary README and incorrect building instructions for mpitest
2020-01-16 14:49:07 +02:00
jpekkila
65d9274eaa
Updated samples to have consistent naming
2020-01-15 16:56:02 +02:00
jpekkila
efa95147f3
Renamed exampleproject -> cpptest
2020-01-15 16:25:27 +02:00
jpekkila
23efcb413f
Introduced a sample directory and moved all non-library-components from src to there
2020-01-15 16:24:38 +02:00