This website requires JavaScript.
Explore
Help
Sign In
cwpearson
/
astaroth
Watch
1
Star
0
Fork
0
You've already forked astaroth
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
9cd5909f5a88a01e646a76973875464ad94686ee
astaroth
/
samples
History
jpekkila
9cd5909f5a
BWtest calculates now aggregate bandwidths per process instead of assuming that all neighbor communication can be done in parallel (Within a node one can have parallel P2P connections to all neighbors and we have an insane total bandwidth, but this is not the case with network, we seem to have only one bidirectional socket)
2020-04-09 20:28:04 +03:00
..
benchmark
Improvements to samples
2020-04-07 17:58:47 +03:00
bwtest
BWtest calculates now aggregate bandwidths per process instead of assuming that all neighbor communication can be done in parallel (Within a node one can have parallel P2P connections to all neighbors and we have an insane total bandwidth, but this is not the case with network, we seem to have only one bidirectional socket)
2020-04-09 20:28:04 +03:00
cpptest
Setting inv_dsx etc explicitly is no longer required as they are set to default values in acc/stdlib/stdderiv.h
2020-01-28 18:22:27 +02:00
ctest
Major improvement: uniforms can now be set to default values. The syntax is the same as for setting any other values, f.ex. 'uniform Scalar a = 1; uniform Scalar b = 0.5 * a;'. Undefined uniforms are still allowed, but in this case the user should load a proper value into it during runtime. Default uniform values can be overwritten by calling any of the uniform loader funcions (like acDeviceLoadScalarUniform). Improved also error checking. Now there are explicit warnings if the user tries to load an invalid value into a device constant.
2020-01-28 18:17:31 +02:00
genbenchmarkscripts
Added missing files
2020-04-09 19:24:55 +03:00
mpitest
Found a workaround that gives good inter and intra-node performance. HPC-X MPI implementation does not know how to do p2p comm with pinned arrays (should be 80 GiB/s, measured 10 GiB/s) and internode comm is super slow without pinned arrays (should be 40 GiB/s, measured < 1 GiB/s). Made a proof of concept communicator that pins arrays that are send or received from another node.
2020-04-05 20:15:32 +03:00