Miikka Vaisala
|
efd3cc40cd
|
Compiles without the API funtion call.
|
2020-11-20 14:11:14 +08:00 |
|
Miikka Vaisala
|
f736aa1cd1
|
Attemptiong to make kernels to go where they should.
|
2020-09-18 16:55:36 +08:00 |
|
Miikka Vaisala
|
6c0cb5e88f
|
Diagnostic kernel addition to calculate vAfven.
|
2020-09-11 16:22:18 +08:00 |
|
jpekkila
|
f1138b04ac
|
Cleaned up the MPI implementation, removed all older implementations (removed also MPI window implementation which might be handy in the future when CUDA-aware support is introduced). If the removed stuff is needed later, here are some keywords to help find this commit: MPI_window, sendrecv, bidirectional, unidirectional transfer, real-time pinning, a0s, b0s.
|
2020-05-28 16:42:50 +03:00 |
|
jpekkila
|
0d62f56e27
|
Tried an alternative approach to comm (was worse than the current solution) and rewrote the current best solution for (now easier to read)
|
2020-05-28 15:31:43 +03:00 |
|
jpekkila
|
fb41741d74
|
Improvements to samples
|
2020-04-07 17:58:47 +03:00 |
|
jpekkila
|
427a3ac5d8
|
Rewrote the previous implementation, now fully works (verified) and gives the speedup we want. Communication latency is now completely hidden on at least two nodes (8 GPUs). Scaling looks very promising.
|
2020-04-06 17:28:02 +03:00 |
|
jpekkila
|
37f1c841a3
|
Added functions for pinning memory that is sent over the network. TODO pack to and from pinned memory selectively (currently P2P results are overwritten with data in pinned memory)
|
2020-04-06 14:09:12 +03:00 |
|
jpekkila
|
fe14ae4665
|
Added an alternative MPI implementation which uses one-sided communication
|
2020-04-02 17:59:53 +03:00 |
|
jpekkila
|
0ccd4e3dbc
|
Major improvement: uniforms can now be set to default values. The syntax is the same as for setting any other values, f.ex. 'uniform Scalar a = 1; uniform Scalar b = 0.5 * a;'. Undefined uniforms are still allowed, but in this case the user should load a proper value into it during runtime. Default uniform values can be overwritten by calling any of the uniform loader funcions (like acDeviceLoadScalarUniform). Improved also error checking. Now there are explicit warnings if the user tries to load an invalid value into a device constant.
|
2020-01-28 18:17:31 +02:00 |
|
jpekkila
|
78fbcc090d
|
Reordered src/core to have better division to host and device code (this is more likely to work when compiling with mpicxx). Disabled separate compilation of CUDA kernels as this complicates compilation and is a source of many cmake/cuda bugs. As a downside, GPU code takes longer to compile.
|
2020-01-23 20:06:20 +02:00 |
|