Commit Graph

15 Commits

Author SHA1 Message Date
Miikka Vaisala
11eddabbd6 Merge branch 'master' into alt_bcond_2020_09 2020-11-23 15:47:46 +08:00
Miikka Vaisala
543c565e5d Dummy at the moment, but now the boundary condition kernel caller can see what vertex buffer name is in use. 2020-11-23 15:43:52 +08:00
Miikka Vaisala
efd3cc40cd Compiles without the API funtion call. 2020-11-20 14:11:14 +08:00
jpekkila
00b7b537ce Modifications for master merge: reverted CMakeLists.txt to the original, disabled mixed precision by default 2020-11-02 10:58:18 +02:00
jpekkila
0a2827593c Added very experimental implementation for mixed precision. Comm is done with f32 and comp with f64. 2020-10-28 12:56:34 +02:00
Miikka Vaisala
f736aa1cd1 Attemptiong to make kernels to go where they should. 2020-09-18 16:55:36 +08:00
Miikka Vaisala
6c0cb5e88f Diagnostic kernel addition to calculate vAfven. 2020-09-11 16:22:18 +08:00
jpekkila
f1138b04ac Cleaned up the MPI implementation, removed all older implementations (removed also MPI window implementation which might be handy in the future when CUDA-aware support is introduced). If the removed stuff is needed later, here are some keywords to help find this commit: MPI_window, sendrecv, bidirectional, unidirectional transfer, real-time pinning, a0s, b0s. 2020-05-28 16:42:50 +03:00
jpekkila
0d62f56e27 Tried an alternative approach to comm (was worse than the current solution) and rewrote the current best solution for (now easier to read) 2020-05-28 15:31:43 +03:00
jpekkila
fb41741d74 Improvements to samples 2020-04-07 17:58:47 +03:00
jpekkila
427a3ac5d8 Rewrote the previous implementation, now fully works (verified) and gives the speedup we want. Communication latency is now completely hidden on at least two nodes (8 GPUs). Scaling looks very promising. 2020-04-06 17:28:02 +03:00
jpekkila
37f1c841a3 Added functions for pinning memory that is sent over the network. TODO pack to and from pinned memory selectively (currently P2P results are overwritten with data in pinned memory) 2020-04-06 14:09:12 +03:00
jpekkila
fe14ae4665 Added an alternative MPI implementation which uses one-sided communication 2020-04-02 17:59:53 +03:00
jpekkila
0ccd4e3dbc Major improvement: uniforms can now be set to default values. The syntax is the same as for setting any other values, f.ex. 'uniform Scalar a = 1; uniform Scalar b = 0.5 * a;'. Undefined uniforms are still allowed, but in this case the user should load a proper value into it during runtime. Default uniform values can be overwritten by calling any of the uniform loader funcions (like acDeviceLoadScalarUniform). Improved also error checking. Now there are explicit warnings if the user tries to load an invalid value into a device constant. 2020-01-28 18:17:31 +02:00
jpekkila
78fbcc090d Reordered src/core to have better division to host and device code (this is more likely to work when compiling with mpicxx). Disabled separate compilation of CUDA kernels as this complicates compilation and is a source of many cmake/cuda bugs. As a downside, GPU code takes longer to compile. 2020-01-23 20:06:20 +02:00