jpekkila
|
f7bd84af46
|
Added macros for getting int3 and AcReal3 device constants from within kernels (and DSL).
|
2019-07-31 17:07:02 +08:00 |
|
jpekkila
|
323d4e3b31
|
Replaced all calls to AC_VTXBUF_IDX to acVertexBufferIdx etc in all files
|
2019-07-23 14:37:28 +03:00 |
|
jpekkila
|
fee03b7149
|
Moved some device limits used only during auto-optimization from astaroth.h to device.cu
|
2019-07-22 19:54:46 +03:00 |
|
jpekkila
|
a950be99f2
|
Streams now created with priority (all streams have the same priority by default)
|
2019-07-22 13:04:04 +03:00 |
|
jpekkila
|
78aba6428e
|
Updated the copyright years throughout the project
|
2019-07-16 14:28:32 +03:00 |
|
jpekkila
|
b08d5b26f5
|
cudaMemcpyToSymbol -> cudaMemcpyToSymbolAsync
|
2019-07-10 15:05:57 +03:00 |
|
jpekkila
|
976bf05c8d
|
Wrong scope for num_iterations in the last commit, fixed
|
2019-07-10 14:37:32 +03:00 |
|
jpekkila
|
866ec8a192
|
Removed some old hack I used for benchmarking a while back
|
2019-07-10 14:34:05 +03:00 |
|
jpekkila
|
d0b95c39b6
|
Disabled writing out unnecessary files when auto-optimizing the code
|
2019-07-09 18:51:04 +03:00 |
|
jpekkila
|
10a98b01a9
|
Experimental change: now the integration function is automatically optimized during acInit
|
2019-07-09 14:46:24 +03:00 |
|
jpekkila
|
a086821e7c
|
Added a function acAutoOptimize to the interface and removed rk3_step_async in kernels.cuh (moved into rkStep)
|
2019-07-09 14:21:22 +03:00 |
|
jpekkila
|
deebe570da
|
Merge branch 'master' into multigpu_optimization_2019-07-05
|
2019-07-08 16:11:24 +03:00 |
|
Miikka Vaisala
|
6ba15c3a7c
|
props.totalConstMem and props.sharedMemPerBlock cause assembler error
while compiling on TIARA gp cluster. Therefore commeted out.
|
2019-07-08 11:00:12 +08:00 |
|
jpekkila
|
5fdfdeca9e
|
Multi-GPU optimizations: removed some unnecessary synchronization and divided the calculation of boundary conditions to local and global steps.
|
2019-07-05 18:21:44 +03:00 |
|
jpekkila
|
d87eb36f5a
|
Formatting: brackets around a for loop for consistency
|
2019-07-05 15:26:19 +03:00 |
|
jpekkila
|
a3ca6cf132
|
Added skeletons for packing parts of the ghost zones into buffers to speed up data transfers
|
2019-07-01 13:56:05 +03:00 |
|
jpekkila
|
8864266042
|
Autoformatted all CUDA/C/C++ code
|
2019-06-18 16:42:56 +03:00 |
|
jpekkila
|
4ca4dbefdf
|
Added the machinery for implementing forcing with the DSL on multiple GPUs and a simple model solution
|
2019-06-18 16:13:32 +03:00 |
|
jpekkila
|
57e2e48fb0
|
Added functions for loading device constants. Also introduced a new int3 constant that can be used to determine the global vertex index inside kernels
|
2019-06-18 14:11:55 +03:00 |
|
jpekkila
|
c9f26d6e58
|
Cleanup
|
2019-06-17 20:44:37 +03:00 |
|
jpekkila
|
ce6f453bc5
|
Rewrote reductions, now much simpler than before
|
2019-06-17 20:38:28 +03:00 |
|
jpekkila
|
5e6cc9b8cc
|
Changed names of some parameters to better ones
|
2019-06-17 18:18:00 +03:00 |
|
jpekkila
|
59086b3e79
|
Added multi-GPU reductions. Tested to work with 1-2 GPUs with power of two grid dimensions. Requires more testing in special cases (when using exotic grid dimensions and a large number of GPUs)
|
2019-06-17 14:45:41 +03:00 |
|
jpekkila
|
0e48766a68
|
Added Astaroth 2.0
|
2019-06-14 14:19:07 +03:00 |
|