jpekkila
|
5be775dbff
|
Various intermediate changes
|
2019-07-31 17:48:48 +03:00 |
|
jpekkila
|
efd9d54fef
|
Stashing WIP changes (interface revision) s.t. I can continue work on a different machine
|
2019-07-30 14:34:44 +03:00 |
|
jpekkila
|
1ceb6739ae
|
Merge branch 'master' into node_device_interface_revision_07-23
|
2019-07-30 14:31:33 +03:00 |
|
jpekkila
|
62100b1140
|
Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth
|
2019-07-30 14:28:25 +03:00 |
|
jpekkila
|
69deef66fe
|
Added sum reduction. NOTE: Scalar sum does not pass the automated test but vector sum does. I couldn't see anything wrong with the code itself and I strongly suspect that the failures are caused by loss of precision due to summing a huge amount of numbers of different magnitudes. However I'm not yet completely sure. Something like the Kahan summation algorithm might be useful if the errors are really caused by fp arithmetic.
|
2019-07-30 14:28:18 +03:00 |
|
jpekkila
|
fdc1e7333c
|
Added macros for getting int3 and AcReal3 device constants from within kernels (and DSL).
|
2019-07-30 09:10:06 +00:00 |
|
jpekkila
|
c9fafe41e5
|
Tidied the CMakeLists, moved stuff to more logical places and added comments. Also tested that ALTER_CONF=ON still works
|
2019-07-26 15:12:55 +03:00 |
|
jpekkila
|
818893a0ea
|
Fixed stray comma in CUDA_ARCH_FLAGS
|
2019-07-26 14:10:17 +03:00 |
|
jpekkila
|
f322bc8b37
|
Rewrote all CMakeLists. Now much cleaner and there's a clear separation during compilation between the core and standalone modules.
|
2019-07-23 20:50:37 +03:00 |
|
jpekkila
|
b65454d523
|
Stashed some testing files used to make sure that the library can also be used from pure C projects (better compatibility). These changes will never go to master as-is.
|
2019-07-23 18:24:47 +03:00 |
|
jpekkila
|
323d4e3b31
|
Replaced all calls to AC_VTXBUF_IDX to acVertexBufferIdx etc in all files
|
2019-07-23 14:37:28 +03:00 |
|
jpekkila
|
fee03b7149
|
Moved some device limits used only during auto-optimization from astaroth.h to device.cu
|
2019-07-22 19:54:46 +03:00 |
|
jpekkila
|
f74df5339f
|
Cleaned up the include directory: removed all unnecessary stuff and moved common definitions to a separate file
|
2019-07-22 19:46:45 +03:00 |
|
jpekkila
|
01a013f3bc
|
Added WARNCHK_CUDA_ALWAYS to errchk.h
|
2019-07-22 13:05:08 +03:00 |
|
jpekkila
|
a950be99f2
|
Streams now created with priority (all streams have the same priority by default)
|
2019-07-22 13:04:04 +03:00 |
|
jpekkila
|
168b3c4d8b
|
Peer access to neighboring GPUs is now enabled during initialization
|
2019-07-22 13:02:19 +03:00 |
|
jpekkila
|
0db61dd411
|
Disabled the project-wide maxrregcount flag by default since it is only beneficial for resource-heavy kernels. The maximum register count should be defined per kernel instead if needed.
|
2019-07-22 12:58:28 +03:00 |
|
jpekkila
|
78aba6428e
|
Updated the copyright years throughout the project
|
2019-07-16 14:28:32 +03:00 |
|
jpekkila
|
93fc121f5c
|
Introduced versions of the asynchronous functions which take a stream as a parameter
|
2019-07-10 15:49:21 +03:00 |
|
jpekkila
|
bd98eaf9f7
|
Added a stream to loadDeviceConstant call.
|
2019-07-10 15:29:54 +03:00 |
|
jpekkila
|
b08d5b26f5
|
cudaMemcpyToSymbol -> cudaMemcpyToSymbolAsync
|
2019-07-10 15:05:57 +03:00 |
|
jpekkila
|
976bf05c8d
|
Wrong scope for num_iterations in the last commit, fixed
|
2019-07-10 14:37:32 +03:00 |
|
jpekkila
|
866ec8a192
|
Removed some old hack I used for benchmarking a while back
|
2019-07-10 14:34:05 +03:00 |
|
jpekkila
|
d0b95c39b6
|
Disabled writing out unnecessary files when auto-optimizing the code
|
2019-07-09 18:51:04 +03:00 |
|
jpekkila
|
0bda016e17
|
Reviewed the Astaroth interface. Now there's a clear distinction between synchronous and asynchronous functions. For basic usage, we provide a set of functions that are always safe to call (acIntegrate, acLoad, etc), but because of this, must be quite restricted in the sense that f.ex. the whole mesh must be loaded at once and computations cannot be executed concurrently on multiple GPUs. For more advanced users we provide asynchronous functions (such as acLoadWithOffset). Since we cannot know how the asynchronous functions are called (for example, when the integration step has been fully completed and the halos of neighboring subgrids can be safely communicated between GPUs), the responsibility of synchronization must be left to the user. In the existing implementations we currently use only the basic "safe" set of functions (except in renderer.cc), so the existing functionality has not been changed with these latests commits. Autotests also pass.
|
2019-07-09 18:42:00 +03:00 |
|
jpekkila
|
1251f61570
|
Removed a stray acBoundcondStep() in acStore where it definitely shouln't be. Removed code duplication: acBoundcondStep now uses the new acLocalBoundcondStep and acGlobalBoundcondStep functions.
|
2019-07-09 17:08:18 +03:00 |
|
jpekkila
|
10a98b01a9
|
Experimental change: now the integration function is automatically optimized during acInit
|
2019-07-09 14:46:24 +03:00 |
|
jpekkila
|
a086821e7c
|
Added a function acAutoOptimize to the interface and removed rk3_step_async in kernels.cuh (moved into rkStep)
|
2019-07-09 14:21:22 +03:00 |
|
jpekkila
|
84d96de42b
|
Merge branch 'master' into multigpu_optimization_2019-07-05
|
2019-07-09 13:40:33 +03:00 |
|
jpekkila
|
508d15b578
|
Switched from math.h to cmath in math_utils.h. The old-school C math functions are bugged/not overloaded properly in GCC < 6.0 when compiling C++.
|
2019-07-09 13:37:08 +03:00 |
|
jpekkila
|
deebe570da
|
Merge branch 'master' into multigpu_optimization_2019-07-05
|
2019-07-08 16:11:24 +03:00 |
|
Miikka Vaisala
|
6ba15c3a7c
|
props.totalConstMem and props.sharedMemPerBlock cause assembler error
while compiling on TIARA gp cluster. Therefore commeted out.
|
2019-07-08 11:00:12 +08:00 |
|
jpekkila
|
5fdfdeca9e
|
Multi-GPU optimizations: removed some unnecessary synchronization and divided the calculation of boundary conditions to local and global steps.
|
2019-07-05 18:21:44 +03:00 |
|
jpekkila
|
f1066a2c11
|
Added preliminary pragmas for dispatching commands simultaneously to multiple GPUs (commented out)
|
2019-07-05 17:16:12 +03:00 |
|
jpekkila
|
2092adc0f6
|
Preparations for multi-GPU optimizations
|
2019-07-05 15:44:30 +03:00 |
|
jpekkila
|
ce8fe53f91
|
Moved explanations and comments to the beginning of astaroth.cu. No code changes.
|
2019-07-05 15:39:52 +03:00 |
|
jpekkila
|
d87eb36f5a
|
Formatting: brackets around a for loop for consistency
|
2019-07-05 15:26:19 +03:00 |
|
jpekkila
|
224b91b83a
|
Added more control for synchronizing streams and halos among the GPUs
|
2019-07-05 15:17:20 +03:00 |
|
jpekkila
|
332f1a4f40
|
Reordered some of the functions in astaroth.cu and introduced acExchangeHalos() for synchronizing the part of the grid that is independent from the chosen boundary conditions between subgrids.
|
2019-07-05 15:01:51 +03:00 |
|
jpekkila
|
d1a93b7d4e
|
acIntegrateStepWithOffset corrected and confirmed to work on 1-4 GPUs
|
2019-07-04 16:58:24 +03:00 |
|
jpekkila
|
01437411b6
|
Comment
|
2019-07-04 16:39:20 +03:00 |
|
jpekkila
|
91f119e8dd
|
Deprecated the old implementation of acIntegrateStep. acIntegrateStep now calls acIntegrateStepWithOffset instead of device.cuh functions.
|
2019-07-04 16:37:55 +03:00 |
|
jpekkila
|
5049dadc1c
|
Implemented acIntegrateStepWithOffset
|
2019-07-04 16:31:16 +03:00 |
|
jpekkila
|
a53e0a170d
|
Overloaded max/min for int3 and removed old comments
|
2019-07-04 16:24:08 +03:00 |
|
jpekkila
|
e1d545b0eb
|
Code readability and cleanup (remembered that int3 has + and - operators defined in math_utils.h)
|
2019-07-04 16:16:49 +03:00 |
|
jpekkila
|
30254d9abb
|
Removed a redundant and old gridIdxx function which I though I already removed a long time ago.
|
2019-07-04 16:10:29 +03:00 |
|
jpekkila
|
b3a0b10a86
|
Removed old comments
|
2019-07-04 16:02:13 +03:00 |
|
jpekkila
|
0884c4bf38
|
Moved the definition of acForcingVec to host_forcing.cc since it depends on user parameters that may not be defined in all projects
|
2019-07-04 15:28:18 +03:00 |
|
jpekkila
|
7abb959828
|
Overhaul to the user-defined parameters done: All logical switches, parameters and vertex buffer handles are now defined in a single header file (the default location is acc/mhd_solver/stencil_defines.h). This header is used when preprocessing the DSL sources and is linked to the include/ directory when calling scripts/compile_acc.sh. astaroth.h is now used for configuring internal stuff only and should not be modified by users
|
2019-07-03 19:01:16 +03:00 |
|
jpekkila
|
6907d74ea3
|
Suppressed an unused variable warning for globalVertexIdx
|
2019-07-03 18:46:17 +03:00 |
|