Commit Graph

138 Commits

Author SHA1 Message Date
jpekkila
69deef66fe Added sum reduction. NOTE: Scalar sum does not pass the automated test but vector sum does. I couldn't see anything wrong with the code itself and I strongly suspect that the failures are caused by loss of precision due to summing a huge amount of numbers of different magnitudes. However I'm not yet completely sure. Something like the Kahan summation algorithm might be useful if the errors are really caused by fp arithmetic. 2019-07-30 14:28:18 +03:00
jpekkila
fdc1e7333c Added macros for getting int3 and AcReal3 device constants from within kernels (and DSL). 2019-07-30 09:10:06 +00:00
jpekkila
c9fafe41e5 Tidied the CMakeLists, moved stuff to more logical places and added comments. Also tested that ALTER_CONF=ON still works 2019-07-26 15:12:55 +03:00
jpekkila
818893a0ea Fixed stray comma in CUDA_ARCH_FLAGS 2019-07-26 14:10:17 +03:00
jpekkila
f322bc8b37 Rewrote all CMakeLists. Now much cleaner and there's a clear separation during compilation between the core and standalone modules. 2019-07-23 20:50:37 +03:00
jpekkila
b65454d523 Stashed some testing files used to make sure that the library can also be used from pure C projects (better compatibility). These changes will never go to master as-is. 2019-07-23 18:24:47 +03:00
jpekkila
323d4e3b31 Replaced all calls to AC_VTXBUF_IDX to acVertexBufferIdx etc in all files 2019-07-23 14:37:28 +03:00
jpekkila
fee03b7149 Moved some device limits used only during auto-optimization from astaroth.h to device.cu 2019-07-22 19:54:46 +03:00
jpekkila
f74df5339f Cleaned up the include directory: removed all unnecessary stuff and moved common definitions to a separate file 2019-07-22 19:46:45 +03:00
jpekkila
01a013f3bc Added WARNCHK_CUDA_ALWAYS to errchk.h 2019-07-22 13:05:08 +03:00
jpekkila
a950be99f2 Streams now created with priority (all streams have the same priority by default) 2019-07-22 13:04:04 +03:00
jpekkila
168b3c4d8b Peer access to neighboring GPUs is now enabled during initialization 2019-07-22 13:02:19 +03:00
jpekkila
0db61dd411 Disabled the project-wide maxrregcount flag by default since it is only beneficial for resource-heavy kernels. The maximum register count should be defined per kernel instead if needed. 2019-07-22 12:58:28 +03:00
jpekkila
78aba6428e Updated the copyright years throughout the project 2019-07-16 14:28:32 +03:00
jpekkila
93fc121f5c Introduced versions of the asynchronous functions which take a stream as a parameter 2019-07-10 15:49:21 +03:00
jpekkila
bd98eaf9f7 Added a stream to loadDeviceConstant call. 2019-07-10 15:29:54 +03:00
jpekkila
b08d5b26f5 cudaMemcpyToSymbol -> cudaMemcpyToSymbolAsync 2019-07-10 15:05:57 +03:00
jpekkila
976bf05c8d Wrong scope for num_iterations in the last commit, fixed 2019-07-10 14:37:32 +03:00
jpekkila
866ec8a192 Removed some old hack I used for benchmarking a while back 2019-07-10 14:34:05 +03:00
jpekkila
d0b95c39b6 Disabled writing out unnecessary files when auto-optimizing the code 2019-07-09 18:51:04 +03:00
jpekkila
0bda016e17 Reviewed the Astaroth interface. Now there's a clear distinction between synchronous and asynchronous functions. For basic usage, we provide a set of functions that are always safe to call (acIntegrate, acLoad, etc), but because of this, must be quite restricted in the sense that f.ex. the whole mesh must be loaded at once and computations cannot be executed concurrently on multiple GPUs. For more advanced users we provide asynchronous functions (such as acLoadWithOffset). Since we cannot know how the asynchronous functions are called (for example, when the integration step has been fully completed and the halos of neighboring subgrids can be safely communicated between GPUs), the responsibility of synchronization must be left to the user. In the existing implementations we currently use only the basic "safe" set of functions (except in renderer.cc), so the existing functionality has not been changed with these latests commits. Autotests also pass. 2019-07-09 18:42:00 +03:00
jpekkila
1251f61570 Removed a stray acBoundcondStep() in acStore where it definitely shouln't be. Removed code duplication: acBoundcondStep now uses the new acLocalBoundcondStep and acGlobalBoundcondStep functions. 2019-07-09 17:08:18 +03:00
jpekkila
10a98b01a9 Experimental change: now the integration function is automatically optimized during acInit 2019-07-09 14:46:24 +03:00
jpekkila
a086821e7c Added a function acAutoOptimize to the interface and removed rk3_step_async in kernels.cuh (moved into rkStep) 2019-07-09 14:21:22 +03:00
jpekkila
84d96de42b Merge branch 'master' into multigpu_optimization_2019-07-05 2019-07-09 13:40:33 +03:00
jpekkila
508d15b578 Switched from math.h to cmath in math_utils.h. The old-school C math functions are bugged/not overloaded properly in GCC < 6.0 when compiling C++. 2019-07-09 13:37:08 +03:00
jpekkila
deebe570da Merge branch 'master' into multigpu_optimization_2019-07-05 2019-07-08 16:11:24 +03:00
Miikka Vaisala
6ba15c3a7c props.totalConstMem and props.sharedMemPerBlock cause assembler error
while compiling on TIARA gp cluster. Therefore commeted out.
2019-07-08 11:00:12 +08:00
jpekkila
5fdfdeca9e Multi-GPU optimizations: removed some unnecessary synchronization and divided the calculation of boundary conditions to local and global steps. 2019-07-05 18:21:44 +03:00
jpekkila
f1066a2c11 Added preliminary pragmas for dispatching commands simultaneously to multiple GPUs (commented out) 2019-07-05 17:16:12 +03:00
jpekkila
2092adc0f6 Preparations for multi-GPU optimizations 2019-07-05 15:44:30 +03:00
jpekkila
ce8fe53f91 Moved explanations and comments to the beginning of astaroth.cu. No code changes. 2019-07-05 15:39:52 +03:00
jpekkila
d87eb36f5a Formatting: brackets around a for loop for consistency 2019-07-05 15:26:19 +03:00
jpekkila
224b91b83a Added more control for synchronizing streams and halos among the GPUs 2019-07-05 15:17:20 +03:00
jpekkila
332f1a4f40 Reordered some of the functions in astaroth.cu and introduced acExchangeHalos() for synchronizing the part of the grid that is independent from the chosen boundary conditions between subgrids. 2019-07-05 15:01:51 +03:00
jpekkila
d1a93b7d4e acIntegrateStepWithOffset corrected and confirmed to work on 1-4 GPUs 2019-07-04 16:58:24 +03:00
jpekkila
01437411b6 Comment 2019-07-04 16:39:20 +03:00
jpekkila
91f119e8dd Deprecated the old implementation of acIntegrateStep. acIntegrateStep now calls acIntegrateStepWithOffset instead of device.cuh functions. 2019-07-04 16:37:55 +03:00
jpekkila
5049dadc1c Implemented acIntegrateStepWithOffset 2019-07-04 16:31:16 +03:00
jpekkila
a53e0a170d Overloaded max/min for int3 and removed old comments 2019-07-04 16:24:08 +03:00
jpekkila
e1d545b0eb Code readability and cleanup (remembered that int3 has + and - operators defined in math_utils.h) 2019-07-04 16:16:49 +03:00
jpekkila
30254d9abb Removed a redundant and old gridIdxx function which I though I already removed a long time ago. 2019-07-04 16:10:29 +03:00
jpekkila
b3a0b10a86 Removed old comments 2019-07-04 16:02:13 +03:00
jpekkila
0884c4bf38 Moved the definition of acForcingVec to host_forcing.cc since it depends on user parameters that may not be defined in all projects 2019-07-04 15:28:18 +03:00
jpekkila
7abb959828 Overhaul to the user-defined parameters done: All logical switches, parameters and vertex buffer handles are now defined in a single header file (the default location is acc/mhd_solver/stencil_defines.h). This header is used when preprocessing the DSL sources and is linked to the include/ directory when calling scripts/compile_acc.sh. astaroth.h is now used for configuring internal stuff only and should not be modified by users 2019-07-03 19:01:16 +03:00
jpekkila
6907d74ea3 Suppressed an unused variable warning for globalVertexIdx 2019-07-03 18:46:17 +03:00
jpekkila
7d6255ba14 Suppressed unused variable warnings in kernels.cuh 2019-07-03 18:12:48 +03:00
jpekkila
81a09501b8 Removed deprecated LNT0 and LNRHO0 defines, now the actual configuration parameters are used (AC_lnrho0 and AC_lnT0). Also accidental autoformatting again, there seems to be stray spaces before linebreaks in some files which get automatically removed by my text editor 2019-07-03 17:23:37 +03:00
jpekkila
8ed947ce98 Removed deprecated sinusoidal forcing from kernels.cuh 2019-07-03 17:13:45 +03:00
jpekkila
d54ccc1da8 Deprecated a block of old code that was used a long time ago for testing forcing 2019-07-03 17:10:01 +03:00