jpekkila
|
bd98eaf9f7
|
Added a stream to loadDeviceConstant call.
|
2019-07-10 15:29:54 +03:00 |
|
jpekkila
|
b08d5b26f5
|
cudaMemcpyToSymbol -> cudaMemcpyToSymbolAsync
|
2019-07-10 15:05:57 +03:00 |
|
jpekkila
|
976bf05c8d
|
Wrong scope for num_iterations in the last commit, fixed
|
2019-07-10 14:37:32 +03:00 |
|
jpekkila
|
866ec8a192
|
Removed some old hack I used for benchmarking a while back
|
2019-07-10 14:34:05 +03:00 |
|
jpekkila
|
d0b95c39b6
|
Disabled writing out unnecessary files when auto-optimizing the code
|
2019-07-09 18:51:04 +03:00 |
|
jpekkila
|
0bda016e17
|
Reviewed the Astaroth interface. Now there's a clear distinction between synchronous and asynchronous functions. For basic usage, we provide a set of functions that are always safe to call (acIntegrate, acLoad, etc), but because of this, must be quite restricted in the sense that f.ex. the whole mesh must be loaded at once and computations cannot be executed concurrently on multiple GPUs. For more advanced users we provide asynchronous functions (such as acLoadWithOffset). Since we cannot know how the asynchronous functions are called (for example, when the integration step has been fully completed and the halos of neighboring subgrids can be safely communicated between GPUs), the responsibility of synchronization must be left to the user. In the existing implementations we currently use only the basic "safe" set of functions (except in renderer.cc), so the existing functionality has not been changed with these latests commits. Autotests also pass.
|
2019-07-09 18:42:00 +03:00 |
|
jpekkila
|
1251f61570
|
Removed a stray acBoundcondStep() in acStore where it definitely shouln't be. Removed code duplication: acBoundcondStep now uses the new acLocalBoundcondStep and acGlobalBoundcondStep functions.
|
2019-07-09 17:08:18 +03:00 |
|
jpekkila
|
10a98b01a9
|
Experimental change: now the integration function is automatically optimized during acInit
|
2019-07-09 14:46:24 +03:00 |
|
jpekkila
|
a086821e7c
|
Added a function acAutoOptimize to the interface and removed rk3_step_async in kernels.cuh (moved into rkStep)
|
2019-07-09 14:21:22 +03:00 |
|
jpekkila
|
84d96de42b
|
Merge branch 'master' into multigpu_optimization_2019-07-05
|
2019-07-09 13:40:33 +03:00 |
|
jpekkila
|
508d15b578
|
Switched from math.h to cmath in math_utils.h. The old-school C math functions are bugged/not overloaded properly in GCC < 6.0 when compiling C++.
|
2019-07-09 13:37:08 +03:00 |
|
jpekkila
|
deebe570da
|
Merge branch 'master' into multigpu_optimization_2019-07-05
|
2019-07-08 16:11:24 +03:00 |
|
Miikka Vaisala
|
6ba15c3a7c
|
props.totalConstMem and props.sharedMemPerBlock cause assembler error
while compiling on TIARA gp cluster. Therefore commeted out.
|
2019-07-08 11:00:12 +08:00 |
|
jpekkila
|
5fdfdeca9e
|
Multi-GPU optimizations: removed some unnecessary synchronization and divided the calculation of boundary conditions to local and global steps.
|
2019-07-05 18:21:44 +03:00 |
|
jpekkila
|
f1066a2c11
|
Added preliminary pragmas for dispatching commands simultaneously to multiple GPUs (commented out)
|
2019-07-05 17:16:12 +03:00 |
|
jpekkila
|
2092adc0f6
|
Preparations for multi-GPU optimizations
|
2019-07-05 15:44:30 +03:00 |
|
jpekkila
|
ce8fe53f91
|
Moved explanations and comments to the beginning of astaroth.cu. No code changes.
|
2019-07-05 15:39:52 +03:00 |
|
jpekkila
|
d87eb36f5a
|
Formatting: brackets around a for loop for consistency
|
2019-07-05 15:26:19 +03:00 |
|
jpekkila
|
224b91b83a
|
Added more control for synchronizing streams and halos among the GPUs
|
2019-07-05 15:17:20 +03:00 |
|
jpekkila
|
332f1a4f40
|
Reordered some of the functions in astaroth.cu and introduced acExchangeHalos() for synchronizing the part of the grid that is independent from the chosen boundary conditions between subgrids.
|
2019-07-05 15:01:51 +03:00 |
|
jpekkila
|
d1a93b7d4e
|
acIntegrateStepWithOffset corrected and confirmed to work on 1-4 GPUs
|
2019-07-04 16:58:24 +03:00 |
|
jpekkila
|
01437411b6
|
Comment
|
2019-07-04 16:39:20 +03:00 |
|
jpekkila
|
91f119e8dd
|
Deprecated the old implementation of acIntegrateStep. acIntegrateStep now calls acIntegrateStepWithOffset instead of device.cuh functions.
|
2019-07-04 16:37:55 +03:00 |
|
jpekkila
|
5049dadc1c
|
Implemented acIntegrateStepWithOffset
|
2019-07-04 16:31:16 +03:00 |
|
jpekkila
|
a53e0a170d
|
Overloaded max/min for int3 and removed old comments
|
2019-07-04 16:24:08 +03:00 |
|
jpekkila
|
e1d545b0eb
|
Code readability and cleanup (remembered that int3 has + and - operators defined in math_utils.h)
|
2019-07-04 16:16:49 +03:00 |
|
jpekkila
|
30254d9abb
|
Removed a redundant and old gridIdxx function which I though I already removed a long time ago.
|
2019-07-04 16:10:29 +03:00 |
|
jpekkila
|
b3a0b10a86
|
Removed old comments
|
2019-07-04 16:02:13 +03:00 |
|
jpekkila
|
0884c4bf38
|
Moved the definition of acForcingVec to host_forcing.cc since it depends on user parameters that may not be defined in all projects
|
2019-07-04 15:28:18 +03:00 |
|
jpekkila
|
7abb959828
|
Overhaul to the user-defined parameters done: All logical switches, parameters and vertex buffer handles are now defined in a single header file (the default location is acc/mhd_solver/stencil_defines.h). This header is used when preprocessing the DSL sources and is linked to the include/ directory when calling scripts/compile_acc.sh. astaroth.h is now used for configuring internal stuff only and should not be modified by users
|
2019-07-03 19:01:16 +03:00 |
|
jpekkila
|
6907d74ea3
|
Suppressed an unused variable warning for globalVertexIdx
|
2019-07-03 18:46:17 +03:00 |
|
jpekkila
|
7d6255ba14
|
Suppressed unused variable warnings in kernels.cuh
|
2019-07-03 18:12:48 +03:00 |
|
jpekkila
|
81a09501b8
|
Removed deprecated LNT0 and LNRHO0 defines, now the actual configuration parameters are used (AC_lnrho0 and AC_lnT0). Also accidental autoformatting again, there seems to be stray spaces before linebreaks in some files which get automatically removed by my text editor
|
2019-07-03 17:23:37 +03:00 |
|
jpekkila
|
8ed947ce98
|
Removed deprecated sinusoidal forcing from kernels.cuh
|
2019-07-03 17:13:45 +03:00 |
|
jpekkila
|
d54ccc1da8
|
Deprecated a block of old code that was used a long time ago for testing forcing
|
2019-07-03 17:10:01 +03:00 |
|
jpekkila
|
08e9a32cb1
|
Added a comment about acForcingVec
|
2019-07-03 16:37:16 +03:00 |
|
jpekkila
|
d4d2680f40
|
Added a new generic function to the interface (astaroth.h) for loading arbitrary device constants. Also (unintended) autoformatting.
|
2019-07-03 16:19:25 +03:00 |
|
Miikka Vaisala
|
03689709df
|
Merge branch 'master' into forcing
|
2019-07-02 16:43:10 +08:00 |
|
jpekkila
|
a3ca6cf132
|
Added skeletons for packing parts of the ghost zones into buffers to speed up data transfers
|
2019-07-01 13:56:05 +03:00 |
|
Miikka Vaisala
|
9f0be0d9ff
|
Solved the forcing function boundary problem.
|
2019-07-01 11:06:42 +08:00 |
|
jpekkila
|
0c63d55fd7
|
Worked around a compiler bug in CUDA 9.1, which caused an "Internal Compiler Error (codegen): "there was an error in verifying the lgenfe output!". Apparently the compiler got confused by overloaded is_valid() if the input parameter was not passed as a reference in both cases.
|
2019-06-29 10:49:15 +03:00 |
|
jpekkila
|
7e40889245
|
Grid and subgrid dimensions are now only printed if VERBOSE_PRINTING == 1
|
2019-06-27 12:54:36 +03:00 |
|
Miikka Vaisala
|
d30b866a21
|
Merge branch 'master' into forcing
Now I need to test what works...
Conflicts:
acc/mhd_solver/stencil_process.sps
|
2019-06-27 11:22:31 +08:00 |
|
jpekkila
|
401172bb74
|
Formatting
|
2019-06-26 19:43:37 +03:00 |
|
jpekkila
|
ee075e6741
|
Set the default number of devices to 0 (this is updated at acInit()
|
2019-06-26 19:42:49 +03:00 |
|
jpekkila
|
cda17c9b08
|
VERBOSE_PRINTING flag is now globally used in the whole program and should be used to suppress development/debugging-related printing. Also added comments to the new interface function acCheckDeviceAvailability and made it free from side effects.
|
2019-06-26 18:50:15 +03:00 |
|
Matthias Rheinhardt
|
0bc8b7e827
|
MR: VTXBUF_DENSITY -> VTXBUF_LNRHO, minor
|
2019-06-26 17:14:24 +03:00 |
|
Matthias Rheinhardt
|
522da0041f
|
MR: new name for GetDevice
|
2019-06-26 16:53:56 +03:00 |
|
jpekkila
|
6bfc5f04f7
|
Added tighter bounds for gcc and nvcc versions. There was a bit of an chicken-and-egg issue: we need gcc 6.0 in order to get bug 48891 (see gcc bugzilla) fixed, but cuda < 9 supports gcc only up to 5.3. This is not a perfect solution, f.ex. ubuntu 16.04 ships with gcc 5.4 but with the fix backported from later versions so in practice that would also work but is not accepted anymore.
|
2019-06-26 13:33:03 +03:00 |
|
Miikka Vaisala
|
be0e46c814
|
Can move forcing vector information now from the host to device.
next step in to generate random waves in the CPU with a chosen degree of helicity etc.
|
2019-06-26 17:41:39 +08:00 |
|