jpekkila
6b53eb31ef
Errors with forcing now down from 3 to 1 after switching from fast & inaccurate trig functions to more accurate ones
2019-08-06 19:29:40 +03:00
jpekkila
d7e26e8f21
Added forcing from stencil_process.sps to autotests. 3 Tests fail.
2019-08-06 19:15:28 +03:00
jpekkila
0e0ace3970
Pure hydro now works with autotests
2019-08-06 18:07:29 +03:00
jpekkila
5870081645
Split kernels.cuh into bounconds.cuh, integration.cuh and reductions.cuh
2019-08-06 17:50:41 +03:00
jpekkila
405fa4d6d6
Moved old kernels to kernels/deprecated
2019-08-06 17:46:52 +03:00
jpekkila
e4d9898f35
Added improvements to autotest.cc
2019-08-06 17:40:27 +03:00
jpekkila
f3de2fa03c
Made globalVertexIdx available during preprocessing. NOTE: potentially dangerous. globalVertexIdx should never be used for reading data from the vertex buffers.
2019-08-05 15:03:02 +03:00
jpekkila
62100b1140
Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth
2019-07-30 14:28:25 +03:00
jpekkila
69deef66fe
Added sum reduction. NOTE: Scalar sum does not pass the automated test but vector sum does. I couldn't see anything wrong with the code itself and I strongly suspect that the failures are caused by loss of precision due to summing a huge amount of numbers of different magnitudes. However I'm not yet completely sure. Something like the Kahan summation algorithm might be useful if the errors are really caused by fp arithmetic.
2019-07-30 14:28:18 +03:00
jpekkila
fdc1e7333c
Added macros for getting int3 and AcReal3 device constants from within kernels (and DSL).
2019-07-30 09:10:06 +00:00
jpekkila
323d4e3b31
Replaced all calls to AC_VTXBUF_IDX to acVertexBufferIdx etc in all files
2019-07-23 14:37:28 +03:00
jpekkila
fee03b7149
Moved some device limits used only during auto-optimization from astaroth.h to device.cu
2019-07-22 19:54:46 +03:00
jpekkila
85883dbc38
NUM_INT_PARAM_TYPES is now NUM_INT_PARAMS etc, replaced these throughout the project
2019-07-22 19:53:45 +03:00
jpekkila
074eae0bae
Added definitions of AC_GEN_STR and AC_GEN_ID to host_memory.h and .cc since they are no longer available from astaroth.h
2019-07-22 19:49:29 +03:00
jpekkila
f74df5339f
Cleaned up the include directory: removed all unnecessary stuff and moved common definitions to a separate file
2019-07-22 19:46:45 +03:00
jpekkila
84af939e5d
The default benchmark is now more suitable for timing multi-GPU performance
2019-07-22 13:08:33 +03:00
jpekkila
01a013f3bc
Added WARNCHK_CUDA_ALWAYS to errchk.h
2019-07-22 13:05:08 +03:00
jpekkila
a950be99f2
Streams now created with priority (all streams have the same priority by default)
2019-07-22 13:04:04 +03:00
jpekkila
168b3c4d8b
Peer access to neighboring GPUs is now enabled during initialization
2019-07-22 13:02:19 +03:00
jpekkila
0db61dd411
Disabled the project-wide maxrregcount flag by default since it is only beneficial for resource-heavy kernels. The maximum register count should be defined per kernel instead if needed.
2019-07-22 12:58:28 +03:00
Miikka Vaisala
074fb26df9
Added TODO_SINK comments.
...
The comments were written to map out what essential part are needed for
resolving a system with graviating sink particles. No changes to the code
itself.
2019-07-17 14:05:48 +08:00
jpekkila
78aba6428e
Updated the copyright years throughout the project
2019-07-16 14:28:32 +03:00
jpekkila
93fc121f5c
Introduced versions of the asynchronous functions which take a stream as a parameter
2019-07-10 15:49:21 +03:00
jpekkila
bd98eaf9f7
Added a stream to loadDeviceConstant call.
2019-07-10 15:29:54 +03:00
jpekkila
b08d5b26f5
cudaMemcpyToSymbol -> cudaMemcpyToSymbolAsync
2019-07-10 15:05:57 +03:00
jpekkila
976bf05c8d
Wrong scope for num_iterations in the last commit, fixed
2019-07-10 14:37:32 +03:00
jpekkila
866ec8a192
Removed some old hack I used for benchmarking a while back
2019-07-10 14:34:05 +03:00
jpekkila
e14e19774d
Added a synchronization to benchmark.cc that is now required when calling acIntegrateStep
2019-07-09 19:03:45 +03:00
jpekkila
8cc9281045
Double versions of some sqrt, cos and sin were used in model_rk3.cc instead of the long double versions, fixed.
2019-07-09 19:03:15 +03:00
jpekkila
e6c770cbee
Added a synchronization after acLoadDeviceConstant since it is now stated to be asynchronous
2019-07-09 19:00:08 +03:00
jpekkila
d0b95c39b6
Disabled writing out unnecessary files when auto-optimizing the code
2019-07-09 18:51:04 +03:00
jpekkila
0bda016e17
Reviewed the Astaroth interface. Now there's a clear distinction between synchronous and asynchronous functions. For basic usage, we provide a set of functions that are always safe to call (acIntegrate, acLoad, etc), but because of this, must be quite restricted in the sense that f.ex. the whole mesh must be loaded at once and computations cannot be executed concurrently on multiple GPUs. For more advanced users we provide asynchronous functions (such as acLoadWithOffset). Since we cannot know how the asynchronous functions are called (for example, when the integration step has been fully completed and the halos of neighboring subgrids can be safely communicated between GPUs), the responsibility of synchronization must be left to the user. In the existing implementations we currently use only the basic "safe" set of functions (except in renderer.cc), so the existing functionality has not been changed with these latests commits. Autotests also pass.
2019-07-09 18:42:00 +03:00
jpekkila
1251f61570
Removed a stray acBoundcondStep() in acStore where it definitely shouln't be. Removed code duplication: acBoundcondStep now uses the new acLocalBoundcondStep and acGlobalBoundcondStep functions.
2019-07-09 17:08:18 +03:00
jpekkila
10a98b01a9
Experimental change: now the integration function is automatically optimized during acInit
2019-07-09 14:46:24 +03:00
jpekkila
a086821e7c
Added a function acAutoOptimize to the interface and removed rk3_step_async in kernels.cuh (moved into rkStep)
2019-07-09 14:21:22 +03:00
jpekkila
84d96de42b
Merge branch 'master' into multigpu_optimization_2019-07-05
2019-07-09 13:40:33 +03:00
jpekkila
508d15b578
Switched from math.h to cmath in math_utils.h. The old-school C math functions are bugged/not overloaded properly in GCC < 6.0 when compiling C++.
2019-07-09 13:37:08 +03:00
jpekkila
deebe570da
Merge branch 'master' into multigpu_optimization_2019-07-05
2019-07-08 16:11:24 +03:00
jpekkila
eda2f6543b
Created a new ForcingParams structure and some functions for generating and transferring the forcing parameters to the host/device
2019-07-08 15:43:37 +03:00
Miikka Vaisala
f9be905703
Corrected an unit coversion issue from forcing.
...
Now noticing these because of switching to gcc 8.
2019-07-08 16:43:37 +08:00
Miikka Vaisala
6ba15c3a7c
props.totalConstMem and props.sharedMemPerBlock cause assembler error
...
while compiling on TIARA gp cluster. Therefore commeted out.
2019-07-08 11:00:12 +08:00
jpekkila
5fdfdeca9e
Multi-GPU optimizations: removed some unnecessary synchronization and divided the calculation of boundary conditions to local and global steps.
2019-07-05 18:21:44 +03:00
jpekkila
f1066a2c11
Added preliminary pragmas for dispatching commands simultaneously to multiple GPUs (commented out)
2019-07-05 17:16:12 +03:00
jpekkila
2092adc0f6
Preparations for multi-GPU optimizations
2019-07-05 15:44:30 +03:00
jpekkila
ce8fe53f91
Moved explanations and comments to the beginning of astaroth.cu. No code changes.
2019-07-05 15:39:52 +03:00
jpekkila
d87eb36f5a
Formatting: brackets around a for loop for consistency
2019-07-05 15:26:19 +03:00
jpekkila
224b91b83a
Added more control for synchronizing streams and halos among the GPUs
2019-07-05 15:17:20 +03:00
jpekkila
332f1a4f40
Reordered some of the functions in astaroth.cu and introduced acExchangeHalos() for synchronizing the part of the grid that is independent from the chosen boundary conditions between subgrids.
2019-07-05 15:01:51 +03:00
jpekkila
c71711ec36
Disabled real-time visualization by default. SDL2 is no longer a dependency when building with the default flags.
2019-07-04 22:30:26 +03:00
jpekkila
ad7a497eef
Added a comment about timestepping and autoformat
2019-07-04 17:25:54 +03:00