jpekkila
|
5232d987c1
|
Added acStoreWithOffset to the revised interface
|
2019-08-05 16:18:22 +03:00 |
|
jpekkila
|
567ad61465
|
Multinode MPI implementation should be done later in its own branch. The focus of this branch is to revise the node and device layers. Commented out references to the Grid layer.
|
2019-08-02 13:54:54 +03:00 |
|
jpekkila
|
2b6bf10ae6
|
Dummy implementation of the Grid interface
|
2019-08-01 18:37:36 +03:00 |
|
jpekkila
|
5be775dbff
|
Various intermediate changes
|
2019-07-31 17:48:48 +03:00 |
|
jpekkila
|
efd9d54fef
|
Stashing WIP changes (interface revision) s.t. I can continue work on a different machine
|
2019-07-30 14:34:44 +03:00 |
|
jpekkila
|
1ceb6739ae
|
Merge branch 'master' into node_device_interface_revision_07-23
|
2019-07-30 14:31:33 +03:00 |
|
jpekkila
|
69deef66fe
|
Added sum reduction. NOTE: Scalar sum does not pass the automated test but vector sum does. I couldn't see anything wrong with the code itself and I strongly suspect that the failures are caused by loss of precision due to summing a huge amount of numbers of different magnitudes. However I'm not yet completely sure. Something like the Kahan summation algorithm might be useful if the errors are really caused by fp arithmetic.
|
2019-07-30 14:28:18 +03:00 |
|
jpekkila
|
b65454d523
|
Stashed some testing files used to make sure that the library can also be used from pure C projects (better compatibility). These changes will never go to master as-is.
|
2019-07-23 18:24:47 +03:00 |
|
jpekkila
|
323d4e3b31
|
Replaced all calls to AC_VTXBUF_IDX to acVertexBufferIdx etc in all files
|
2019-07-23 14:37:28 +03:00 |
|
jpekkila
|
f74df5339f
|
Cleaned up the include directory: removed all unnecessary stuff and moved common definitions to a separate file
|
2019-07-22 19:46:45 +03:00 |
|
jpekkila
|
168b3c4d8b
|
Peer access to neighboring GPUs is now enabled during initialization
|
2019-07-22 13:02:19 +03:00 |
|
jpekkila
|
78aba6428e
|
Updated the copyright years throughout the project
|
2019-07-16 14:28:32 +03:00 |
|
jpekkila
|
93fc121f5c
|
Introduced versions of the asynchronous functions which take a stream as a parameter
|
2019-07-10 15:49:21 +03:00 |
|
jpekkila
|
bd98eaf9f7
|
Added a stream to loadDeviceConstant call.
|
2019-07-10 15:29:54 +03:00 |
|
jpekkila
|
0bda016e17
|
Reviewed the Astaroth interface. Now there's a clear distinction between synchronous and asynchronous functions. For basic usage, we provide a set of functions that are always safe to call (acIntegrate, acLoad, etc), but because of this, must be quite restricted in the sense that f.ex. the whole mesh must be loaded at once and computations cannot be executed concurrently on multiple GPUs. For more advanced users we provide asynchronous functions (such as acLoadWithOffset). Since we cannot know how the asynchronous functions are called (for example, when the integration step has been fully completed and the halos of neighboring subgrids can be safely communicated between GPUs), the responsibility of synchronization must be left to the user. In the existing implementations we currently use only the basic "safe" set of functions (except in renderer.cc), so the existing functionality has not been changed with these latests commits. Autotests also pass.
|
2019-07-09 18:42:00 +03:00 |
|
jpekkila
|
1251f61570
|
Removed a stray acBoundcondStep() in acStore where it definitely shouln't be. Removed code duplication: acBoundcondStep now uses the new acLocalBoundcondStep and acGlobalBoundcondStep functions.
|
2019-07-09 17:08:18 +03:00 |
|
jpekkila
|
a086821e7c
|
Added a function acAutoOptimize to the interface and removed rk3_step_async in kernels.cuh (moved into rkStep)
|
2019-07-09 14:21:22 +03:00 |
|
jpekkila
|
5fdfdeca9e
|
Multi-GPU optimizations: removed some unnecessary synchronization and divided the calculation of boundary conditions to local and global steps.
|
2019-07-05 18:21:44 +03:00 |
|
jpekkila
|
f1066a2c11
|
Added preliminary pragmas for dispatching commands simultaneously to multiple GPUs (commented out)
|
2019-07-05 17:16:12 +03:00 |
|
jpekkila
|
2092adc0f6
|
Preparations for multi-GPU optimizations
|
2019-07-05 15:44:30 +03:00 |
|
jpekkila
|
ce8fe53f91
|
Moved explanations and comments to the beginning of astaroth.cu. No code changes.
|
2019-07-05 15:39:52 +03:00 |
|
jpekkila
|
224b91b83a
|
Added more control for synchronizing streams and halos among the GPUs
|
2019-07-05 15:17:20 +03:00 |
|
jpekkila
|
332f1a4f40
|
Reordered some of the functions in astaroth.cu and introduced acExchangeHalos() for synchronizing the part of the grid that is independent from the chosen boundary conditions between subgrids.
|
2019-07-05 15:01:51 +03:00 |
|
jpekkila
|
d1a93b7d4e
|
acIntegrateStepWithOffset corrected and confirmed to work on 1-4 GPUs
|
2019-07-04 16:58:24 +03:00 |
|
jpekkila
|
01437411b6
|
Comment
|
2019-07-04 16:39:20 +03:00 |
|
jpekkila
|
91f119e8dd
|
Deprecated the old implementation of acIntegrateStep. acIntegrateStep now calls acIntegrateStepWithOffset instead of device.cuh functions.
|
2019-07-04 16:37:55 +03:00 |
|
jpekkila
|
5049dadc1c
|
Implemented acIntegrateStepWithOffset
|
2019-07-04 16:31:16 +03:00 |
|
jpekkila
|
a53e0a170d
|
Overloaded max/min for int3 and removed old comments
|
2019-07-04 16:24:08 +03:00 |
|
jpekkila
|
e1d545b0eb
|
Code readability and cleanup (remembered that int3 has + and - operators defined in math_utils.h)
|
2019-07-04 16:16:49 +03:00 |
|
jpekkila
|
30254d9abb
|
Removed a redundant and old gridIdxx function which I though I already removed a long time ago.
|
2019-07-04 16:10:29 +03:00 |
|
jpekkila
|
0884c4bf38
|
Moved the definition of acForcingVec to host_forcing.cc since it depends on user parameters that may not be defined in all projects
|
2019-07-04 15:28:18 +03:00 |
|
jpekkila
|
7abb959828
|
Overhaul to the user-defined parameters done: All logical switches, parameters and vertex buffer handles are now defined in a single header file (the default location is acc/mhd_solver/stencil_defines.h). This header is used when preprocessing the DSL sources and is linked to the include/ directory when calling scripts/compile_acc.sh. astaroth.h is now used for configuring internal stuff only and should not be modified by users
|
2019-07-03 19:01:16 +03:00 |
|
jpekkila
|
08e9a32cb1
|
Added a comment about acForcingVec
|
2019-07-03 16:37:16 +03:00 |
|
jpekkila
|
d4d2680f40
|
Added a new generic function to the interface (astaroth.h) for loading arbitrary device constants. Also (unintended) autoformatting.
|
2019-07-03 16:19:25 +03:00 |
|
Miikka Vaisala
|
03689709df
|
Merge branch 'master' into forcing
|
2019-07-02 16:43:10 +08:00 |
|
Miikka Vaisala
|
9f0be0d9ff
|
Solved the forcing function boundary problem.
|
2019-07-01 11:06:42 +08:00 |
|
jpekkila
|
7e40889245
|
Grid and subgrid dimensions are now only printed if VERBOSE_PRINTING == 1
|
2019-06-27 12:54:36 +03:00 |
|
Miikka Vaisala
|
d30b866a21
|
Merge branch 'master' into forcing
Now I need to test what works...
Conflicts:
acc/mhd_solver/stencil_process.sps
|
2019-06-27 11:22:31 +08:00 |
|
jpekkila
|
401172bb74
|
Formatting
|
2019-06-26 19:43:37 +03:00 |
|
jpekkila
|
ee075e6741
|
Set the default number of devices to 0 (this is updated at acInit()
|
2019-06-26 19:42:49 +03:00 |
|
jpekkila
|
cda17c9b08
|
VERBOSE_PRINTING flag is now globally used in the whole program and should be used to suppress development/debugging-related printing. Also added comments to the new interface function acCheckDeviceAvailability and made it free from side effects.
|
2019-06-26 18:50:15 +03:00 |
|
Matthias Rheinhardt
|
0bc8b7e827
|
MR: VTXBUF_DENSITY -> VTXBUF_LNRHO, minor
|
2019-06-26 17:14:24 +03:00 |
|
Matthias Rheinhardt
|
522da0041f
|
MR: new name for GetDevice
|
2019-06-26 16:53:56 +03:00 |
|
Miikka Vaisala
|
be0e46c814
|
Can move forcing vector information now from the host to device.
next step in to generate random waves in the CPU with a chosen degree of helicity etc.
|
2019-06-26 17:41:39 +08:00 |
|
Miikka Vaisala
|
231a8aa06e
|
Trying to figure out how to upload values to GPU.
|
2019-06-26 15:23:46 +08:00 |
|
jpekkila
|
2310186c71
|
Added a skeleton function for updating an arbitrary block inside the computational domain instead of the whole mesh
|
2019-06-19 19:43:46 +03:00 |
|
jpekkila
|
2eacb98246
|
Now acBoundcondStep is applied after acIntegrate to ensure that the whole grid visible to the host, including boundaries, are always up to date
|
2019-06-19 14:29:07 +03:00 |
|
jpekkila
|
8864266042
|
Autoformatted all CUDA/C/C++ code
|
2019-06-18 16:42:56 +03:00 |
|
jpekkila
|
4ca4dbefdf
|
Added the machinery for implementing forcing with the DSL on multiple GPUs and a simple model solution
|
2019-06-18 16:13:32 +03:00 |
|
jpekkila
|
59086b3e79
|
Added multi-GPU reductions. Tested to work with 1-2 GPUs with power of two grid dimensions. Requires more testing in special cases (when using exotic grid dimensions and a large number of GPUs)
|
2019-06-17 14:45:41 +03:00 |
|