Commit Graph

51 Commits

Author SHA1 Message Date
jpekkila
5232d987c1 Added acStoreWithOffset to the revised interface 2019-08-05 16:18:22 +03:00
jpekkila
567ad61465 Multinode MPI implementation should be done later in its own branch. The focus of this branch is to revise the node and device layers. Commented out references to the Grid layer. 2019-08-02 13:54:54 +03:00
jpekkila
2b6bf10ae6 Dummy implementation of the Grid interface 2019-08-01 18:37:36 +03:00
jpekkila
5be775dbff Various intermediate changes 2019-07-31 17:48:48 +03:00
jpekkila
efd9d54fef Stashing WIP changes (interface revision) s.t. I can continue work on a different machine 2019-07-30 14:34:44 +03:00
jpekkila
1ceb6739ae Merge branch 'master' into node_device_interface_revision_07-23 2019-07-30 14:31:33 +03:00
jpekkila
69deef66fe Added sum reduction. NOTE: Scalar sum does not pass the automated test but vector sum does. I couldn't see anything wrong with the code itself and I strongly suspect that the failures are caused by loss of precision due to summing a huge amount of numbers of different magnitudes. However I'm not yet completely sure. Something like the Kahan summation algorithm might be useful if the errors are really caused by fp arithmetic. 2019-07-30 14:28:18 +03:00
jpekkila
b65454d523 Stashed some testing files used to make sure that the library can also be used from pure C projects (better compatibility). These changes will never go to master as-is. 2019-07-23 18:24:47 +03:00
jpekkila
323d4e3b31 Replaced all calls to AC_VTXBUF_IDX to acVertexBufferIdx etc in all files 2019-07-23 14:37:28 +03:00
jpekkila
f74df5339f Cleaned up the include directory: removed all unnecessary stuff and moved common definitions to a separate file 2019-07-22 19:46:45 +03:00
jpekkila
168b3c4d8b Peer access to neighboring GPUs is now enabled during initialization 2019-07-22 13:02:19 +03:00
jpekkila
78aba6428e Updated the copyright years throughout the project 2019-07-16 14:28:32 +03:00
jpekkila
93fc121f5c Introduced versions of the asynchronous functions which take a stream as a parameter 2019-07-10 15:49:21 +03:00
jpekkila
bd98eaf9f7 Added a stream to loadDeviceConstant call. 2019-07-10 15:29:54 +03:00
jpekkila
0bda016e17 Reviewed the Astaroth interface. Now there's a clear distinction between synchronous and asynchronous functions. For basic usage, we provide a set of functions that are always safe to call (acIntegrate, acLoad, etc), but because of this, must be quite restricted in the sense that f.ex. the whole mesh must be loaded at once and computations cannot be executed concurrently on multiple GPUs. For more advanced users we provide asynchronous functions (such as acLoadWithOffset). Since we cannot know how the asynchronous functions are called (for example, when the integration step has been fully completed and the halos of neighboring subgrids can be safely communicated between GPUs), the responsibility of synchronization must be left to the user. In the existing implementations we currently use only the basic "safe" set of functions (except in renderer.cc), so the existing functionality has not been changed with these latests commits. Autotests also pass. 2019-07-09 18:42:00 +03:00
jpekkila
1251f61570 Removed a stray acBoundcondStep() in acStore where it definitely shouln't be. Removed code duplication: acBoundcondStep now uses the new acLocalBoundcondStep and acGlobalBoundcondStep functions. 2019-07-09 17:08:18 +03:00
jpekkila
a086821e7c Added a function acAutoOptimize to the interface and removed rk3_step_async in kernels.cuh (moved into rkStep) 2019-07-09 14:21:22 +03:00
jpekkila
5fdfdeca9e Multi-GPU optimizations: removed some unnecessary synchronization and divided the calculation of boundary conditions to local and global steps. 2019-07-05 18:21:44 +03:00
jpekkila
f1066a2c11 Added preliminary pragmas for dispatching commands simultaneously to multiple GPUs (commented out) 2019-07-05 17:16:12 +03:00
jpekkila
2092adc0f6 Preparations for multi-GPU optimizations 2019-07-05 15:44:30 +03:00
jpekkila
ce8fe53f91 Moved explanations and comments to the beginning of astaroth.cu. No code changes. 2019-07-05 15:39:52 +03:00
jpekkila
224b91b83a Added more control for synchronizing streams and halos among the GPUs 2019-07-05 15:17:20 +03:00
jpekkila
332f1a4f40 Reordered some of the functions in astaroth.cu and introduced acExchangeHalos() for synchronizing the part of the grid that is independent from the chosen boundary conditions between subgrids. 2019-07-05 15:01:51 +03:00
jpekkila
d1a93b7d4e acIntegrateStepWithOffset corrected and confirmed to work on 1-4 GPUs 2019-07-04 16:58:24 +03:00
jpekkila
01437411b6 Comment 2019-07-04 16:39:20 +03:00
jpekkila
91f119e8dd Deprecated the old implementation of acIntegrateStep. acIntegrateStep now calls acIntegrateStepWithOffset instead of device.cuh functions. 2019-07-04 16:37:55 +03:00
jpekkila
5049dadc1c Implemented acIntegrateStepWithOffset 2019-07-04 16:31:16 +03:00
jpekkila
a53e0a170d Overloaded max/min for int3 and removed old comments 2019-07-04 16:24:08 +03:00
jpekkila
e1d545b0eb Code readability and cleanup (remembered that int3 has + and - operators defined in math_utils.h) 2019-07-04 16:16:49 +03:00
jpekkila
30254d9abb Removed a redundant and old gridIdxx function which I though I already removed a long time ago. 2019-07-04 16:10:29 +03:00
jpekkila
0884c4bf38 Moved the definition of acForcingVec to host_forcing.cc since it depends on user parameters that may not be defined in all projects 2019-07-04 15:28:18 +03:00
jpekkila
7abb959828 Overhaul to the user-defined parameters done: All logical switches, parameters and vertex buffer handles are now defined in a single header file (the default location is acc/mhd_solver/stencil_defines.h). This header is used when preprocessing the DSL sources and is linked to the include/ directory when calling scripts/compile_acc.sh. astaroth.h is now used for configuring internal stuff only and should not be modified by users 2019-07-03 19:01:16 +03:00
jpekkila
08e9a32cb1 Added a comment about acForcingVec 2019-07-03 16:37:16 +03:00
jpekkila
d4d2680f40 Added a new generic function to the interface (astaroth.h) for loading arbitrary device constants. Also (unintended) autoformatting. 2019-07-03 16:19:25 +03:00
Miikka Vaisala
03689709df Merge branch 'master' into forcing 2019-07-02 16:43:10 +08:00
Miikka Vaisala
9f0be0d9ff Solved the forcing function boundary problem. 2019-07-01 11:06:42 +08:00
jpekkila
7e40889245 Grid and subgrid dimensions are now only printed if VERBOSE_PRINTING == 1 2019-06-27 12:54:36 +03:00
Miikka Vaisala
d30b866a21 Merge branch 'master' into forcing
Now I need to test what works...

Conflicts:
	acc/mhd_solver/stencil_process.sps
2019-06-27 11:22:31 +08:00
jpekkila
401172bb74 Formatting 2019-06-26 19:43:37 +03:00
jpekkila
ee075e6741 Set the default number of devices to 0 (this is updated at acInit() 2019-06-26 19:42:49 +03:00
jpekkila
cda17c9b08 VERBOSE_PRINTING flag is now globally used in the whole program and should be used to suppress development/debugging-related printing. Also added comments to the new interface function acCheckDeviceAvailability and made it free from side effects. 2019-06-26 18:50:15 +03:00
Matthias Rheinhardt
0bc8b7e827 MR: VTXBUF_DENSITY -> VTXBUF_LNRHO, minor 2019-06-26 17:14:24 +03:00
Matthias Rheinhardt
522da0041f MR: new name for GetDevice 2019-06-26 16:53:56 +03:00
Miikka Vaisala
be0e46c814 Can move forcing vector information now from the host to device.
next step in to generate random waves in the CPU with a chosen degree of helicity etc.
2019-06-26 17:41:39 +08:00
Miikka Vaisala
231a8aa06e Trying to figure out how to upload values to GPU. 2019-06-26 15:23:46 +08:00
jpekkila
2310186c71 Added a skeleton function for updating an arbitrary block inside the computational domain instead of the whole mesh 2019-06-19 19:43:46 +03:00
jpekkila
2eacb98246 Now acBoundcondStep is applied after acIntegrate to ensure that the whole grid visible to the host, including boundaries, are always up to date 2019-06-19 14:29:07 +03:00
jpekkila
8864266042 Autoformatted all CUDA/C/C++ code 2019-06-18 16:42:56 +03:00
jpekkila
4ca4dbefdf Added the machinery for implementing forcing with the DSL on multiple GPUs and a simple model solution 2019-06-18 16:13:32 +03:00
jpekkila
59086b3e79 Added multi-GPU reductions. Tested to work with 1-2 GPUs with power of two grid dimensions. Requires more testing in special cases (when using exotic grid dimensions and a large number of GPUs) 2019-06-17 14:45:41 +03:00