Commit Graph

173 Commits

Author SHA1 Message Date
jpekkila
26bbfa089d Better multi-node communication: fire and forget. 2019-10-17 18:17:37 +03:00
jpekkila
3d852e5082 Added timing to the MPI benchmark 2019-10-17 17:43:54 +03:00
jpekkila
588a94c772 Added more MPI stuff. Now multi-node GPU-GPU communication with GPUDirect RDMA should work. Also device memory is now allocated in unified memory by default as this makes MPI communication simpler if RDMA is not supported. This does not affect Astaroth any other way since different devices use different portions of the memory space and we continue managing memory transfers manually. 2019-10-17 16:09:05 +03:00
jpekkila
0e88d6c339 Marked some internal functions static 2019-10-17 14:41:44 +03:00
jpekkila
f1e988ba6a Added stuff for the device layer for testing GPU-GPU MPI. This is a quick and dirty solution which is primarily meant for benchmarking/verification. Figuring out what the MPI interface should look like is more challenging and is not the priority right now 2019-10-17 14:40:53 +03:00
jpekkila
65a2d47ef7 Made grid.cu (multi-node) to compile without errors. Not used though. 2019-10-17 13:03:42 +03:00
jpekkila
0865f0499b Various improvements to the MPI-GPU implementation, but linking MPI libraries with both the host C-project and the core library seems to be a major pain. Currently the communication is done via gpu->cpu->cpu->gpu. 2019-10-15 19:32:16 +03:00
jpekkila
113be456d6 Undeprecated the wrong function in commit b693c8a 2019-10-15 18:11:07 +03:00
jpekkila
1ca089c163 New cmake option: MPI_ENABLED. Enables MPI functions on the device layer 2019-10-15 17:57:53 +03:00
jpekkila
b693c8adb4 Undeprecated acDeviceLoadMesh and acDeviceStoreMesh, these are actually very nice to have 2019-10-15 16:12:31 +03:00
jpekkila
08188f3f5b is_valid is now consistently overloaded (parameter passed as a reference). Older CUDA compilers complained about this. 2019-10-14 21:18:21 +03:00
jpekkila
08f155cbec Finetuning some error checks 2019-10-07 20:40:32 +03:00
jpekkila
5d4f47c3d2 Added overloads for vector in-place addition and subtraction 2019-10-07 19:40:54 +03:00
jpekkila
ba49e7e400 Replaced deprecated DCONST_INT calls with overloaded DCONST() 2019-10-07 19:40:27 +03:00
jpekkila
66cfcefb34 More error checks 2019-10-07 17:00:23 +03:00
jpekkila
0e1d1b9fb4 Some optimizations for DSL compilation. Also a new feature: Inplace addition and subtraction += and -= are now allowed 2019-10-07 16:33:24 +03:00
jpekkila
f7c079be2a Removed everything unnecessary from integration.cuh. Now all derivatives etc are available in a standard library header (acc/stdlib/stdderiv.h) 2019-10-07 15:47:33 +03:00
jpekkila
9a16c79ce6 Renamed all references to uniforms to f.ex. loadScalarConstant -> loadScalarUniform (for consistency with the DSL) 2019-10-01 17:12:20 +03:00
jpekkila
2c8c49ee24 Removed or updated some old .gitignore files 2019-09-24 17:50:41 +03:00
jpekkila
e4eea7db83 Added support for Volta GPUs 2019-09-24 17:19:45 +03:00
jpekkila
3bb6ca1712 The Astaroth Code Compiler (acc) is now built with cmake. Additionally, make is now used to generate the CUDA headers from DSL sources. The headers are also properly regenerated whenever a DSL file has been changed. With this commit, the DSL is now seamlessly integrated to the library and we no longer need complicated scripts to figure out the correct files. The current workflow for using custom DSL sources is to pass the DSL module directory to cmake, f.ex. cmake -DDSL_MODULE_DIR=/acc/mhd_solver. Note that the path must be absolute or then given relative to the CMakeLists.txt directory. f.ex cd build && cmake -DDSL_MODULE_DIR=../acc/mhd_solver does not work. CMake then takes all DSL files in that directory and handles the rest. 2019-09-18 17:28:29 +03:00
jpekkila
bce3e4de03 Made warnings about unused device functions go away 2019-09-18 16:58:04 +03:00
jpekkila
021e5f3774 Renamed NUM_STREAM_TYPES -> NUM_STREAMS 2019-09-12 15:48:38 +03:00
jpekkila
53230c9b61 Added errorchecking and more flexibility the the new acDeviceLoadScalarArray function 2019-09-05 19:56:04 +03:00
jpekkila
263a1d23a3 Added a function for loading ScalarArrays to the GPU 2019-09-05 16:35:08 +03:00
jpekkila
9e57aba9b7 New feature: ScalarArray. ScalarArrays are read-only 1D arrays containing max(mx, max(my, mz)) elements. ScalarArray is a new type of uniform and can be used for storing f.ex. forcing profiles. The DSL now also supports complex numbers and some basic arithmetic (exp, multiplication) 2019-09-02 21:26:57 +03:00
jpekkila
6ea02fa28e DSL now 'feature complete' with respect to what I had in mind before the summer. Users can now create multiple kernels and the library functions are generated automatically for them. The generated library functions are of the form acDeviceKernel_<name> and acNodeKernel_<name>. More features are needed though. The next features to be added at some point are 1D and 2D device constant arrays in order to support profiles for f.ex. forcing. 2019-08-27 18:19:20 +03:00
jpekkila
20138263f4 The previous attempt (dsl_feature_completeness_2019-08-23) to enable arbitrary kernel functions was a failure: we get significant performance loss (25-100%) if step_number is not passed as a template parameter to the integration kernel. Apparently the CUDA compiler cannot perform some optimizations if there is a if/else construct in a performance-critical part which cannot be evaluated at compile time. This branch keeps step_number as a template parameter but takes rest of the user parameters as uniforms (dt is no longer passed as a function parameter but as an uniform with the DSL instead). 2019-08-27 17:36:33 +03:00
jpekkila
022e46f2e7 Merge branch 'master' into dsl_parameter_overhaul_2019-08-19 2019-08-23 13:13:57 +03:00
jpekkila
f6040f89dc Added acPrintMeshInfo for printing all mesh parameters 2019-08-21 16:24:48 +03:00
jpekkila
39dcda4a04 Made warnings about unused functions go away (this is intended functionality and not all programs will use all types of device constants, thus unnecessary warning) 2019-08-21 14:28:46 +03:00
jpekkila
51cf1f1068 The C header is now generated from the DSL, stashing the changes just to be sure since I might overwrite something when updating the compilation scripts to work with this new scheme 2019-08-19 18:19:28 +03:00
jpekkila
d801ebdd41 Now parameters and vertexbuffers (fields) can be declared with the DSL only. TODO: translation from the DSL header to C 2019-08-19 17:35:03 +03:00
jpekkila
bcdd827a4f Added a proper declarations for all user-specified uniform. Note: built-in uniforms are not correctly translated into CUDA 2019-08-19 17:05:56 +03:00
jpekkila
0208d55e4e Moved STENCIL_ORDER and NGHOST out of user-defined parameter as these are actually internal defines used to configure the built-in functions. Additionally, renamed all explicitly declared uniforms from dsx -> AC_dsx in the DSL in preparation for having clear connection between DSL uniforms and the library parameter handles created by the user (AcRealParam etc) 2019-08-19 16:40:47 +03:00
jpekkila
787363226b Added functions for loading int, int3, scalar and vector constants to the device layer (acDeviceLoad...Constant) 2019-08-19 15:28:16 +03:00
jpekkila
41805dcb68 Added some error checking for the case where user supplies an incomplete meshinfo to acDeviceLoadMeshInfo 2019-08-19 15:17:51 +03:00
jpekkila
598799d7c3 Added a new function to the device interface: acDeviceLoadMeshInfo 2019-08-19 15:14:00 +03:00
jpekkila
e89897985e Battled with math.h and cmath. We probably should move from C standard libraries to C++ ones internally (in places which are not visible via the interface) 2019-08-19 14:02:30 +03:00
jpekkila
6d4d53342e Removed old comments 2019-08-15 11:14:52 +03:00
jpekkila
36fea70560 Moved basic built-in functions for vector operations to math_utils.h from integration.cuh so that they are shared with the CPU and GPU 2019-08-15 11:04:22 +03:00
jpekkila
d5b2e5bb42 Added placeholders for new built-in variables in the DSL. Also overloads to DCONST_INT etc. Naming still pending and old DCONST_REAL etc calls still work. 2019-08-12 14:05:35 +03:00
jpekkila
b8c4d07de2 Removed unnecessary comments 2019-08-12 13:31:24 +03:00
jpekkila
e027f7e548 Removed grid_n in astaroth.cu and replaced it with the new acNodeQueryDeviceConfiguration call 2019-08-12 13:25:47 +03:00
jpekkila
bba9ec7c3b Implemented acNodeQueryDeviceConfiguration 2019-08-12 11:40:38 +03:00
jpekkila
b5daf22c26 Added interface function acSynchronizeMesh 2019-08-12 10:25:05 +03:00
jpekkila
8bbb2cd5df Now prints device info before trying to run the dummy kernel 2019-08-12 09:46:37 +03:00
jpekkila
b53cabbc44 Made the DSL syntax less confusing: Input and output arrays are now ScalarField and VectorFields instead of scalars and vectors. C++ initializers are now also possible, removing the need to declare Fields as int or int3 which was very confusing, like "what, you assing an int value to a real, what the &^%@?" 2019-08-08 21:07:36 +03:00
jpekkila
5397495496 Added acLoadWithOffset 2019-08-08 20:43:01 +03:00
jpekkila
e79e1207f2 Added a function for checking whether CUDA-capable devices are available 2019-08-08 20:35:02 +03:00