Commit Graph

199 Commits

Author SHA1 Message Date
jpekkila
263a1d23a3 Added a function for loading ScalarArrays to the GPU 2019-09-05 16:35:08 +03:00
jpekkila
9e57aba9b7 New feature: ScalarArray. ScalarArrays are read-only 1D arrays containing max(mx, max(my, mz)) elements. ScalarArray is a new type of uniform and can be used for storing f.ex. forcing profiles. The DSL now also supports complex numbers and some basic arithmetic (exp, multiplication) 2019-09-02 21:26:57 +03:00
jpekkila
6ea02fa28e DSL now 'feature complete' with respect to what I had in mind before the summer. Users can now create multiple kernels and the library functions are generated automatically for them. The generated library functions are of the form acDeviceKernel_<name> and acNodeKernel_<name>. More features are needed though. The next features to be added at some point are 1D and 2D device constant arrays in order to support profiles for f.ex. forcing. 2019-08-27 18:19:20 +03:00
jpekkila
20138263f4 The previous attempt (dsl_feature_completeness_2019-08-23) to enable arbitrary kernel functions was a failure: we get significant performance loss (25-100%) if step_number is not passed as a template parameter to the integration kernel. Apparently the CUDA compiler cannot perform some optimizations if there is a if/else construct in a performance-critical part which cannot be evaluated at compile time. This branch keeps step_number as a template parameter but takes rest of the user parameters as uniforms (dt is no longer passed as a function parameter but as an uniform with the DSL instead). 2019-08-27 17:36:33 +03:00
jpekkila
022e46f2e7 Merge branch 'master' into dsl_parameter_overhaul_2019-08-19 2019-08-23 13:13:57 +03:00
jpekkila
f6040f89dc Added acPrintMeshInfo for printing all mesh parameters 2019-08-21 16:24:48 +03:00
jpekkila
39dcda4a04 Made warnings about unused functions go away (this is intended functionality and not all programs will use all types of device constants, thus unnecessary warning) 2019-08-21 14:28:46 +03:00
jpekkila
51cf1f1068 The C header is now generated from the DSL, stashing the changes just to be sure since I might overwrite something when updating the compilation scripts to work with this new scheme 2019-08-19 18:19:28 +03:00
jpekkila
d801ebdd41 Now parameters and vertexbuffers (fields) can be declared with the DSL only. TODO: translation from the DSL header to C 2019-08-19 17:35:03 +03:00
jpekkila
bcdd827a4f Added a proper declarations for all user-specified uniform. Note: built-in uniforms are not correctly translated into CUDA 2019-08-19 17:05:56 +03:00
jpekkila
0208d55e4e Moved STENCIL_ORDER and NGHOST out of user-defined parameter as these are actually internal defines used to configure the built-in functions. Additionally, renamed all explicitly declared uniforms from dsx -> AC_dsx in the DSL in preparation for having clear connection between DSL uniforms and the library parameter handles created by the user (AcRealParam etc) 2019-08-19 16:40:47 +03:00
jpekkila
787363226b Added functions for loading int, int3, scalar and vector constants to the device layer (acDeviceLoad...Constant) 2019-08-19 15:28:16 +03:00
jpekkila
41805dcb68 Added some error checking for the case where user supplies an incomplete meshinfo to acDeviceLoadMeshInfo 2019-08-19 15:17:51 +03:00
jpekkila
598799d7c3 Added a new function to the device interface: acDeviceLoadMeshInfo 2019-08-19 15:14:00 +03:00
jpekkila
e89897985e Battled with math.h and cmath. We probably should move from C standard libraries to C++ ones internally (in places which are not visible via the interface) 2019-08-19 14:02:30 +03:00
jpekkila
6d4d53342e Removed old comments 2019-08-15 11:14:52 +03:00
jpekkila
36fea70560 Moved basic built-in functions for vector operations to math_utils.h from integration.cuh so that they are shared with the CPU and GPU 2019-08-15 11:04:22 +03:00
jpekkila
d5b2e5bb42 Added placeholders for new built-in variables in the DSL. Also overloads to DCONST_INT etc. Naming still pending and old DCONST_REAL etc calls still work. 2019-08-12 14:05:35 +03:00
jpekkila
b8c4d07de2 Removed unnecessary comments 2019-08-12 13:31:24 +03:00
jpekkila
e027f7e548 Removed grid_n in astaroth.cu and replaced it with the new acNodeQueryDeviceConfiguration call 2019-08-12 13:25:47 +03:00
jpekkila
bba9ec7c3b Implemented acNodeQueryDeviceConfiguration 2019-08-12 11:40:38 +03:00
jpekkila
b5daf22c26 Added interface function acSynchronizeMesh 2019-08-12 10:25:05 +03:00
jpekkila
8bbb2cd5df Now prints device info before trying to run the dummy kernel 2019-08-12 09:46:37 +03:00
jpekkila
b53cabbc44 Made the DSL syntax less confusing: Input and output arrays are now ScalarField and VectorFields instead of scalars and vectors. C++ initializers are now also possible, removing the need to declare Fields as int or int3 which was very confusing, like "what, you assing an int value to a real, what the &^%@?" 2019-08-08 21:07:36 +03:00
jpekkila
5397495496 Added acLoadWithOffset 2019-08-08 20:43:01 +03:00
jpekkila
e79e1207f2 Added a function for checking whether CUDA-capable devices are available 2019-08-08 20:35:02 +03:00
jpekkila
8a9099d75e Added missing functions to fix backwards compatibility with the version interfaced with Pencil Code 2019-08-08 19:49:57 +03:00
jpekkila
322cdce52c Added some new comments + some helpful old comments from a time before the interface revision 2019-08-07 20:05:54 +03:00
jpekkila
1525e0603f Added some preliminary pragma omps and verified that acIntegrate works as it should. 2019-08-07 19:08:52 +03:00
jpekkila
c2bd5ae3e6 Simplified the optimized multi-GPU integration function 2019-08-07 18:17:03 +03:00
jpekkila
a930864f42 Merge branch 'master' into node_device_interface_revision_07-23 2019-08-07 07:43:28 +03:00
jpekkila
cf6b75f82a Merged in cmakelist_rewrite_and_C_API_conformity_07-26 (pull request #1) 2019-08-07 06:53:17 +03:00
jpekkila
6b53eb31ef Errors with forcing now down from 3 to 1 after switching from fast & inaccurate trig functions to more accurate ones 2019-08-06 19:29:40 +03:00
jpekkila
daee456660 Merge branch 'cmakelist_rewrite_and_C_API_conformity_07-26' into node_device_interface_revision_07-23 2019-08-06 17:57:30 +03:00
jpekkila
abf4815174 Merge branch 'master' into cmakelist_rewrite_and_C_API_conformity_07-26 2019-08-06 17:53:53 +03:00
jpekkila
5870081645 Split kernels.cuh into bounconds.cuh, integration.cuh and reductions.cuh 2019-08-06 17:50:41 +03:00
jpekkila
405fa4d6d6 Moved old kernels to kernels/deprecated 2019-08-06 17:46:52 +03:00
jpekkila
3726847683 Made globalGridN and d_multigpu_offsets built-in parameters. Note the renaming from globalGrid.n to globalGridN. 2019-08-06 16:39:15 +03:00
jpekkila
1dd9975528 Formatting 2019-08-06 15:44:51 +03:00
jpekkila
b2632c87b4 Merge branch 'cmakelist_rewrite_and_C_API_conformity_07-26' into node_device_interface_revision_07-23 2019-08-06 15:18:33 +03:00
jpekkila
280804a438 Merge branch 'master' into cmakelist_rewrite_and_C_API_conformity_07-26 2019-08-06 15:14:33 +03:00
jpekkila
5f4246fb42 Standalone now uses O2 optimization level instead of O3. Also removed -march=native since this causes issues if the program is compiled on a different architecture than it is run on. Since we do not do heavy arithmetic on the host side and the host code is not performance-critical part of the code, -march-native is not very useful anyways 2019-08-06 14:46:13 +03:00
jpekkila
b73c2675e8 Added the optimized implementation of acNodeIntegrate where boundconds are done before integration instead of after 2019-08-05 20:10:13 +03:00
jpekkila
8df49370c8 Cleanup 2019-08-05 19:08:05 +03:00
jpekkila
fa6e1116cb The interface revision now actually works. The issue was incorrect order of src and dst indices when storing the mesh. 2019-08-05 17:26:05 +03:00
jpekkila
5232d987c1 Added acStoreWithOffset to the revised interface 2019-08-05 16:18:22 +03:00
jpekkila
f3de2fa03c Made globalVertexIdx available during preprocessing. NOTE: potentially dangerous. globalVertexIdx should never be used for reading data from the vertex buffers. 2019-08-05 15:03:02 +03:00
jpekkila
6dfd03664d Still does not work. I'm starting to think that instead of this one huge revision, we should modify the existing interface step-by-step. 2019-08-02 15:31:24 +03:00
jpekkila
5f2378e91b Now compiles (does not work though) 2019-08-02 15:15:18 +03:00
jpekkila
567ad61465 Multinode MPI implementation should be done later in its own branch. The focus of this branch is to revise the node and device layers. Commented out references to the Grid layer. 2019-08-02 13:54:54 +03:00