jpekkila
|
53230c9b61
|
Added errorchecking and more flexibility the the new acDeviceLoadScalarArray function
|
2019-09-05 19:56:04 +03:00 |
|
jpekkila
|
263a1d23a3
|
Added a function for loading ScalarArrays to the GPU
|
2019-09-05 16:35:08 +03:00 |
|
jpekkila
|
9e57aba9b7
|
New feature: ScalarArray. ScalarArrays are read-only 1D arrays containing max(mx, max(my, mz)) elements. ScalarArray is a new type of uniform and can be used for storing f.ex. forcing profiles. The DSL now also supports complex numbers and some basic arithmetic (exp, multiplication)
|
2019-09-02 21:26:57 +03:00 |
|
jpekkila
|
6ea02fa28e
|
DSL now 'feature complete' with respect to what I had in mind before the summer. Users can now create multiple kernels and the library functions are generated automatically for them. The generated library functions are of the form acDeviceKernel_<name> and acNodeKernel_<name>. More features are needed though. The next features to be added at some point are 1D and 2D device constant arrays in order to support profiles for f.ex. forcing.
|
2019-08-27 18:19:20 +03:00 |
|
jpekkila
|
20138263f4
|
The previous attempt (dsl_feature_completeness_2019-08-23) to enable arbitrary kernel functions was a failure: we get significant performance loss (25-100%) if step_number is not passed as a template parameter to the integration kernel. Apparently the CUDA compiler cannot perform some optimizations if there is a if/else construct in a performance-critical part which cannot be evaluated at compile time. This branch keeps step_number as a template parameter but takes rest of the user parameters as uniforms (dt is no longer passed as a function parameter but as an uniform with the DSL instead).
|
2019-08-27 17:36:33 +03:00 |
|
jpekkila
|
022e46f2e7
|
Merge branch 'master' into dsl_parameter_overhaul_2019-08-19
|
2019-08-23 13:13:57 +03:00 |
|
jpekkila
|
f6040f89dc
|
Added acPrintMeshInfo for printing all mesh parameters
|
2019-08-21 16:24:48 +03:00 |
|
jpekkila
|
39dcda4a04
|
Made warnings about unused functions go away (this is intended functionality and not all programs will use all types of device constants, thus unnecessary warning)
|
2019-08-21 14:28:46 +03:00 |
|
jpekkila
|
51cf1f1068
|
The C header is now generated from the DSL, stashing the changes just to be sure since I might overwrite something when updating the compilation scripts to work with this new scheme
|
2019-08-19 18:19:28 +03:00 |
|
jpekkila
|
d801ebdd41
|
Now parameters and vertexbuffers (fields) can be declared with the DSL only. TODO: translation from the DSL header to C
|
2019-08-19 17:35:03 +03:00 |
|
jpekkila
|
bcdd827a4f
|
Added a proper declarations for all user-specified uniform. Note: built-in uniforms are not correctly translated into CUDA
|
2019-08-19 17:05:56 +03:00 |
|
jpekkila
|
0208d55e4e
|
Moved STENCIL_ORDER and NGHOST out of user-defined parameter as these are actually internal defines used to configure the built-in functions. Additionally, renamed all explicitly declared uniforms from dsx -> AC_dsx in the DSL in preparation for having clear connection between DSL uniforms and the library parameter handles created by the user (AcRealParam etc)
|
2019-08-19 16:40:47 +03:00 |
|
jpekkila
|
787363226b
|
Added functions for loading int, int3, scalar and vector constants to the device layer (acDeviceLoad...Constant)
|
2019-08-19 15:28:16 +03:00 |
|
jpekkila
|
41805dcb68
|
Added some error checking for the case where user supplies an incomplete meshinfo to acDeviceLoadMeshInfo
|
2019-08-19 15:17:51 +03:00 |
|
jpekkila
|
598799d7c3
|
Added a new function to the device interface: acDeviceLoadMeshInfo
|
2019-08-19 15:14:00 +03:00 |
|
jpekkila
|
e89897985e
|
Battled with math.h and cmath. We probably should move from C standard libraries to C++ ones internally (in places which are not visible via the interface)
|
2019-08-19 14:02:30 +03:00 |
|
jpekkila
|
6d4d53342e
|
Removed old comments
|
2019-08-15 11:14:52 +03:00 |
|
jpekkila
|
36fea70560
|
Moved basic built-in functions for vector operations to math_utils.h from integration.cuh so that they are shared with the CPU and GPU
|
2019-08-15 11:04:22 +03:00 |
|
jpekkila
|
d5b2e5bb42
|
Added placeholders for new built-in variables in the DSL. Also overloads to DCONST_INT etc. Naming still pending and old DCONST_REAL etc calls still work.
|
2019-08-12 14:05:35 +03:00 |
|
jpekkila
|
b8c4d07de2
|
Removed unnecessary comments
|
2019-08-12 13:31:24 +03:00 |
|
jpekkila
|
e027f7e548
|
Removed grid_n in astaroth.cu and replaced it with the new acNodeQueryDeviceConfiguration call
|
2019-08-12 13:25:47 +03:00 |
|
jpekkila
|
bba9ec7c3b
|
Implemented acNodeQueryDeviceConfiguration
|
2019-08-12 11:40:38 +03:00 |
|
jpekkila
|
b5daf22c26
|
Added interface function acSynchronizeMesh
|
2019-08-12 10:25:05 +03:00 |
|
jpekkila
|
8bbb2cd5df
|
Now prints device info before trying to run the dummy kernel
|
2019-08-12 09:46:37 +03:00 |
|
jpekkila
|
b53cabbc44
|
Made the DSL syntax less confusing: Input and output arrays are now ScalarField and VectorFields instead of scalars and vectors. C++ initializers are now also possible, removing the need to declare Fields as int or int3 which was very confusing, like "what, you assing an int value to a real, what the &^%@?"
|
2019-08-08 21:07:36 +03:00 |
|
jpekkila
|
5397495496
|
Added acLoadWithOffset
|
2019-08-08 20:43:01 +03:00 |
|
jpekkila
|
e79e1207f2
|
Added a function for checking whether CUDA-capable devices are available
|
2019-08-08 20:35:02 +03:00 |
|
jpekkila
|
8a9099d75e
|
Added missing functions to fix backwards compatibility with the version interfaced with Pencil Code
|
2019-08-08 19:49:57 +03:00 |
|
jpekkila
|
322cdce52c
|
Added some new comments + some helpful old comments from a time before the interface revision
|
2019-08-07 20:05:54 +03:00 |
|
jpekkila
|
1525e0603f
|
Added some preliminary pragma omps and verified that acIntegrate works as it should.
|
2019-08-07 19:08:52 +03:00 |
|
jpekkila
|
c2bd5ae3e6
|
Simplified the optimized multi-GPU integration function
|
2019-08-07 18:17:03 +03:00 |
|
jpekkila
|
a930864f42
|
Merge branch 'master' into node_device_interface_revision_07-23
|
2019-08-07 07:43:28 +03:00 |
|
jpekkila
|
cf6b75f82a
|
Merged in cmakelist_rewrite_and_C_API_conformity_07-26 (pull request #1)
|
2019-08-07 06:53:17 +03:00 |
|
jpekkila
|
6b53eb31ef
|
Errors with forcing now down from 3 to 1 after switching from fast & inaccurate trig functions to more accurate ones
|
2019-08-06 19:29:40 +03:00 |
|
jpekkila
|
daee456660
|
Merge branch 'cmakelist_rewrite_and_C_API_conformity_07-26' into node_device_interface_revision_07-23
|
2019-08-06 17:57:30 +03:00 |
|
jpekkila
|
abf4815174
|
Merge branch 'master' into cmakelist_rewrite_and_C_API_conformity_07-26
|
2019-08-06 17:53:53 +03:00 |
|
jpekkila
|
5870081645
|
Split kernels.cuh into bounconds.cuh, integration.cuh and reductions.cuh
|
2019-08-06 17:50:41 +03:00 |
|
jpekkila
|
405fa4d6d6
|
Moved old kernels to kernels/deprecated
|
2019-08-06 17:46:52 +03:00 |
|
jpekkila
|
3726847683
|
Made globalGridN and d_multigpu_offsets built-in parameters. Note the renaming from globalGrid.n to globalGridN.
|
2019-08-06 16:39:15 +03:00 |
|
jpekkila
|
1dd9975528
|
Formatting
|
2019-08-06 15:44:51 +03:00 |
|
jpekkila
|
b2632c87b4
|
Merge branch 'cmakelist_rewrite_and_C_API_conformity_07-26' into node_device_interface_revision_07-23
|
2019-08-06 15:18:33 +03:00 |
|
jpekkila
|
280804a438
|
Merge branch 'master' into cmakelist_rewrite_and_C_API_conformity_07-26
|
2019-08-06 15:14:33 +03:00 |
|
jpekkila
|
5f4246fb42
|
Standalone now uses O2 optimization level instead of O3. Also removed -march=native since this causes issues if the program is compiled on a different architecture than it is run on. Since we do not do heavy arithmetic on the host side and the host code is not performance-critical part of the code, -march-native is not very useful anyways
|
2019-08-06 14:46:13 +03:00 |
|
jpekkila
|
b73c2675e8
|
Added the optimized implementation of acNodeIntegrate where boundconds are done before integration instead of after
|
2019-08-05 20:10:13 +03:00 |
|
jpekkila
|
8df49370c8
|
Cleanup
|
2019-08-05 19:08:05 +03:00 |
|
jpekkila
|
fa6e1116cb
|
The interface revision now actually works. The issue was incorrect order of src and dst indices when storing the mesh.
|
2019-08-05 17:26:05 +03:00 |
|
jpekkila
|
5232d987c1
|
Added acStoreWithOffset to the revised interface
|
2019-08-05 16:18:22 +03:00 |
|
jpekkila
|
f3de2fa03c
|
Made globalVertexIdx available during preprocessing. NOTE: potentially dangerous. globalVertexIdx should never be used for reading data from the vertex buffers.
|
2019-08-05 15:03:02 +03:00 |
|
jpekkila
|
6dfd03664d
|
Still does not work. I'm starting to think that instead of this one huge revision, we should modify the existing interface step-by-step.
|
2019-08-02 15:31:24 +03:00 |
|
jpekkila
|
5f2378e91b
|
Now compiles (does not work though)
|
2019-08-02 15:15:18 +03:00 |
|