jpekkila
|
8a9099d75e
|
Added missing functions to fix backwards compatibility with the version interfaced with Pencil Code
|
2019-08-08 19:49:57 +03:00 |
|
jpekkila
|
322cdce52c
|
Added some new comments + some helpful old comments from a time before the interface revision
|
2019-08-07 20:05:54 +03:00 |
|
jpekkila
|
1525e0603f
|
Added some preliminary pragma omps and verified that acIntegrate works as it should.
|
2019-08-07 19:08:52 +03:00 |
|
jpekkila
|
c2bd5ae3e6
|
Simplified the optimized multi-GPU integration function
|
2019-08-07 18:17:03 +03:00 |
|
jpekkila
|
a930864f42
|
Merge branch 'master' into node_device_interface_revision_07-23
|
2019-08-07 07:43:28 +03:00 |
|
jpekkila
|
cf6b75f82a
|
Merged in cmakelist_rewrite_and_C_API_conformity_07-26 (pull request #1)
|
2019-08-07 06:53:17 +03:00 |
|
jpekkila
|
6b53eb31ef
|
Errors with forcing now down from 3 to 1 after switching from fast & inaccurate trig functions to more accurate ones
|
2019-08-06 19:29:40 +03:00 |
|
jpekkila
|
daee456660
|
Merge branch 'cmakelist_rewrite_and_C_API_conformity_07-26' into node_device_interface_revision_07-23
|
2019-08-06 17:57:30 +03:00 |
|
jpekkila
|
abf4815174
|
Merge branch 'master' into cmakelist_rewrite_and_C_API_conformity_07-26
|
2019-08-06 17:53:53 +03:00 |
|
jpekkila
|
5870081645
|
Split kernels.cuh into bounconds.cuh, integration.cuh and reductions.cuh
|
2019-08-06 17:50:41 +03:00 |
|
jpekkila
|
405fa4d6d6
|
Moved old kernels to kernels/deprecated
|
2019-08-06 17:46:52 +03:00 |
|
jpekkila
|
3726847683
|
Made globalGridN and d_multigpu_offsets built-in parameters. Note the renaming from globalGrid.n to globalGridN.
|
2019-08-06 16:39:15 +03:00 |
|
jpekkila
|
1dd9975528
|
Formatting
|
2019-08-06 15:44:51 +03:00 |
|
jpekkila
|
b2632c87b4
|
Merge branch 'cmakelist_rewrite_and_C_API_conformity_07-26' into node_device_interface_revision_07-23
|
2019-08-06 15:18:33 +03:00 |
|
jpekkila
|
280804a438
|
Merge branch 'master' into cmakelist_rewrite_and_C_API_conformity_07-26
|
2019-08-06 15:14:33 +03:00 |
|
jpekkila
|
5f4246fb42
|
Standalone now uses O2 optimization level instead of O3. Also removed -march=native since this causes issues if the program is compiled on a different architecture than it is run on. Since we do not do heavy arithmetic on the host side and the host code is not performance-critical part of the code, -march-native is not very useful anyways
|
2019-08-06 14:46:13 +03:00 |
|
jpekkila
|
b73c2675e8
|
Added the optimized implementation of acNodeIntegrate where boundconds are done before integration instead of after
|
2019-08-05 20:10:13 +03:00 |
|
jpekkila
|
8df49370c8
|
Cleanup
|
2019-08-05 19:08:05 +03:00 |
|
jpekkila
|
fa6e1116cb
|
The interface revision now actually works. The issue was incorrect order of src and dst indices when storing the mesh.
|
2019-08-05 17:26:05 +03:00 |
|
jpekkila
|
5232d987c1
|
Added acStoreWithOffset to the revised interface
|
2019-08-05 16:18:22 +03:00 |
|
jpekkila
|
f3de2fa03c
|
Made globalVertexIdx available during preprocessing. NOTE: potentially dangerous. globalVertexIdx should never be used for reading data from the vertex buffers.
|
2019-08-05 15:03:02 +03:00 |
|
jpekkila
|
6dfd03664d
|
Still does not work. I'm starting to think that instead of this one huge revision, we should modify the existing interface step-by-step.
|
2019-08-02 15:31:24 +03:00 |
|
jpekkila
|
5f2378e91b
|
Now compiles (does not work though)
|
2019-08-02 15:15:18 +03:00 |
|
jpekkila
|
567ad61465
|
Multinode MPI implementation should be done later in its own branch. The focus of this branch is to revise the node and device layers. Commented out references to the Grid layer.
|
2019-08-02 13:54:54 +03:00 |
|
jpekkila
|
2b6bf10ae6
|
Dummy implementation of the Grid interface
|
2019-08-01 18:37:36 +03:00 |
|
jpekkila
|
328b809efe
|
Added the revised node interface
|
2019-08-01 14:04:11 +03:00 |
|
jpekkila
|
92376588ba
|
Merge branch 'master' into cmakelist_rewrite_and_C_API_conformity_07-26
|
2019-07-31 20:12:22 +03:00 |
|
jpekkila
|
fb0610c1ba
|
Intermediate changes to the revised node interface
|
2019-07-31 20:04:39 +03:00 |
|
jpekkila
|
0a5d025172
|
Formatting
|
2019-07-31 19:08:16 +03:00 |
|
jpekkila
|
9b7f4277fc
|
Fixed errors in device.cu
|
2019-07-31 19:07:26 +03:00 |
|
jpekkila
|
49026bd26b
|
Revised device interface done
|
2019-07-31 18:46:41 +03:00 |
|
jpekkila
|
5be775dbff
|
Various intermediate changes
|
2019-07-31 17:48:48 +03:00 |
|
jpekkila
|
efd9d54fef
|
Stashing WIP changes (interface revision) s.t. I can continue work on a different machine
|
2019-07-30 14:34:44 +03:00 |
|
jpekkila
|
1ceb6739ae
|
Merge branch 'master' into node_device_interface_revision_07-23
|
2019-07-30 14:31:33 +03:00 |
|
jpekkila
|
62100b1140
|
Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth
|
2019-07-30 14:28:25 +03:00 |
|
jpekkila
|
69deef66fe
|
Added sum reduction. NOTE: Scalar sum does not pass the automated test but vector sum does. I couldn't see anything wrong with the code itself and I strongly suspect that the failures are caused by loss of precision due to summing a huge amount of numbers of different magnitudes. However I'm not yet completely sure. Something like the Kahan summation algorithm might be useful if the errors are really caused by fp arithmetic.
|
2019-07-30 14:28:18 +03:00 |
|
jpekkila
|
fdc1e7333c
|
Added macros for getting int3 and AcReal3 device constants from within kernels (and DSL).
|
2019-07-30 09:10:06 +00:00 |
|
jpekkila
|
c9fafe41e5
|
Tidied the CMakeLists, moved stuff to more logical places and added comments. Also tested that ALTER_CONF=ON still works
|
2019-07-26 15:12:55 +03:00 |
|
jpekkila
|
818893a0ea
|
Fixed stray comma in CUDA_ARCH_FLAGS
|
2019-07-26 14:10:17 +03:00 |
|
jpekkila
|
f322bc8b37
|
Rewrote all CMakeLists. Now much cleaner and there's a clear separation during compilation between the core and standalone modules.
|
2019-07-23 20:50:37 +03:00 |
|
jpekkila
|
b65454d523
|
Stashed some testing files used to make sure that the library can also be used from pure C projects (better compatibility). These changes will never go to master as-is.
|
2019-07-23 18:24:47 +03:00 |
|
jpekkila
|
323d4e3b31
|
Replaced all calls to AC_VTXBUF_IDX to acVertexBufferIdx etc in all files
|
2019-07-23 14:37:28 +03:00 |
|
jpekkila
|
fee03b7149
|
Moved some device limits used only during auto-optimization from astaroth.h to device.cu
|
2019-07-22 19:54:46 +03:00 |
|
jpekkila
|
f74df5339f
|
Cleaned up the include directory: removed all unnecessary stuff and moved common definitions to a separate file
|
2019-07-22 19:46:45 +03:00 |
|
jpekkila
|
01a013f3bc
|
Added WARNCHK_CUDA_ALWAYS to errchk.h
|
2019-07-22 13:05:08 +03:00 |
|
jpekkila
|
a950be99f2
|
Streams now created with priority (all streams have the same priority by default)
|
2019-07-22 13:04:04 +03:00 |
|
jpekkila
|
168b3c4d8b
|
Peer access to neighboring GPUs is now enabled during initialization
|
2019-07-22 13:02:19 +03:00 |
|
jpekkila
|
0db61dd411
|
Disabled the project-wide maxrregcount flag by default since it is only beneficial for resource-heavy kernels. The maximum register count should be defined per kernel instead if needed.
|
2019-07-22 12:58:28 +03:00 |
|
jpekkila
|
78aba6428e
|
Updated the copyright years throughout the project
|
2019-07-16 14:28:32 +03:00 |
|
jpekkila
|
93fc121f5c
|
Introduced versions of the asynchronous functions which take a stream as a parameter
|
2019-07-10 15:49:21 +03:00 |
|