Commit Graph

173 Commits

Author SHA1 Message Date
jpekkila
8a9099d75e Added missing functions to fix backwards compatibility with the version interfaced with Pencil Code 2019-08-08 19:49:57 +03:00
jpekkila
322cdce52c Added some new comments + some helpful old comments from a time before the interface revision 2019-08-07 20:05:54 +03:00
jpekkila
1525e0603f Added some preliminary pragma omps and verified that acIntegrate works as it should. 2019-08-07 19:08:52 +03:00
jpekkila
c2bd5ae3e6 Simplified the optimized multi-GPU integration function 2019-08-07 18:17:03 +03:00
jpekkila
a930864f42 Merge branch 'master' into node_device_interface_revision_07-23 2019-08-07 07:43:28 +03:00
jpekkila
cf6b75f82a Merged in cmakelist_rewrite_and_C_API_conformity_07-26 (pull request #1) 2019-08-07 06:53:17 +03:00
jpekkila
6b53eb31ef Errors with forcing now down from 3 to 1 after switching from fast & inaccurate trig functions to more accurate ones 2019-08-06 19:29:40 +03:00
jpekkila
daee456660 Merge branch 'cmakelist_rewrite_and_C_API_conformity_07-26' into node_device_interface_revision_07-23 2019-08-06 17:57:30 +03:00
jpekkila
abf4815174 Merge branch 'master' into cmakelist_rewrite_and_C_API_conformity_07-26 2019-08-06 17:53:53 +03:00
jpekkila
5870081645 Split kernels.cuh into bounconds.cuh, integration.cuh and reductions.cuh 2019-08-06 17:50:41 +03:00
jpekkila
405fa4d6d6 Moved old kernels to kernels/deprecated 2019-08-06 17:46:52 +03:00
jpekkila
3726847683 Made globalGridN and d_multigpu_offsets built-in parameters. Note the renaming from globalGrid.n to globalGridN. 2019-08-06 16:39:15 +03:00
jpekkila
1dd9975528 Formatting 2019-08-06 15:44:51 +03:00
jpekkila
b2632c87b4 Merge branch 'cmakelist_rewrite_and_C_API_conformity_07-26' into node_device_interface_revision_07-23 2019-08-06 15:18:33 +03:00
jpekkila
280804a438 Merge branch 'master' into cmakelist_rewrite_and_C_API_conformity_07-26 2019-08-06 15:14:33 +03:00
jpekkila
5f4246fb42 Standalone now uses O2 optimization level instead of O3. Also removed -march=native since this causes issues if the program is compiled on a different architecture than it is run on. Since we do not do heavy arithmetic on the host side and the host code is not performance-critical part of the code, -march-native is not very useful anyways 2019-08-06 14:46:13 +03:00
jpekkila
b73c2675e8 Added the optimized implementation of acNodeIntegrate where boundconds are done before integration instead of after 2019-08-05 20:10:13 +03:00
jpekkila
8df49370c8 Cleanup 2019-08-05 19:08:05 +03:00
jpekkila
fa6e1116cb The interface revision now actually works. The issue was incorrect order of src and dst indices when storing the mesh. 2019-08-05 17:26:05 +03:00
jpekkila
5232d987c1 Added acStoreWithOffset to the revised interface 2019-08-05 16:18:22 +03:00
jpekkila
f3de2fa03c Made globalVertexIdx available during preprocessing. NOTE: potentially dangerous. globalVertexIdx should never be used for reading data from the vertex buffers. 2019-08-05 15:03:02 +03:00
jpekkila
6dfd03664d Still does not work. I'm starting to think that instead of this one huge revision, we should modify the existing interface step-by-step. 2019-08-02 15:31:24 +03:00
jpekkila
5f2378e91b Now compiles (does not work though) 2019-08-02 15:15:18 +03:00
jpekkila
567ad61465 Multinode MPI implementation should be done later in its own branch. The focus of this branch is to revise the node and device layers. Commented out references to the Grid layer. 2019-08-02 13:54:54 +03:00
jpekkila
2b6bf10ae6 Dummy implementation of the Grid interface 2019-08-01 18:37:36 +03:00
jpekkila
328b809efe Added the revised node interface 2019-08-01 14:04:11 +03:00
jpekkila
92376588ba Merge branch 'master' into cmakelist_rewrite_and_C_API_conformity_07-26 2019-07-31 20:12:22 +03:00
jpekkila
fb0610c1ba Intermediate changes to the revised node interface 2019-07-31 20:04:39 +03:00
jpekkila
0a5d025172 Formatting 2019-07-31 19:08:16 +03:00
jpekkila
9b7f4277fc Fixed errors in device.cu 2019-07-31 19:07:26 +03:00
jpekkila
49026bd26b Revised device interface done 2019-07-31 18:46:41 +03:00
jpekkila
5be775dbff Various intermediate changes 2019-07-31 17:48:48 +03:00
jpekkila
efd9d54fef Stashing WIP changes (interface revision) s.t. I can continue work on a different machine 2019-07-30 14:34:44 +03:00
jpekkila
1ceb6739ae Merge branch 'master' into node_device_interface_revision_07-23 2019-07-30 14:31:33 +03:00
jpekkila
62100b1140 Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth 2019-07-30 14:28:25 +03:00
jpekkila
69deef66fe Added sum reduction. NOTE: Scalar sum does not pass the automated test but vector sum does. I couldn't see anything wrong with the code itself and I strongly suspect that the failures are caused by loss of precision due to summing a huge amount of numbers of different magnitudes. However I'm not yet completely sure. Something like the Kahan summation algorithm might be useful if the errors are really caused by fp arithmetic. 2019-07-30 14:28:18 +03:00
jpekkila
fdc1e7333c Added macros for getting int3 and AcReal3 device constants from within kernels (and DSL). 2019-07-30 09:10:06 +00:00
jpekkila
c9fafe41e5 Tidied the CMakeLists, moved stuff to more logical places and added comments. Also tested that ALTER_CONF=ON still works 2019-07-26 15:12:55 +03:00
jpekkila
818893a0ea Fixed stray comma in CUDA_ARCH_FLAGS 2019-07-26 14:10:17 +03:00
jpekkila
f322bc8b37 Rewrote all CMakeLists. Now much cleaner and there's a clear separation during compilation between the core and standalone modules. 2019-07-23 20:50:37 +03:00
jpekkila
b65454d523 Stashed some testing files used to make sure that the library can also be used from pure C projects (better compatibility). These changes will never go to master as-is. 2019-07-23 18:24:47 +03:00
jpekkila
323d4e3b31 Replaced all calls to AC_VTXBUF_IDX to acVertexBufferIdx etc in all files 2019-07-23 14:37:28 +03:00
jpekkila
fee03b7149 Moved some device limits used only during auto-optimization from astaroth.h to device.cu 2019-07-22 19:54:46 +03:00
jpekkila
f74df5339f Cleaned up the include directory: removed all unnecessary stuff and moved common definitions to a separate file 2019-07-22 19:46:45 +03:00
jpekkila
01a013f3bc Added WARNCHK_CUDA_ALWAYS to errchk.h 2019-07-22 13:05:08 +03:00
jpekkila
a950be99f2 Streams now created with priority (all streams have the same priority by default) 2019-07-22 13:04:04 +03:00
jpekkila
168b3c4d8b Peer access to neighboring GPUs is now enabled during initialization 2019-07-22 13:02:19 +03:00
jpekkila
0db61dd411 Disabled the project-wide maxrregcount flag by default since it is only beneficial for resource-heavy kernels. The maximum register count should be defined per kernel instead if needed. 2019-07-22 12:58:28 +03:00
jpekkila
78aba6428e Updated the copyright years throughout the project 2019-07-16 14:28:32 +03:00
jpekkila
93fc121f5c Introduced versions of the asynchronous functions which take a stream as a parameter 2019-07-10 15:49:21 +03:00