astaroth

Author	SHA1	Message	Date
jpekkila	b2632c87b4	Merge branch 'cmakelist_rewrite_and_C_API_conformity_07-26' into node_device_interface_revision_07-23	2019-08-06 15:18:33 +03:00
jpekkila	280804a438	Merge branch 'master' into cmakelist_rewrite_and_C_API_conformity_07-26	2019-08-06 15:14:33 +03:00
jpekkila	5f4246fb42	Standalone now uses O2 optimization level instead of O3. Also removed -march=native since this causes issues if the program is compiled on a different architecture than it is run on. Since we do not do heavy arithmetic on the host side and the host code is not performance-critical part of the code, -march-native is not very useful anyways	2019-08-06 14:46:13 +03:00
jpekkila	b73c2675e8	Added the optimized implementation of acNodeIntegrate where boundconds are done before integration instead of after	2019-08-05 20:10:13 +03:00
jpekkila	8df49370c8	Cleanup	2019-08-05 19:08:05 +03:00
jpekkila	fa6e1116cb	The interface revision now actually works. The issue was incorrect order of src and dst indices when storing the mesh.	2019-08-05 17:26:05 +03:00
jpekkila	5232d987c1	Added acStoreWithOffset to the revised interface	2019-08-05 16:18:22 +03:00
jpekkila	f3de2fa03c	Made globalVertexIdx available during preprocessing. NOTE: potentially dangerous. globalVertexIdx should never be used for reading data from the vertex buffers.	2019-08-05 15:03:02 +03:00
jpekkila	6dfd03664d	Still does not work. I'm starting to think that instead of this one huge revision, we should modify the existing interface step-by-step.	2019-08-02 15:31:24 +03:00
jpekkila	5f2378e91b	Now compiles (does not work though)	2019-08-02 15:15:18 +03:00
jpekkila	567ad61465	Multinode MPI implementation should be done later in its own branch. The focus of this branch is to revise the node and device layers. Commented out references to the Grid layer.	2019-08-02 13:54:54 +03:00
jpekkila	2b6bf10ae6	Dummy implementation of the Grid interface	2019-08-01 18:37:36 +03:00
jpekkila	328b809efe	Added the revised node interface	2019-08-01 14:04:11 +03:00
jpekkila	92376588ba	Merge branch 'master' into cmakelist_rewrite_and_C_API_conformity_07-26	2019-07-31 20:12:22 +03:00
jpekkila	fb0610c1ba	Intermediate changes to the revised node interface	2019-07-31 20:04:39 +03:00
jpekkila	0a5d025172	Formatting	2019-07-31 19:08:16 +03:00
jpekkila	9b7f4277fc	Fixed errors in device.cu	2019-07-31 19:07:26 +03:00
jpekkila	49026bd26b	Revised device interface done	2019-07-31 18:46:41 +03:00
jpekkila	5be775dbff	Various intermediate changes	2019-07-31 17:48:48 +03:00
jpekkila	efd9d54fef	Stashing WIP changes (interface revision) s.t. I can continue work on a different machine	2019-07-30 14:34:44 +03:00
jpekkila	1ceb6739ae	Merge branch 'master' into node_device_interface_revision_07-23	2019-07-30 14:31:33 +03:00
jpekkila	62100b1140	Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth	2019-07-30 14:28:25 +03:00
jpekkila	69deef66fe	Added sum reduction. NOTE: Scalar sum does not pass the automated test but vector sum does. I couldn't see anything wrong with the code itself and I strongly suspect that the failures are caused by loss of precision due to summing a huge amount of numbers of different magnitudes. However I'm not yet completely sure. Something like the Kahan summation algorithm might be useful if the errors are really caused by fp arithmetic.	2019-07-30 14:28:18 +03:00
jpekkila	fdc1e7333c	Added macros for getting int3 and AcReal3 device constants from within kernels (and DSL).	2019-07-30 09:10:06 +00:00
jpekkila	c9fafe41e5	Tidied the CMakeLists, moved stuff to more logical places and added comments. Also tested that ALTER_CONF=ON still works	2019-07-26 15:12:55 +03:00
jpekkila	818893a0ea	Fixed stray comma in CUDA_ARCH_FLAGS	2019-07-26 14:10:17 +03:00
jpekkila	f322bc8b37	Rewrote all CMakeLists. Now much cleaner and there's a clear separation during compilation between the core and standalone modules.	2019-07-23 20:50:37 +03:00
jpekkila	b65454d523	Stashed some testing files used to make sure that the library can also be used from pure C projects (better compatibility). These changes will never go to master as-is.	2019-07-23 18:24:47 +03:00
jpekkila	323d4e3b31	Replaced all calls to AC_VTXBUF_IDX to acVertexBufferIdx etc in all files	2019-07-23 14:37:28 +03:00
jpekkila	fee03b7149	Moved some device limits used only during auto-optimization from astaroth.h to device.cu	2019-07-22 19:54:46 +03:00
jpekkila	f74df5339f	Cleaned up the include directory: removed all unnecessary stuff and moved common definitions to a separate file	2019-07-22 19:46:45 +03:00
jpekkila	01a013f3bc	Added WARNCHK_CUDA_ALWAYS to errchk.h	2019-07-22 13:05:08 +03:00
jpekkila	a950be99f2	Streams now created with priority (all streams have the same priority by default)	2019-07-22 13:04:04 +03:00
jpekkila	168b3c4d8b	Peer access to neighboring GPUs is now enabled during initialization	2019-07-22 13:02:19 +03:00
jpekkila	0db61dd411	Disabled the project-wide maxrregcount flag by default since it is only beneficial for resource-heavy kernels. The maximum register count should be defined per kernel instead if needed.	2019-07-22 12:58:28 +03:00
jpekkila	78aba6428e	Updated the copyright years throughout the project	2019-07-16 14:28:32 +03:00
jpekkila	93fc121f5c	Introduced versions of the asynchronous functions which take a stream as a parameter	2019-07-10 15:49:21 +03:00
jpekkila	bd98eaf9f7	Added a stream to loadDeviceConstant call.	2019-07-10 15:29:54 +03:00
jpekkila	b08d5b26f5	cudaMemcpyToSymbol -> cudaMemcpyToSymbolAsync	2019-07-10 15:05:57 +03:00
jpekkila	976bf05c8d	Wrong scope for num_iterations in the last commit, fixed	2019-07-10 14:37:32 +03:00
jpekkila	866ec8a192	Removed some old hack I used for benchmarking a while back	2019-07-10 14:34:05 +03:00
jpekkila	d0b95c39b6	Disabled writing out unnecessary files when auto-optimizing the code	2019-07-09 18:51:04 +03:00
jpekkila	0bda016e17	Reviewed the Astaroth interface. Now there's a clear distinction between synchronous and asynchronous functions. For basic usage, we provide a set of functions that are always safe to call (acIntegrate, acLoad, etc), but because of this, must be quite restricted in the sense that f.ex. the whole mesh must be loaded at once and computations cannot be executed concurrently on multiple GPUs. For more advanced users we provide asynchronous functions (such as acLoadWithOffset). Since we cannot know how the asynchronous functions are called (for example, when the integration step has been fully completed and the halos of neighboring subgrids can be safely communicated between GPUs), the responsibility of synchronization must be left to the user. In the existing implementations we currently use only the basic "safe" set of functions (except in renderer.cc), so the existing functionality has not been changed with these latests commits. Autotests also pass.	2019-07-09 18:42:00 +03:00
jpekkila	1251f61570	Removed a stray acBoundcondStep() in acStore where it definitely shouln't be. Removed code duplication: acBoundcondStep now uses the new acLocalBoundcondStep and acGlobalBoundcondStep functions.	2019-07-09 17:08:18 +03:00
jpekkila	10a98b01a9	Experimental change: now the integration function is automatically optimized during acInit	2019-07-09 14:46:24 +03:00
jpekkila	a086821e7c	Added a function acAutoOptimize to the interface and removed rk3_step_async in kernels.cuh (moved into rkStep)	2019-07-09 14:21:22 +03:00
jpekkila	84d96de42b	Merge branch 'master' into multigpu_optimization_2019-07-05	2019-07-09 13:40:33 +03:00
jpekkila	508d15b578	Switched from math.h to cmath in math_utils.h. The old-school C math functions are bugged/not overloaded properly in GCC < 6.0 when compiling C++.	2019-07-09 13:37:08 +03:00
jpekkila	deebe570da	Merge branch 'master' into multigpu_optimization_2019-07-05	2019-07-08 16:11:24 +03:00
Miikka Vaisala	6ba15c3a7c	props.totalConstMem and props.sharedMemPerBlock cause assembler error while compiling on TIARA gp cluster. Therefore commeted out.	2019-07-08 11:00:12 +08:00

1 2 3

110 Commits