astaroth

Author	SHA1	Message	Date
jpekkila	316d44b843	Fixed an out-of-bounds error with auto-optimization (introduced in the last few commits)	2019-12-03 16:04:44 +02:00
jpekkila	7e4212ddd9	Enabled the generation of API hooks for calling DSL functions (was messing up with compilation earlier)	2019-12-03 15:17:27 +02:00
jpekkila	5a6a3110df	Reformatted	2019-12-03 15:14:26 +02:00
jpekkila	f14e35620c	Now nvcc is used to compile kernels only. All host code, incl. device.cc, MPI communication and others are now compiled with the host C++ compiler. This should work around an nvcc/MPI bug on Puhti.	2019-12-03 15:12:17 +02:00
jpekkila	8bffb2a1d0	Fixed ambiguous logic in acNodeStoreVertexBufferWithOffset, now halos of arbitrary GPUs do not overwrite valid data from the computational domain of a neighboring GPU. Also disabled p2p transfers temporarily until I figure out a clean way to avoid cudaErrorPeerAccessAlreadyEnabled errors	2019-12-02 12:58:09 +02:00
jpekkila	0178d4788c	The core library now links to the CXX MPI library instead of the C one	2019-11-27 14:51:49 +02:00
jpekkila	ab539a98d6	Replaced old deprecated instances of DCONST_INT with DCONST	2019-11-27 13:48:42 +02:00
jpekkila	1270332f48	Fixed a small mistake in the last merge	2019-11-27 11:58:14 +02:00
Johannes Pekkila	3eabf94f92	Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth	2019-11-27 08:55:23 +01:00
jpekkila	5e3caf086e	Device id is now properly set when using MPI and there are multiple visible GPUs per node	2019-11-26 16:54:56 +02:00
jpekkila	0b0ccd697a	Added some explicit casts in get_neighbor (MPI) to fix warnings raised when compiling with older gcc	2019-11-20 10:18:10 +02:00
Johannes Pekkila	981331e7d7	Benchmark results now written out to a file	2019-10-24 15:53:08 +02:00
Johannes Pekkila	4ffde83215	Set default values for benchmarking	2019-10-24 15:22:47 +02:00
Johannes Pekkila	8894b7c7d6	Added a function for getting pid of a neighboring process when decomposing in 3D	2019-10-23 19:26:35 +02:00
Johannes Pekkila	474bdf185d	Cleaned up the MPI solution for 3D decomp test	2019-10-23 12:33:46 +02:00
Johannes Pekkila	1d81333ff7	More concurrent kernels and MPI comm	2019-10-23 12:07:23 +02:00
Johannes Pekkila	04867334e7	Full integration step with MPI comms	2019-10-22 19:59:15 +02:00
Johannes Pekkila	870cd91b5f	Added the final MPI solution for the benchmark tests: RDMA is now used and I don't think we can go much faster with the current decomposition scheme. To get better scaling, we probably would have to change 3D decomposition instead of using the current simple 1D decomp	2019-10-22 19:28:35 +02:00
jpekkila	3d7ad7c8f2	Code cleanup	2019-10-22 15:38:34 +03:00
jpekkila	64221c218d	Made some warnings go away	2019-10-22 15:03:55 +03:00
Johannes Pekkila	e4a7cdcf1d	Added functions for packing and unpacking data on the device	2019-10-22 13:48:47 +02:00
Johannes Pekkila	915e1c7c14	Trying to overlap MPI communication with computation of boundary conditions. However, NVIDIA seemed to forget one important detail in the documentation for CUDA-aware MPI: it looks like CUDA streams are not supported with CUDA-aware MPI communication. So in the end the fastest solution might be to use old-school gpu->cpu->cpu->gpu MPI communication after all	2019-10-21 15:50:53 +02:00
jpekkila	f120343110	Bugfix: peer access was not disabled when Node was destroyed, leading to cudaErrorPeerAccessAlreadyEnabled error when creating new Nodes	2019-10-21 16:23:24 +03:00
Johannes Pekkila	7b475b6dee	Better MPI synchronization	2019-10-18 11:50:22 +02:00
Johannes Pekkila	155d369888	MPI communication now 10x faster	2019-10-17 22:39:57 +02:00
jpekkila	26bbfa089d	Better multi-node communication: fire and forget.	2019-10-17 18:17:37 +03:00
jpekkila	3d852e5082	Added timing to the MPI benchmark	2019-10-17 17:43:54 +03:00
jpekkila	588a94c772	Added more MPI stuff. Now multi-node GPU-GPU communication with GPUDirect RDMA should work. Also device memory is now allocated in unified memory by default as this makes MPI communication simpler if RDMA is not supported. This does not affect Astaroth any other way since different devices use different portions of the memory space and we continue managing memory transfers manually.	2019-10-17 16:09:05 +03:00
jpekkila	0e88d6c339	Marked some internal functions static	2019-10-17 14:41:44 +03:00
jpekkila	f1e988ba6a	Added stuff for the device layer for testing GPU-GPU MPI. This is a quick and dirty solution which is primarily meant for benchmarking/verification. Figuring out what the MPI interface should look like is more challenging and is not the priority right now	2019-10-17 14:40:53 +03:00
jpekkila	65a2d47ef7	Made grid.cu (multi-node) to compile without errors. Not used though.	2019-10-17 13:03:42 +03:00
jpekkila	0865f0499b	Various improvements to the MPI-GPU implementation, but linking MPI libraries with both the host C-project and the core library seems to be a major pain. Currently the communication is done via gpu->cpu->cpu->gpu.	2019-10-15 19:32:16 +03:00
jpekkila	113be456d6	Undeprecated the wrong function in commit `b693c8a`	2019-10-15 18:11:07 +03:00
jpekkila	1ca089c163	New cmake option: MPI_ENABLED. Enables MPI functions on the device layer	2019-10-15 17:57:53 +03:00
jpekkila	b693c8adb4	Undeprecated acDeviceLoadMesh and acDeviceStoreMesh, these are actually very nice to have	2019-10-15 16:12:31 +03:00
jpekkila	08188f3f5b	is_valid is now consistently overloaded (parameter passed as a reference). Older CUDA compilers complained about this.	2019-10-14 21:18:21 +03:00
jpekkila	08f155cbec	Finetuning some error checks	2019-10-07 20:40:32 +03:00
jpekkila	5d4f47c3d2	Added overloads for vector in-place addition and subtraction	2019-10-07 19:40:54 +03:00
jpekkila	ba49e7e400	Replaced deprecated DCONST_INT calls with overloaded DCONST()	2019-10-07 19:40:27 +03:00
jpekkila	66cfcefb34	More error checks	2019-10-07 17:00:23 +03:00
jpekkila	0e1d1b9fb4	Some optimizations for DSL compilation. Also a new feature: Inplace addition and subtraction += and -= are now allowed	2019-10-07 16:33:24 +03:00
jpekkila	f7c079be2a	Removed everything unnecessary from integration.cuh. Now all derivatives etc are available in a standard library header (acc/stdlib/stdderiv.h)	2019-10-07 15:47:33 +03:00
jpekkila	9a16c79ce6	Renamed all references to uniforms to f.ex. loadScalarConstant -> loadScalarUniform (for consistency with the DSL)	2019-10-01 17:12:20 +03:00
jpekkila	2c8c49ee24	Removed or updated some old .gitignore files	2019-09-24 17:50:41 +03:00
jpekkila	e4eea7db83	Added support for Volta GPUs	2019-09-24 17:19:45 +03:00
jpekkila	3bb6ca1712	The Astaroth Code Compiler (acc) is now built with cmake. Additionally, make is now used to generate the CUDA headers from DSL sources. The headers are also properly regenerated whenever a DSL file has been changed. With this commit, the DSL is now seamlessly integrated to the library and we no longer need complicated scripts to figure out the correct files. The current workflow for using custom DSL sources is to pass the DSL module directory to cmake, f.ex. cmake -DDSL_MODULE_DIR=/acc/mhd_solver. Note that the path must be absolute or then given relative to the CMakeLists.txt directory. f.ex cd build && cmake -DDSL_MODULE_DIR=../acc/mhd_solver does not work. CMake then takes all DSL files in that directory and handles the rest.	2019-09-18 17:28:29 +03:00
jpekkila	bce3e4de03	Made warnings about unused device functions go away	2019-09-18 16:58:04 +03:00
jpekkila	021e5f3774	Renamed NUM_STREAM_TYPES -> NUM_STREAMS	2019-09-12 15:48:38 +03:00
jpekkila	53230c9b61	Added errorchecking and more flexibility the the new acDeviceLoadScalarArray function	2019-09-05 19:56:04 +03:00
jpekkila	263a1d23a3	Added a function for loading ScalarArrays to the GPU	2019-09-05 16:35:08 +03:00

1 2 3 4

198 Commits