astaroth

Author	SHA1	Message	Date
jpekkila	78fbcc090d	Reordered src/core to have better division to host and device code (this is more likely to work when compiling with mpicxx). Disabled separate compilation of CUDA kernels as this complicates compilation and is a source of many cmake/cuda bugs. As a downside, GPU code takes longer to compile.	2020-01-23 20:06:20 +02:00
jpekkila	96389e9da6	Modified standalone includes to function with new astaroth headers	2020-01-23 20:03:25 +02:00
jpekkila	3adb0242a4	src/utils is now a real library. Includable with the astaroth_utils.h header and linkable with libastaroth_utils.a. The purpose of Astaroth Utils is to function as a generic utility library in contrast to Astaroth Standalone which is essentially hardcoded only for MHD.	2020-01-23 20:02:38 +02:00
jpekkila	5de163e8d1	Added commented out pragma unrolls to remind how packing could be improved. Though at the moment unrolls actually make the performance much worse, reasons unknown.	2020-01-22 19:27:45 +02:00
jpekkila	41f8e9aebb	Removed an old inefficient function for MPI comm	2020-01-22 19:26:33 +02:00
jpekkila	caacf2b33c	Removed --restrict flag from CUDA compilation for safety	2020-01-22 19:25:26 +02:00
jpekkila	354cf81777	MPI_Request was saved to address pointing to local memory, fixed	2020-01-20 19:15:20 +02:00
jpekkila	54d91e7eeb	Removed debug synchronization from packing.cu	2020-01-20 18:58:06 +02:00
jpekkila	993bfc4533	Better concurrency and some simplifications (MPI).	2020-01-20 18:45:24 +02:00
jpekkila	765ce9a573	Some concurrency optimizations for 3D blocking	2020-01-20 17:08:23 +02:00
jpekkila	6d4f696e60	Initial implementation for parallel compute + communication	2020-01-20 16:21:19 +02:00
jpekkila	3625e9db5f	Added timing to acDeviceRunMPITest()	2020-01-20 14:54:26 +02:00
jpekkila	d034cadfac	Updated copyright years	2020-01-17 15:34:10 +02:00
jpekkila	88a4a1718d	More cleanup	2020-01-17 15:27:02 +02:00
jpekkila	ff6a7155e5	Added a simplified and cleaned up 3D decomp MPI implementation. Tested to work at least up to 2x2x2 nodes.	2020-01-17 15:22:23 +02:00
jpekkila	975a15f7f4	Removed all MPI-related code in preparation of a rewrite of the MPI stuff	2020-01-17 14:22:11 +02:00
jpekkila	9264b7515a	Working 3D decomp, unoptimized	2020-01-16 21:47:05 +02:00
jpekkila	29b38d3b89	MPI distribute and gather were incorrect, fixed. Now tested to work with 1,2, and 4 GPUs.	2020-01-16 19:12:32 +02:00
jpekkila	d7f56eeb67	Boundary conditions for 3D decomposition with MPI now working on a single node.	2020-01-16 16:34:33 +02:00
jpekkila	50bf8b7148	MPI communication of corners via CPU OK	2020-01-16 15:17:57 +02:00
jpekkila	c76c2afd5e	Merge branch 'master' into 3d-decomposition-2020-01	2020-01-16 13:21:59 +02:00
jpekkila	23efcb413f	Introduced a sample directory and moved all non-library-components from src to there	2020-01-15 16:24:38 +02:00
Miikka Vaisala	185b33980f	Forcing function bug correction.	2020-01-14 13:58:11 +08:00
jpekkila	5e1500fe97	Happy new year! :)	2020-01-13 21:38:07 +02:00
jpekkila	92a6a1bdec	Added more professional run flags to ./ac_run	2020-01-13 15:35:01 +02:00
jpekkila	794e4393c3	Added a new function for the legacy Astaroth layer: acGetNode(). This functions returns a Node, which can be used to access acNode layer functions	2020-01-13 11:33:15 +02:00
jpekkila	1d315732e0	Giving up on 3D decomposition with CUDA-aware MPI. The MPI implementation on Puhti seems to be painfully bugged, the device pointers are not tracked properly in some cases (f.ex. if there's an array of structures which contain CUDA pointers). Going to implement 3D decomp the traditional way for now (communicating via the CPU). It's easy to switch to CUDA-aware MPI once Mellanox/NVIDIA/CSC have fixed their software.	2020-01-07 21:06:27 +02:00
jpekkila	299ff5cb67	All fields are now packed to simplify communication	2020-01-07 21:01:22 +02:00
jpekkila	5d60791f13	Current 3D decomp method still too complicated. Starting again from scratch.	2020-01-07 14:40:51 +02:00
jpekkila	eaee81bf06	Merge branch 'master' into 3d-decomposition-2020-01	2020-01-07 14:25:06 +02:00
jpekkila	f0208c66a6	Now compiles also for P100 by default (was removed accidentally in earlier commits)	2020-01-07 10:29:44 +00:00
jpekkila	1dbcc469fc	Allocations for packed data (MPI)	2020-01-05 18:57:14 +02:00
jpekkila	bee930b151	Merge branch 'master' into 3d-decomposition-2020-01	2020-01-05 16:48:26 +02:00
jpekkila	be7946c2af	Added the multiplication operator for int3 structures	2020-01-05 16:47:28 +02:00
jpekkila	51b48a5a36	Some intermediate MPI changes	2020-01-05 16:46:40 +02:00
jpekkila	d6c81c89fb	This 3D blocking approach is getting too complicated, removed code and trying again	2019-12-28 16:38:10 +02:00
jpekkila	e86b082c98	MPI transfer for the first corner with 3D blocking now complete. Disabled/enabled some error checking for development	2019-12-27 13:43:22 +02:00
jpekkila	bd0cc3ee20	There was some kind of mismatch between CUDA and MPI (UCX) libraries when linking with cudart. Switching to provided by cmake fixed the issue.	2019-12-27 13:41:18 +02:00
jpekkila	6b5910f7df	Added allocations for the packed buffers	2019-12-21 19:00:35 +02:00
jpekkila	57a1f3e30c	Added a generic pack/unpack function	2019-12-21 16:20:40 +02:00
jpekkila	e4f7214b3a	benchmark.cc edited online with Bitbucket	2019-12-21 11:26:54 +00:00
jpekkila	3ecd47fe8b	Merge branch 'master' into 3d-decomposition-2020-01	2019-12-21 13:22:45 +02:00
jpekkila	35b56029cf	Build failed with single-precision, added the correct casts to modelsolver.c	2019-12-21 13:21:56 +02:00
jpekkila	4d873caf38	Changed utils CMakeList.txt to modern cmake style	2019-12-21 13:16:08 +02:00
jpekkila	bad64f5307	Started the 3D decomposition branch. Four tasks: 1) Determine how to distribute the work given n processes 2) Distribute and gather the mesh to/from these processes 3) Create packing/unpacking functions and 4) Transfer packed data blocks between neighbors. Tasks 1 and 2 done with this commit.	2019-12-21 12:37:01 +02:00
jpekkila	ecff5c3041	Added some final changes to benchmarking	2019-12-15 21:47:41 +02:00
jpekkila	8bd81db63c	Added CPU parallelization to make CPU integration and boundconds faster	2019-12-14 15:45:42 +02:00
jpekkila	ff35d78509	Rewrote the MPI benchmark-verification function	2019-12-14 15:26:19 +02:00
jpekkila	f0e77181df	Benchmark finetuning	2019-12-14 14:52:06 +02:00
jpekkila	b8a997b0ab	Added code for doing a proper verification run with MPI. Passes nicely with full MHD + upwinding when using the new utility stuff introduced in the previous commits. Note: forcing is not enabled in the utility library by default.	2019-12-14 07:37:59 +02:00

1 2 3 4 5 ...

523 Commits