astaroth

Author	SHA1	Message	Date
Johannes Pekkila	1d81333ff7	More concurrent kernels and MPI comm	2019-10-23 12:07:23 +02:00
Johannes Pekkila	04867334e7	Full integration step with MPI comms	2019-10-22 19:59:15 +02:00
Johannes Pekkila	870cd91b5f	Added the final MPI solution for the benchmark tests: RDMA is now used and I don't think we can go much faster with the current decomposition scheme. To get better scaling, we probably would have to change 3D decomposition instead of using the current simple 1D decomp	2019-10-22 19:28:35 +02:00
jpekkila	3d7ad7c8f2	Code cleanup	2019-10-22 15:38:34 +03:00
jpekkila	64221c218d	Made some warnings go away	2019-10-22 15:03:55 +03:00
Johannes Pekkila	e4a7cdcf1d	Added functions for packing and unpacking data on the device	2019-10-22 13:48:47 +02:00
Johannes Pekkila	915e1c7c14	Trying to overlap MPI communication with computation of boundary conditions. However, NVIDIA seemed to forget one important detail in the documentation for CUDA-aware MPI: it looks like CUDA streams are not supported with CUDA-aware MPI communication. So in the end the fastest solution might be to use old-school gpu->cpu->cpu->gpu MPI communication after all	2019-10-21 15:50:53 +02:00
jpekkila	f120343110	Bugfix: peer access was not disabled when Node was destroyed, leading to cudaErrorPeerAccessAlreadyEnabled error when creating new Nodes	2019-10-21 16:23:24 +03:00
Johannes Pekkila	7b475b6dee	Better MPI synchronization	2019-10-18 11:50:22 +02:00
jpekkila	f3cb6e7049	Removed old unused tokens from the DSL grammar	2019-10-18 02:14:19 +03:00
jpekkila	0f5acfbb33	<q:::qqq!!!:::q:[2~:wqMer§§gccc:qq[2~: branch 'master' of https://bitbucket.org/jpekkila/astaroth:q Z bin/sh: 1: !:: not .>.Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth	2019-10-18 02:06:15 +03:00
jpekkila	7c79a98cdc	Added support for various binary operations (>=, <=, /= etc). Also bitwise operators \| and & are now allowed	2019-10-18 01:52:14 +03:00
Johannes Pekkila	155d369888	MPI communication now 10x faster	2019-10-17 22:39:57 +02:00
jpekkila	26bbfa089d	Better multi-node communication: fire and forget.	2019-10-17 18:17:37 +03:00
jpekkila	3d852e5082	Added timing to the MPI benchmark	2019-10-17 17:43:54 +03:00
jpekkila	e0a631d81a	Added the hires timer to utils	2019-10-17 17:43:34 +03:00
jpekkila	588a94c772	Added more MPI stuff. Now multi-node GPU-GPU communication with GPUDirect RDMA should work. Also device memory is now allocated in unified memory by default as this makes MPI communication simpler if RDMA is not supported. This does not affect Astaroth any other way since different devices use different portions of the memory space and we continue managing memory transfers manually.	2019-10-17 16:09:05 +03:00
jpekkila	0e88d6c339	Marked some internal functions static	2019-10-17 14:41:44 +03:00
jpekkila	7390d53f79	Added missing extern Cs to verification.h	2019-10-17 14:41:13 +03:00
jpekkila	f1e988ba6a	Added stuff for the device layer for testing GPU-GPU MPI. This is a quick and dirty solution which is primarily meant for benchmarking/verification. Figuring out what the MPI interface should look like is more challenging and is not the priority right now	2019-10-17 14:40:53 +03:00
jpekkila	bb9e65a741	AC_DEFAULT_CONFIG now propagated to projects that link to astaroth utils	2019-10-17 13:05:17 +03:00
jpekkila	859195eda4	exampleproject no longer compiled with astaroth utils	2019-10-17 13:04:39 +03:00
jpekkila	65a2d47ef7	Made grid.cu (multi-node) to compile without errors. Not used though.	2019-10-17 13:03:42 +03:00
jpekkila	ef94ab5b96	A small update to ctest	2019-10-17 13:02:41 +03:00
jpekkila	4fcf9d861f	More undeprecated/deprecated fixes	2019-10-15 19:46:57 +03:00
jpekkila	0865f0499b	Various improvements to the MPI-GPU implementation, but linking MPI libraries with both the host C-project and the core library seems to be a major pain. Currently the communication is done via gpu->cpu->cpu->gpu.	2019-10-15 19:32:16 +03:00
jpekkila	113be456d6	Undeprecated the wrong function in commit `b693c8a`	2019-10-15 18:11:07 +03:00
jpekkila	1ca089c163	New cmake option: MPI_ENABLED. Enables MPI functions on the device layer	2019-10-15 17:57:53 +03:00
jpekkila	0d02faa5f5	Working base for gathering, distributing and communicating halos with MPI	2019-10-15 17:39:26 +03:00
jpekkila	b11ef143eb	Moved a debug print further to reduce clutter	2019-10-15 17:38:29 +03:00
jpekkila	fd9dc7ca98	Added periodic boundconds to utils	2019-10-15 17:37:57 +03:00
jpekkila	ff1ad37047	Some small improvements to the utils library	2019-10-15 17:00:58 +03:00
jpekkila	46ad9da8c8	Pulled some stuff from the mpi branch	2019-10-15 17:00:44 +03:00
jpekkila	4ae9c74d9d	Added a function for randomizing vertex buffers (useful for testing)	2019-10-15 16:13:11 +03:00
jpekkila	37171689c8	Formatting	2019-10-15 16:12:44 +03:00
jpekkila	b693c8adb4	Undeprecated acDeviceLoadMesh and acDeviceStoreMesh, these are actually very nice to have	2019-10-15 16:12:31 +03:00
jpekkila	8d86ac6f9e	Started preparing the MPI version for benchmarks and added a solve-independent version of the verification functions to the utils library	2019-10-15 15:54:15 +03:00
jpekkila	08188f3f5b	is_valid is now consistently overloaded (parameter passed as a reference). Older CUDA compilers complained about this.	2019-10-14 21:18:21 +03:00
jpekkila	b667735906	Removed debug prints from the preprocessing script	2019-10-08 00:31:15 +03:00
jpekkila	44a86f5e80	acc: Removed debug prints, old code. Also the scope of the declarations made inside a for statement is now properly tracked	2019-10-08 00:20:57 +03:00
jpekkila	08f155cbec	Finetuning some error checks	2019-10-07 20:40:32 +03:00
jpekkila	ea4438f331	Adapted the old example of helical forcing with profiles to conform with the revised syntax	2019-10-07 19:43:25 +03:00
jpekkila	0cc5bdaa08	Added support for ScalarArrays back	2019-10-07 19:42:24 +03:00
jpekkila	5d4f47c3d2	Added overloads for vector in-place addition and subtraction	2019-10-07 19:40:54 +03:00
jpekkila	ba49e7e400	Replaced deprecated DCONST_INT calls with overloaded DCONST()	2019-10-07 19:40:27 +03:00
jpekkila	9c575f8059	Merge branch 'master' into acc_rewrite_20191002	2019-10-07 18:28:33 +03:00
jpekkila	ff12332f06	Clarified the syntax for real number literals. 1.0 is the same precision as AcReal, 1.0f is an explicit float and 1.0d is an explicit double.	2019-10-07 18:24:32 +03:00
jpekkila	ffb139883f	API_specification_and_user_manual.md edited online with Bitbucket	2019-10-07 15:22:26 +00:00
jpekkila	aa6c2b23d9	Built-in parameters are now added during compilation instead of defining them in CUDA sources. IMPORTANT: DCONST macro should no longer be used when accessing built-in variables. Now all uniforms are consistently accessed with the handle only	2019-10-07 17:39:27 +03:00
jpekkila	3fe7b62d3e	Removed the old accrevision directory	2019-10-07 17:37:09 +03:00

1 2 3 4 5 ...

580 Commits