astaroth

Author	SHA1	Message	Date
jpekkila	2ab605e125	Added the default testcase for MPI benchmarks	2019-12-05 18:14:36 +02:00
jpekkila	d136834219	Re-enabled and updated MPI integration with the proper synchronization from earlier commits, removed old stuff. Should now work and be ready for benchmarks	2019-12-05 16:48:45 +02:00
jpekkila	f16826f2cd	Removed old code	2019-12-05 16:40:48 +02:00
jpekkila	9f4742bafe	Fixed the UCX warning from the last commit. Indexing of MPI_Waitall was wrong and also UCX required that MPI_Isend is also "waited" even though it should implicitly complete at the same time with MPI_Irecv	2019-12-05 16:40:30 +02:00
jpekkila	e47cfad6b5	MPI now compiles and runs on Puhti, basic verification test with boundary transfers OK. Gives an "UCX WARN object 0x2fa7780 was not returned to mpool ucp_requests" warning though which seems to indicate that not all asynchronous MPI calls finished before MPI_Finalize	2019-12-05 16:17:17 +02:00
jpekkila	9d70a29ae0	Now the minimum cmake version is 3.9. This is required for proper CUDA & MPI support. Older versions of cmake are very buggy when compiling cuda and it's a pain in the neck to try and work around all the quirks.	2019-12-05 15:35:51 +02:00
jpekkila	e99a428dec	OpenMP is now properly linked with the standalone without propagating it to nvcc (which would cause an error)	2019-12-05 15:30:48 +02:00
jpekkila	9adb9dc38a	Disabled MPI integration temporarily and enabled verification for MPI tests	2019-12-04 15:11:40 +02:00
jpekkila	6a250f0572	Rewrote core CMakeLists.txt for cmake versions with proper CUDA & MPI support (3.9+)	2019-12-04 15:09:38 +02:00
jpekkila	0ea2fa9337	Cleaner MPI linking with the core library. Requires cmake 3.9+ though, might have to modify later to work with older versions.	2019-12-04 13:49:38 +02:00
jpekkila	6e63411170	Moved the definition of AC_DEFAULT_CONFIG to the root-level CMakeLists.txt. Now should be visible throughout the project.	2019-12-03 18:42:49 +02:00
jpekkila	f97e5cb77c	Fixed parts which caused a shadowing warning (same variable name used for different variables in the same scope)	2019-12-03 18:41:08 +02:00
jpekkila	04e27e85b2	Removed MPI from the core library dependencies: instead one should use the appropriate mpi compiler for compiling host code by passing something like -DCMAKE_C_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicc -DCMAKE_CXX_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicxx to cmake	2019-12-03 18:40:15 +02:00
jpekkila	c273fcf110	More rigorous error checking	2019-12-03 18:38:15 +02:00
jpekkila	49581e8eaa	Added forward declaration for yyparse to avoid warnings with some compilers when compiling acc	2019-12-03 18:36:21 +02:00
jpekkila	825aa0efaa	More warning flags for host code in the core library + small misc changes	2019-12-03 16:58:20 +02:00
jpekkila	316d44b843	Fixed an out-of-bounds error with auto-optimization (introduced in the last few commits)	2019-12-03 16:04:44 +02:00
jpekkila	7e4212ddd9	Enabled the generation of API hooks for calling DSL functions (was messing up with compilation earlier)	2019-12-03 15:17:27 +02:00
jpekkila	5a6a3110df	Reformatted	2019-12-03 15:14:26 +02:00
jpekkila	f14e35620c	Now nvcc is used to compile kernels only. All host code, incl. device.cc, MPI communication and others are now compiled with the host C++ compiler. This should work around an nvcc/MPI bug on Puhti.	2019-12-03 15:12:17 +02:00
jpekkila	8bffb2a1d0	Fixed ambiguous logic in acNodeStoreVertexBufferWithOffset, now halos of arbitrary GPUs do not overwrite valid data from the computational domain of a neighboring GPU. Also disabled p2p transfers temporarily until I figure out a clean way to avoid cudaErrorPeerAccessAlreadyEnabled errors	2019-12-02 12:58:09 +02:00
jpekkila	0178d4788c	The core library now links to the CXX MPI library instead of the C one	2019-11-27 14:51:49 +02:00
jpekkila	ab539a98d6	Replaced old deprecated instances of DCONST_INT with DCONST	2019-11-27 13:48:42 +02:00
jpekkila	1270332f48	Fixed a small mistake in the last merge	2019-11-27 11:58:14 +02:00
Johannes Pekkila	3d35897601	The structure holding an abstract syntax tree node (acc) was not properly initialized to 0, fixed	2019-11-27 09:16:32 +01:00
Johannes Pekkila	3eabf94f92	Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth	2019-11-27 08:55:23 +01:00
jpekkila	5e3caf086e	Device id is now properly set when using MPI and there are multiple visible GPUs per node	2019-11-26 16:54:56 +02:00
jpekkila	53695d66a3	Benchmarking now prints out also percentiles	2019-11-26 16:26:31 +02:00
jpekkila	0b0ccd697a	Added some explicit casts in get_neighbor (MPI) to fix warnings raised when compiling with older gcc	2019-11-20 10:18:10 +02:00
Miikka Vaisala	d3260edd2a	Can now picture the magnetic field and streamlines. And some other minor improvements.	2019-11-04 11:27:53 +08:00
Johannes Pekkila	981331e7d7	Benchmark results now written out to a file	2019-10-24 15:53:08 +02:00
Johannes Pekkila	4ffde83215	Set default values for benchmarking	2019-10-24 15:22:47 +02:00
Johannes Pekkila	8894b7c7d6	Added a function for getting pid of a neighboring process when decomposing in 3D	2019-10-23 19:26:35 +02:00
Johannes Pekkila	474bdf185d	Cleaned up the MPI solution for 3D decomp test	2019-10-23 12:33:46 +02:00
Johannes Pekkila	1d81333ff7	More concurrent kernels and MPI comm	2019-10-23 12:07:23 +02:00
Johannes Pekkila	04867334e7	Full integration step with MPI comms	2019-10-22 19:59:15 +02:00
Johannes Pekkila	870cd91b5f	Added the final MPI solution for the benchmark tests: RDMA is now used and I don't think we can go much faster with the current decomposition scheme. To get better scaling, we probably would have to change 3D decomposition instead of using the current simple 1D decomp	2019-10-22 19:28:35 +02:00
jpekkila	3d7ad7c8f2	Code cleanup	2019-10-22 15:38:34 +03:00
jpekkila	64221c218d	Made some warnings go away	2019-10-22 15:03:55 +03:00
Johannes Pekkila	e4a7cdcf1d	Added functions for packing and unpacking data on the device	2019-10-22 13:48:47 +02:00
Johannes Pekkila	915e1c7c14	Trying to overlap MPI communication with computation of boundary conditions. However, NVIDIA seemed to forget one important detail in the documentation for CUDA-aware MPI: it looks like CUDA streams are not supported with CUDA-aware MPI communication. So in the end the fastest solution might be to use old-school gpu->cpu->cpu->gpu MPI communication after all	2019-10-21 15:50:53 +02:00
jpekkila	f120343110	Bugfix: peer access was not disabled when Node was destroyed, leading to cudaErrorPeerAccessAlreadyEnabled error when creating new Nodes	2019-10-21 16:23:24 +03:00
Johannes Pekkila	7b475b6dee	Better MPI synchronization	2019-10-18 11:50:22 +02:00
jpekkila	f3cb6e7049	Removed old unused tokens from the DSL grammar	2019-10-18 02:14:19 +03:00
jpekkila	0f5acfbb33	<q:::qqq!!!:::q:[2~:wqMer§§gccc:qq[2~: branch 'master' of https://bitbucket.org/jpekkila/astaroth:q Z bin/sh: 1: !:: not .>.Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth	2019-10-18 02:06:15 +03:00
jpekkila	7c79a98cdc	Added support for various binary operations (>=, <=, /= etc). Also bitwise operators \| and & are now allowed	2019-10-18 01:52:14 +03:00
Johannes Pekkila	155d369888	MPI communication now 10x faster	2019-10-17 22:39:57 +02:00
jpekkila	26bbfa089d	Better multi-node communication: fire and forget.	2019-10-17 18:17:37 +03:00
jpekkila	3d852e5082	Added timing to the MPI benchmark	2019-10-17 17:43:54 +03:00
jpekkila	e0a631d81a	Added the hires timer to utils	2019-10-17 17:43:34 +03:00

1 2 3 4 5 ...

614 Commits