jpekkila
|
7e4212ddd9
|
Enabled the generation of API hooks for calling DSL functions (was messing up with compilation earlier)
|
2019-12-03 15:17:27 +02:00 |
|
jpekkila
|
5a6a3110df
|
Reformatted
|
2019-12-03 15:14:26 +02:00 |
|
jpekkila
|
f14e35620c
|
Now nvcc is used to compile kernels only. All host code, incl. device.cc, MPI communication and others are now compiled with the host C++ compiler. This should work around an nvcc/MPI bug on Puhti.
|
2019-12-03 15:12:17 +02:00 |
|
jpekkila
|
8bffb2a1d0
|
Fixed ambiguous logic in acNodeStoreVertexBufferWithOffset, now halos of arbitrary GPUs do not overwrite valid data from the computational domain of a neighboring GPU. Also disabled p2p transfers temporarily until I figure out a clean way to avoid cudaErrorPeerAccessAlreadyEnabled errors
|
2019-12-02 12:58:09 +02:00 |
|
jpekkila
|
0178d4788c
|
The core library now links to the CXX MPI library instead of the C one
|
2019-11-27 14:51:49 +02:00 |
|
jpekkila
|
ab539a98d6
|
Replaced old deprecated instances of DCONST_INT with DCONST
|
2019-11-27 13:48:42 +02:00 |
|
jpekkila
|
1270332f48
|
Fixed a small mistake in the last merge
|
2019-11-27 11:58:14 +02:00 |
|
Johannes Pekkila
|
3d35897601
|
The structure holding an abstract syntax tree node (acc) was not properly initialized to 0, fixed
|
2019-11-27 09:16:32 +01:00 |
|
Johannes Pekkila
|
3eabf94f92
|
Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth
|
2019-11-27 08:55:23 +01:00 |
|
jpekkila
|
5e3caf086e
|
Device id is now properly set when using MPI and there are multiple visible GPUs per node
|
2019-11-26 16:54:56 +02:00 |
|
jpekkila
|
53695d66a3
|
Benchmarking now prints out also percentiles
|
2019-11-26 16:26:31 +02:00 |
|
jpekkila
|
0b0ccd697a
|
Added some explicit casts in get_neighbor (MPI) to fix warnings raised when compiling with older gcc
|
2019-11-20 10:18:10 +02:00 |
|
Miikka Vaisala
|
d3260edd2a
|
Can now picture the magnetic field and streamlines. And some other minor improvements.
|
2019-11-04 11:27:53 +08:00 |
|
Johannes Pekkila
|
981331e7d7
|
Benchmark results now written out to a file
|
2019-10-24 15:53:08 +02:00 |
|
Johannes Pekkila
|
4ffde83215
|
Set default values for benchmarking
|
2019-10-24 15:22:47 +02:00 |
|
Johannes Pekkila
|
8894b7c7d6
|
Added a function for getting pid of a neighboring process when decomposing in 3D
|
2019-10-23 19:26:35 +02:00 |
|
Johannes Pekkila
|
474bdf185d
|
Cleaned up the MPI solution for 3D decomp test
|
2019-10-23 12:33:46 +02:00 |
|
Johannes Pekkila
|
1d81333ff7
|
More concurrent kernels and MPI comm
|
2019-10-23 12:07:23 +02:00 |
|
Johannes Pekkila
|
04867334e7
|
Full integration step with MPI comms
|
2019-10-22 19:59:15 +02:00 |
|
Johannes Pekkila
|
870cd91b5f
|
Added the final MPI solution for the benchmark tests: RDMA is now used and I don't think we can go much faster with the current decomposition scheme. To get better scaling, we probably would have to change 3D decomposition instead of using the current simple 1D decomp
|
2019-10-22 19:28:35 +02:00 |
|
jpekkila
|
3d7ad7c8f2
|
Code cleanup
|
2019-10-22 15:38:34 +03:00 |
|
jpekkila
|
64221c218d
|
Made some warnings go away
|
2019-10-22 15:03:55 +03:00 |
|
Johannes Pekkila
|
e4a7cdcf1d
|
Added functions for packing and unpacking data on the device
|
2019-10-22 13:48:47 +02:00 |
|
Johannes Pekkila
|
915e1c7c14
|
Trying to overlap MPI communication with computation of boundary conditions. However, NVIDIA seemed to forget one important detail in the documentation for CUDA-aware MPI: it looks like CUDA streams are not supported with CUDA-aware MPI communication. So in the end the fastest solution might be to use old-school gpu->cpu->cpu->gpu MPI communication after all
|
2019-10-21 15:50:53 +02:00 |
|
jpekkila
|
f120343110
|
Bugfix: peer access was not disabled when Node was destroyed, leading to cudaErrorPeerAccessAlreadyEnabled error when creating new Nodes
|
2019-10-21 16:23:24 +03:00 |
|
Johannes Pekkila
|
7b475b6dee
|
Better MPI synchronization
|
2019-10-18 11:50:22 +02:00 |
|
jpekkila
|
f3cb6e7049
|
Removed old unused tokens from the DSL grammar
|
2019-10-18 02:14:19 +03:00 |
|
jpekkila
|
0f5acfbb33
|
<q:::qqq!!!:::q:[2~:wqMer§§gccc:qq[2~: branch 'master' of
https://bitbucket.org/jpekkila/astaroth:q Z
bin/sh: 1: !:: not .>.Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth
|
2019-10-18 02:06:15 +03:00 |
|
jpekkila
|
7c79a98cdc
|
Added support for various binary operations (>=, <=, /= etc). Also bitwise operators | and & are now allowed
|
2019-10-18 01:52:14 +03:00 |
|
Johannes Pekkila
|
155d369888
|
MPI communication now 10x faster
|
2019-10-17 22:39:57 +02:00 |
|
jpekkila
|
26bbfa089d
|
Better multi-node communication: fire and forget.
|
2019-10-17 18:17:37 +03:00 |
|
jpekkila
|
3d852e5082
|
Added timing to the MPI benchmark
|
2019-10-17 17:43:54 +03:00 |
|
jpekkila
|
e0a631d81a
|
Added the hires timer to utils
|
2019-10-17 17:43:34 +03:00 |
|
jpekkila
|
588a94c772
|
Added more MPI stuff. Now multi-node GPU-GPU communication with GPUDirect RDMA should work. Also device memory is now allocated in unified memory by default as this makes MPI communication simpler if RDMA is not supported. This does not affect Astaroth any other way since different devices use different portions of the memory space and we continue managing memory transfers manually.
|
2019-10-17 16:09:05 +03:00 |
|
jpekkila
|
0e88d6c339
|
Marked some internal functions static
|
2019-10-17 14:41:44 +03:00 |
|
jpekkila
|
7390d53f79
|
Added missing extern Cs to verification.h
|
2019-10-17 14:41:13 +03:00 |
|
jpekkila
|
f1e988ba6a
|
Added stuff for the device layer for testing GPU-GPU MPI. This is a quick and dirty solution which is primarily meant for benchmarking/verification. Figuring out what the MPI interface should look like is more challenging and is not the priority right now
|
2019-10-17 14:40:53 +03:00 |
|
jpekkila
|
bb9e65a741
|
AC_DEFAULT_CONFIG now propagated to projects that link to astaroth utils
|
2019-10-17 13:05:17 +03:00 |
|
jpekkila
|
859195eda4
|
exampleproject no longer compiled with astaroth utils
|
2019-10-17 13:04:39 +03:00 |
|
jpekkila
|
65a2d47ef7
|
Made grid.cu (multi-node) to compile without errors. Not used though.
|
2019-10-17 13:03:42 +03:00 |
|
jpekkila
|
ef94ab5b96
|
A small update to ctest
|
2019-10-17 13:02:41 +03:00 |
|
jpekkila
|
4fcf9d861f
|
More undeprecated/deprecated fixes
|
2019-10-15 19:46:57 +03:00 |
|
jpekkila
|
0865f0499b
|
Various improvements to the MPI-GPU implementation, but linking MPI libraries with both the host C-project and the core library seems to be a major pain. Currently the communication is done via gpu->cpu->cpu->gpu.
|
2019-10-15 19:32:16 +03:00 |
|
jpekkila
|
113be456d6
|
Undeprecated the wrong function in commit b693c8a
|
2019-10-15 18:11:07 +03:00 |
|
jpekkila
|
1ca089c163
|
New cmake option: MPI_ENABLED. Enables MPI functions on the device layer
|
2019-10-15 17:57:53 +03:00 |
|
jpekkila
|
0d02faa5f5
|
Working base for gathering, distributing and communicating halos with MPI
|
2019-10-15 17:39:26 +03:00 |
|
jpekkila
|
b11ef143eb
|
Moved a debug print further to reduce clutter
|
2019-10-15 17:38:29 +03:00 |
|
jpekkila
|
fd9dc7ca98
|
Added periodic boundconds to utils
|
2019-10-15 17:37:57 +03:00 |
|
jpekkila
|
ff1ad37047
|
Some small improvements to the utils library
|
2019-10-15 17:00:58 +03:00 |
|
jpekkila
|
46ad9da8c8
|
Pulled some stuff from the mpi branch
|
2019-10-15 17:00:44 +03:00 |
|