jpekkila
|
5ec2f6ad75
|
Better wording in config_loader.c
|
2019-12-14 07:23:25 +02:00 |
|
jpekkila
|
164d11bfca
|
Removed flush-to-zero flags from kernel compilation. No significant effect on performance but may affect accuracy in some cases
|
2019-12-14 07:22:14 +02:00 |
|
jpekkila
|
6b38ef461a
|
Puhti GPUDirect fails for some reason if the cuda library is linked with instead of cudart
|
2019-12-11 17:26:21 +02:00 |
|
jpekkila
|
a1a2d838ea
|
Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth
|
2019-12-08 23:22:51 +02:00 |
|
jpekkila
|
752f44b0a7
|
Second attempt at getting bitbucket to compile
|
2019-12-08 23:22:33 +02:00 |
|
jpekkila
|
420f8b9e06
|
MPI benchmark now writes out the 95th percentile instead of average running time
|
2019-12-08 23:12:23 +02:00 |
|
jpekkila
|
90f85069c6
|
Bitbucket pipelines building fails because the CUDA include dir does not seem to be included for some reason. This is an attempted fix
|
2019-12-08 23:08:45 +02:00 |
|
jpekkila
|
2ab605e125
|
Added the default testcase for MPI benchmarks
|
2019-12-05 18:14:36 +02:00 |
|
jpekkila
|
d136834219
|
Re-enabled and updated MPI integration with the proper synchronization from earlier commits, removed old stuff. Should now work and be ready for benchmarks
|
2019-12-05 16:48:45 +02:00 |
|
jpekkila
|
f16826f2cd
|
Removed old code
|
2019-12-05 16:40:48 +02:00 |
|
jpekkila
|
9f4742bafe
|
Fixed the UCX warning from the last commit. Indexing of MPI_Waitall was wrong and also UCX required that MPI_Isend is also "waited" even though it should implicitly complete at the same time with MPI_Irecv
|
2019-12-05 16:40:30 +02:00 |
|
jpekkila
|
e47cfad6b5
|
MPI now compiles and runs on Puhti, basic verification test with boundary transfers OK. Gives an "UCX WARN object 0x2fa7780 was not returned to mpool ucp_requests" warning though which seems to indicate that not all asynchronous MPI calls finished before MPI_Finalize
|
2019-12-05 16:17:17 +02:00 |
|
jpekkila
|
9d70a29ae0
|
Now the minimum cmake version is 3.9. This is required for proper CUDA & MPI support. Older versions of cmake are very buggy when compiling cuda and it's a pain in the neck to try and work around all the quirks.
|
2019-12-05 15:35:51 +02:00 |
|
jpekkila
|
e99a428dec
|
OpenMP is now properly linked with the standalone without propagating it to nvcc (which would cause an error)
|
2019-12-05 15:30:48 +02:00 |
|
jpekkila
|
9adb9dc38a
|
Disabled MPI integration temporarily and enabled verification for MPI tests
|
2019-12-04 15:11:40 +02:00 |
|
jpekkila
|
6a250f0572
|
Rewrote core CMakeLists.txt for cmake versions with proper CUDA & MPI support (3.9+)
|
2019-12-04 15:09:38 +02:00 |
|
jpekkila
|
0ea2fa9337
|
Cleaner MPI linking with the core library. Requires cmake 3.9+ though, might have to modify later to work with older versions.
|
2019-12-04 13:49:38 +02:00 |
|
jpekkila
|
6e63411170
|
Moved the definition of AC_DEFAULT_CONFIG to the root-level CMakeLists.txt. Now should be visible throughout the project.
|
2019-12-03 18:42:49 +02:00 |
|
jpekkila
|
f97e5cb77c
|
Fixed parts which caused a shadowing warning (same variable name used for different variables in the same scope)
|
2019-12-03 18:41:08 +02:00 |
|
jpekkila
|
04e27e85b2
|
Removed MPI from the core library dependencies: instead one should use the appropriate mpi compiler for compiling host code by passing something like -DCMAKE_C_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicc -DCMAKE_CXX_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicxx to cmake
|
2019-12-03 18:40:15 +02:00 |
|
jpekkila
|
c273fcf110
|
More rigorous error checking
|
2019-12-03 18:38:15 +02:00 |
|
jpekkila
|
49581e8eaa
|
Added forward declaration for yyparse to avoid warnings with some compilers when compiling acc
|
2019-12-03 18:36:21 +02:00 |
|
jpekkila
|
825aa0efaa
|
More warning flags for host code in the core library + small misc changes
|
2019-12-03 16:58:20 +02:00 |
|
jpekkila
|
316d44b843
|
Fixed an out-of-bounds error with auto-optimization (introduced in the last few commits)
|
2019-12-03 16:04:44 +02:00 |
|
jpekkila
|
7e4212ddd9
|
Enabled the generation of API hooks for calling DSL functions (was messing up with compilation earlier)
|
2019-12-03 15:17:27 +02:00 |
|
jpekkila
|
5a6a3110df
|
Reformatted
|
2019-12-03 15:14:26 +02:00 |
|
jpekkila
|
f14e35620c
|
Now nvcc is used to compile kernels only. All host code, incl. device.cc, MPI communication and others are now compiled with the host C++ compiler. This should work around an nvcc/MPI bug on Puhti.
|
2019-12-03 15:12:17 +02:00 |
|
jpekkila
|
8bffb2a1d0
|
Fixed ambiguous logic in acNodeStoreVertexBufferWithOffset, now halos of arbitrary GPUs do not overwrite valid data from the computational domain of a neighboring GPU. Also disabled p2p transfers temporarily until I figure out a clean way to avoid cudaErrorPeerAccessAlreadyEnabled errors
|
2019-12-02 12:58:09 +02:00 |
|
jpekkila
|
0178d4788c
|
The core library now links to the CXX MPI library instead of the C one
|
2019-11-27 14:51:49 +02:00 |
|
jpekkila
|
ab539a98d6
|
Replaced old deprecated instances of DCONST_INT with DCONST
|
2019-11-27 13:48:42 +02:00 |
|
jpekkila
|
1270332f48
|
Fixed a small mistake in the last merge
|
2019-11-27 11:58:14 +02:00 |
|
Johannes Pekkila
|
3d35897601
|
The structure holding an abstract syntax tree node (acc) was not properly initialized to 0, fixed
|
2019-11-27 09:16:32 +01:00 |
|
Johannes Pekkila
|
3eabf94f92
|
Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth
|
2019-11-27 08:55:23 +01:00 |
|
jpekkila
|
5e3caf086e
|
Device id is now properly set when using MPI and there are multiple visible GPUs per node
|
2019-11-26 16:54:56 +02:00 |
|
jpekkila
|
53695d66a3
|
Benchmarking now prints out also percentiles
|
2019-11-26 16:26:31 +02:00 |
|
jpekkila
|
0b0ccd697a
|
Added some explicit casts in get_neighbor (MPI) to fix warnings raised when compiling with older gcc
|
2019-11-20 10:18:10 +02:00 |
|
Miikka Vaisala
|
d3260edd2a
|
Can now picture the magnetic field and streamlines. And some other minor improvements.
|
2019-11-04 11:27:53 +08:00 |
|
Johannes Pekkila
|
981331e7d7
|
Benchmark results now written out to a file
|
2019-10-24 15:53:08 +02:00 |
|
Johannes Pekkila
|
4ffde83215
|
Set default values for benchmarking
|
2019-10-24 15:22:47 +02:00 |
|
Johannes Pekkila
|
8894b7c7d6
|
Added a function for getting pid of a neighboring process when decomposing in 3D
|
2019-10-23 19:26:35 +02:00 |
|
Johannes Pekkila
|
474bdf185d
|
Cleaned up the MPI solution for 3D decomp test
|
2019-10-23 12:33:46 +02:00 |
|
Johannes Pekkila
|
1d81333ff7
|
More concurrent kernels and MPI comm
|
2019-10-23 12:07:23 +02:00 |
|
Johannes Pekkila
|
04867334e7
|
Full integration step with MPI comms
|
2019-10-22 19:59:15 +02:00 |
|
Johannes Pekkila
|
870cd91b5f
|
Added the final MPI solution for the benchmark tests: RDMA is now used and I don't think we can go much faster with the current decomposition scheme. To get better scaling, we probably would have to change 3D decomposition instead of using the current simple 1D decomp
|
2019-10-22 19:28:35 +02:00 |
|
jpekkila
|
3d7ad7c8f2
|
Code cleanup
|
2019-10-22 15:38:34 +03:00 |
|
jpekkila
|
64221c218d
|
Made some warnings go away
|
2019-10-22 15:03:55 +03:00 |
|
Johannes Pekkila
|
e4a7cdcf1d
|
Added functions for packing and unpacking data on the device
|
2019-10-22 13:48:47 +02:00 |
|
Johannes Pekkila
|
915e1c7c14
|
Trying to overlap MPI communication with computation of boundary conditions. However, NVIDIA seemed to forget one important detail in the documentation for CUDA-aware MPI: it looks like CUDA streams are not supported with CUDA-aware MPI communication. So in the end the fastest solution might be to use old-school gpu->cpu->cpu->gpu MPI communication after all
|
2019-10-21 15:50:53 +02:00 |
|
jpekkila
|
f120343110
|
Bugfix: peer access was not disabled when Node was destroyed, leading to cudaErrorPeerAccessAlreadyEnabled error when creating new Nodes
|
2019-10-21 16:23:24 +03:00 |
|
Johannes Pekkila
|
7b475b6dee
|
Better MPI synchronization
|
2019-10-18 11:50:22 +02:00 |
|