Found a workaround that gives good inter and intra-node performance. HPC-X MPI implementation does not know how to do p2p comm with pinned arrays (should be 80 GiB/s, measured 10 GiB/s) and internode comm is super slow without pinned arrays (should be 40 GiB/s, measured < 1 GiB/s). Made a proof of concept communicator that pins arrays that are send or received from another node.

This commit is contained in:
jpekkila
2020-04-05 20:15:32 +03:00
parent 88e53dfa21
commit cc9d3f1b9c
3 changed files with 258 additions and 5 deletions

View File

@@ -6,4 +6,4 @@ find_package(CUDAToolkit)
add_executable(bwtest main.c)
add_compile_options(-O3)
target_link_libraries(bwtest MPI::MPI_C OpenMP::OpenMP_C CUDA::cudart)
target_link_libraries(bwtest MPI::MPI_C OpenMP::OpenMP_C CUDA::cudart_static)