Commit Graph

638 Commits

Author SHA1 Message Date
jpekkila
51b48a5a36 Some intermediate MPI changes 2020-01-05 16:46:40 +02:00
jpekkila
d6c81c89fb This 3D blocking approach is getting too complicated, removed code and trying again 2019-12-28 16:38:10 +02:00
jpekkila
e86b082c98 MPI transfer for the first corner with 3D blocking now complete. Disabled/enabled some error checking for development 2019-12-27 13:43:22 +02:00
jpekkila
bd0cc3ee20 There was some kind of mismatch between CUDA and MPI (UCX) libraries when linking with cudart. Switching to provided by cmake fixed the issue. 2019-12-27 13:41:18 +02:00
jpekkila
6b5910f7df Added allocations for the packed buffers 2019-12-21 19:00:35 +02:00
jpekkila
57a1f3e30c Added a generic pack/unpack function 2019-12-21 16:20:40 +02:00
jpekkila
3ecd47fe8b Merge branch 'master' into 3d-decomposition-2020-01 2019-12-21 13:22:45 +02:00
jpekkila
35b56029cf Build failed with single-precision, added the correct casts to modelsolver.c 2019-12-21 13:21:56 +02:00
jpekkila
4d873caf38 Changed utils CMakeList.txt to modern cmake style 2019-12-21 13:16:08 +02:00
jpekkila
bad64f5307 Started the 3D decomposition branch. Four tasks: 1) Determine how to distribute the work given n processes 2) Distribute and gather the mesh to/from these processes 3) Create packing/unpacking functions and 4) Transfer packed data blocks between neighbors. Tasks 1 and 2 done with this commit. 2019-12-21 12:37:01 +02:00
jpekkila
ecff5c3041 Added some final changes to benchmarking 2019-12-15 21:47:41 +02:00
jpekkila
8bd81db63c Added CPU parallelization to make CPU integration and boundconds faster 2019-12-14 15:45:42 +02:00
jpekkila
ff35d78509 Rewrote the MPI benchmark-verification function 2019-12-14 15:26:19 +02:00
jpekkila
f0e77181df Benchmark finetuning 2019-12-14 14:52:06 +02:00
jpekkila
b8a997b0ab Added code for doing a proper verification run with MPI. Passes nicely with full MHD + upwinding when using the new utility stuff introduced in the previous commits. Note: forcing is not enabled in the utility library by default. 2019-12-14 07:37:59 +02:00
jpekkila
277905aafb Added a model integrator to the utility library (written in pure C). Requires support for AVX vector instructions. 2019-12-14 07:34:33 +02:00
jpekkila
22a3105068 Finished the latest version of autotesting (utility library). Uses ulps to determine the acceptable error instead of the relative error used previously 2019-12-14 07:27:11 +02:00
jpekkila
5ec2f6ad75 Better wording in config_loader.c 2019-12-14 07:23:25 +02:00
jpekkila
164d11bfca Removed flush-to-zero flags from kernel compilation. No significant effect on performance but may affect accuracy in some cases 2019-12-14 07:22:14 +02:00
jpekkila
6b38ef461a Puhti GPUDirect fails for some reason if the cuda library is linked with instead of cudart 2019-12-11 17:26:21 +02:00
jpekkila
a1a2d838ea Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth 2019-12-08 23:22:51 +02:00
jpekkila
752f44b0a7 Second attempt at getting bitbucket to compile 2019-12-08 23:22:33 +02:00
jpekkila
420f8b9e06 MPI benchmark now writes out the 95th percentile instead of average running time 2019-12-08 23:12:23 +02:00
jpekkila
90f85069c6 Bitbucket pipelines building fails because the CUDA include dir does not seem to be included for some reason. This is an attempted fix 2019-12-08 23:08:45 +02:00
jpekkila
2ab605e125 Added the default testcase for MPI benchmarks 2019-12-05 18:14:36 +02:00
jpekkila
d136834219 Re-enabled and updated MPI integration with the proper synchronization from earlier commits, removed old stuff. Should now work and be ready for benchmarks 2019-12-05 16:48:45 +02:00
jpekkila
f16826f2cd Removed old code 2019-12-05 16:40:48 +02:00
jpekkila
9f4742bafe Fixed the UCX warning from the last commit. Indexing of MPI_Waitall was wrong and also UCX required that MPI_Isend is also "waited" even though it should implicitly complete at the same time with MPI_Irecv 2019-12-05 16:40:30 +02:00
jpekkila
e47cfad6b5 MPI now compiles and runs on Puhti, basic verification test with boundary transfers OK. Gives an "UCX WARN object 0x2fa7780 was not returned to mpool ucp_requests" warning though which seems to indicate that not all asynchronous MPI calls finished before MPI_Finalize 2019-12-05 16:17:17 +02:00
jpekkila
9d70a29ae0 Now the minimum cmake version is 3.9. This is required for proper CUDA & MPI support. Older versions of cmake are very buggy when compiling cuda and it's a pain in the neck to try and work around all the quirks. 2019-12-05 15:35:51 +02:00
jpekkila
e99a428dec OpenMP is now properly linked with the standalone without propagating it to nvcc (which would cause an error) 2019-12-05 15:30:48 +02:00
jpekkila
9adb9dc38a Disabled MPI integration temporarily and enabled verification for MPI tests 2019-12-04 15:11:40 +02:00
jpekkila
6a250f0572 Rewrote core CMakeLists.txt for cmake versions with proper CUDA & MPI support (3.9+) 2019-12-04 15:09:38 +02:00
jpekkila
0ea2fa9337 Cleaner MPI linking with the core library. Requires cmake 3.9+ though, might have to modify later to work with older versions. 2019-12-04 13:49:38 +02:00
jpekkila
6e63411170 Moved the definition of AC_DEFAULT_CONFIG to the root-level CMakeLists.txt. Now should be visible throughout the project. 2019-12-03 18:42:49 +02:00
jpekkila
f97e5cb77c Fixed parts which caused a shadowing warning (same variable name used for different variables in the same scope) 2019-12-03 18:41:08 +02:00
jpekkila
04e27e85b2 Removed MPI from the core library dependencies: instead one should use the appropriate mpi compiler for compiling host code by passing something like -DCMAKE_C_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicc -DCMAKE_CXX_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicxx to cmake 2019-12-03 18:40:15 +02:00
jpekkila
c273fcf110 More rigorous error checking 2019-12-03 18:38:15 +02:00
jpekkila
49581e8eaa Added forward declaration for yyparse to avoid warnings with some compilers when compiling acc 2019-12-03 18:36:21 +02:00
jpekkila
825aa0efaa More warning flags for host code in the core library + small misc changes 2019-12-03 16:58:20 +02:00
jpekkila
316d44b843 Fixed an out-of-bounds error with auto-optimization (introduced in the last few commits) 2019-12-03 16:04:44 +02:00
jpekkila
7e4212ddd9 Enabled the generation of API hooks for calling DSL functions (was messing up with compilation earlier) 2019-12-03 15:17:27 +02:00
jpekkila
5a6a3110df Reformatted 2019-12-03 15:14:26 +02:00
jpekkila
f14e35620c Now nvcc is used to compile kernels only. All host code, incl. device.cc, MPI communication and others are now compiled with the host C++ compiler. This should work around an nvcc/MPI bug on Puhti. 2019-12-03 15:12:17 +02:00
jpekkila
8bffb2a1d0 Fixed ambiguous logic in acNodeStoreVertexBufferWithOffset, now halos of arbitrary GPUs do not overwrite valid data from the computational domain of a neighboring GPU. Also disabled p2p transfers temporarily until I figure out a clean way to avoid cudaErrorPeerAccessAlreadyEnabled errors 2019-12-02 12:58:09 +02:00
jpekkila
0178d4788c The core library now links to the CXX MPI library instead of the C one 2019-11-27 14:51:49 +02:00
jpekkila
ab539a98d6 Replaced old deprecated instances of DCONST_INT with DCONST 2019-11-27 13:48:42 +02:00
jpekkila
1270332f48 Fixed a small mistake in the last merge 2019-11-27 11:58:14 +02:00
Johannes Pekkila
3d35897601 The structure holding an abstract syntax tree node (acc) was not properly initialized to 0, fixed 2019-11-27 09:16:32 +01:00
Johannes Pekkila
3eabf94f92 Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth 2019-11-27 08:55:23 +01:00