Commit Graph

625 Commits

Author SHA1 Message Date
jpekkila
f0e77181df Benchmark finetuning 2019-12-14 14:52:06 +02:00
jpekkila
b8a997b0ab Added code for doing a proper verification run with MPI. Passes nicely with full MHD + upwinding when using the new utility stuff introduced in the previous commits. Note: forcing is not enabled in the utility library by default. 2019-12-14 07:37:59 +02:00
jpekkila
277905aafb Added a model integrator to the utility library (written in pure C). Requires support for AVX vector instructions. 2019-12-14 07:34:33 +02:00
jpekkila
22a3105068 Finished the latest version of autotesting (utility library). Uses ulps to determine the acceptable error instead of the relative error used previously 2019-12-14 07:27:11 +02:00
jpekkila
5ec2f6ad75 Better wording in config_loader.c 2019-12-14 07:23:25 +02:00
jpekkila
164d11bfca Removed flush-to-zero flags from kernel compilation. No significant effect on performance but may affect accuracy in some cases 2019-12-14 07:22:14 +02:00
jpekkila
6b38ef461a Puhti GPUDirect fails for some reason if the cuda library is linked with instead of cudart 2019-12-11 17:26:21 +02:00
jpekkila
a1a2d838ea Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth 2019-12-08 23:22:51 +02:00
jpekkila
752f44b0a7 Second attempt at getting bitbucket to compile 2019-12-08 23:22:33 +02:00
jpekkila
420f8b9e06 MPI benchmark now writes out the 95th percentile instead of average running time 2019-12-08 23:12:23 +02:00
jpekkila
90f85069c6 Bitbucket pipelines building fails because the CUDA include dir does not seem to be included for some reason. This is an attempted fix 2019-12-08 23:08:45 +02:00
jpekkila
2ab605e125 Added the default testcase for MPI benchmarks 2019-12-05 18:14:36 +02:00
jpekkila
d136834219 Re-enabled and updated MPI integration with the proper synchronization from earlier commits, removed old stuff. Should now work and be ready for benchmarks 2019-12-05 16:48:45 +02:00
jpekkila
f16826f2cd Removed old code 2019-12-05 16:40:48 +02:00
jpekkila
9f4742bafe Fixed the UCX warning from the last commit. Indexing of MPI_Waitall was wrong and also UCX required that MPI_Isend is also "waited" even though it should implicitly complete at the same time with MPI_Irecv 2019-12-05 16:40:30 +02:00
jpekkila
e47cfad6b5 MPI now compiles and runs on Puhti, basic verification test with boundary transfers OK. Gives an "UCX WARN object 0x2fa7780 was not returned to mpool ucp_requests" warning though which seems to indicate that not all asynchronous MPI calls finished before MPI_Finalize 2019-12-05 16:17:17 +02:00
jpekkila
9d70a29ae0 Now the minimum cmake version is 3.9. This is required for proper CUDA & MPI support. Older versions of cmake are very buggy when compiling cuda and it's a pain in the neck to try and work around all the quirks. 2019-12-05 15:35:51 +02:00
jpekkila
e99a428dec OpenMP is now properly linked with the standalone without propagating it to nvcc (which would cause an error) 2019-12-05 15:30:48 +02:00
jpekkila
9adb9dc38a Disabled MPI integration temporarily and enabled verification for MPI tests 2019-12-04 15:11:40 +02:00
jpekkila
6a250f0572 Rewrote core CMakeLists.txt for cmake versions with proper CUDA & MPI support (3.9+) 2019-12-04 15:09:38 +02:00
jpekkila
0ea2fa9337 Cleaner MPI linking with the core library. Requires cmake 3.9+ though, might have to modify later to work with older versions. 2019-12-04 13:49:38 +02:00
jpekkila
6e63411170 Moved the definition of AC_DEFAULT_CONFIG to the root-level CMakeLists.txt. Now should be visible throughout the project. 2019-12-03 18:42:49 +02:00
jpekkila
f97e5cb77c Fixed parts which caused a shadowing warning (same variable name used for different variables in the same scope) 2019-12-03 18:41:08 +02:00
jpekkila
04e27e85b2 Removed MPI from the core library dependencies: instead one should use the appropriate mpi compiler for compiling host code by passing something like -DCMAKE_C_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicc -DCMAKE_CXX_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicxx to cmake 2019-12-03 18:40:15 +02:00
jpekkila
c273fcf110 More rigorous error checking 2019-12-03 18:38:15 +02:00
jpekkila
49581e8eaa Added forward declaration for yyparse to avoid warnings with some compilers when compiling acc 2019-12-03 18:36:21 +02:00
jpekkila
825aa0efaa More warning flags for host code in the core library + small misc changes 2019-12-03 16:58:20 +02:00
jpekkila
316d44b843 Fixed an out-of-bounds error with auto-optimization (introduced in the last few commits) 2019-12-03 16:04:44 +02:00
jpekkila
7e4212ddd9 Enabled the generation of API hooks for calling DSL functions (was messing up with compilation earlier) 2019-12-03 15:17:27 +02:00
jpekkila
5a6a3110df Reformatted 2019-12-03 15:14:26 +02:00
jpekkila
f14e35620c Now nvcc is used to compile kernels only. All host code, incl. device.cc, MPI communication and others are now compiled with the host C++ compiler. This should work around an nvcc/MPI bug on Puhti. 2019-12-03 15:12:17 +02:00
jpekkila
8bffb2a1d0 Fixed ambiguous logic in acNodeStoreVertexBufferWithOffset, now halos of arbitrary GPUs do not overwrite valid data from the computational domain of a neighboring GPU. Also disabled p2p transfers temporarily until I figure out a clean way to avoid cudaErrorPeerAccessAlreadyEnabled errors 2019-12-02 12:58:09 +02:00
jpekkila
0178d4788c The core library now links to the CXX MPI library instead of the C one 2019-11-27 14:51:49 +02:00
jpekkila
ab539a98d6 Replaced old deprecated instances of DCONST_INT with DCONST 2019-11-27 13:48:42 +02:00
jpekkila
1270332f48 Fixed a small mistake in the last merge 2019-11-27 11:58:14 +02:00
Johannes Pekkila
3d35897601 The structure holding an abstract syntax tree node (acc) was not properly initialized to 0, fixed 2019-11-27 09:16:32 +01:00
Johannes Pekkila
3eabf94f92 Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth 2019-11-27 08:55:23 +01:00
jpekkila
5e3caf086e Device id is now properly set when using MPI and there are multiple visible GPUs per node 2019-11-26 16:54:56 +02:00
jpekkila
53695d66a3 Benchmarking now prints out also percentiles 2019-11-26 16:26:31 +02:00
jpekkila
0b0ccd697a Added some explicit casts in get_neighbor (MPI) to fix warnings raised when compiling with older gcc 2019-11-20 10:18:10 +02:00
Miikka Vaisala
d3260edd2a Can now picture the magnetic field and streamlines. And some other minor improvements. 2019-11-04 11:27:53 +08:00
Johannes Pekkila
981331e7d7 Benchmark results now written out to a file 2019-10-24 15:53:08 +02:00
Johannes Pekkila
4ffde83215 Set default values for benchmarking 2019-10-24 15:22:47 +02:00
Johannes Pekkila
8894b7c7d6 Added a function for getting pid of a neighboring process when decomposing in 3D 2019-10-23 19:26:35 +02:00
Johannes Pekkila
474bdf185d Cleaned up the MPI solution for 3D decomp test 2019-10-23 12:33:46 +02:00
Johannes Pekkila
1d81333ff7 More concurrent kernels and MPI comm 2019-10-23 12:07:23 +02:00
Johannes Pekkila
04867334e7 Full integration step with MPI comms 2019-10-22 19:59:15 +02:00
Johannes Pekkila
870cd91b5f Added the final MPI solution for the benchmark tests: RDMA is now used and I don't think we can go much faster with the current decomposition scheme. To get better scaling, we probably would have to change 3D decomposition instead of using the current simple 1D decomp 2019-10-22 19:28:35 +02:00
jpekkila
3d7ad7c8f2 Code cleanup 2019-10-22 15:38:34 +03:00
jpekkila
64221c218d Made some warnings go away 2019-10-22 15:03:55 +03:00