Commit Graph

384 Commits

Author SHA1 Message Date
jpekkila
92a6a1bdec Added more professional run flags to ./ac_run 2020-01-13 15:35:01 +02:00
jpekkila
794e4393c3 Added a new function for the legacy Astaroth layer: acGetNode(). This functions returns a Node, which can be used to access acNode layer functions 2020-01-13 11:33:15 +02:00
jpekkila
f0208c66a6 Now compiles also for P100 by default (was removed accidentally in earlier commits) 2020-01-07 10:29:44 +00:00
jpekkila
e4f7214b3a benchmark.cc edited online with Bitbucket 2019-12-21 11:26:54 +00:00
jpekkila
35b56029cf Build failed with single-precision, added the correct casts to modelsolver.c 2019-12-21 13:21:56 +02:00
jpekkila
4d873caf38 Changed utils CMakeList.txt to modern cmake style 2019-12-21 13:16:08 +02:00
jpekkila
ecff5c3041 Added some final changes to benchmarking 2019-12-15 21:47:41 +02:00
jpekkila
8bd81db63c Added CPU parallelization to make CPU integration and boundconds faster 2019-12-14 15:45:42 +02:00
jpekkila
ff35d78509 Rewrote the MPI benchmark-verification function 2019-12-14 15:26:19 +02:00
jpekkila
f0e77181df Benchmark finetuning 2019-12-14 14:52:06 +02:00
jpekkila
b8a997b0ab Added code for doing a proper verification run with MPI. Passes nicely with full MHD + upwinding when using the new utility stuff introduced in the previous commits. Note: forcing is not enabled in the utility library by default. 2019-12-14 07:37:59 +02:00
jpekkila
277905aafb Added a model integrator to the utility library (written in pure C). Requires support for AVX vector instructions. 2019-12-14 07:34:33 +02:00
jpekkila
22a3105068 Finished the latest version of autotesting (utility library). Uses ulps to determine the acceptable error instead of the relative error used previously 2019-12-14 07:27:11 +02:00
jpekkila
5ec2f6ad75 Better wording in config_loader.c 2019-12-14 07:23:25 +02:00
jpekkila
164d11bfca Removed flush-to-zero flags from kernel compilation. No significant effect on performance but may affect accuracy in some cases 2019-12-14 07:22:14 +02:00
jpekkila
6b38ef461a Puhti GPUDirect fails for some reason if the cuda library is linked with instead of cudart 2019-12-11 17:26:21 +02:00
jpekkila
a1a2d838ea Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth 2019-12-08 23:22:51 +02:00
jpekkila
752f44b0a7 Second attempt at getting bitbucket to compile 2019-12-08 23:22:33 +02:00
jpekkila
420f8b9e06 MPI benchmark now writes out the 95th percentile instead of average running time 2019-12-08 23:12:23 +02:00
jpekkila
90f85069c6 Bitbucket pipelines building fails because the CUDA include dir does not seem to be included for some reason. This is an attempted fix 2019-12-08 23:08:45 +02:00
jpekkila
2ab605e125 Added the default testcase for MPI benchmarks 2019-12-05 18:14:36 +02:00
jpekkila
d136834219 Re-enabled and updated MPI integration with the proper synchronization from earlier commits, removed old stuff. Should now work and be ready for benchmarks 2019-12-05 16:48:45 +02:00
jpekkila
f16826f2cd Removed old code 2019-12-05 16:40:48 +02:00
jpekkila
9f4742bafe Fixed the UCX warning from the last commit. Indexing of MPI_Waitall was wrong and also UCX required that MPI_Isend is also "waited" even though it should implicitly complete at the same time with MPI_Irecv 2019-12-05 16:40:30 +02:00
jpekkila
e47cfad6b5 MPI now compiles and runs on Puhti, basic verification test with boundary transfers OK. Gives an "UCX WARN object 0x2fa7780 was not returned to mpool ucp_requests" warning though which seems to indicate that not all asynchronous MPI calls finished before MPI_Finalize 2019-12-05 16:17:17 +02:00
jpekkila
e99a428dec OpenMP is now properly linked with the standalone without propagating it to nvcc (which would cause an error) 2019-12-05 15:30:48 +02:00
jpekkila
9adb9dc38a Disabled MPI integration temporarily and enabled verification for MPI tests 2019-12-04 15:11:40 +02:00
jpekkila
6a250f0572 Rewrote core CMakeLists.txt for cmake versions with proper CUDA & MPI support (3.9+) 2019-12-04 15:09:38 +02:00
jpekkila
0ea2fa9337 Cleaner MPI linking with the core library. Requires cmake 3.9+ though, might have to modify later to work with older versions. 2019-12-04 13:49:38 +02:00
jpekkila
6e63411170 Moved the definition of AC_DEFAULT_CONFIG to the root-level CMakeLists.txt. Now should be visible throughout the project. 2019-12-03 18:42:49 +02:00
jpekkila
f97e5cb77c Fixed parts which caused a shadowing warning (same variable name used for different variables in the same scope) 2019-12-03 18:41:08 +02:00
jpekkila
04e27e85b2 Removed MPI from the core library dependencies: instead one should use the appropriate mpi compiler for compiling host code by passing something like -DCMAKE_C_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicc -DCMAKE_CXX_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicxx to cmake 2019-12-03 18:40:15 +02:00
jpekkila
c273fcf110 More rigorous error checking 2019-12-03 18:38:15 +02:00
jpekkila
825aa0efaa More warning flags for host code in the core library + small misc changes 2019-12-03 16:58:20 +02:00
jpekkila
316d44b843 Fixed an out-of-bounds error with auto-optimization (introduced in the last few commits) 2019-12-03 16:04:44 +02:00
jpekkila
7e4212ddd9 Enabled the generation of API hooks for calling DSL functions (was messing up with compilation earlier) 2019-12-03 15:17:27 +02:00
jpekkila
5a6a3110df Reformatted 2019-12-03 15:14:26 +02:00
jpekkila
f14e35620c Now nvcc is used to compile kernels only. All host code, incl. device.cc, MPI communication and others are now compiled with the host C++ compiler. This should work around an nvcc/MPI bug on Puhti. 2019-12-03 15:12:17 +02:00
jpekkila
8bffb2a1d0 Fixed ambiguous logic in acNodeStoreVertexBufferWithOffset, now halos of arbitrary GPUs do not overwrite valid data from the computational domain of a neighboring GPU. Also disabled p2p transfers temporarily until I figure out a clean way to avoid cudaErrorPeerAccessAlreadyEnabled errors 2019-12-02 12:58:09 +02:00
jpekkila
0178d4788c The core library now links to the CXX MPI library instead of the C one 2019-11-27 14:51:49 +02:00
jpekkila
ab539a98d6 Replaced old deprecated instances of DCONST_INT with DCONST 2019-11-27 13:48:42 +02:00
jpekkila
1270332f48 Fixed a small mistake in the last merge 2019-11-27 11:58:14 +02:00
Johannes Pekkila
3eabf94f92 Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth 2019-11-27 08:55:23 +01:00
jpekkila
5e3caf086e Device id is now properly set when using MPI and there are multiple visible GPUs per node 2019-11-26 16:54:56 +02:00
jpekkila
53695d66a3 Benchmarking now prints out also percentiles 2019-11-26 16:26:31 +02:00
jpekkila
0b0ccd697a Added some explicit casts in get_neighbor (MPI) to fix warnings raised when compiling with older gcc 2019-11-20 10:18:10 +02:00
Johannes Pekkila
981331e7d7 Benchmark results now written out to a file 2019-10-24 15:53:08 +02:00
Johannes Pekkila
4ffde83215 Set default values for benchmarking 2019-10-24 15:22:47 +02:00
Johannes Pekkila
8894b7c7d6 Added a function for getting pid of a neighboring process when decomposing in 3D 2019-10-23 19:26:35 +02:00
Johannes Pekkila
474bdf185d Cleaned up the MPI solution for 3D decomp test 2019-10-23 12:33:46 +02:00