jpekkila
|
d6c81c89fb
|
This 3D blocking approach is getting too complicated, removed code and trying again
|
2019-12-28 16:38:10 +02:00 |
|
jpekkila
|
e86b082c98
|
MPI transfer for the first corner with 3D blocking now complete. Disabled/enabled some error checking for development
|
2019-12-27 13:43:22 +02:00 |
|
jpekkila
|
bd0cc3ee20
|
There was some kind of mismatch between CUDA and MPI (UCX) libraries when linking with cudart. Switching to provided by cmake fixed the issue.
|
2019-12-27 13:41:18 +02:00 |
|
jpekkila
|
6b5910f7df
|
Added allocations for the packed buffers
|
2019-12-21 19:00:35 +02:00 |
|
jpekkila
|
57a1f3e30c
|
Added a generic pack/unpack function
|
2019-12-21 16:20:40 +02:00 |
|
jpekkila
|
e4f7214b3a
|
benchmark.cc edited online with Bitbucket
|
2019-12-21 11:26:54 +00:00 |
|
jpekkila
|
3ecd47fe8b
|
Merge branch 'master' into 3d-decomposition-2020-01
|
2019-12-21 13:22:45 +02:00 |
|
jpekkila
|
35b56029cf
|
Build failed with single-precision, added the correct casts to modelsolver.c
|
2019-12-21 13:21:56 +02:00 |
|
jpekkila
|
4d873caf38
|
Changed utils CMakeList.txt to modern cmake style
|
2019-12-21 13:16:08 +02:00 |
|
jpekkila
|
bad64f5307
|
Started the 3D decomposition branch. Four tasks: 1) Determine how to distribute the work given n processes 2) Distribute and gather the mesh to/from these processes 3) Create packing/unpacking functions and 4) Transfer packed data blocks between neighbors. Tasks 1 and 2 done with this commit.
|
2019-12-21 12:37:01 +02:00 |
|
jpekkila
|
ecff5c3041
|
Added some final changes to benchmarking
|
2019-12-15 21:47:41 +02:00 |
|
jpekkila
|
8bd81db63c
|
Added CPU parallelization to make CPU integration and boundconds faster
|
2019-12-14 15:45:42 +02:00 |
|
jpekkila
|
ff35d78509
|
Rewrote the MPI benchmark-verification function
|
2019-12-14 15:26:19 +02:00 |
|
jpekkila
|
f0e77181df
|
Benchmark finetuning
|
2019-12-14 14:52:06 +02:00 |
|
jpekkila
|
b8a997b0ab
|
Added code for doing a proper verification run with MPI. Passes nicely with full MHD + upwinding when using the new utility stuff introduced in the previous commits. Note: forcing is not enabled in the utility library by default.
|
2019-12-14 07:37:59 +02:00 |
|
jpekkila
|
277905aafb
|
Added a model integrator to the utility library (written in pure C). Requires support for AVX vector instructions.
|
2019-12-14 07:34:33 +02:00 |
|
jpekkila
|
22a3105068
|
Finished the latest version of autotesting (utility library). Uses ulps to determine the acceptable error instead of the relative error used previously
|
2019-12-14 07:27:11 +02:00 |
|
jpekkila
|
5ec2f6ad75
|
Better wording in config_loader.c
|
2019-12-14 07:23:25 +02:00 |
|
jpekkila
|
164d11bfca
|
Removed flush-to-zero flags from kernel compilation. No significant effect on performance but may affect accuracy in some cases
|
2019-12-14 07:22:14 +02:00 |
|
jpekkila
|
6b38ef461a
|
Puhti GPUDirect fails for some reason if the cuda library is linked with instead of cudart
|
2019-12-11 17:26:21 +02:00 |
|
jpekkila
|
a1a2d838ea
|
Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth
|
2019-12-08 23:22:51 +02:00 |
|
jpekkila
|
752f44b0a7
|
Second attempt at getting bitbucket to compile
|
2019-12-08 23:22:33 +02:00 |
|
jpekkila
|
420f8b9e06
|
MPI benchmark now writes out the 95th percentile instead of average running time
|
2019-12-08 23:12:23 +02:00 |
|
jpekkila
|
90f85069c6
|
Bitbucket pipelines building fails because the CUDA include dir does not seem to be included for some reason. This is an attempted fix
|
2019-12-08 23:08:45 +02:00 |
|
jpekkila
|
2ab605e125
|
Added the default testcase for MPI benchmarks
|
2019-12-05 18:14:36 +02:00 |
|
jpekkila
|
d136834219
|
Re-enabled and updated MPI integration with the proper synchronization from earlier commits, removed old stuff. Should now work and be ready for benchmarks
|
2019-12-05 16:48:45 +02:00 |
|
jpekkila
|
f16826f2cd
|
Removed old code
|
2019-12-05 16:40:48 +02:00 |
|
jpekkila
|
9f4742bafe
|
Fixed the UCX warning from the last commit. Indexing of MPI_Waitall was wrong and also UCX required that MPI_Isend is also "waited" even though it should implicitly complete at the same time with MPI_Irecv
|
2019-12-05 16:40:30 +02:00 |
|
jpekkila
|
e47cfad6b5
|
MPI now compiles and runs on Puhti, basic verification test with boundary transfers OK. Gives an "UCX WARN object 0x2fa7780 was not returned to mpool ucp_requests" warning though which seems to indicate that not all asynchronous MPI calls finished before MPI_Finalize
|
2019-12-05 16:17:17 +02:00 |
|
jpekkila
|
9d70a29ae0
|
Now the minimum cmake version is 3.9. This is required for proper CUDA & MPI support. Older versions of cmake are very buggy when compiling cuda and it's a pain in the neck to try and work around all the quirks.
|
2019-12-05 15:35:51 +02:00 |
|
jpekkila
|
e99a428dec
|
OpenMP is now properly linked with the standalone without propagating it to nvcc (which would cause an error)
|
2019-12-05 15:30:48 +02:00 |
|
jpekkila
|
9adb9dc38a
|
Disabled MPI integration temporarily and enabled verification for MPI tests
|
2019-12-04 15:11:40 +02:00 |
|
jpekkila
|
6a250f0572
|
Rewrote core CMakeLists.txt for cmake versions with proper CUDA & MPI support (3.9+)
|
2019-12-04 15:09:38 +02:00 |
|
jpekkila
|
0ea2fa9337
|
Cleaner MPI linking with the core library. Requires cmake 3.9+ though, might have to modify later to work with older versions.
|
2019-12-04 13:49:38 +02:00 |
|
jpekkila
|
6e63411170
|
Moved the definition of AC_DEFAULT_CONFIG to the root-level CMakeLists.txt. Now should be visible throughout the project.
|
2019-12-03 18:42:49 +02:00 |
|
jpekkila
|
f97e5cb77c
|
Fixed parts which caused a shadowing warning (same variable name used for different variables in the same scope)
|
2019-12-03 18:41:08 +02:00 |
|
jpekkila
|
04e27e85b2
|
Removed MPI from the core library dependencies: instead one should use the appropriate mpi compiler for compiling host code by passing something like -DCMAKE_C_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicc -DCMAKE_CXX_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicxx to cmake
|
2019-12-03 18:40:15 +02:00 |
|
jpekkila
|
c273fcf110
|
More rigorous error checking
|
2019-12-03 18:38:15 +02:00 |
|
jpekkila
|
49581e8eaa
|
Added forward declaration for yyparse to avoid warnings with some compilers when compiling acc
|
2019-12-03 18:36:21 +02:00 |
|
jpekkila
|
825aa0efaa
|
More warning flags for host code in the core library + small misc changes
|
2019-12-03 16:58:20 +02:00 |
|
jpekkila
|
316d44b843
|
Fixed an out-of-bounds error with auto-optimization (introduced in the last few commits)
|
2019-12-03 16:04:44 +02:00 |
|
jpekkila
|
7e4212ddd9
|
Enabled the generation of API hooks for calling DSL functions (was messing up with compilation earlier)
|
2019-12-03 15:17:27 +02:00 |
|
jpekkila
|
5a6a3110df
|
Reformatted
|
2019-12-03 15:14:26 +02:00 |
|
jpekkila
|
f14e35620c
|
Now nvcc is used to compile kernels only. All host code, incl. device.cc, MPI communication and others are now compiled with the host C++ compiler. This should work around an nvcc/MPI bug on Puhti.
|
2019-12-03 15:12:17 +02:00 |
|
jpekkila
|
8bffb2a1d0
|
Fixed ambiguous logic in acNodeStoreVertexBufferWithOffset, now halos of arbitrary GPUs do not overwrite valid data from the computational domain of a neighboring GPU. Also disabled p2p transfers temporarily until I figure out a clean way to avoid cudaErrorPeerAccessAlreadyEnabled errors
|
2019-12-02 12:58:09 +02:00 |
|
jpekkila
|
0178d4788c
|
The core library now links to the CXX MPI library instead of the C one
|
2019-11-27 14:51:49 +02:00 |
|
jpekkila
|
ab539a98d6
|
Replaced old deprecated instances of DCONST_INT with DCONST
|
2019-11-27 13:48:42 +02:00 |
|
jpekkila
|
1270332f48
|
Fixed a small mistake in the last merge
|
2019-11-27 11:58:14 +02:00 |
|
Johannes Pekkila
|
3d35897601
|
The structure holding an abstract syntax tree node (acc) was not properly initialized to 0, fixed
|
2019-11-27 09:16:32 +01:00 |
|
Johannes Pekkila
|
3eabf94f92
|
Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth
|
2019-11-27 08:55:23 +01:00 |
|