185b33980fForcing function bug correction.
Miikka Vaisala
2020-01-14 13:58:11 +08:00
ae0163b0e5Missed one
jpekkila
2020-01-13 21:52:58 +02:00
5e1500fe97Happy new year! :)
jpekkila
2020-01-13 21:38:07 +02:00
81aeff8b78Updated the licence and made it .md
jpekkila
2020-01-13 21:35:14 +02:00
d51d48071fUpdated documentation and made it work with Doxygen. Now the doc/doxygen/index.html generated with it looks quite good and contains lots of useful and up-to-date information about Astaroth
jpekkila
2020-01-13 21:11:04 +02:00
a6cf5a8b79CONTRIBUTING.md edited online with Bitbucket
jpekkila
2020-01-13 16:39:26 +00:00
d01e20a3d9README.md edited online with Bitbucket. Now the links work (had to append markdown-header-* to the link)
jpekkila
2020-01-13 16:34:57 +00:00
a85a9614e6README.md edited online with Bitbucket. Now it's gotta work.
jpekkila
2020-01-13 16:30:47 +00:00
cc933a0949README.md edited online with Bitbucket. Consistent headings and another attempt and linking.
jpekkila
2020-01-13 16:26:06 +00:00
b6451c4b82Fixed hyperlinks in README.md
jpekkila
2020-01-13 16:22:22 +00:00
74f68d4371CONTRIBUTING.md created online with Bitbucket
jpekkila
2020-01-13 16:16:55 +00:00
bd640a8ff5Removed unnecessary linebreaks from README.md.
jpekkila
2020-01-13 15:31:05 +00:00
92a6a1bdecAdded more professional run flags to ./ac_run
jpekkila
2020-01-13 15:35:01 +02:00
794e4393c3Added a new function for the legacy Astaroth layer: acGetNode(). This functions returns a Node, which can be used to access acNode layer functions
jpekkila
2020-01-13 11:33:15 +02:00
1d315732e0Giving up on 3D decomposition with CUDA-aware MPI. The MPI implementation on Puhti seems to be painfully bugged, the device pointers are not tracked properly in some cases (f.ex. if there's an array of structures which contain CUDA pointers). Going to implement 3D decomp the traditional way for now (communicating via the CPU). It's easy to switch to CUDA-aware MPI once Mellanox/NVIDIA/CSC have fixed their software.
jpekkila
2020-01-07 21:06:22 +02:00
299ff5cb67All fields are now packed to simplify communication
jpekkila
2020-01-07 21:01:22 +02:00
5d60791f13Current 3D decomp method still too complicated. Starting again from scratch.
jpekkila
2020-01-07 14:40:32 +02:00
eaee81bf06Merge branch 'master' into 3d-decomposition-2020-01
jpekkila
2020-01-07 14:25:06 +02:00
f0208c66a6Now compiles also for P100 by default (was removed accidentally in earlier commits)
jpekkila
2020-01-07 10:29:44 +00:00
1dbcc469fcAllocations for packed data (MPI)
jpekkila
2020-01-05 18:57:14 +02:00
bee930b151Merge branch 'master' into 3d-decomposition-2020-01
jpekkila
2020-01-05 16:48:26 +02:00
be7946c2afAdded the multiplication operator for int3 structures
jpekkila
2020-01-05 16:47:28 +02:00
d6c81c89fbThis 3D blocking approach is getting too complicated, removed code and trying again
jpekkila
2019-12-28 16:38:10 +02:00
e86b082c98MPI transfer for the first corner with 3D blocking now complete. Disabled/enabled some error checking for development
jpekkila
2019-12-27 13:43:22 +02:00
bd0cc3ee20There was some kind of mismatch between CUDA and MPI (UCX) libraries when linking with cudart. Switching to provided by cmake fixed the issue.
jpekkila
2019-12-27 13:41:18 +02:00
6b5910f7dfAdded allocations for the packed buffers
jpekkila
2019-12-21 19:00:35 +02:00
57a1f3e30cAdded a generic pack/unpack function
jpekkila
2019-12-21 16:20:40 +02:00
e4f7214b3abenchmark.cc edited online with Bitbucket
jpekkila
2019-12-21 11:26:54 +00:00
3ecd47fe8bMerge branch 'master' into 3d-decomposition-2020-01
jpekkila
2019-12-21 13:22:45 +02:00
35b56029cfBuild failed with single-precision, added the correct casts to modelsolver.c
jpekkila
2019-12-21 13:21:56 +02:00
4d873caf38Changed utils CMakeList.txt to modern cmake style
jpekkila
2019-12-21 13:16:08 +02:00
bad64f5307Started the 3D decomposition branch. Four tasks: 1) Determine how to distribute the work given n processes 2) Distribute and gather the mesh to/from these processes 3) Create packing/unpacking functions and 4) Transfer packed data blocks between neighbors. Tasks 1 and 2 done with this commit.
jpekkila
2019-12-21 12:37:01 +02:00
ecff5c3041Added some final changes to benchmarking
jpekkila
2019-12-15 21:47:41 +02:00
8bd81db63cAdded CPU parallelization to make CPU integration and boundconds faster
jpekkila
2019-12-14 15:45:42 +02:00
ff35d78509Rewrote the MPI benchmark-verification function
jpekkila
2019-12-14 15:26:19 +02:00
b8a997b0abAdded code for doing a proper verification run with MPI. Passes nicely with full MHD + upwinding when using the new utility stuff introduced in the previous commits. Note: forcing is not enabled in the utility library by default.
jpekkila
2019-12-14 07:37:59 +02:00
277905aafbAdded a model integrator to the utility library (written in pure C). Requires support for AVX vector instructions.
jpekkila
2019-12-14 07:34:33 +02:00
22a3105068Finished the latest version of autotesting (utility library). Uses ulps to determine the acceptable error instead of the relative error used previously
jpekkila
2019-12-14 07:27:11 +02:00
5ec2f6ad75Better wording in config_loader.c
jpekkila
2019-12-14 07:23:25 +02:00
164d11bfcaRemoved flush-to-zero flags from kernel compilation. No significant effect on performance but may affect accuracy in some cases
jpekkila
2019-12-14 07:22:14 +02:00
6b38ef461aPuhti GPUDirect fails for some reason if the cuda library is linked with instead of cudart
jpekkila
2019-12-11 17:26:21 +02:00
752f44b0a7Second attempt at getting bitbucket to compile
jpekkila
2019-12-08 23:22:33 +02:00
420f8b9e06MPI benchmark now writes out the 95th percentile instead of average running time
jpekkila
2019-12-08 23:12:23 +02:00
90f85069c6Bitbucket pipelines building fails because the CUDA include dir does not seem to be included for some reason. This is an attempted fix
jpekkila
2019-12-08 23:08:45 +02:00
2ab605e125Added the default testcase for MPI benchmarks
jpekkila
2019-12-05 18:14:36 +02:00
d136834219Re-enabled and updated MPI integration with the proper synchronization from earlier commits, removed old stuff. Should now work and be ready for benchmarks
jpekkila
2019-12-05 16:48:45 +02:00
f16826f2cdRemoved old code
jpekkila
2019-12-05 16:40:48 +02:00
9f4742bafeFixed the UCX warning from the last commit. Indexing of MPI_Waitall was wrong and also UCX required that MPI_Isend is also "waited" even though it should implicitly complete at the same time with MPI_Irecv
jpekkila
2019-12-05 16:40:30 +02:00
e47cfad6b5MPI now compiles and runs on Puhti, basic verification test with boundary transfers OK. Gives an "UCX WARN object 0x2fa7780 was not returned to mpool ucp_requests" warning though which seems to indicate that not all asynchronous MPI calls finished before MPI_Finalize
jpekkila
2019-12-05 16:15:37 +02:00
9d70a29ae0Now the minimum cmake version is 3.9. This is required for proper CUDA & MPI support. Older versions of cmake are very buggy when compiling cuda and it's a pain in the neck to try and work around all the quirks.
jpekkila
2019-12-05 15:35:51 +02:00
e99a428decOpenMP is now properly linked with the standalone without propagating it to nvcc (which would cause an error)
jpekkila
2019-12-05 15:30:48 +02:00
9adb9dc38aDisabled MPI integration temporarily and enabled verification for MPI tests
jpekkila
2019-12-04 15:11:40 +02:00
6a250f0572Rewrote core CMakeLists.txt for cmake versions with proper CUDA & MPI support (3.9+)
jpekkila
2019-12-04 15:09:38 +02:00
0ea2fa9337Cleaner MPI linking with the core library. Requires cmake 3.9+ though, might have to modify later to work with older versions.
jpekkila
2019-12-04 13:49:38 +02:00
6e63411170Moved the definition of AC_DEFAULT_CONFIG to the root-level CMakeLists.txt. Now should be visible throughout the project.
jpekkila
2019-12-03 18:42:49 +02:00
f97e5cb77cFixed parts which caused a shadowing warning (same variable name used for different variables in the same scope)
jpekkila
2019-12-03 18:41:08 +02:00
04e27e85b2Removed MPI from the core library dependencies: instead one should use the appropriate mpi compiler for compiling host code by passing something like -DCMAKE_C_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicc -DCMAKE_CXX_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicxx to cmake
jpekkila
2019-12-03 18:40:15 +02:00
f14e35620cNow nvcc is used to compile kernels only. All host code, incl. device.cc, MPI communication and others are now compiled with the host C++ compiler. This should work around an nvcc/MPI bug on Puhti.
jpekkila
2019-12-03 15:12:17 +02:00
8bffb2a1d0Fixed ambiguous logic in acNodeStoreVertexBufferWithOffset, now halos of arbitrary GPUs do not overwrite valid data from the computational domain of a neighboring GPU. Also disabled p2p transfers temporarily until I figure out a clean way to avoid cudaErrorPeerAccessAlreadyEnabled errors
jpekkila
2019-12-02 12:58:09 +02:00
0178d4788cThe core library now links to the CXX MPI library instead of the C one
jpekkila
2019-11-27 14:51:49 +02:00
ab539a98d6Replaced old deprecated instances of DCONST_INT with DCONST
jpekkila
2019-11-27 13:48:42 +02:00
1270332f48Fixed a small mistake in the last merge
jpekkila
2019-11-27 11:58:14 +02:00
3d35897601The structure holding an abstract syntax tree node (acc) was not properly initialized to 0, fixed
Johannes Pekkila
2019-11-27 09:16:32 +01:00
5e3caf086eDevice id is now properly set when using MPI and there are multiple visible GPUs per node
jpekkila
2019-11-26 16:54:56 +02:00
53695d66a3Benchmarking now prints out also percentiles
jpekkila
2019-11-26 16:26:31 +02:00
0b0ccd697aAdded some explicit casts in get_neighbor (MPI) to fix warnings raised when compiling with older gcc
jpekkila
2019-11-20 10:18:10 +02:00
d3260edd2aCan now picture the magnetic field and streamlines. And some other minor improvements.
Miikka Vaisala
2019-11-04 11:27:53 +08:00
981331e7d7Benchmark results now written out to a file
Johannes Pekkila
2019-10-24 15:53:08 +02:00
4ffde83215Set default values for benchmarking
Johannes Pekkila
2019-10-24 15:22:47 +02:00
8894b7c7d6Added a function for getting pid of a neighboring process when decomposing in 3D
Johannes Pekkila
2019-10-23 19:26:35 +02:00
474bdf185dCleaned up the MPI solution for 3D decomp test
Johannes Pekkila
2019-10-23 12:33:46 +02:00
1d81333ff7More concurrent kernels and MPI comm
Johannes Pekkila
2019-10-23 12:07:23 +02:00
04867334e7Full integration step with MPI comms
Johannes Pekkila
2019-10-22 19:59:15 +02:00
870cd91b5fAdded the final MPI solution for the benchmark tests: RDMA is now used and I don't think we can go much faster with the current decomposition scheme. To get better scaling, we probably would have to change 3D decomposition instead of using the current simple 1D decomp
Johannes Pekkila
2019-10-22 19:28:35 +02:00
64221c218dMade some warnings go away
jpekkila
2019-10-22 15:03:55 +03:00
e4a7cdcf1dAdded functions for packing and unpacking data on the device
Johannes Pekkila
2019-10-22 13:48:47 +02:00
915e1c7c14Trying to overlap MPI communication with computation of boundary conditions. However, NVIDIA seemed to forget one important detail in the documentation for CUDA-aware MPI: it looks like CUDA streams are not supported with CUDA-aware MPI communication. So in the end the fastest solution might be to use old-school gpu->cpu->cpu->gpu MPI communication after all
Johannes Pekkila
2019-10-21 15:50:53 +02:00
f120343110Bugfix: peer access was not disabled when Node was destroyed, leading to cudaErrorPeerAccessAlreadyEnabled error when creating new Nodes
jpekkila
2019-10-21 16:23:24 +03:00
7b475b6deeBetter MPI synchronization
Johannes Pekkila
2019-10-18 11:50:22 +02:00
f3cb6e7049Removed old unused tokens from the DSL grammar
jpekkila
2019-10-18 02:14:19 +03:00
7c79a98cdcAdded support for various binary operations (>=, <=, /= etc). Also bitwise operators | and & are now allowed
jpekkila
2019-10-18 01:52:14 +03:00
155d369888MPI communication now 10x faster
Johannes Pekkila
2019-10-17 22:39:57 +02:00