astaroth

cwpearson/astaroth

Fork 0

0676d27761 Moved compile_acc_module.sh from scripts to the acc directory jpekkila 2020-01-14 21:44:27 +02:00
8dbeb9b654 Rewrote acc/README.md jpekkila 2020-01-14 21:37:56 +02:00
25180c00b3 Formatting fixes jpekkila 2020-01-14 15:25:50 +02:00
37cafd26aa Various small improvements to the website (navigation panel, better headings, formatting, etc) jpekkila 2020-01-14 14:44:06 +02:00
d947bdccb8 General purpose Python tool improvements. Miikka Vaisala 2020-01-14 14:23:24 +08:00
185b33980f Forcing function bug correction. Miikka Vaisala 2020-01-14 13:58:11 +08:00
ae0163b0e5 Missed one jpekkila 2020-01-13 21:52:58 +02:00
5e1500fe97 Happy new year! :) jpekkila 2020-01-13 21:38:07 +02:00
81aeff8b78 Updated the licence and made it .md jpekkila 2020-01-13 21:35:14 +02:00
d51d48071f Updated documentation and made it work with Doxygen. Now the doc/doxygen/index.html generated with it looks quite good and contains lots of useful and up-to-date information about Astaroth jpekkila 2020-01-13 21:11:04 +02:00
a6cf5a8b79 CONTRIBUTING.md edited online with Bitbucket jpekkila 2020-01-13 16:39:26 +00:00
d01e20a3d9 README.md edited online with Bitbucket. Now the links work (had to append markdown-header-* to the link) jpekkila 2020-01-13 16:34:57 +00:00
a85a9614e6 README.md edited online with Bitbucket. Now it's gotta work. jpekkila 2020-01-13 16:30:47 +00:00
cc933a0949 README.md edited online with Bitbucket. Consistent headings and another attempt and linking. jpekkila 2020-01-13 16:26:06 +00:00
b6451c4b82 Fixed hyperlinks in README.md jpekkila 2020-01-13 16:22:22 +00:00
74f68d4371 CONTRIBUTING.md created online with Bitbucket jpekkila 2020-01-13 16:16:55 +00:00
bd640a8ff5 Removed unnecessary linebreaks from README.md. jpekkila 2020-01-13 15:31:05 +00:00
785230053d Rewrote README.md jpekkila 2020-01-13 15:27:24 +00:00
92a6a1bdec Added more professional run flags to ./ac_run jpekkila 2020-01-13 15:35:01 +02:00
794e4393c3 Added a new function for the legacy Astaroth layer: acGetNode(). This functions returns a Node, which can be used to access acNode layer functions jpekkila 2020-01-13 11:33:15 +02:00
1d315732e0 Giving up on 3D decomposition with CUDA-aware MPI. The MPI implementation on Puhti seems to be painfully bugged, the device pointers are not tracked properly in some cases (f.ex. if there's an array of structures which contain CUDA pointers). Going to implement 3D decomp the traditional way for now (communicating via the CPU). It's easy to switch to CUDA-aware MPI once Mellanox/NVIDIA/CSC have fixed their software. jpekkila 2020-01-07 21:06:22 +02:00
299ff5cb67 All fields are now packed to simplify communication jpekkila 2020-01-07 21:01:22 +02:00
5d60791f13 Current 3D decomp method still too complicated. Starting again from scratch. jpekkila 2020-01-07 14:40:32 +02:00
eaee81bf06 Merge branch 'master' into 3d-decomposition-2020-01 jpekkila 2020-01-07 14:25:06 +02:00
f0208c66a6 Now compiles also for P100 by default (was removed accidentally in earlier commits) jpekkila 2020-01-07 10:29:44 +00:00
1dbcc469fc Allocations for packed data (MPI) jpekkila 2020-01-05 18:57:14 +02:00
bee930b151 Merge branch 'master' into 3d-decomposition-2020-01 jpekkila 2020-01-05 16:48:26 +02:00
be7946c2af Added the multiplication operator for int3 structures jpekkila 2020-01-05 16:47:28 +02:00
51b48a5a36 Some intermediate MPI changes jpekkila 2020-01-05 16:46:40 +02:00
d6c81c89fb This 3D blocking approach is getting too complicated, removed code and trying again jpekkila 2019-12-28 16:38:10 +02:00
e86b082c98 MPI transfer for the first corner with 3D blocking now complete. Disabled/enabled some error checking for development jpekkila 2019-12-27 13:43:22 +02:00
bd0cc3ee20 There was some kind of mismatch between CUDA and MPI (UCX) libraries when linking with cudart. Switching to provided by cmake fixed the issue. jpekkila 2019-12-27 13:41:18 +02:00
6b5910f7df Added allocations for the packed buffers jpekkila 2019-12-21 19:00:35 +02:00
57a1f3e30c Added a generic pack/unpack function jpekkila 2019-12-21 16:20:40 +02:00
e4f7214b3a benchmark.cc edited online with Bitbucket jpekkila 2019-12-21 11:26:54 +00:00
3ecd47fe8b Merge branch 'master' into 3d-decomposition-2020-01 jpekkila 2019-12-21 13:22:45 +02:00
35b56029cf Build failed with single-precision, added the correct casts to modelsolver.c jpekkila 2019-12-21 13:21:56 +02:00
4d873caf38 Changed utils CMakeList.txt to modern cmake style jpekkila 2019-12-21 13:16:08 +02:00
bad64f5307 Started the 3D decomposition branch. Four tasks: 1) Determine how to distribute the work given n processes 2) Distribute and gather the mesh to/from these processes 3) Create packing/unpacking functions and 4) Transfer packed data blocks between neighbors. Tasks 1 and 2 done with this commit. jpekkila 2019-12-21 12:37:01 +02:00
ecff5c3041 Added some final changes to benchmarking jpekkila 2019-12-15 21:47:41 +02:00
8bd81db63c Added CPU parallelization to make CPU integration and boundconds faster jpekkila 2019-12-14 15:45:42 +02:00
ff35d78509 Rewrote the MPI benchmark-verification function jpekkila 2019-12-14 15:26:19 +02:00
f0e77181df Benchmark finetuning jpekkila 2019-12-14 14:52:06 +02:00
b8a997b0ab Added code for doing a proper verification run with MPI. Passes nicely with full MHD + upwinding when using the new utility stuff introduced in the previous commits. Note: forcing is not enabled in the utility library by default. jpekkila 2019-12-14 07:37:59 +02:00
277905aafb Added a model integrator to the utility library (written in pure C). Requires support for AVX vector instructions. jpekkila 2019-12-14 07:34:33 +02:00
22a3105068 Finished the latest version of autotesting (utility library). Uses ulps to determine the acceptable error instead of the relative error used previously jpekkila 2019-12-14 07:27:11 +02:00
5ec2f6ad75 Better wording in config_loader.c jpekkila 2019-12-14 07:23:25 +02:00
164d11bfca Removed flush-to-zero flags from kernel compilation. No significant effect on performance but may affect accuracy in some cases jpekkila 2019-12-14 07:22:14 +02:00
6b38ef461a Puhti GPUDirect fails for some reason if the cuda library is linked with instead of cudart jpekkila 2019-12-11 17:26:21 +02:00
a1a2d838ea Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth jpekkila 2019-12-08 23:22:51 +02:00
752f44b0a7 Second attempt at getting bitbucket to compile jpekkila 2019-12-08 23:22:33 +02:00
420f8b9e06 MPI benchmark now writes out the 95th percentile instead of average running time jpekkila 2019-12-08 23:12:23 +02:00
90f85069c6 Bitbucket pipelines building fails because the CUDA include dir does not seem to be included for some reason. This is an attempted fix jpekkila 2019-12-08 23:08:45 +02:00
2ab605e125 Added the default testcase for MPI benchmarks jpekkila 2019-12-05 18:14:36 +02:00
d136834219 Re-enabled and updated MPI integration with the proper synchronization from earlier commits, removed old stuff. Should now work and be ready for benchmarks jpekkila 2019-12-05 16:48:45 +02:00
f16826f2cd Removed old code jpekkila 2019-12-05 16:40:48 +02:00
9f4742bafe Fixed the UCX warning from the last commit. Indexing of MPI_Waitall was wrong and also UCX required that MPI_Isend is also "waited" even though it should implicitly complete at the same time with MPI_Irecv jpekkila 2019-12-05 16:40:30 +02:00
e47cfad6b5 MPI now compiles and runs on Puhti, basic verification test with boundary transfers OK. Gives an "UCX WARN object 0x2fa7780 was not returned to mpool ucp_requests" warning though which seems to indicate that not all asynchronous MPI calls finished before MPI_Finalize jpekkila 2019-12-05 16:15:37 +02:00
9d70a29ae0 Now the minimum cmake version is 3.9. This is required for proper CUDA & MPI support. Older versions of cmake are very buggy when compiling cuda and it's a pain in the neck to try and work around all the quirks. jpekkila 2019-12-05 15:35:51 +02:00
e99a428dec OpenMP is now properly linked with the standalone without propagating it to nvcc (which would cause an error) jpekkila 2019-12-05 15:30:48 +02:00
9adb9dc38a Disabled MPI integration temporarily and enabled verification for MPI tests jpekkila 2019-12-04 15:11:40 +02:00
6a250f0572 Rewrote core CMakeLists.txt for cmake versions with proper CUDA & MPI support (3.9+) jpekkila 2019-12-04 15:09:38 +02:00
0ea2fa9337 Cleaner MPI linking with the core library. Requires cmake 3.9+ though, might have to modify later to work with older versions. jpekkila 2019-12-04 13:49:38 +02:00
6e63411170 Moved the definition of AC_DEFAULT_CONFIG to the root-level CMakeLists.txt. Now should be visible throughout the project. jpekkila 2019-12-03 18:42:49 +02:00
f97e5cb77c Fixed parts which caused a shadowing warning (same variable name used for different variables in the same scope) jpekkila 2019-12-03 18:41:08 +02:00
04e27e85b2 Removed MPI from the core library dependencies: instead one should use the appropriate mpi compiler for compiling host code by passing something like -DCMAKE_C_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicc -DCMAKE_CXX_COMPILER=/appl/opt/openmpi/3.1.3-cuda/gcc/7.3.0/bin/mpicxx to cmake jpekkila 2019-12-03 18:40:15 +02:00
c273fcf110 More rigorous error checking jpekkila 2019-12-03 18:38:15 +02:00
49581e8eaa Added forward declaration for yyparse to avoid warnings with some compilers when compiling acc jpekkila 2019-12-03 18:36:21 +02:00
825aa0efaa More warning flags for host code in the core library + small misc changes jpekkila 2019-12-03 16:58:20 +02:00
316d44b843 Fixed an out-of-bounds error with auto-optimization (introduced in the last few commits) jpekkila 2019-12-03 16:04:44 +02:00
7e4212ddd9 Enabled the generation of API hooks for calling DSL functions (was messing up with compilation earlier) jpekkila 2019-12-03 15:17:27 +02:00
5a6a3110df Reformatted jpekkila 2019-12-03 15:14:26 +02:00
f14e35620c Now nvcc is used to compile kernels only. All host code, incl. device.cc, MPI communication and others are now compiled with the host C++ compiler. This should work around an nvcc/MPI bug on Puhti. jpekkila 2019-12-03 15:12:17 +02:00
8bffb2a1d0 Fixed ambiguous logic in acNodeStoreVertexBufferWithOffset, now halos of arbitrary GPUs do not overwrite valid data from the computational domain of a neighboring GPU. Also disabled p2p transfers temporarily until I figure out a clean way to avoid cudaErrorPeerAccessAlreadyEnabled errors jpekkila 2019-12-02 12:58:09 +02:00
0178d4788c The core library now links to the CXX MPI library instead of the C one jpekkila 2019-11-27 14:51:49 +02:00
ab539a98d6 Replaced old deprecated instances of DCONST_INT with DCONST jpekkila 2019-11-27 13:48:42 +02:00
1270332f48 Fixed a small mistake in the last merge jpekkila 2019-11-27 11:58:14 +02:00
3d35897601 The structure holding an abstract syntax tree node (acc) was not properly initialized to 0, fixed Johannes Pekkila 2019-11-27 09:16:32 +01:00
3eabf94f92 Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth Johannes Pekkila 2019-11-27 08:55:23 +01:00
5e3caf086e Device id is now properly set when using MPI and there are multiple visible GPUs per node jpekkila 2019-11-26 16:54:56 +02:00
53695d66a3 Benchmarking now prints out also percentiles jpekkila 2019-11-26 16:26:31 +02:00
0b0ccd697a Added some explicit casts in get_neighbor (MPI) to fix warnings raised when compiling with older gcc jpekkila 2019-11-20 10:18:10 +02:00
d3260edd2a Can now picture the magnetic field and streamlines. And some other minor improvements. Miikka Vaisala 2019-11-04 11:27:53 +08:00
981331e7d7 Benchmark results now written out to a file Johannes Pekkila 2019-10-24 15:53:08 +02:00
4ffde83215 Set default values for benchmarking Johannes Pekkila 2019-10-24 15:22:47 +02:00
8894b7c7d6 Added a function for getting pid of a neighboring process when decomposing in 3D Johannes Pekkila 2019-10-23 19:26:35 +02:00
474bdf185d Cleaned up the MPI solution for 3D decomp test Johannes Pekkila 2019-10-23 12:33:46 +02:00
1d81333ff7 More concurrent kernels and MPI comm Johannes Pekkila 2019-10-23 12:07:23 +02:00
04867334e7 Full integration step with MPI comms Johannes Pekkila 2019-10-22 19:59:15 +02:00
870cd91b5f Added the final MPI solution for the benchmark tests: RDMA is now used and I don't think we can go much faster with the current decomposition scheme. To get better scaling, we probably would have to change 3D decomposition instead of using the current simple 1D decomp Johannes Pekkila 2019-10-22 19:28:35 +02:00
3d7ad7c8f2 Code cleanup jpekkila 2019-10-22 15:38:34 +03:00
64221c218d Made some warnings go away jpekkila 2019-10-22 15:03:55 +03:00
e4a7cdcf1d Added functions for packing and unpacking data on the device Johannes Pekkila 2019-10-22 13:48:47 +02:00
915e1c7c14 Trying to overlap MPI communication with computation of boundary conditions. However, NVIDIA seemed to forget one important detail in the documentation for CUDA-aware MPI: it looks like CUDA streams are not supported with CUDA-aware MPI communication. So in the end the fastest solution might be to use old-school gpu->cpu->cpu->gpu MPI communication after all Johannes Pekkila 2019-10-21 15:50:53 +02:00
f120343110 Bugfix: peer access was not disabled when Node was destroyed, leading to cudaErrorPeerAccessAlreadyEnabled error when creating new Nodes jpekkila 2019-10-21 16:23:24 +03:00
7b475b6dee Better MPI synchronization Johannes Pekkila 2019-10-18 11:50:22 +02:00
f3cb6e7049 Removed old unused tokens from the DSL grammar jpekkila 2019-10-18 02:14:19 +03:00
0f5acfbb33 <q:::qqq!!!:::q:[2~:wqMer§§gccc:qq[2~: branch 'master' of https://bitbucket.org/jpekkila/astaroth:q Z bin/sh: 1: !:: not .>.Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth jpekkila 2019-10-18 01:52:30 +03:00
7c79a98cdc Added support for various binary operations (>=, <=, /= etc). Also bitwise operators | and & are now allowed jpekkila 2019-10-18 01:52:14 +03:00
155d369888 MPI communication now 10x faster Johannes Pekkila 2019-10-17 22:39:57 +02:00

Commit Graph Select branches Hide Pull Requests gaussian_explosion Mono Color

Commit Graph

Select branches

Hide Pull Requests

gaussian_explosion