jpekkila
|
cc933a0949
|
README.md edited online with Bitbucket. Consistent headings and another attempt and linking.
|
2020-01-13 16:26:06 +00:00 |
|
jpekkila
|
b6451c4b82
|
Fixed hyperlinks in README.md
|
2020-01-13 16:22:22 +00:00 |
|
jpekkila
|
74f68d4371
|
CONTRIBUTING.md created online with Bitbucket
|
2020-01-13 16:16:55 +00:00 |
|
jpekkila
|
bd640a8ff5
|
Removed unnecessary linebreaks from README.md.
|
2020-01-13 15:31:05 +00:00 |
|
jpekkila
|
785230053d
|
Rewrote README.md
|
2020-01-13 15:27:24 +00:00 |
|
jpekkila
|
92a6a1bdec
|
Added more professional run flags to ./ac_run
|
2020-01-13 15:35:01 +02:00 |
|
jpekkila
|
794e4393c3
|
Added a new function for the legacy Astaroth layer: acGetNode(). This functions returns a Node, which can be used to access acNode layer functions
|
2020-01-13 11:33:15 +02:00 |
|
jpekkila
|
1d315732e0
|
Giving up on 3D decomposition with CUDA-aware MPI. The MPI implementation on Puhti seems to be painfully bugged, the device pointers are not tracked properly in some cases (f.ex. if there's an array of structures which contain CUDA pointers). Going to implement 3D decomp the traditional way for now (communicating via the CPU). It's easy to switch to CUDA-aware MPI once Mellanox/NVIDIA/CSC have fixed their software.
|
2020-01-07 21:06:27 +02:00 |
|
jpekkila
|
299ff5cb67
|
All fields are now packed to simplify communication
|
2020-01-07 21:01:22 +02:00 |
|
jpekkila
|
5d60791f13
|
Current 3D decomp method still too complicated. Starting again from scratch.
|
2020-01-07 14:40:51 +02:00 |
|
jpekkila
|
eaee81bf06
|
Merge branch 'master' into 3d-decomposition-2020-01
|
2020-01-07 14:25:06 +02:00 |
|
jpekkila
|
f0208c66a6
|
Now compiles also for P100 by default (was removed accidentally in earlier commits)
|
2020-01-07 10:29:44 +00:00 |
|
jpekkila
|
1dbcc469fc
|
Allocations for packed data (MPI)
|
2020-01-05 18:57:14 +02:00 |
|
jpekkila
|
bee930b151
|
Merge branch 'master' into 3d-decomposition-2020-01
|
2020-01-05 16:48:26 +02:00 |
|
jpekkila
|
be7946c2af
|
Added the multiplication operator for int3 structures
|
2020-01-05 16:47:28 +02:00 |
|
jpekkila
|
51b48a5a36
|
Some intermediate MPI changes
|
2020-01-05 16:46:40 +02:00 |
|
jpekkila
|
d6c81c89fb
|
This 3D blocking approach is getting too complicated, removed code and trying again
|
2019-12-28 16:38:10 +02:00 |
|
jpekkila
|
e86b082c98
|
MPI transfer for the first corner with 3D blocking now complete. Disabled/enabled some error checking for development
|
2019-12-27 13:43:22 +02:00 |
|
jpekkila
|
bd0cc3ee20
|
There was some kind of mismatch between CUDA and MPI (UCX) libraries when linking with cudart. Switching to provided by cmake fixed the issue.
|
2019-12-27 13:41:18 +02:00 |
|
jpekkila
|
6b5910f7df
|
Added allocations for the packed buffers
|
2019-12-21 19:00:35 +02:00 |
|
jpekkila
|
57a1f3e30c
|
Added a generic pack/unpack function
|
2019-12-21 16:20:40 +02:00 |
|
jpekkila
|
e4f7214b3a
|
benchmark.cc edited online with Bitbucket
|
2019-12-21 11:26:54 +00:00 |
|
jpekkila
|
3ecd47fe8b
|
Merge branch 'master' into 3d-decomposition-2020-01
|
2019-12-21 13:22:45 +02:00 |
|
jpekkila
|
35b56029cf
|
Build failed with single-precision, added the correct casts to modelsolver.c
|
2019-12-21 13:21:56 +02:00 |
|
jpekkila
|
4d873caf38
|
Changed utils CMakeList.txt to modern cmake style
|
2019-12-21 13:16:08 +02:00 |
|
jpekkila
|
bad64f5307
|
Started the 3D decomposition branch. Four tasks: 1) Determine how to distribute the work given n processes 2) Distribute and gather the mesh to/from these processes 3) Create packing/unpacking functions and 4) Transfer packed data blocks between neighbors. Tasks 1 and 2 done with this commit.
|
2019-12-21 12:37:01 +02:00 |
|
jpekkila
|
ecff5c3041
|
Added some final changes to benchmarking
|
2019-12-15 21:47:41 +02:00 |
|
jpekkila
|
8bd81db63c
|
Added CPU parallelization to make CPU integration and boundconds faster
|
2019-12-14 15:45:42 +02:00 |
|
jpekkila
|
ff35d78509
|
Rewrote the MPI benchmark-verification function
|
2019-12-14 15:26:19 +02:00 |
|
jpekkila
|
f0e77181df
|
Benchmark finetuning
|
2019-12-14 14:52:06 +02:00 |
|
jpekkila
|
b8a997b0ab
|
Added code for doing a proper verification run with MPI. Passes nicely with full MHD + upwinding when using the new utility stuff introduced in the previous commits. Note: forcing is not enabled in the utility library by default.
|
2019-12-14 07:37:59 +02:00 |
|
jpekkila
|
277905aafb
|
Added a model integrator to the utility library (written in pure C). Requires support for AVX vector instructions.
|
2019-12-14 07:34:33 +02:00 |
|
jpekkila
|
22a3105068
|
Finished the latest version of autotesting (utility library). Uses ulps to determine the acceptable error instead of the relative error used previously
|
2019-12-14 07:27:11 +02:00 |
|
jpekkila
|
5ec2f6ad75
|
Better wording in config_loader.c
|
2019-12-14 07:23:25 +02:00 |
|
jpekkila
|
164d11bfca
|
Removed flush-to-zero flags from kernel compilation. No significant effect on performance but may affect accuracy in some cases
|
2019-12-14 07:22:14 +02:00 |
|
jpekkila
|
6b38ef461a
|
Puhti GPUDirect fails for some reason if the cuda library is linked with instead of cudart
|
2019-12-11 17:26:21 +02:00 |
|
jpekkila
|
a1a2d838ea
|
Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth
|
2019-12-08 23:22:51 +02:00 |
|
jpekkila
|
752f44b0a7
|
Second attempt at getting bitbucket to compile
|
2019-12-08 23:22:33 +02:00 |
|
jpekkila
|
420f8b9e06
|
MPI benchmark now writes out the 95th percentile instead of average running time
|
2019-12-08 23:12:23 +02:00 |
|
jpekkila
|
90f85069c6
|
Bitbucket pipelines building fails because the CUDA include dir does not seem to be included for some reason. This is an attempted fix
|
2019-12-08 23:08:45 +02:00 |
|
jpekkila
|
2ab605e125
|
Added the default testcase for MPI benchmarks
|
2019-12-05 18:14:36 +02:00 |
|
jpekkila
|
d136834219
|
Re-enabled and updated MPI integration with the proper synchronization from earlier commits, removed old stuff. Should now work and be ready for benchmarks
|
2019-12-05 16:48:45 +02:00 |
|
jpekkila
|
f16826f2cd
|
Removed old code
|
2019-12-05 16:40:48 +02:00 |
|
jpekkila
|
9f4742bafe
|
Fixed the UCX warning from the last commit. Indexing of MPI_Waitall was wrong and also UCX required that MPI_Isend is also "waited" even though it should implicitly complete at the same time with MPI_Irecv
|
2019-12-05 16:40:30 +02:00 |
|
jpekkila
|
e47cfad6b5
|
MPI now compiles and runs on Puhti, basic verification test with boundary transfers OK. Gives an "UCX WARN object 0x2fa7780 was not returned to mpool ucp_requests" warning though which seems to indicate that not all asynchronous MPI calls finished before MPI_Finalize
|
2019-12-05 16:17:17 +02:00 |
|
jpekkila
|
9d70a29ae0
|
Now the minimum cmake version is 3.9. This is required for proper CUDA & MPI support. Older versions of cmake are very buggy when compiling cuda and it's a pain in the neck to try and work around all the quirks.
|
2019-12-05 15:35:51 +02:00 |
|
jpekkila
|
e99a428dec
|
OpenMP is now properly linked with the standalone without propagating it to nvcc (which would cause an error)
|
2019-12-05 15:30:48 +02:00 |
|
jpekkila
|
9adb9dc38a
|
Disabled MPI integration temporarily and enabled verification for MPI tests
|
2019-12-04 15:11:40 +02:00 |
|
jpekkila
|
6a250f0572
|
Rewrote core CMakeLists.txt for cmake versions with proper CUDA & MPI support (3.9+)
|
2019-12-04 15:09:38 +02:00 |
|
jpekkila
|
0ea2fa9337
|
Cleaner MPI linking with the core library. Requires cmake 3.9+ though, might have to modify later to work with older versions.
|
2019-12-04 13:49:38 +02:00 |
|