Johannes Pekkila
|
915e1c7c14
|
Trying to overlap MPI communication with computation of boundary conditions. However, NVIDIA seemed to forget one important detail in the documentation for CUDA-aware MPI: it looks like CUDA streams are not supported with CUDA-aware MPI communication. So in the end the fastest solution might be to use old-school gpu->cpu->cpu->gpu MPI communication after all
|
2019-10-21 15:50:53 +02:00 |
|
jpekkila
|
f120343110
|
Bugfix: peer access was not disabled when Node was destroyed, leading to cudaErrorPeerAccessAlreadyEnabled error when creating new Nodes
|
2019-10-21 16:23:24 +03:00 |
|
Johannes Pekkila
|
7b475b6dee
|
Better MPI synchronization
|
2019-10-18 11:50:22 +02:00 |
|
jpekkila
|
f3cb6e7049
|
Removed old unused tokens from the DSL grammar
|
2019-10-18 02:14:19 +03:00 |
|
jpekkila
|
0f5acfbb33
|
<q:::qqq!!!:::q:[2~:wqMer§§gccc:qq[2~: branch 'master' of
https://bitbucket.org/jpekkila/astaroth:q Z
bin/sh: 1: !:: not .>.Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth
|
2019-10-18 02:06:15 +03:00 |
|
jpekkila
|
7c79a98cdc
|
Added support for various binary operations (>=, <=, /= etc). Also bitwise operators | and & are now allowed
|
2019-10-18 01:52:14 +03:00 |
|
Johannes Pekkila
|
155d369888
|
MPI communication now 10x faster
|
2019-10-17 22:39:57 +02:00 |
|
jpekkila
|
26bbfa089d
|
Better multi-node communication: fire and forget.
|
2019-10-17 18:17:37 +03:00 |
|
jpekkila
|
3d852e5082
|
Added timing to the MPI benchmark
|
2019-10-17 17:43:54 +03:00 |
|
jpekkila
|
e0a631d81a
|
Added the hires timer to utils
|
2019-10-17 17:43:34 +03:00 |
|
jpekkila
|
588a94c772
|
Added more MPI stuff. Now multi-node GPU-GPU communication with GPUDirect RDMA should work. Also device memory is now allocated in unified memory by default as this makes MPI communication simpler if RDMA is not supported. This does not affect Astaroth any other way since different devices use different portions of the memory space and we continue managing memory transfers manually.
|
2019-10-17 16:09:05 +03:00 |
|
jpekkila
|
0e88d6c339
|
Marked some internal functions static
|
2019-10-17 14:41:44 +03:00 |
|
jpekkila
|
7390d53f79
|
Added missing extern Cs to verification.h
|
2019-10-17 14:41:13 +03:00 |
|
jpekkila
|
f1e988ba6a
|
Added stuff for the device layer for testing GPU-GPU MPI. This is a quick and dirty solution which is primarily meant for benchmarking/verification. Figuring out what the MPI interface should look like is more challenging and is not the priority right now
|
2019-10-17 14:40:53 +03:00 |
|
jpekkila
|
bb9e65a741
|
AC_DEFAULT_CONFIG now propagated to projects that link to astaroth utils
|
2019-10-17 13:05:17 +03:00 |
|
jpekkila
|
859195eda4
|
exampleproject no longer compiled with astaroth utils
|
2019-10-17 13:04:39 +03:00 |
|
jpekkila
|
65a2d47ef7
|
Made grid.cu (multi-node) to compile without errors. Not used though.
|
2019-10-17 13:03:42 +03:00 |
|
jpekkila
|
ef94ab5b96
|
A small update to ctest
|
2019-10-17 13:02:41 +03:00 |
|
jpekkila
|
4fcf9d861f
|
More undeprecated/deprecated fixes
|
2019-10-15 19:46:57 +03:00 |
|
jpekkila
|
0865f0499b
|
Various improvements to the MPI-GPU implementation, but linking MPI libraries with both the host C-project and the core library seems to be a major pain. Currently the communication is done via gpu->cpu->cpu->gpu.
|
2019-10-15 19:32:16 +03:00 |
|
jpekkila
|
113be456d6
|
Undeprecated the wrong function in commit b693c8a
|
2019-10-15 18:11:07 +03:00 |
|
jpekkila
|
1ca089c163
|
New cmake option: MPI_ENABLED. Enables MPI functions on the device layer
|
2019-10-15 17:57:53 +03:00 |
|
jpekkila
|
0d02faa5f5
|
Working base for gathering, distributing and communicating halos with MPI
|
2019-10-15 17:39:26 +03:00 |
|
jpekkila
|
b11ef143eb
|
Moved a debug print further to reduce clutter
|
2019-10-15 17:38:29 +03:00 |
|
jpekkila
|
fd9dc7ca98
|
Added periodic boundconds to utils
|
2019-10-15 17:37:57 +03:00 |
|
jpekkila
|
ff1ad37047
|
Some small improvements to the utils library
|
2019-10-15 17:00:58 +03:00 |
|
jpekkila
|
46ad9da8c8
|
Pulled some stuff from the mpi branch
|
2019-10-15 17:00:44 +03:00 |
|
jpekkila
|
4ae9c74d9d
|
Added a function for randomizing vertex buffers (useful for testing)
|
2019-10-15 16:13:11 +03:00 |
|
jpekkila
|
37171689c8
|
Formatting
|
2019-10-15 16:12:44 +03:00 |
|
jpekkila
|
b693c8adb4
|
Undeprecated acDeviceLoadMesh and acDeviceStoreMesh, these are actually very nice to have
|
2019-10-15 16:12:31 +03:00 |
|
jpekkila
|
8d86ac6f9e
|
Started preparing the MPI version for benchmarks and added a solve-independent version of the verification functions to the utils library
|
2019-10-15 15:54:15 +03:00 |
|
jpekkila
|
08188f3f5b
|
is_valid is now consistently overloaded (parameter passed as a reference). Older CUDA compilers complained about this.
|
2019-10-14 21:18:21 +03:00 |
|
jpekkila
|
b667735906
|
Removed debug prints from the preprocessing script
|
2019-10-08 00:31:15 +03:00 |
|
jpekkila
|
44a86f5e80
|
acc: Removed debug prints, old code. Also the scope of the declarations made inside a for statement is now properly tracked
|
2019-10-08 00:20:57 +03:00 |
|
jpekkila
|
08f155cbec
|
Finetuning some error checks
|
2019-10-07 20:40:32 +03:00 |
|
jpekkila
|
ea4438f331
|
Adapted the old example of helical forcing with profiles to conform with the revised syntax
|
2019-10-07 19:43:25 +03:00 |
|
jpekkila
|
0cc5bdaa08
|
Added support for ScalarArrays back
|
2019-10-07 19:42:24 +03:00 |
|
jpekkila
|
5d4f47c3d2
|
Added overloads for vector in-place addition and subtraction
|
2019-10-07 19:40:54 +03:00 |
|
jpekkila
|
ba49e7e400
|
Replaced deprecated DCONST_INT calls with overloaded DCONST()
|
2019-10-07 19:40:27 +03:00 |
|
jpekkila
|
9c575f8059
|
Merge branch 'master' into acc_rewrite_20191002
|
2019-10-07 18:28:33 +03:00 |
|
jpekkila
|
ff12332f06
|
Clarified the syntax for real number literals. 1.0 is the same precision as AcReal, 1.0f is an explicit float and 1.0d is an explicit double.
|
2019-10-07 18:24:32 +03:00 |
|
jpekkila
|
ffb139883f
|
API_specification_and_user_manual.md edited online with Bitbucket
|
2019-10-07 15:22:26 +00:00 |
|
jpekkila
|
aa6c2b23d9
|
Built-in parameters are now added during compilation instead of defining them in CUDA sources. IMPORTANT: DCONST macro should no longer be used when accessing built-in variables. Now all uniforms are consistently accessed with the handle only
|
2019-10-07 17:39:27 +03:00 |
|
jpekkila
|
3fe7b62d3e
|
Removed the old accrevision directory
|
2019-10-07 17:37:09 +03:00 |
|
jpekkila
|
6560be7056
|
Moved the old mhd solver to mhd_solver_DEPRECATED and replaced it with the new stencil_kernel.ac file
|
2019-10-07 17:36:30 +03:00 |
|
jpekkila
|
8c1e603a98
|
On second thought, let's revert the changes in mhd_solver and use the file I already modified instead of doing the same changes twice
|
2019-10-07 17:29:53 +03:00 |
|
jpekkila
|
16c8b1e748
|
Autoformatting
|
2019-10-07 17:17:58 +03:00 |
|
jpekkila
|
c8e0586b60
|
Renamed the old .sas and .sdh files to regular headers and added #pragma once.
|
2019-10-07 17:17:26 +03:00 |
|
jpekkila
|
ee4ff730f6
|
Deprecated inv_dsx and friends from utils/config_loader.c since those are not defined in the case where the user does not include stdderiv.h
|
2019-10-07 17:01:21 +03:00 |
|
jpekkila
|
66cfcefb34
|
More error checks
|
2019-10-07 17:00:23 +03:00 |
|