Commit Graph

580 Commits

Author SHA1 Message Date
Johannes Pekkila
1d81333ff7 More concurrent kernels and MPI comm 2019-10-23 12:07:23 +02:00
Johannes Pekkila
04867334e7 Full integration step with MPI comms 2019-10-22 19:59:15 +02:00
Johannes Pekkila
870cd91b5f Added the final MPI solution for the benchmark tests: RDMA is now used and I don't think we can go much faster with the current decomposition scheme. To get better scaling, we probably would have to change 3D decomposition instead of using the current simple 1D decomp 2019-10-22 19:28:35 +02:00
jpekkila
3d7ad7c8f2 Code cleanup 2019-10-22 15:38:34 +03:00
jpekkila
64221c218d Made some warnings go away 2019-10-22 15:03:55 +03:00
Johannes Pekkila
e4a7cdcf1d Added functions for packing and unpacking data on the device 2019-10-22 13:48:47 +02:00
Johannes Pekkila
915e1c7c14 Trying to overlap MPI communication with computation of boundary conditions. However, NVIDIA seemed to forget one important detail in the documentation for CUDA-aware MPI: it looks like CUDA streams are not supported with CUDA-aware MPI communication. So in the end the fastest solution might be to use old-school gpu->cpu->cpu->gpu MPI communication after all 2019-10-21 15:50:53 +02:00
jpekkila
f120343110 Bugfix: peer access was not disabled when Node was destroyed, leading to cudaErrorPeerAccessAlreadyEnabled error when creating new Nodes 2019-10-21 16:23:24 +03:00
Johannes Pekkila
7b475b6dee Better MPI synchronization 2019-10-18 11:50:22 +02:00
jpekkila
f3cb6e7049 Removed old unused tokens from the DSL grammar 2019-10-18 02:14:19 +03:00
jpekkila
0f5acfbb33 <q:::qqq!!!:::q:[2~:wqMer§§gccc:qq[2~: branch 'master' of
https://bitbucket.org/jpekkila/astaroth:q Z
bin/sh: 1: !:: not .>.Merge branch 'master' of https://bitbucket.org/jpekkila/astaroth
2019-10-18 02:06:15 +03:00
jpekkila
7c79a98cdc Added support for various binary operations (>=, <=, /= etc). Also bitwise operators | and & are now allowed 2019-10-18 01:52:14 +03:00
Johannes Pekkila
155d369888 MPI communication now 10x faster 2019-10-17 22:39:57 +02:00
jpekkila
26bbfa089d Better multi-node communication: fire and forget. 2019-10-17 18:17:37 +03:00
jpekkila
3d852e5082 Added timing to the MPI benchmark 2019-10-17 17:43:54 +03:00
jpekkila
e0a631d81a Added the hires timer to utils 2019-10-17 17:43:34 +03:00
jpekkila
588a94c772 Added more MPI stuff. Now multi-node GPU-GPU communication with GPUDirect RDMA should work. Also device memory is now allocated in unified memory by default as this makes MPI communication simpler if RDMA is not supported. This does not affect Astaroth any other way since different devices use different portions of the memory space and we continue managing memory transfers manually. 2019-10-17 16:09:05 +03:00
jpekkila
0e88d6c339 Marked some internal functions static 2019-10-17 14:41:44 +03:00
jpekkila
7390d53f79 Added missing extern Cs to verification.h 2019-10-17 14:41:13 +03:00
jpekkila
f1e988ba6a Added stuff for the device layer for testing GPU-GPU MPI. This is a quick and dirty solution which is primarily meant for benchmarking/verification. Figuring out what the MPI interface should look like is more challenging and is not the priority right now 2019-10-17 14:40:53 +03:00
jpekkila
bb9e65a741 AC_DEFAULT_CONFIG now propagated to projects that link to astaroth utils 2019-10-17 13:05:17 +03:00
jpekkila
859195eda4 exampleproject no longer compiled with astaroth utils 2019-10-17 13:04:39 +03:00
jpekkila
65a2d47ef7 Made grid.cu (multi-node) to compile without errors. Not used though. 2019-10-17 13:03:42 +03:00
jpekkila
ef94ab5b96 A small update to ctest 2019-10-17 13:02:41 +03:00
jpekkila
4fcf9d861f More undeprecated/deprecated fixes 2019-10-15 19:46:57 +03:00
jpekkila
0865f0499b Various improvements to the MPI-GPU implementation, but linking MPI libraries with both the host C-project and the core library seems to be a major pain. Currently the communication is done via gpu->cpu->cpu->gpu. 2019-10-15 19:32:16 +03:00
jpekkila
113be456d6 Undeprecated the wrong function in commit b693c8a 2019-10-15 18:11:07 +03:00
jpekkila
1ca089c163 New cmake option: MPI_ENABLED. Enables MPI functions on the device layer 2019-10-15 17:57:53 +03:00
jpekkila
0d02faa5f5 Working base for gathering, distributing and communicating halos with MPI 2019-10-15 17:39:26 +03:00
jpekkila
b11ef143eb Moved a debug print further to reduce clutter 2019-10-15 17:38:29 +03:00
jpekkila
fd9dc7ca98 Added periodic boundconds to utils 2019-10-15 17:37:57 +03:00
jpekkila
ff1ad37047 Some small improvements to the utils library 2019-10-15 17:00:58 +03:00
jpekkila
46ad9da8c8 Pulled some stuff from the mpi branch 2019-10-15 17:00:44 +03:00
jpekkila
4ae9c74d9d Added a function for randomizing vertex buffers (useful for testing) 2019-10-15 16:13:11 +03:00
jpekkila
37171689c8 Formatting 2019-10-15 16:12:44 +03:00
jpekkila
b693c8adb4 Undeprecated acDeviceLoadMesh and acDeviceStoreMesh, these are actually very nice to have 2019-10-15 16:12:31 +03:00
jpekkila
8d86ac6f9e Started preparing the MPI version for benchmarks and added a solve-independent version of the verification functions to the utils library 2019-10-15 15:54:15 +03:00
jpekkila
08188f3f5b is_valid is now consistently overloaded (parameter passed as a reference). Older CUDA compilers complained about this. 2019-10-14 21:18:21 +03:00
jpekkila
b667735906 Removed debug prints from the preprocessing script 2019-10-08 00:31:15 +03:00
jpekkila
44a86f5e80 acc: Removed debug prints, old code. Also the scope of the declarations made inside a for statement is now properly tracked 2019-10-08 00:20:57 +03:00
jpekkila
08f155cbec Finetuning some error checks 2019-10-07 20:40:32 +03:00
jpekkila
ea4438f331 Adapted the old example of helical forcing with profiles to conform with the revised syntax 2019-10-07 19:43:25 +03:00
jpekkila
0cc5bdaa08 Added support for ScalarArrays back 2019-10-07 19:42:24 +03:00
jpekkila
5d4f47c3d2 Added overloads for vector in-place addition and subtraction 2019-10-07 19:40:54 +03:00
jpekkila
ba49e7e400 Replaced deprecated DCONST_INT calls with overloaded DCONST() 2019-10-07 19:40:27 +03:00
jpekkila
9c575f8059 Merge branch 'master' into acc_rewrite_20191002 2019-10-07 18:28:33 +03:00
jpekkila
ff12332f06 Clarified the syntax for real number literals. 1.0 is the same precision as AcReal, 1.0f is an explicit float and 1.0d is an explicit double. 2019-10-07 18:24:32 +03:00
jpekkila
ffb139883f API_specification_and_user_manual.md edited online with Bitbucket 2019-10-07 15:22:26 +00:00
jpekkila
aa6c2b23d9 Built-in parameters are now added during compilation instead of defining them in CUDA sources. IMPORTANT: DCONST macro should no longer be used when accessing built-in variables. Now all uniforms are consistently accessed with the handle only 2019-10-07 17:39:27 +03:00
jpekkila
3fe7b62d3e Removed the old accrevision directory 2019-10-07 17:37:09 +03:00