From 37cafd26aa767929a4c6d2ab3076ceb9ad0be80a Mon Sep 17 00:00:00 2001 From: jpekkila Date: Tue, 14 Jan 2020 14:44:06 +0200 Subject: [PATCH] Various small improvements to the website (navigation panel, better headings, formatting, etc) --- CONTRIBUTING.md | 3 - README.md | 7 +- .../API_specification_and_user_manual.md | 92 +++++++----------- doc/astaroth_logo_small.png | Bin 0 -> 1565 bytes doxyfile | 10 +- include/astaroth_device.h | 7 ++ 6 files changed, 47 insertions(+), 72 deletions(-) create mode 100644 doc/astaroth_logo_small.png diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 411f8c8..7b7f524 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,6 +1,3 @@ -Contributing -============ - # Contributing Contributions to Astaroth are very welcome! diff --git a/README.md b/README.md index 1d4fe76..827882e 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,4 @@ -Astaroth Documentation {#mainpage} -============ - -![Astaroth Sigil](doc/astaroth_logo.svg) - -# Astaroth - A Multi-GPU Library for Generic Stencil Computations +# Astaroth - A Multi-GPU Library for Generic Stencil Computations {#mainpage} [Specification](doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md) | [Contributing](CONTRIBUTING.md) | [Licence](LICENCE.md) | [Issue Tracker](https://bitbucket.org/jpekkila/astaroth/issues?status=new&status=open) | [Wiki](https://bitbucket.org/jpekkila/astaroth/wiki/Home) diff --git a/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md b/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md index fe77ce9..eb955d0 100644 --- a/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md +++ b/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md @@ -1,6 +1,3 @@ -Astaroth Specification and User Manual -============ - # Astaroth Specification and User Manual Copyright (C) 2014-2020, Johannes Pekkila, Miikka Vaisala. @@ -52,17 +49,17 @@ usable via the Astaroth API. While the Astaroth library is written in C++/CUDA, the C99 standard. -# Publications +## Publications The foundational work was done in (Väisälä, Pekkilä, 2017) and the library, API and DSL described in this document were introduced in (Pekkilä, 2019). We kindly wish the users of Astaroth to cite to these publications in their work. -> J. Pekkilä, Astaroth: A Library for Stencil Computations on Graphics Processing Units. Master's thesis, Aalto University School of Science, Espoo, Finland, 2019. +> [J. Pekkilä, Astaroth: A Library for Stencil Computations on Graphics Processing Units. Master's thesis, Aalto University School of Science, Espoo, Finland, 2019.](http://urn.fi/URN:NBN:fi:aalto-201906233993) -> M. S. Väisälä, Magnetic Phenomena of the Interstellar Medium in Theory and Observation. PhD thesis, University of Helsinki, Finland, 2017. +> [M. S. Väisälä, Magnetic Phenomena of the Interstellar Medium in Theory and Observation. PhD thesis, University of Helsinki, Finland, 2017.](http://urn.fi/URN:ISBN:978-951-51-2778-5) -> J. Pekkilä, M. S. Väisälä, M. Käpylä, P. J. Käpylä, and O. Anjum, “Methods for compressible fluid simulation on GPUs using high-order finite differences, ”Computer Physics Communications, vol. 217, pp. 11–22, Aug. 2017. +> [J. Pekkilä, M. S. Väisälä, M. Käpylä, P. J. Käpylä, and O. Anjum, “Methods for compressible fluid simulation on GPUs using high-order finite differences, ”Computer Physics Communications, vol. 217, pp. 11–22, Aug. 2017.](https://doi.org/10.1016/j.cpc.2017.03.011) @@ -218,9 +215,10 @@ AcResult acDeviceLoadMeshInfo(const Device device, const Stream stream, const AcMeshInfo device_config); ``` +### Integration, Reductions and Boundary Conditions -### Computation - +The library provides the following functions for integration, reductions and computing periodic +boundary conditions. ```C AcResult acDeviceIntegrateSubstep(const Device device, const Stream stream, const int step_number, const int3 start, const int3 end, const AcReal dt); @@ -248,7 +246,16 @@ AcResult acNodeReduceVec(const Node node, const Stream stream_type, const Reduct const VertexBufferHandle vtxbuf2, AcReal* result); ``` -### Stream Synchronization +Finally, there's a library function that is automatically generated for all user-specified `Kernel` +functions written with the Astaroth DSL, +```C +AcResult acDeviceKernel_##identifier(const Device device, const Stream stream, + const int3 start, const int3 end); +``` +Where `##identifier` is replaced with the name of the user-specified kernel. For example, a device +function `Kernel solve()` can be called with `acDeviceKernel_solve()` via the API. + +## Stream Synchronization All library functions that take a `Stream` as a parameter are asynchronous. When calling these functions, control returns immediately back to the host even if the called device function has not @@ -270,13 +277,20 @@ synchronized at once by passing the alias `STREAM_ALL` to the synchronization fu Usage of streams is demonstrated with the following example. ```C funcA(STREAM_0); -funcB(STREAM_0); // Blocks until funcA has completed -funcC(STREAM_1); // May execute in parallel with funcB +funcB(STREAM_0); // Blocks until funcA has completed +funcC(STREAM_1); // May execute in parallel with funcB barrierSynchronizeStream(STREAM_ALL); // Blocks until functions in all streams have completed -funcD(STREAM_2); // Is started when command returns from synchronizeStream() +funcD(STREAM_2); // Is started when command returns from synchronizeStream() ``` -### Data Synchronization +Astaroth API provides the following functions for barrier synchronization. +```C +AcResult acSynchronize(void); +AcResult acNodeSynchronizeStream(const Node node, const Stream stream); +AcResult acDeviceSynchronizeStream(const Device device, const Stream stream); +``` + +## Data Synchronization Stream synchronization works in the same fashion on node and device layers. However on the node layer, one has to take in account that a portion of the mesh is shared between devices and that the @@ -296,7 +310,7 @@ AcResult acNodeSynchronizeVertexBuffer(const Node node, const Stream stream, > **NOTE**: Local halos must be up to date before synchronizing the data. Local halos are the grid points outside the computational domain which are used only by a single device. The mesh is distributed to multiple devices by blocking along the z axis. If there are *n* devices and the z-dimension of the computational domain is *nz*, then each device is assigned *nz / n* two-dimensional planes. For example with two devices, the data block that has to be up to date ranges from *(0, 0, nz)* to *(mx, my, nz + 2 * NGHOST)*. -### Input and Output Buffers +## Input and Output Buffers The mesh is duplicated to input and output buffers for performance reasons. The input buffers are read-only in user-specified compute kernels, which allows us to read them via the texture cache @@ -357,14 +371,14 @@ Meshes are the primary structures for passing information to the library and ker of a `Mesh` is declared as ```C typedef struct { - int int_params[NUM_INT_PARAMS]; - int3 int3_params[NUM_INT3_PARAMS]; - AcReal real_params[NUM_REAL_PARAMS]; + int int_params[NUM_INT_PARAMS]; + int3 int3_params[NUM_INT3_PARAMS]; + AcReal real_params[NUM_REAL_PARAMS]; AcReal3 real3_params[NUM_REAL3_PARAMS]; } AcMeshInfo; typedef struct { - AcReal* vertex_buffer[NUM_VTXBUF_HANDLES]; + AcReal* vertex_buffer[NUM_VTXBUF_HANDLES]; AcMeshInfo info; } AcMesh; ``` @@ -415,45 +429,7 @@ Let *i* be the device id. The portion of the halos shared by neighboring devices `acNodeSynchronizeVertexBuffer` and `acNodeSynchronizeMesh` communicate these shared areas among the devices in the node. -## Integration, Reductions and Boundary Conditions - -The library provides the following functions for integration, reductions and computing periodic -boundary conditions. -```C -AcResult acDeviceIntegrateSubstep(const Device device, const Stream stream, const int step_number, - const int3 start, const int3 end, const AcReal dt); -AcResult acDevicePeriodicBoundcondStep(const Device device, const Stream stream, - const VertexBufferHandle vtxbuf_handle, const int3 start, - const int3 end); -AcResult acDevicePeriodicBoundconds(const Device device, const Stream stream, const int3 start, - const int3 end); -AcResult acDeviceReduceScal(const Device device, const Stream stream, const ReductionType rtype, - const VertexBufferHandle vtxbuf_handle, AcReal* result); -AcResult acDeviceReduceVec(const Device device, const Stream stream_type, const ReductionType rtype, - const VertexBufferHandle vtxbuf0, const VertexBufferHandle vtxbuf1, - const VertexBufferHandle vtxbuf2, AcReal* result); - -AcResult acNodeIntegrateSubstep(const Node node, const Stream stream, const int step_number, - const int3 start, const int3 end, const AcReal dt); -AcResult acNodeIntegrate(const Node node, const AcReal dt); -AcResult acNodePeriodicBoundcondStep(const Node node, const Stream stream, - const VertexBufferHandle vtxbuf_handle); -AcResult acNodePeriodicBoundconds(const Node node, const Stream stream); -AcResult acNodeReduceScal(const Node node, const Stream stream, const ReductionType rtype, - const VertexBufferHandle vtxbuf_handle, AcReal* result); -AcResult acNodeReduceVec(const Node node, const Stream stream_type, const ReductionType rtype, - const VertexBufferHandle vtxbuf0, const VertexBufferHandle vtxbuf1, - const VertexBufferHandle vtxbuf2, AcReal* result); -``` - -Finally, there's a library function that is automatically generated for all user-specified `Kernel` -functions written with the Astaroth DSL, -```C -AcResult acDeviceKernel_##identifier(const Device device, const Stream stream, - const int3 start, const int3 end); -``` -Where `##identifier` is replaced with the name of the user-specified kernel. For example, a device -function `Kernel solve()` can be called with `acDeviceKernel_solve()` via the API. +> **NOTE:** The decomposition scheme is subject to change. # Astaroth Domain-Specific Language diff --git a/doc/astaroth_logo_small.png b/doc/astaroth_logo_small.png new file mode 100644 index 0000000000000000000000000000000000000000..8bd3616dfb381d1bceccdcd2bd8e00e4ac94d6dd GIT binary patch literal 1565 zcmV+&2IBdNP)EX>4Tx04R}tkv&MmP!xqvQ>CI+2Rn#3WT;Lph!%0wDionYs1;guFnQ@8G-*gu zTpR`0f`dPcRR6lU)@9ukc|2eTX3z=x)?xH-)yYJ8HS92Bvd?N8IGfbO!gLrz= zHaPDShgeZoiO-26CS8#Dk?V@fZ=4HF7Intq8~3b{&P zw!>l~*KK$>Qiya5gl zf$;)muY0_^r*m%q_O#~r1Dc_7w`(JC-T(jq24YJ`L;(K){{a7>y{D4^000SaNLh0L z01ejw01ejxLMWSf00007bV*G`2jl?`3pF$oZgf=u00bFHL_t(&-tC%CY*a-Y$3L^| zQi`<=hayD|Rpd|h9rV%=Pb5a-!K8@?51x!3)OeA`DoEPxYaxLqB&8e(9zZcMO*CGN zCpmF68x1CIW57U-SpK|_X8+g*7T+y-D= zO`;6kuq5|WU^5_q7l1e7cJ4Y*RPT$bZ3EYUF(vOg;B8BC-&gXIz&*8d4h{~F31X!K z#$@^@V7n$@+847YV+Z-12M%aoF5>fU>=2nAmhh4GX?|SH?pw)RldD*e?ZA#oLn3?t8x=0j@AAv^N(CP(|Kg1H!6@5n)0^+Q@MU5`?Yp)xe3Wb5}wh%WX45q)S0C_ zbE`v1nPSB-@A%T&rgxZ_$SsAEpt_=hrHs)BxFe9-v843al~c}Rsup~`Z4!A?@6*+t z$`1ozejXz16SMOSyKUmx<| z#lc{4i&q`9c_9}hfBn8LzbqBXbD^ZOnOq=fkZk+F^JeV|Ae z69?*&V8JClK!=h*cOC$2@v^)$I8bRWUM5TayMbkaUSBjrZWG?+0`QbJ-(sA%{8K$Q z0lz%~mUlgFzRk|u(i3*PmR1ty;#j1Mb>lt3P;&+bK^ z;Ve$;0l8Et&k4k!A?in8rnlBAaWUDTqW|AZo-VzD7D$$5_dcDvhYcm2rt|P+dTV_p zUZWhGS-LaVGLEH(P4){CnHCjvOZF>7OzNg635#RWJF1y`z9qB$u*yXgcZ~^7kT})` zOVyJ7GMz`Hm>kjmdjmM>%+j5?#+a|q3EN_V1EU@bT369BFYCw5mfcp3P% zRVdg1+ys8NST$7Q>8MJtZUKXq$)13T0t?uemZ5y~x#4An$pQyGC zl+~hDNZG&RcJ8V&pAMo+Zno}HT~Ybuo4`@vV~cV>Q%dc`xY*|4um. */ +/** + * @file Single-Device Interface + * \brief Provides functions for controlling a single device. + * + * Detailed info. + * + */ #pragma once #ifdef __cplusplus