diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 411f8c8..7b7f524 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,6 +1,3 @@ -Contributing -============ - # Contributing Contributions to Astaroth are very welcome! diff --git a/README.md b/README.md index 1d4fe76..827882e 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,4 @@ -Astaroth Documentation {#mainpage} -============ - -![Astaroth Sigil](doc/astaroth_logo.svg) - -# Astaroth - A Multi-GPU Library for Generic Stencil Computations +# Astaroth - A Multi-GPU Library for Generic Stencil Computations {#mainpage} [Specification](doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md) | [Contributing](CONTRIBUTING.md) | [Licence](LICENCE.md) | [Issue Tracker](https://bitbucket.org/jpekkila/astaroth/issues?status=new&status=open) | [Wiki](https://bitbucket.org/jpekkila/astaroth/wiki/Home) diff --git a/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md b/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md index fe77ce9..eb955d0 100644 --- a/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md +++ b/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md @@ -1,6 +1,3 @@ -Astaroth Specification and User Manual -============ - # Astaroth Specification and User Manual Copyright (C) 2014-2020, Johannes Pekkila, Miikka Vaisala. @@ -52,17 +49,17 @@ usable via the Astaroth API. While the Astaroth library is written in C++/CUDA, the C99 standard. -# Publications +## Publications The foundational work was done in (Väisälä, Pekkilä, 2017) and the library, API and DSL described in this document were introduced in (Pekkilä, 2019). We kindly wish the users of Astaroth to cite to these publications in their work. -> J. Pekkilä, Astaroth: A Library for Stencil Computations on Graphics Processing Units. Master's thesis, Aalto University School of Science, Espoo, Finland, 2019. +> [J. Pekkilä, Astaroth: A Library for Stencil Computations on Graphics Processing Units. Master's thesis, Aalto University School of Science, Espoo, Finland, 2019.](http://urn.fi/URN:NBN:fi:aalto-201906233993) -> M. S. Väisälä, Magnetic Phenomena of the Interstellar Medium in Theory and Observation. PhD thesis, University of Helsinki, Finland, 2017. +> [M. S. Väisälä, Magnetic Phenomena of the Interstellar Medium in Theory and Observation. PhD thesis, University of Helsinki, Finland, 2017.](http://urn.fi/URN:ISBN:978-951-51-2778-5) -> J. Pekkilä, M. S. Väisälä, M. Käpylä, P. J. Käpylä, and O. Anjum, “Methods for compressible fluid simulation on GPUs using high-order finite differences, ”Computer Physics Communications, vol. 217, pp. 11–22, Aug. 2017. +> [J. Pekkilä, M. S. Väisälä, M. Käpylä, P. J. Käpylä, and O. Anjum, “Methods for compressible fluid simulation on GPUs using high-order finite differences, ”Computer Physics Communications, vol. 217, pp. 11–22, Aug. 2017.](https://doi.org/10.1016/j.cpc.2017.03.011) @@ -218,9 +215,10 @@ AcResult acDeviceLoadMeshInfo(const Device device, const Stream stream, const AcMeshInfo device_config); ``` +### Integration, Reductions and Boundary Conditions -### Computation - +The library provides the following functions for integration, reductions and computing periodic +boundary conditions. ```C AcResult acDeviceIntegrateSubstep(const Device device, const Stream stream, const int step_number, const int3 start, const int3 end, const AcReal dt); @@ -248,7 +246,16 @@ AcResult acNodeReduceVec(const Node node, const Stream stream_type, const Reduct const VertexBufferHandle vtxbuf2, AcReal* result); ``` -### Stream Synchronization +Finally, there's a library function that is automatically generated for all user-specified `Kernel` +functions written with the Astaroth DSL, +```C +AcResult acDeviceKernel_##identifier(const Device device, const Stream stream, + const int3 start, const int3 end); +``` +Where `##identifier` is replaced with the name of the user-specified kernel. For example, a device +function `Kernel solve()` can be called with `acDeviceKernel_solve()` via the API. + +## Stream Synchronization All library functions that take a `Stream` as a parameter are asynchronous. When calling these functions, control returns immediately back to the host even if the called device function has not @@ -270,13 +277,20 @@ synchronized at once by passing the alias `STREAM_ALL` to the synchronization fu Usage of streams is demonstrated with the following example. ```C funcA(STREAM_0); -funcB(STREAM_0); // Blocks until funcA has completed -funcC(STREAM_1); // May execute in parallel with funcB +funcB(STREAM_0); // Blocks until funcA has completed +funcC(STREAM_1); // May execute in parallel with funcB barrierSynchronizeStream(STREAM_ALL); // Blocks until functions in all streams have completed -funcD(STREAM_2); // Is started when command returns from synchronizeStream() +funcD(STREAM_2); // Is started when command returns from synchronizeStream() ``` -### Data Synchronization +Astaroth API provides the following functions for barrier synchronization. +```C +AcResult acSynchronize(void); +AcResult acNodeSynchronizeStream(const Node node, const Stream stream); +AcResult acDeviceSynchronizeStream(const Device device, const Stream stream); +``` + +## Data Synchronization Stream synchronization works in the same fashion on node and device layers. However on the node layer, one has to take in account that a portion of the mesh is shared between devices and that the @@ -296,7 +310,7 @@ AcResult acNodeSynchronizeVertexBuffer(const Node node, const Stream stream, > **NOTE**: Local halos must be up to date before synchronizing the data. Local halos are the grid points outside the computational domain which are used only by a single device. The mesh is distributed to multiple devices by blocking along the z axis. If there are *n* devices and the z-dimension of the computational domain is *nz*, then each device is assigned *nz / n* two-dimensional planes. For example with two devices, the data block that has to be up to date ranges from *(0, 0, nz)* to *(mx, my, nz + 2 * NGHOST)*. -### Input and Output Buffers +## Input and Output Buffers The mesh is duplicated to input and output buffers for performance reasons. The input buffers are read-only in user-specified compute kernels, which allows us to read them via the texture cache @@ -357,14 +371,14 @@ Meshes are the primary structures for passing information to the library and ker of a `Mesh` is declared as ```C typedef struct { - int int_params[NUM_INT_PARAMS]; - int3 int3_params[NUM_INT3_PARAMS]; - AcReal real_params[NUM_REAL_PARAMS]; + int int_params[NUM_INT_PARAMS]; + int3 int3_params[NUM_INT3_PARAMS]; + AcReal real_params[NUM_REAL_PARAMS]; AcReal3 real3_params[NUM_REAL3_PARAMS]; } AcMeshInfo; typedef struct { - AcReal* vertex_buffer[NUM_VTXBUF_HANDLES]; + AcReal* vertex_buffer[NUM_VTXBUF_HANDLES]; AcMeshInfo info; } AcMesh; ``` @@ -415,45 +429,7 @@ Let *i* be the device id. The portion of the halos shared by neighboring devices `acNodeSynchronizeVertexBuffer` and `acNodeSynchronizeMesh` communicate these shared areas among the devices in the node. -## Integration, Reductions and Boundary Conditions - -The library provides the following functions for integration, reductions and computing periodic -boundary conditions. -```C -AcResult acDeviceIntegrateSubstep(const Device device, const Stream stream, const int step_number, - const int3 start, const int3 end, const AcReal dt); -AcResult acDevicePeriodicBoundcondStep(const Device device, const Stream stream, - const VertexBufferHandle vtxbuf_handle, const int3 start, - const int3 end); -AcResult acDevicePeriodicBoundconds(const Device device, const Stream stream, const int3 start, - const int3 end); -AcResult acDeviceReduceScal(const Device device, const Stream stream, const ReductionType rtype, - const VertexBufferHandle vtxbuf_handle, AcReal* result); -AcResult acDeviceReduceVec(const Device device, const Stream stream_type, const ReductionType rtype, - const VertexBufferHandle vtxbuf0, const VertexBufferHandle vtxbuf1, - const VertexBufferHandle vtxbuf2, AcReal* result); - -AcResult acNodeIntegrateSubstep(const Node node, const Stream stream, const int step_number, - const int3 start, const int3 end, const AcReal dt); -AcResult acNodeIntegrate(const Node node, const AcReal dt); -AcResult acNodePeriodicBoundcondStep(const Node node, const Stream stream, - const VertexBufferHandle vtxbuf_handle); -AcResult acNodePeriodicBoundconds(const Node node, const Stream stream); -AcResult acNodeReduceScal(const Node node, const Stream stream, const ReductionType rtype, - const VertexBufferHandle vtxbuf_handle, AcReal* result); -AcResult acNodeReduceVec(const Node node, const Stream stream_type, const ReductionType rtype, - const VertexBufferHandle vtxbuf0, const VertexBufferHandle vtxbuf1, - const VertexBufferHandle vtxbuf2, AcReal* result); -``` - -Finally, there's a library function that is automatically generated for all user-specified `Kernel` -functions written with the Astaroth DSL, -```C -AcResult acDeviceKernel_##identifier(const Device device, const Stream stream, - const int3 start, const int3 end); -``` -Where `##identifier` is replaced with the name of the user-specified kernel. For example, a device -function `Kernel solve()` can be called with `acDeviceKernel_solve()` via the API. +> **NOTE:** The decomposition scheme is subject to change. # Astaroth Domain-Specific Language diff --git a/doc/astaroth_logo_small.png b/doc/astaroth_logo_small.png new file mode 100644 index 0000000..8bd3616 Binary files /dev/null and b/doc/astaroth_logo_small.png differ diff --git a/doxyfile b/doxyfile index 5f89d90..6d3f388 100644 --- a/doxyfile +++ b/doxyfile @@ -38,7 +38,7 @@ PROJECT_NAME = "Astaroth" # could be handy for archiving the generated documentation or if some version # control system is used. -PROJECT_NUMBER = +PROJECT_NUMBER = 2.1 # Using the PROJECT_BRIEF tag one can provide an optional one line description # for a project that appears at the top of each page and should give viewer a @@ -51,7 +51,7 @@ PROJECT_BRIEF = # pixels and the maximum width should not exceed 200 pixels. Doxygen will copy # the logo to the output directory. -PROJECT_LOGO = +PROJECT_LOGO = doc/astaroth_logo_small.png # The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) path # into which the generated documentation will be written. If a relative path is @@ -242,7 +242,7 @@ TCL_SUBST = # members will be omitted, etc. # The default value is: NO. -OPTIMIZE_OUTPUT_FOR_C = NO +OPTIMIZE_OUTPUT_FOR_C = YES # Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java or # Python sources only. Doxygen will then generate output that is more tailored @@ -1187,7 +1187,7 @@ HTML_TIMESTAMP = NO # The default value is: NO. # This tag requires that the tag GENERATE_HTML is set to YES. -HTML_DYNAMIC_SECTIONS = NO +HTML_DYNAMIC_SECTIONS = YES # With HTML_INDEX_NUM_ENTRIES one can control the preferred number of entries # shown in the various tree structured indices initially; the user can expand @@ -1416,7 +1416,7 @@ DISABLE_INDEX = NO # The default value is: NO. # This tag requires that the tag GENERATE_HTML is set to YES. -GENERATE_TREEVIEW = NO +GENERATE_TREEVIEW = YES # The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values that # doxygen will group on one line in the generated HTML documentation. diff --git a/include/astaroth_device.h b/include/astaroth_device.h index bb9d182..8abe642 100644 --- a/include/astaroth_device.h +++ b/include/astaroth_device.h @@ -16,6 +16,13 @@ You should have received a copy of the GNU General Public License along with Astaroth. If not, see . */ +/** + * @file Single-Device Interface + * \brief Provides functions for controlling a single device. + * + * Detailed info. + * + */ #pragma once #ifdef __cplusplus