diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 6bff505..24cae80 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,20 +1,23 @@ +Contributing +============ + # Contributing Contributions to Astaroth are very welcome! This document details how to create good contributions. There are two primary concerns: -0. The codebase should stay maintainable and commits should adhere to a consistent style. -0. New additions should not disrupt the work of others. +1. The codebase should stay maintainable and commits should adhere to a consistent style. +2. New additions should not disrupt the work of others. ## Basic workflow -*"There is something that needs fixing"* +> "There is something that needs fixing" -0. Create your work. See [Programming](#markdown-header-programming) and [Committing](#markdown-header-committing) . -0. When done, check that autotests still pass by running `./ac_run -t`. -0. **[Recommended]:** Autoformat your code. See [Formatting](#markdown-header-formatting). -0. Create a pull request. +1. Create your work. See [Programming](#markdown-header-programming) and [Committing](#markdown-header-committing) . +2. When done, check that autotests still pass by running `./ac_run -t`. +3. **[Recommended]:** Autoformat your code. See [Formatting](#markdown-header-formatting). +4. Create a pull request. ## Programming * **Strive for code clarity over micro-optimizations.** diff --git a/README.md b/README.md index 9f5227a..7b31d29 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,11 @@ -![astaroth_logo](./doc/astaroth_logo.svg "Astaroth Sigil") +Astaroth Documentation {#mainpage} +============ + +![Astaroth Sigil](doc/astaroth_logo.svg) # Astaroth - A Multi-GPU Library for Generic Stencil Computations -[Wiki](https://bitbucket.org/jpekkila/astaroth/wiki/Home) | [Issue Tracker](https://bitbucket.org/jpekkila/astaroth/issues?status=new&status=open) | [Contributing](https://bitbucket.org/jpekkila/astaroth/src/master/CONTRIBUTING.md) | [Licence](https://bitbucket.org/jpekkila/astaroth/src/master/LICENCE.txt) +[Specification](doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md) | [Contributing](CONTRIBUTING.md) | [Licence](https://bitbucket.org/jpekkila/astaroth/src/master/LICENCE.txt) | [Issue Tracker](https://bitbucket.org/jpekkila/astaroth/issues?status=new&status=open) | [Wiki](https://bitbucket.org/jpekkila/astaroth/wiki/Home) Astaroth is a multi-GPU library for three-dimensional stencil computations. It is designed especially for performing high-order stencil computations in structured grids, where several coupled fields are updated each time step. Astaroth consists of a multi-GPU and single-GPU @@ -11,7 +14,7 @@ makes Astaroth especially suitable for multiphysics simulations. Astaroth is licenced under the terms of the GNU General Public Licence, version 3, or later (see [LICENCE.txt](https://bitbucket.org/miikkavaisala/astaroth-code/src/master/astaroth_2.0/LICENCE.txt)). For contributing guidelines, -see [Contributing](https://bitbucket.org/jpekkila/astaroth/src/master/CONTRIBUTING.md). +see [Contributing](CONTRIBUTING.md). ## System Requirements @@ -24,14 +27,14 @@ Relative recent versions of ## Building -In the base astaroth directory, run +In the base directory, run -0. `mkdir build` -0. `cd build` -0. `cmake ..` -0. `make -j` +1. `mkdir build` +2. `cd build` +3. `cmake ..` +4. `make -j` -> **Optional:** Documentation can be generated with `doxygen doxyfile` (requires Doxygen). The +> **Optional:** Documentation can be generated by running `doxygen` in the base directory. The generated documentation can be found in `doc/doxygen`. > **Tip:** The library is configured by passing [options](#markdown-header-cmake-options) to CMake with `-D[option]=[ON|OFF]`. @@ -74,7 +77,7 @@ See `analysis/python/` directory of existing data visualization and analysis scr * `astaroth/include/astaroth.h`: Legacy interface for backwards compatibility and quick testing. * `astaroth/include/astaroth_node.h`: Multi-GPU interface (single node). * `astaroth/include/astaroth_device.h`: Single-GPU interface. -* `astaroth/src/utils`: Utility library for host-side memory allocations, verification and other tasks. +* `astaroth/src/utils/`: Utility library for host-side memory allocations, verification and other tasks. ## FAQ @@ -92,5 +95,5 @@ Otherwise the build steps are the same. Run with `mpirun -np 4 ./mpitest`. How do I contribute? -> See [Contributing](https://bitbucket.org/jpekkila/astaroth/src/master/CONTRIBUTING.md). +> See [Contributing](CONTRIBUTING.md). diff --git a/acc/README.md b/acc/README.md index 6197fed..467b63b 100644 --- a/acc/README.md +++ b/acc/README.md @@ -1,3 +1,6 @@ +Astaroth DSL compiler +============ + # Dependencies ## Debian/Ubuntu `apt install flex bison build-essential` diff --git a/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md b/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md index 717df22..6f03032 100644 --- a/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md +++ b/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md @@ -1,4 +1,7 @@ -# Astaroth specification and user manual +Astaroth Specification and User Manual +============ + +# Astaroth Specification and User Manual Copyright (C) 2014-2019, Johannes Pekkila, Miikka Vaisala. @@ -20,7 +23,7 @@ Copyright (C) 2014-2019, Johannes Pekkila, Miikka Vaisala. along with Astaroth. If not, see . -# Introduction and background +# Introduction and Background Astaroth is a collection of tools for utilizing multiple graphics processing units (GPUs) efficiently in three-dimensional stencil computations. This document specifies the Astaroth @@ -67,8 +70,8 @@ to these publications in their work. The Astroth application-programming interface (API) provides the means for controlling execution of user-defined and built-in functions on multiple graphics processing units. Functions in the API are -prefixed with lower case ```ac```, while structures and data types are prefixed with capitalized -```Ac```. Compile-time constants, such as definitions and enumerations, have the prefix ```AC_```. +prefixed with lower case `ac`, while structures and data types are prefixed with capitalized +`Ac`. Compile-time constants, such as definitions and enumerations, have the prefix `AC_`. All of the API functions return an AcResult value indicating either success or failure. The return codes are ```C @@ -103,13 +106,13 @@ Finally, a third layer is provided for convenience and backwards compatibility. There are also several helper functions defined in `include/astaroth_defines.h`, which can be used for, say, determining the size or performing index calculations within the simulation domain. -## List of Astaroth API functions +## List of Astaroth API Functions Here's a non-exhaustive list of astaroth API functions. For more info and an up-to-date list, see the corresponding header files in `include/astaroth_defines.h`, `include/astaroth.h`, `include/ astaroth_node.h`, `include/astaroth_device.h`. -### Initialization, quitting and helper functions +### Initialization, Quitting and Helper Functions Device layer. ```C @@ -137,7 +140,7 @@ size_t acVertexBufferCompdomainSizeBytes(const AcMeshInfo info); size_t acVertexBufferIdx(const int i, const int j, const int k, const AcMeshInfo info); ``` -### Loading and storing +### Loading and Storing Loading meshes and vertex buffers to device memory. ```C @@ -245,7 +248,7 @@ AcResult acNodeReduceVec(const Node node, const Stream stream_type, const Reduct const VertexBufferHandle vtxbuf2, AcReal* result); ``` -### Stream synchronization +### Stream Synchronization All library functions that take a `Stream` as a parameter are asynchronous. When calling these functions, control returns immediately back to the host even if the called device function has not @@ -273,7 +276,7 @@ barrierSynchronizeStream(STREAM_ALL); // Blocks until functions in all streams h funcD(STREAM_2); // Is started when command returns from synchronizeStream() ``` -### Data synchronization +### Data Synchronization Stream synchronization works in the same fashion on node and device layers. However on the node layer, one has to take in account that a portion of the mesh is shared between devices and that the @@ -291,14 +294,9 @@ AcResult acNodeSynchronizeVertexBuffer(const Node node, const Stream stream, ``` -> **NOTE**: Local halos must be up to date before synchronizing the data. Local halos are the grid -points outside the computational domain which are used only by a single device. The mesh is -distributed to multiple devices by blocking along the z axis. If there are *n* devices and the z- -dimension of the computational domain is *nz*, then each device is assigned *nz / n* two- -dimensional planes. For example with two devices, the data block that has to be up to date ranges -from *(0, 0, nz)* to *(mx, my, nz + 2 * NGHOST)* +> **NOTE**: Local halos must be up to date before synchronizing the data. Local halos are the grid points outside the computational domain which are used only by a single device. The mesh is distributed to multiple devices by blocking along the z axis. If there are *n* devices and the z-dimension of the computational domain is *nz*, then each device is assigned *nz / n* two-dimensional planes. For example with two devices, the data block that has to be up to date ranges from *(0, 0, nz)* to *(mx, my, nz + 2 * NGHOST)*. -### Input and output buffers +### Input and Output Buffers The mesh is duplicated to input and output buffers for performance reasons. The input buffers are read-only in user-specified compute kernels, which allows us to read them via the texture cache @@ -313,10 +311,7 @@ is done via the API calls AcResult acDeviceSwapBuffers(const Device device); AcResult acNodeSwapBuffers(const Node node); ``` -> **NOTE**: All functions provided with the API operate on input buffers and ensure that the -complete result is available in the input buffer when the function has completed. User-specified -kernels are exceptions and write the result to output buffers. Therefore buffers have to be swapped -only after calling user-specified kernels. +> **NOTE**: All functions provided with the API operate on input buffers and ensure that the complete result is available in the input buffer when the function has completed. User-specified kernels are exceptions and write the result to output buffers. Therefore buffers have to be swapped only after calling user-specified kernels. ## Devices @@ -420,7 +415,7 @@ Let *i* be the device id. The portion of the halos shared by neighboring devices `acNodeSynchronizeVertexBuffer` and `acNodeSynchronizeMesh` communicate these shared areas among the devices in the node. -## Integration, reductions and boundary conditions +## Integration, Reductions and Boundary Conditions The library provides the following functions for integration, reductions and computing periodic boundary conditions. @@ -487,18 +482,18 @@ pipeline shown in the following figure. | Stage | File ending | Description | |--------------------|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Stencil assembly | .sas | Defines the shape of the stencils and functions to be preprocessed before entering the stencil processing stage. Reading from input arrays is only possible during this stage. | -| Stencil process | .sps | The functions executed on streams of data are defined here. Contains kernels, which are essentially main functions of GPU programs. | -| Stencil definition | .sdh | All field identifiers and constant memory symbols are defined in this file. | -| Any | .h | Optional header files which can be included in any other file. +| Stencil assembly | .ac | Defines the shape of the stencils and functions to be preprocessed before entering the stencil processing stage. Reading from input arrays is only possible during this stage. | +| Stencil process | .ac | The functions executed on streams of data are defined here. Contains kernels, which are essentially main functions of GPU programs. | +| Stencil definition | .ac | All field identifiers and constant memory symbols are defined in this file. | +| Any | .h | Optional header files which can be included in any file. Compilation of the DSL files is integrated into `CMakelists.txt` provided with the library and dependencies are recompiled if needed when calling `make`. All DSL files should reside in the same -directory and there should be only one `.sas`, `.sps` and `.sdh` file. There may be any number of +directory and there should be only one `.ac` file. There may be any number of optional `.h` files. When configuring the project, the user should pass the path to the DSL directory as a cmake option like so: ```cmake -DDSL_MODULE_DIR="some/user/dir" ..```. -## Data types +## Data Types In addition to basic datatypes in C/C++/CUDA, such as int and int3, we provide the following datatypes with the DSL. @@ -517,13 +512,13 @@ In addition to basic datatypes in C/C++/CUDA, such as int and int3, we provide t `Scalars` are 32-bit floating-point numbers by default. Double precision can be turned on by setting cmake option `DOUBLE_PRECISION=ON`. All real number literals are converted automatically to the correct precision. In cases where , the precision can be declared explicitly by appending `f` or `d` postfix to the real number. For example, ```C -1.0 // The same precision as Scalar/AcReal -1.0f // Explicit float -1.0d // Explicit double +1.0 // The same precision as Scalar/AcReal +1.0f // Explicit float +1.0d // Explicit double (1.0f * 1.0d) // 1.0f is implicitly cast to double and the multiplication is done in double precision. ``` -## Control flow +## Control Flow Conditional statements are expressed with the `if-else` construct. Unlike in C and C++, we require that the scope of the `if-else` statement is explicitly declared using braces `{` and `}` in order @@ -566,19 +561,21 @@ The following built-in variables are available in `Kernel`s. | globalVertexIdx | Holds the global index of the currently processed vertex. If there is only single device, then vertexIdx is the same as globalVertexIdx. Otherwise globalVertexIdx is offset accordingly. | | globalGridN | Holds the dimensions of the computational domain. | -## Preprocessed functions +## Preprocessed Functions The type qualifier `Preprocessed` indicates which functions should be evaluated immediately when entering a `Kernel` function. The return values of `Preprocessed` functions are cached and calling these functions during the stencil processing stage is essentially free. As main memory bandwidth is significantly slower than on-chip memories and registers, declaring reading-heavy functions as -`Preprocessed` is critical for obtaining good performance in stencil codes. - -`Preprocessed` functions may only be defined in stencil assembly files. +`Preprocessed` is critical for obtaining good performance in stencil codes. The built-in variables `vertexIdx`, `globalVertexidx` and `globalGridN` are available in all `Preprocessed` functions. +## Device Functions + +The type qualifier `Device` indicates which functions can be called from `Kernel` functions or other `Device` functions. + ## Uniforms `Uniform`s are global device variables which stay constant for the duration of a kernel launch. @@ -603,17 +600,23 @@ Instead, one should load the appropriate values during runtime using the `acLoad related functions. -## Standard libraries +## Standard Libraries -> Not implemented +The following table lists the standard libraries currently available. -## Performance considerations +| Built-in variable | Description | +|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| stdderiv.h | Contains functions for computing 2nd, 4th, 6th and 8th order derivatives (configured by defining the STENCIL_ORDER before including stdderiv.h) | + +Astaroth DSL libraries can be included in the same way as C/C++ headers. For example, `#include `. + +## Performance Considerations Uniforms are as fast as compile-time constants as long as -0. The halting condition of a tight loop does not depend on an uniform or a variable, as this would prevent unrolling of the loop during compile-time. -0. Uniforms are not multiplied with each other. The result should be stored in an auxiliary uniform instead. For example, the result of `nx * ny` should be stored in a new `uniform nxy` -0. At least 32 neighboring streams in the x-axis access the same `uniform`. That is, the vertices at vertexIdx.x = i... i + 32 should access the same `uniform` where i is a multiple of 32. +1. The halting condition of a tight loop does not depend on an uniform or a variable, as this would prevent unrolling of the loop during compile-time. +2. Uniforms are not multiplied with each other. The result should be stored in an auxiliary uniform instead. For example, the result of `nx * ny` should be stored in a new `uniform nxy` +3. At least 32 neighboring streams in the x-axis access the same `uniform`. That is, the vertices at vertexIdx.x = i... i + 32 should access the same `uniform` where i is a multiple of 32. diff --git a/doxyfile b/doxyfile index 7bab478..5f89d90 100644 --- a/doxyfile +++ b/doxyfile @@ -771,7 +771,7 @@ WARN_LOGFILE = doc/doxygen/doxygen_warnings.log # spaces. See also FILE_PATTERNS and EXTENSION_MAPPING # Note: If this tag is empty the current directory is searched. -INPUT = src include +INPUT = # This tag can be used to specify the character encoding of the source files # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses @@ -796,7 +796,7 @@ INPUT_ENCODING = UTF-8 # *.m, *.markdown, *.md, *.mm, *.dox, *.py, *.pyw, *.f90, *.f, *.for, *.tcl, # *.vhd, *.vhdl, *.ucf, *.qsf, *.as and *.js. -FILE_PATTERNS = *.cc *.h *.cu *.cuh +FILE_PATTERNS = *.c *.cc *.h *.cu *.cuh *.md # The RECURSIVE tag can be used to specify whether or not subdirectories should # be searched for input files as well. @@ -920,7 +920,7 @@ FILTER_SOURCE_PATTERNS = # (index.html). This can be useful if you have a project on for instance GitHub # and want to reuse the introduction page also for the doxygen output. -USE_MDFILE_AS_MAINPAGE = +USE_MDFILE_AS_MAINPAGE = #--------------------------------------------------------------------------- # Configuration options related to source browsing @@ -2314,7 +2314,7 @@ DIRECTORY_GRAPH = YES # The default value is: png. # This tag requires that the tag HAVE_DOT is set to YES. -DOT_IMAGE_FORMAT = png +DOT_IMAGE_FORMAT = svg # If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to # enable generation of interactive SVG images that allow zooming and panning. @@ -2326,7 +2326,7 @@ DOT_IMAGE_FORMAT = png # The default value is: NO. # This tag requires that the tag HAVE_DOT is set to YES. -INTERACTIVE_SVG = NO +INTERACTIVE_SVG = YES # The DOT_PATH tag can be used to specify the path where the dot tool can be # found. If left blank, it is assumed the dot tool can be found in the path. diff --git a/include/astaroth_node.h b/include/astaroth_node.h index 61438f3..d1cb131 100644 --- a/include/astaroth_node.h +++ b/include/astaroth_node.h @@ -41,16 +41,48 @@ typedef struct { Grid subgrid; } DeviceConfiguration; -/** */ +/** +Initializes all devices on the current node. + +Devices on the node are configured based on the contents of AcMesh. + +@return Exit status. Places the newly created handle in the output parameter. +@see AcMeshInfo + + +Usage example: +@code +AcMeshInfo info; +acLoadConfig(AC_DEFAULT_CONFIG, &info); + +Node node; +acNodeCreate(0, info, &node); +acNodeDestroy(node); +@endcode + */ AcResult acNodeCreate(const int id, const AcMeshInfo node_config, Node* node); -/** */ +/** +Resets all devices on the current node. + +@see acNodeCreate() + */ AcResult acNodeDestroy(Node node); -/** */ +/** +Prints information about the devices available on the current node. + +Requires that Node has been initialized with +@See acNodeCreate(). +*/ AcResult acNodePrintInfo(const Node node); -/** */ +/** + + + +@see DeviceConfiguration +*/ AcResult acNodeQueryDeviceConfiguration(const Node node, DeviceConfiguration* config); /** */