Updated documentation and made it work with Doxygen. Now the doc/doxygen/index.html generated with it looks quite good and contains lots of useful and up-to-date information about Astaroth
This commit is contained in:
@@ -1,20 +1,23 @@
|
||||
Contributing
|
||||
============
|
||||
|
||||
# Contributing
|
||||
|
||||
Contributions to Astaroth are very welcome!
|
||||
|
||||
This document details how to create good contributions. There are two primary concerns:
|
||||
|
||||
0. The codebase should stay maintainable and commits should adhere to a consistent style.
|
||||
0. New additions should not disrupt the work of others.
|
||||
1. The codebase should stay maintainable and commits should adhere to a consistent style.
|
||||
2. New additions should not disrupt the work of others.
|
||||
|
||||
## Basic workflow
|
||||
|
||||
*"There is something that needs fixing"*
|
||||
> "There is something that needs fixing"
|
||||
|
||||
0. Create your work. See [Programming](#markdown-header-programming) and [Committing](#markdown-header-committing) .
|
||||
0. When done, check that autotests still pass by running `./ac_run -t`.
|
||||
0. **[Recommended]:** Autoformat your code. See [Formatting](#markdown-header-formatting).
|
||||
0. Create a pull request.
|
||||
1. Create your work. See [Programming](#markdown-header-programming) and [Committing](#markdown-header-committing) .
|
||||
2. When done, check that autotests still pass by running `./ac_run -t`.
|
||||
3. **[Recommended]:** Autoformat your code. See [Formatting](#markdown-header-formatting).
|
||||
4. Create a pull request.
|
||||
|
||||
## Programming
|
||||
* **Strive for code clarity over micro-optimizations.**
|
||||
|
25
README.md
25
README.md
@@ -1,8 +1,11 @@
|
||||

|
||||
Astaroth Documentation {#mainpage}
|
||||
============
|
||||
|
||||

|
||||
|
||||
# Astaroth - A Multi-GPU Library for Generic Stencil Computations
|
||||
|
||||
[Wiki](https://bitbucket.org/jpekkila/astaroth/wiki/Home) | [Issue Tracker](https://bitbucket.org/jpekkila/astaroth/issues?status=new&status=open) | [Contributing](https://bitbucket.org/jpekkila/astaroth/src/master/CONTRIBUTING.md) | [Licence](https://bitbucket.org/jpekkila/astaroth/src/master/LICENCE.txt)
|
||||
[Specification](doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md) | [Contributing](CONTRIBUTING.md) | [Licence](https://bitbucket.org/jpekkila/astaroth/src/master/LICENCE.txt) | [Issue Tracker](https://bitbucket.org/jpekkila/astaroth/issues?status=new&status=open) | [Wiki](https://bitbucket.org/jpekkila/astaroth/wiki/Home)
|
||||
|
||||
Astaroth is a multi-GPU library for three-dimensional stencil computations. It is designed especially for performing high-order stencil
|
||||
computations in structured grids, where several coupled fields are updated each time step. Astaroth consists of a multi-GPU and single-GPU
|
||||
@@ -11,7 +14,7 @@ makes Astaroth especially suitable for multiphysics simulations.
|
||||
|
||||
Astaroth is licenced under the terms of the GNU General Public Licence, version 3, or later
|
||||
(see [LICENCE.txt](https://bitbucket.org/miikkavaisala/astaroth-code/src/master/astaroth_2.0/LICENCE.txt)). For contributing guidelines,
|
||||
see [Contributing](https://bitbucket.org/jpekkila/astaroth/src/master/CONTRIBUTING.md).
|
||||
see [Contributing](CONTRIBUTING.md).
|
||||
|
||||
|
||||
## System Requirements
|
||||
@@ -24,14 +27,14 @@ Relative recent versions of
|
||||
|
||||
## Building
|
||||
|
||||
In the base astaroth directory, run
|
||||
In the base directory, run
|
||||
|
||||
0. `mkdir build`
|
||||
0. `cd build`
|
||||
0. `cmake ..`
|
||||
0. `make -j`
|
||||
1. `mkdir build`
|
||||
2. `cd build`
|
||||
3. `cmake ..`
|
||||
4. `make -j`
|
||||
|
||||
> **Optional:** Documentation can be generated with `doxygen doxyfile` (requires Doxygen). The
|
||||
> **Optional:** Documentation can be generated by running `doxygen` in the base directory. The
|
||||
generated documentation can be found in `doc/doxygen`.
|
||||
|
||||
> **Tip:** The library is configured by passing [options](#markdown-header-cmake-options) to CMake with `-D[option]=[ON|OFF]`.
|
||||
@@ -74,7 +77,7 @@ See `analysis/python/` directory of existing data visualization and analysis scr
|
||||
* `astaroth/include/astaroth.h`: Legacy interface for backwards compatibility and quick testing.
|
||||
* `astaroth/include/astaroth_node.h`: Multi-GPU interface (single node).
|
||||
* `astaroth/include/astaroth_device.h`: Single-GPU interface.
|
||||
* `astaroth/src/utils`: Utility library for host-side memory allocations, verification and other tasks.
|
||||
* `astaroth/src/utils/`: Utility library for host-side memory allocations, verification and other tasks.
|
||||
|
||||
## FAQ
|
||||
|
||||
@@ -92,5 +95,5 @@ Otherwise the build steps are the same. Run with `mpirun -np 4 ./mpitest`.
|
||||
|
||||
How do I contribute?
|
||||
|
||||
> See [Contributing](https://bitbucket.org/jpekkila/astaroth/src/master/CONTRIBUTING.md).
|
||||
> See [Contributing](CONTRIBUTING.md).
|
||||
|
||||
|
@@ -1,3 +1,6 @@
|
||||
Astaroth DSL compiler
|
||||
============
|
||||
|
||||
# Dependencies
|
||||
## Debian/Ubuntu
|
||||
`apt install flex bison build-essential`
|
||||
|
@@ -1,4 +1,7 @@
|
||||
# Astaroth specification and user manual
|
||||
Astaroth Specification and User Manual
|
||||
============
|
||||
|
||||
# Astaroth Specification and User Manual
|
||||
|
||||
Copyright (C) 2014-2019, Johannes Pekkila, Miikka Vaisala.
|
||||
|
||||
@@ -20,7 +23,7 @@ Copyright (C) 2014-2019, Johannes Pekkila, Miikka Vaisala.
|
||||
along with Astaroth. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
|
||||
# Introduction and background
|
||||
# Introduction and Background
|
||||
|
||||
Astaroth is a collection of tools for utilizing multiple graphics processing units (GPUs)
|
||||
efficiently in three-dimensional stencil computations. This document specifies the Astaroth
|
||||
@@ -67,8 +70,8 @@ to these publications in their work.
|
||||
|
||||
The Astroth application-programming interface (API) provides the means for controlling execution of
|
||||
user-defined and built-in functions on multiple graphics processing units. Functions in the API are
|
||||
prefixed with lower case ```ac```, while structures and data types are prefixed with capitalized
|
||||
```Ac```. Compile-time constants, such as definitions and enumerations, have the prefix ```AC_```.
|
||||
prefixed with lower case `ac`, while structures and data types are prefixed with capitalized
|
||||
`Ac`. Compile-time constants, such as definitions and enumerations, have the prefix `AC_`.
|
||||
All of the API functions return an AcResult value indicating either success or failure. The return
|
||||
codes are
|
||||
```C
|
||||
@@ -103,13 +106,13 @@ Finally, a third layer is provided for convenience and backwards compatibility.
|
||||
There are also several helper functions defined in `include/astaroth_defines.h`, which can be used for, say, determining the size or performing index calculations within the simulation domain.
|
||||
|
||||
|
||||
## List of Astaroth API functions
|
||||
## List of Astaroth API Functions
|
||||
|
||||
Here's a non-exhaustive list of astaroth API functions. For more info and an up-to-date list, see
|
||||
the corresponding header files in `include/astaroth_defines.h`, `include/astaroth.h`, `include/
|
||||
astaroth_node.h`, `include/astaroth_device.h`.
|
||||
|
||||
### Initialization, quitting and helper functions
|
||||
### Initialization, Quitting and Helper Functions
|
||||
|
||||
Device layer.
|
||||
```C
|
||||
@@ -137,7 +140,7 @@ size_t acVertexBufferCompdomainSizeBytes(const AcMeshInfo info);
|
||||
size_t acVertexBufferIdx(const int i, const int j, const int k, const AcMeshInfo info);
|
||||
```
|
||||
|
||||
### Loading and storing
|
||||
### Loading and Storing
|
||||
|
||||
Loading meshes and vertex buffers to device memory.
|
||||
```C
|
||||
@@ -245,7 +248,7 @@ AcResult acNodeReduceVec(const Node node, const Stream stream_type, const Reduct
|
||||
const VertexBufferHandle vtxbuf2, AcReal* result);
|
||||
```
|
||||
|
||||
### Stream synchronization
|
||||
### Stream Synchronization
|
||||
|
||||
All library functions that take a `Stream` as a parameter are asynchronous. When calling these
|
||||
functions, control returns immediately back to the host even if the called device function has not
|
||||
@@ -273,7 +276,7 @@ barrierSynchronizeStream(STREAM_ALL); // Blocks until functions in all streams h
|
||||
funcD(STREAM_2); // Is started when command returns from synchronizeStream()
|
||||
```
|
||||
|
||||
### Data synchronization
|
||||
### Data Synchronization
|
||||
|
||||
Stream synchronization works in the same fashion on node and device layers. However on the node
|
||||
layer, one has to take in account that a portion of the mesh is shared between devices and that the
|
||||
@@ -291,14 +294,9 @@ AcResult acNodeSynchronizeVertexBuffer(const Node node, const Stream stream,
|
||||
|
||||
```
|
||||
|
||||
> **NOTE**: Local halos must be up to date before synchronizing the data. Local halos are the grid
|
||||
points outside the computational domain which are used only by a single device. The mesh is
|
||||
distributed to multiple devices by blocking along the z axis. If there are *n* devices and the z-
|
||||
dimension of the computational domain is *nz*, then each device is assigned *nz / n* two-
|
||||
dimensional planes. For example with two devices, the data block that has to be up to date ranges
|
||||
from *(0, 0, nz)* to *(mx, my, nz + 2 * NGHOST)*
|
||||
> **NOTE**: Local halos must be up to date before synchronizing the data. Local halos are the grid points outside the computational domain which are used only by a single device. The mesh is distributed to multiple devices by blocking along the z axis. If there are *n* devices and the z-dimension of the computational domain is *nz*, then each device is assigned *nz / n* two-dimensional planes. For example with two devices, the data block that has to be up to date ranges from *(0, 0, nz)* to *(mx, my, nz + 2 * NGHOST)*.
|
||||
|
||||
### Input and output buffers
|
||||
### Input and Output Buffers
|
||||
|
||||
The mesh is duplicated to input and output buffers for performance reasons. The input buffers are
|
||||
read-only in user-specified compute kernels, which allows us to read them via the texture cache
|
||||
@@ -313,10 +311,7 @@ is done via the API calls
|
||||
AcResult acDeviceSwapBuffers(const Device device);
|
||||
AcResult acNodeSwapBuffers(const Node node);
|
||||
```
|
||||
> **NOTE**: All functions provided with the API operate on input buffers and ensure that the
|
||||
complete result is available in the input buffer when the function has completed. User-specified
|
||||
kernels are exceptions and write the result to output buffers. Therefore buffers have to be swapped
|
||||
only after calling user-specified kernels.
|
||||
> **NOTE**: All functions provided with the API operate on input buffers and ensure that the complete result is available in the input buffer when the function has completed. User-specified kernels are exceptions and write the result to output buffers. Therefore buffers have to be swapped only after calling user-specified kernels.
|
||||
|
||||
## Devices
|
||||
|
||||
@@ -420,7 +415,7 @@ Let *i* be the device id. The portion of the halos shared by neighboring devices
|
||||
`acNodeSynchronizeVertexBuffer` and `acNodeSynchronizeMesh` communicate these shared areas among
|
||||
the devices in the node.
|
||||
|
||||
## Integration, reductions and boundary conditions
|
||||
## Integration, Reductions and Boundary Conditions
|
||||
|
||||
The library provides the following functions for integration, reductions and computing periodic
|
||||
boundary conditions.
|
||||
@@ -487,18 +482,18 @@ pipeline shown in the following figure.
|
||||
|
||||
| Stage | File ending | Description |
|
||||
|--------------------|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| Stencil assembly | .sas | Defines the shape of the stencils and functions to be preprocessed before entering the stencil processing stage. Reading from input arrays is only possible during this stage. |
|
||||
| Stencil process | .sps | The functions executed on streams of data are defined here. Contains kernels, which are essentially main functions of GPU programs. |
|
||||
| Stencil definition | .sdh | All field identifiers and constant memory symbols are defined in this file. |
|
||||
| Any | .h | Optional header files which can be included in any other file.
|
||||
| Stencil assembly | .ac | Defines the shape of the stencils and functions to be preprocessed before entering the stencil processing stage. Reading from input arrays is only possible during this stage. |
|
||||
| Stencil process | .ac | The functions executed on streams of data are defined here. Contains kernels, which are essentially main functions of GPU programs. |
|
||||
| Stencil definition | .ac | All field identifiers and constant memory symbols are defined in this file. |
|
||||
| Any | .h | Optional header files which can be included in any file.
|
||||
|
||||
Compilation of the DSL files is integrated into `CMakelists.txt` provided with the library and
|
||||
dependencies are recompiled if needed when calling `make`. All DSL files should reside in the same
|
||||
directory and there should be only one `.sas`, `.sps` and `.sdh` file. There may be any number of
|
||||
directory and there should be only one `.ac` file. There may be any number of
|
||||
optional `.h` files. When configuring the project, the user should pass the path to the DSL
|
||||
directory as a cmake option like so: ```cmake -DDSL_MODULE_DIR="some/user/dir" ..```.
|
||||
|
||||
## Data types
|
||||
## Data Types
|
||||
|
||||
In addition to basic datatypes in C/C++/CUDA, such as int and int3, we provide the following datatypes with the DSL.
|
||||
|
||||
@@ -517,13 +512,13 @@ In addition to basic datatypes in C/C++/CUDA, such as int and int3, we provide t
|
||||
`Scalars` are 32-bit floating-point numbers by default. Double precision can be turned on by setting cmake option `DOUBLE_PRECISION=ON`.
|
||||
All real number literals are converted automatically to the correct precision. In cases where , the precision can be declared explicitly by appending `f` or `d` postfix to the real number. For example,
|
||||
```C
|
||||
1.0 // The same precision as Scalar/AcReal
|
||||
1.0f // Explicit float
|
||||
1.0d // Explicit double
|
||||
1.0 // The same precision as Scalar/AcReal
|
||||
1.0f // Explicit float
|
||||
1.0d // Explicit double
|
||||
(1.0f * 1.0d) // 1.0f is implicitly cast to double and the multiplication is done in double precision.
|
||||
```
|
||||
|
||||
## Control flow
|
||||
## Control Flow
|
||||
|
||||
Conditional statements are expressed with the `if-else` construct. Unlike in C and C++, we require
|
||||
that the scope of the `if-else` statement is explicitly declared using braces `{` and `}` in order
|
||||
@@ -566,19 +561,21 @@ The following built-in variables are available in `Kernel`s.
|
||||
| globalVertexIdx | Holds the global index of the currently processed vertex. If there is only single device, then vertexIdx is the same as globalVertexIdx. Otherwise globalVertexIdx is offset accordingly. |
|
||||
| globalGridN | Holds the dimensions of the computational domain. |
|
||||
|
||||
## Preprocessed functions
|
||||
## Preprocessed Functions
|
||||
|
||||
The type qualifier `Preprocessed` indicates which functions should be evaluated immediately when
|
||||
entering a `Kernel` function. The return values of `Preprocessed` functions are cached and calling
|
||||
these functions during the stencil processing stage is essentially free. As main memory bandwidth is
|
||||
significantly slower than on-chip memories and registers, declaring reading-heavy functions as
|
||||
`Preprocessed` is critical for obtaining good performance in stencil codes.
|
||||
|
||||
`Preprocessed` functions may only be defined in stencil assembly files.
|
||||
`Preprocessed` is critical for obtaining good performance in stencil codes.
|
||||
|
||||
The built-in variables `vertexIdx`, `globalVertexidx` and `globalGridN` are available in all
|
||||
`Preprocessed` functions.
|
||||
|
||||
## Device Functions
|
||||
|
||||
The type qualifier `Device` indicates which functions can be called from `Kernel` functions or other `Device` functions.
|
||||
|
||||
## Uniforms
|
||||
|
||||
`Uniform`s are global device variables which stay constant for the duration of a kernel launch.
|
||||
@@ -603,17 +600,23 @@ Instead, one should load the appropriate values during runtime using the `acLoad
|
||||
related functions.
|
||||
|
||||
|
||||
## Standard libraries
|
||||
## Standard Libraries
|
||||
|
||||
> Not implemented
|
||||
The following table lists the standard libraries currently available.
|
||||
|
||||
## Performance considerations
|
||||
| Built-in variable | Description |
|
||||
|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| stdderiv.h | Contains functions for computing 2nd, 4th, 6th and 8th order derivatives (configured by defining the STENCIL_ORDER before including stdderiv.h) |
|
||||
|
||||
Astaroth DSL libraries can be included in the same way as C/C++ headers. For example, `#include <stdderiv.h>`.
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
Uniforms are as fast as compile-time constants as long as
|
||||
|
||||
0. The halting condition of a tight loop does not depend on an uniform or a variable, as this would prevent unrolling of the loop during compile-time.
|
||||
0. Uniforms are not multiplied with each other. The result should be stored in an auxiliary uniform instead. For example, the result of `nx * ny` should be stored in a new `uniform nxy`
|
||||
0. At least 32 neighboring streams in the x-axis access the same `uniform`. That is, the vertices at vertexIdx.x = i... i + 32 should access the same `uniform` where i is a multiple of 32.
|
||||
1. The halting condition of a tight loop does not depend on an uniform or a variable, as this would prevent unrolling of the loop during compile-time.
|
||||
2. Uniforms are not multiplied with each other. The result should be stored in an auxiliary uniform instead. For example, the result of `nx * ny` should be stored in a new `uniform nxy`
|
||||
3. At least 32 neighboring streams in the x-axis access the same `uniform`. That is, the vertices at vertexIdx.x = i... i + 32 should access the same `uniform` where i is a multiple of 32.
|
||||
|
||||
|
||||
|
||||
|
10
doxyfile
10
doxyfile
@@ -771,7 +771,7 @@ WARN_LOGFILE = doc/doxygen/doxygen_warnings.log
|
||||
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
|
||||
# Note: If this tag is empty the current directory is searched.
|
||||
|
||||
INPUT = src include
|
||||
INPUT =
|
||||
|
||||
# This tag can be used to specify the character encoding of the source files
|
||||
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
|
||||
@@ -796,7 +796,7 @@ INPUT_ENCODING = UTF-8
|
||||
# *.m, *.markdown, *.md, *.mm, *.dox, *.py, *.pyw, *.f90, *.f, *.for, *.tcl,
|
||||
# *.vhd, *.vhdl, *.ucf, *.qsf, *.as and *.js.
|
||||
|
||||
FILE_PATTERNS = *.cc *.h *.cu *.cuh
|
||||
FILE_PATTERNS = *.c *.cc *.h *.cu *.cuh *.md
|
||||
|
||||
# The RECURSIVE tag can be used to specify whether or not subdirectories should
|
||||
# be searched for input files as well.
|
||||
@@ -920,7 +920,7 @@ FILTER_SOURCE_PATTERNS =
|
||||
# (index.html). This can be useful if you have a project on for instance GitHub
|
||||
# and want to reuse the introduction page also for the doxygen output.
|
||||
|
||||
USE_MDFILE_AS_MAINPAGE =
|
||||
USE_MDFILE_AS_MAINPAGE =
|
||||
|
||||
#---------------------------------------------------------------------------
|
||||
# Configuration options related to source browsing
|
||||
@@ -2314,7 +2314,7 @@ DIRECTORY_GRAPH = YES
|
||||
# The default value is: png.
|
||||
# This tag requires that the tag HAVE_DOT is set to YES.
|
||||
|
||||
DOT_IMAGE_FORMAT = png
|
||||
DOT_IMAGE_FORMAT = svg
|
||||
|
||||
# If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to
|
||||
# enable generation of interactive SVG images that allow zooming and panning.
|
||||
@@ -2326,7 +2326,7 @@ DOT_IMAGE_FORMAT = png
|
||||
# The default value is: NO.
|
||||
# This tag requires that the tag HAVE_DOT is set to YES.
|
||||
|
||||
INTERACTIVE_SVG = NO
|
||||
INTERACTIVE_SVG = YES
|
||||
|
||||
# The DOT_PATH tag can be used to specify the path where the dot tool can be
|
||||
# found. If left blank, it is assumed the dot tool can be found in the path.
|
||||
|
@@ -41,16 +41,48 @@ typedef struct {
|
||||
Grid subgrid;
|
||||
} DeviceConfiguration;
|
||||
|
||||
/** */
|
||||
/**
|
||||
Initializes all devices on the current node.
|
||||
|
||||
Devices on the node are configured based on the contents of AcMesh.
|
||||
|
||||
@return Exit status. Places the newly created handle in the output parameter.
|
||||
@see AcMeshInfo
|
||||
|
||||
|
||||
Usage example:
|
||||
@code
|
||||
AcMeshInfo info;
|
||||
acLoadConfig(AC_DEFAULT_CONFIG, &info);
|
||||
|
||||
Node node;
|
||||
acNodeCreate(0, info, &node);
|
||||
acNodeDestroy(node);
|
||||
@endcode
|
||||
*/
|
||||
AcResult acNodeCreate(const int id, const AcMeshInfo node_config, Node* node);
|
||||
|
||||
/** */
|
||||
/**
|
||||
Resets all devices on the current node.
|
||||
|
||||
@see acNodeCreate()
|
||||
*/
|
||||
AcResult acNodeDestroy(Node node);
|
||||
|
||||
/** */
|
||||
/**
|
||||
Prints information about the devices available on the current node.
|
||||
|
||||
Requires that Node has been initialized with
|
||||
@See acNodeCreate().
|
||||
*/
|
||||
AcResult acNodePrintInfo(const Node node);
|
||||
|
||||
/** */
|
||||
/**
|
||||
|
||||
|
||||
|
||||
@see DeviceConfiguration
|
||||
*/
|
||||
AcResult acNodeQueryDeviceConfiguration(const Node node, DeviceConfiguration* config);
|
||||
|
||||
/** */
|
||||
|
Reference in New Issue
Block a user