diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 6bff505..24cae80 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,20 +1,23 @@
+Contributing
+============
+
# Contributing
Contributions to Astaroth are very welcome!
This document details how to create good contributions. There are two primary concerns:
-0. The codebase should stay maintainable and commits should adhere to a consistent style.
-0. New additions should not disrupt the work of others.
+1. The codebase should stay maintainable and commits should adhere to a consistent style.
+2. New additions should not disrupt the work of others.
## Basic workflow
-*"There is something that needs fixing"*
+> "There is something that needs fixing"
-0. Create your work. See [Programming](#markdown-header-programming) and [Committing](#markdown-header-committing) .
-0. When done, check that autotests still pass by running `./ac_run -t`.
-0. **[Recommended]:** Autoformat your code. See [Formatting](#markdown-header-formatting).
-0. Create a pull request.
+1. Create your work. See [Programming](#markdown-header-programming) and [Committing](#markdown-header-committing) .
+2. When done, check that autotests still pass by running `./ac_run -t`.
+3. **[Recommended]:** Autoformat your code. See [Formatting](#markdown-header-formatting).
+4. Create a pull request.
## Programming
* **Strive for code clarity over micro-optimizations.**
diff --git a/README.md b/README.md
index 9f5227a..7b31d29 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,11 @@
-
+Astaroth Documentation {#mainpage}
+============
+
+
# Astaroth - A Multi-GPU Library for Generic Stencil Computations
-[Wiki](https://bitbucket.org/jpekkila/astaroth/wiki/Home) | [Issue Tracker](https://bitbucket.org/jpekkila/astaroth/issues?status=new&status=open) | [Contributing](https://bitbucket.org/jpekkila/astaroth/src/master/CONTRIBUTING.md) | [Licence](https://bitbucket.org/jpekkila/astaroth/src/master/LICENCE.txt)
+[Specification](doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md) | [Contributing](CONTRIBUTING.md) | [Licence](https://bitbucket.org/jpekkila/astaroth/src/master/LICENCE.txt) | [Issue Tracker](https://bitbucket.org/jpekkila/astaroth/issues?status=new&status=open) | [Wiki](https://bitbucket.org/jpekkila/astaroth/wiki/Home)
Astaroth is a multi-GPU library for three-dimensional stencil computations. It is designed especially for performing high-order stencil
computations in structured grids, where several coupled fields are updated each time step. Astaroth consists of a multi-GPU and single-GPU
@@ -11,7 +14,7 @@ makes Astaroth especially suitable for multiphysics simulations.
Astaroth is licenced under the terms of the GNU General Public Licence, version 3, or later
(see [LICENCE.txt](https://bitbucket.org/miikkavaisala/astaroth-code/src/master/astaroth_2.0/LICENCE.txt)). For contributing guidelines,
-see [Contributing](https://bitbucket.org/jpekkila/astaroth/src/master/CONTRIBUTING.md).
+see [Contributing](CONTRIBUTING.md).
## System Requirements
@@ -24,14 +27,14 @@ Relative recent versions of
## Building
-In the base astaroth directory, run
+In the base directory, run
-0. `mkdir build`
-0. `cd build`
-0. `cmake ..`
-0. `make -j`
+1. `mkdir build`
+2. `cd build`
+3. `cmake ..`
+4. `make -j`
-> **Optional:** Documentation can be generated with `doxygen doxyfile` (requires Doxygen). The
+> **Optional:** Documentation can be generated by running `doxygen` in the base directory. The
generated documentation can be found in `doc/doxygen`.
> **Tip:** The library is configured by passing [options](#markdown-header-cmake-options) to CMake with `-D[option]=[ON|OFF]`.
@@ -74,7 +77,7 @@ See `analysis/python/` directory of existing data visualization and analysis scr
* `astaroth/include/astaroth.h`: Legacy interface for backwards compatibility and quick testing.
* `astaroth/include/astaroth_node.h`: Multi-GPU interface (single node).
* `astaroth/include/astaroth_device.h`: Single-GPU interface.
-* `astaroth/src/utils`: Utility library for host-side memory allocations, verification and other tasks.
+* `astaroth/src/utils/`: Utility library for host-side memory allocations, verification and other tasks.
## FAQ
@@ -92,5 +95,5 @@ Otherwise the build steps are the same. Run with `mpirun -np 4 ./mpitest`.
How do I contribute?
-> See [Contributing](https://bitbucket.org/jpekkila/astaroth/src/master/CONTRIBUTING.md).
+> See [Contributing](CONTRIBUTING.md).
diff --git a/acc/README.md b/acc/README.md
index 6197fed..467b63b 100644
--- a/acc/README.md
+++ b/acc/README.md
@@ -1,3 +1,6 @@
+Astaroth DSL compiler
+============
+
# Dependencies
## Debian/Ubuntu
`apt install flex bison build-essential`
diff --git a/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md b/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md
index 717df22..6f03032 100644
--- a/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md
+++ b/doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md
@@ -1,4 +1,7 @@
-# Astaroth specification and user manual
+Astaroth Specification and User Manual
+============
+
+# Astaroth Specification and User Manual
Copyright (C) 2014-2019, Johannes Pekkila, Miikka Vaisala.
@@ -20,7 +23,7 @@ Copyright (C) 2014-2019, Johannes Pekkila, Miikka Vaisala.
along with Astaroth. If not, see .
-# Introduction and background
+# Introduction and Background
Astaroth is a collection of tools for utilizing multiple graphics processing units (GPUs)
efficiently in three-dimensional stencil computations. This document specifies the Astaroth
@@ -67,8 +70,8 @@ to these publications in their work.
The Astroth application-programming interface (API) provides the means for controlling execution of
user-defined and built-in functions on multiple graphics processing units. Functions in the API are
-prefixed with lower case ```ac```, while structures and data types are prefixed with capitalized
-```Ac```. Compile-time constants, such as definitions and enumerations, have the prefix ```AC_```.
+prefixed with lower case `ac`, while structures and data types are prefixed with capitalized
+`Ac`. Compile-time constants, such as definitions and enumerations, have the prefix `AC_`.
All of the API functions return an AcResult value indicating either success or failure. The return
codes are
```C
@@ -103,13 +106,13 @@ Finally, a third layer is provided for convenience and backwards compatibility.
There are also several helper functions defined in `include/astaroth_defines.h`, which can be used for, say, determining the size or performing index calculations within the simulation domain.
-## List of Astaroth API functions
+## List of Astaroth API Functions
Here's a non-exhaustive list of astaroth API functions. For more info and an up-to-date list, see
the corresponding header files in `include/astaroth_defines.h`, `include/astaroth.h`, `include/
astaroth_node.h`, `include/astaroth_device.h`.
-### Initialization, quitting and helper functions
+### Initialization, Quitting and Helper Functions
Device layer.
```C
@@ -137,7 +140,7 @@ size_t acVertexBufferCompdomainSizeBytes(const AcMeshInfo info);
size_t acVertexBufferIdx(const int i, const int j, const int k, const AcMeshInfo info);
```
-### Loading and storing
+### Loading and Storing
Loading meshes and vertex buffers to device memory.
```C
@@ -245,7 +248,7 @@ AcResult acNodeReduceVec(const Node node, const Stream stream_type, const Reduct
const VertexBufferHandle vtxbuf2, AcReal* result);
```
-### Stream synchronization
+### Stream Synchronization
All library functions that take a `Stream` as a parameter are asynchronous. When calling these
functions, control returns immediately back to the host even if the called device function has not
@@ -273,7 +276,7 @@ barrierSynchronizeStream(STREAM_ALL); // Blocks until functions in all streams h
funcD(STREAM_2); // Is started when command returns from synchronizeStream()
```
-### Data synchronization
+### Data Synchronization
Stream synchronization works in the same fashion on node and device layers. However on the node
layer, one has to take in account that a portion of the mesh is shared between devices and that the
@@ -291,14 +294,9 @@ AcResult acNodeSynchronizeVertexBuffer(const Node node, const Stream stream,
```
-> **NOTE**: Local halos must be up to date before synchronizing the data. Local halos are the grid
-points outside the computational domain which are used only by a single device. The mesh is
-distributed to multiple devices by blocking along the z axis. If there are *n* devices and the z-
-dimension of the computational domain is *nz*, then each device is assigned *nz / n* two-
-dimensional planes. For example with two devices, the data block that has to be up to date ranges
-from *(0, 0, nz)* to *(mx, my, nz + 2 * NGHOST)*
+> **NOTE**: Local halos must be up to date before synchronizing the data. Local halos are the grid points outside the computational domain which are used only by a single device. The mesh is distributed to multiple devices by blocking along the z axis. If there are *n* devices and the z-dimension of the computational domain is *nz*, then each device is assigned *nz / n* two-dimensional planes. For example with two devices, the data block that has to be up to date ranges from *(0, 0, nz)* to *(mx, my, nz + 2 * NGHOST)*.
-### Input and output buffers
+### Input and Output Buffers
The mesh is duplicated to input and output buffers for performance reasons. The input buffers are
read-only in user-specified compute kernels, which allows us to read them via the texture cache
@@ -313,10 +311,7 @@ is done via the API calls
AcResult acDeviceSwapBuffers(const Device device);
AcResult acNodeSwapBuffers(const Node node);
```
-> **NOTE**: All functions provided with the API operate on input buffers and ensure that the
-complete result is available in the input buffer when the function has completed. User-specified
-kernels are exceptions and write the result to output buffers. Therefore buffers have to be swapped
-only after calling user-specified kernels.
+> **NOTE**: All functions provided with the API operate on input buffers and ensure that the complete result is available in the input buffer when the function has completed. User-specified kernels are exceptions and write the result to output buffers. Therefore buffers have to be swapped only after calling user-specified kernels.
## Devices
@@ -420,7 +415,7 @@ Let *i* be the device id. The portion of the halos shared by neighboring devices
`acNodeSynchronizeVertexBuffer` and `acNodeSynchronizeMesh` communicate these shared areas among
the devices in the node.
-## Integration, reductions and boundary conditions
+## Integration, Reductions and Boundary Conditions
The library provides the following functions for integration, reductions and computing periodic
boundary conditions.
@@ -487,18 +482,18 @@ pipeline shown in the following figure.
| Stage | File ending | Description |
|--------------------|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Stencil assembly | .sas | Defines the shape of the stencils and functions to be preprocessed before entering the stencil processing stage. Reading from input arrays is only possible during this stage. |
-| Stencil process | .sps | The functions executed on streams of data are defined here. Contains kernels, which are essentially main functions of GPU programs. |
-| Stencil definition | .sdh | All field identifiers and constant memory symbols are defined in this file. |
-| Any | .h | Optional header files which can be included in any other file.
+| Stencil assembly | .ac | Defines the shape of the stencils and functions to be preprocessed before entering the stencil processing stage. Reading from input arrays is only possible during this stage. |
+| Stencil process | .ac | The functions executed on streams of data are defined here. Contains kernels, which are essentially main functions of GPU programs. |
+| Stencil definition | .ac | All field identifiers and constant memory symbols are defined in this file. |
+| Any | .h | Optional header files which can be included in any file.
Compilation of the DSL files is integrated into `CMakelists.txt` provided with the library and
dependencies are recompiled if needed when calling `make`. All DSL files should reside in the same
-directory and there should be only one `.sas`, `.sps` and `.sdh` file. There may be any number of
+directory and there should be only one `.ac` file. There may be any number of
optional `.h` files. When configuring the project, the user should pass the path to the DSL
directory as a cmake option like so: ```cmake -DDSL_MODULE_DIR="some/user/dir" ..```.
-## Data types
+## Data Types
In addition to basic datatypes in C/C++/CUDA, such as int and int3, we provide the following datatypes with the DSL.
@@ -517,13 +512,13 @@ In addition to basic datatypes in C/C++/CUDA, such as int and int3, we provide t
`Scalars` are 32-bit floating-point numbers by default. Double precision can be turned on by setting cmake option `DOUBLE_PRECISION=ON`.
All real number literals are converted automatically to the correct precision. In cases where , the precision can be declared explicitly by appending `f` or `d` postfix to the real number. For example,
```C
-1.0 // The same precision as Scalar/AcReal
-1.0f // Explicit float
-1.0d // Explicit double
+1.0 // The same precision as Scalar/AcReal
+1.0f // Explicit float
+1.0d // Explicit double
(1.0f * 1.0d) // 1.0f is implicitly cast to double and the multiplication is done in double precision.
```
-## Control flow
+## Control Flow
Conditional statements are expressed with the `if-else` construct. Unlike in C and C++, we require
that the scope of the `if-else` statement is explicitly declared using braces `{` and `}` in order
@@ -566,19 +561,21 @@ The following built-in variables are available in `Kernel`s.
| globalVertexIdx | Holds the global index of the currently processed vertex. If there is only single device, then vertexIdx is the same as globalVertexIdx. Otherwise globalVertexIdx is offset accordingly. |
| globalGridN | Holds the dimensions of the computational domain. |
-## Preprocessed functions
+## Preprocessed Functions
The type qualifier `Preprocessed` indicates which functions should be evaluated immediately when
entering a `Kernel` function. The return values of `Preprocessed` functions are cached and calling
these functions during the stencil processing stage is essentially free. As main memory bandwidth is
significantly slower than on-chip memories and registers, declaring reading-heavy functions as
-`Preprocessed` is critical for obtaining good performance in stencil codes.
-
-`Preprocessed` functions may only be defined in stencil assembly files.
+`Preprocessed` is critical for obtaining good performance in stencil codes.
The built-in variables `vertexIdx`, `globalVertexidx` and `globalGridN` are available in all
`Preprocessed` functions.
+## Device Functions
+
+The type qualifier `Device` indicates which functions can be called from `Kernel` functions or other `Device` functions.
+
## Uniforms
`Uniform`s are global device variables which stay constant for the duration of a kernel launch.
@@ -603,17 +600,23 @@ Instead, one should load the appropriate values during runtime using the `acLoad
related functions.
-## Standard libraries
+## Standard Libraries
-> Not implemented
+The following table lists the standard libraries currently available.
-## Performance considerations
+| Built-in variable | Description |
+|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| stdderiv.h | Contains functions for computing 2nd, 4th, 6th and 8th order derivatives (configured by defining the STENCIL_ORDER before including stdderiv.h) |
+
+Astaroth DSL libraries can be included in the same way as C/C++ headers. For example, `#include `.
+
+## Performance Considerations
Uniforms are as fast as compile-time constants as long as
-0. The halting condition of a tight loop does not depend on an uniform or a variable, as this would prevent unrolling of the loop during compile-time.
-0. Uniforms are not multiplied with each other. The result should be stored in an auxiliary uniform instead. For example, the result of `nx * ny` should be stored in a new `uniform nxy`
-0. At least 32 neighboring streams in the x-axis access the same `uniform`. That is, the vertices at vertexIdx.x = i... i + 32 should access the same `uniform` where i is a multiple of 32.
+1. The halting condition of a tight loop does not depend on an uniform or a variable, as this would prevent unrolling of the loop during compile-time.
+2. Uniforms are not multiplied with each other. The result should be stored in an auxiliary uniform instead. For example, the result of `nx * ny` should be stored in a new `uniform nxy`
+3. At least 32 neighboring streams in the x-axis access the same `uniform`. That is, the vertices at vertexIdx.x = i... i + 32 should access the same `uniform` where i is a multiple of 32.
diff --git a/doxyfile b/doxyfile
index 7bab478..5f89d90 100644
--- a/doxyfile
+++ b/doxyfile
@@ -771,7 +771,7 @@ WARN_LOGFILE = doc/doxygen/doxygen_warnings.log
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
# Note: If this tag is empty the current directory is searched.
-INPUT = src include
+INPUT =
# This tag can be used to specify the character encoding of the source files
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
@@ -796,7 +796,7 @@ INPUT_ENCODING = UTF-8
# *.m, *.markdown, *.md, *.mm, *.dox, *.py, *.pyw, *.f90, *.f, *.for, *.tcl,
# *.vhd, *.vhdl, *.ucf, *.qsf, *.as and *.js.
-FILE_PATTERNS = *.cc *.h *.cu *.cuh
+FILE_PATTERNS = *.c *.cc *.h *.cu *.cuh *.md
# The RECURSIVE tag can be used to specify whether or not subdirectories should
# be searched for input files as well.
@@ -920,7 +920,7 @@ FILTER_SOURCE_PATTERNS =
# (index.html). This can be useful if you have a project on for instance GitHub
# and want to reuse the introduction page also for the doxygen output.
-USE_MDFILE_AS_MAINPAGE =
+USE_MDFILE_AS_MAINPAGE =
#---------------------------------------------------------------------------
# Configuration options related to source browsing
@@ -2314,7 +2314,7 @@ DIRECTORY_GRAPH = YES
# The default value is: png.
# This tag requires that the tag HAVE_DOT is set to YES.
-DOT_IMAGE_FORMAT = png
+DOT_IMAGE_FORMAT = svg
# If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to
# enable generation of interactive SVG images that allow zooming and panning.
@@ -2326,7 +2326,7 @@ DOT_IMAGE_FORMAT = png
# The default value is: NO.
# This tag requires that the tag HAVE_DOT is set to YES.
-INTERACTIVE_SVG = NO
+INTERACTIVE_SVG = YES
# The DOT_PATH tag can be used to specify the path where the dot tool can be
# found. If left blank, it is assumed the dot tool can be found in the path.
diff --git a/include/astaroth_node.h b/include/astaroth_node.h
index 61438f3..d1cb131 100644
--- a/include/astaroth_node.h
+++ b/include/astaroth_node.h
@@ -41,16 +41,48 @@ typedef struct {
Grid subgrid;
} DeviceConfiguration;
-/** */
+/**
+Initializes all devices on the current node.
+
+Devices on the node are configured based on the contents of AcMesh.
+
+@return Exit status. Places the newly created handle in the output parameter.
+@see AcMeshInfo
+
+
+Usage example:
+@code
+AcMeshInfo info;
+acLoadConfig(AC_DEFAULT_CONFIG, &info);
+
+Node node;
+acNodeCreate(0, info, &node);
+acNodeDestroy(node);
+@endcode
+ */
AcResult acNodeCreate(const int id, const AcMeshInfo node_config, Node* node);
-/** */
+/**
+Resets all devices on the current node.
+
+@see acNodeCreate()
+ */
AcResult acNodeDestroy(Node node);
-/** */
+/**
+Prints information about the devices available on the current node.
+
+Requires that Node has been initialized with
+@See acNodeCreate().
+*/
AcResult acNodePrintInfo(const Node node);
-/** */
+/**
+
+
+
+@see DeviceConfiguration
+*/
AcResult acNodeQueryDeviceConfiguration(const Node node, DeviceConfiguration* config);
/** */