Various small improvements to the website (navigation panel, better headings, formatting, etc)
This commit is contained in:
@@ -1,6 +1,3 @@
|
|||||||
Contributing
|
|
||||||
============
|
|
||||||
|
|
||||||
# Contributing
|
# Contributing
|
||||||
|
|
||||||
Contributions to Astaroth are very welcome!
|
Contributions to Astaroth are very welcome!
|
||||||
|
@@ -1,9 +1,4 @@
|
|||||||
Astaroth Documentation {#mainpage}
|
# Astaroth - A Multi-GPU Library for Generic Stencil Computations {#mainpage}
|
||||||
============
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
# Astaroth - A Multi-GPU Library for Generic Stencil Computations
|
|
||||||
|
|
||||||
[Specification](doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md) | [Contributing](CONTRIBUTING.md) | [Licence](LICENCE.md) | [Issue Tracker](https://bitbucket.org/jpekkila/astaroth/issues?status=new&status=open) | [Wiki](https://bitbucket.org/jpekkila/astaroth/wiki/Home)
|
[Specification](doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md) | [Contributing](CONTRIBUTING.md) | [Licence](LICENCE.md) | [Issue Tracker](https://bitbucket.org/jpekkila/astaroth/issues?status=new&status=open) | [Wiki](https://bitbucket.org/jpekkila/astaroth/wiki/Home)
|
||||||
|
|
||||||
|
@@ -1,6 +1,3 @@
|
|||||||
Astaroth Specification and User Manual
|
|
||||||
============
|
|
||||||
|
|
||||||
# Astaroth Specification and User Manual
|
# Astaroth Specification and User Manual
|
||||||
|
|
||||||
Copyright (C) 2014-2020, Johannes Pekkila, Miikka Vaisala.
|
Copyright (C) 2014-2020, Johannes Pekkila, Miikka Vaisala.
|
||||||
@@ -52,17 +49,17 @@ usable via the Astaroth API. While the Astaroth library is written in C++/CUDA,
|
|||||||
the C99 standard.
|
the C99 standard.
|
||||||
|
|
||||||
|
|
||||||
# Publications
|
## Publications
|
||||||
|
|
||||||
The foundational work was done in (Väisälä, Pekkilä, 2017) and the library, API and DSL described
|
The foundational work was done in (Väisälä, Pekkilä, 2017) and the library, API and DSL described
|
||||||
in this document were introduced in (Pekkilä, 2019). We kindly wish the users of Astaroth to cite
|
in this document were introduced in (Pekkilä, 2019). We kindly wish the users of Astaroth to cite
|
||||||
to these publications in their work.
|
to these publications in their work.
|
||||||
|
|
||||||
> J. Pekkilä, Astaroth: A Library for Stencil Computations on Graphics Processing Units. Master's thesis, Aalto University School of Science, Espoo, Finland, 2019.
|
> [J. Pekkilä, Astaroth: A Library for Stencil Computations on Graphics Processing Units. Master's thesis, Aalto University School of Science, Espoo, Finland, 2019.](http://urn.fi/URN:NBN:fi:aalto-201906233993)
|
||||||
|
|
||||||
> M. S. Väisälä, Magnetic Phenomena of the Interstellar Medium in Theory and Observation. PhD thesis, University of Helsinki, Finland, 2017.
|
> [M. S. Väisälä, Magnetic Phenomena of the Interstellar Medium in Theory and Observation. PhD thesis, University of Helsinki, Finland, 2017.](http://urn.fi/URN:ISBN:978-951-51-2778-5)
|
||||||
|
|
||||||
> J. Pekkilä, M. S. Väisälä, M. Käpylä, P. J. Käpylä, and O. Anjum, “Methods for compressible fluid simulation on GPUs using high-order finite differences, ”Computer Physics Communications, vol. 217, pp. 11–22, Aug. 2017.
|
> [J. Pekkilä, M. S. Väisälä, M. Käpylä, P. J. Käpylä, and O. Anjum, “Methods for compressible fluid simulation on GPUs using high-order finite differences, ”Computer Physics Communications, vol. 217, pp. 11–22, Aug. 2017.](https://doi.org/10.1016/j.cpc.2017.03.011)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -218,9 +215,10 @@ AcResult acDeviceLoadMeshInfo(const Device device, const Stream stream,
|
|||||||
const AcMeshInfo device_config);
|
const AcMeshInfo device_config);
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Integration, Reductions and Boundary Conditions
|
||||||
|
|
||||||
### Computation
|
The library provides the following functions for integration, reductions and computing periodic
|
||||||
|
boundary conditions.
|
||||||
```C
|
```C
|
||||||
AcResult acDeviceIntegrateSubstep(const Device device, const Stream stream, const int step_number,
|
AcResult acDeviceIntegrateSubstep(const Device device, const Stream stream, const int step_number,
|
||||||
const int3 start, const int3 end, const AcReal dt);
|
const int3 start, const int3 end, const AcReal dt);
|
||||||
@@ -248,7 +246,16 @@ AcResult acNodeReduceVec(const Node node, const Stream stream_type, const Reduct
|
|||||||
const VertexBufferHandle vtxbuf2, AcReal* result);
|
const VertexBufferHandle vtxbuf2, AcReal* result);
|
||||||
```
|
```
|
||||||
|
|
||||||
### Stream Synchronization
|
Finally, there's a library function that is automatically generated for all user-specified `Kernel`
|
||||||
|
functions written with the Astaroth DSL,
|
||||||
|
```C
|
||||||
|
AcResult acDeviceKernel_##identifier(const Device device, const Stream stream,
|
||||||
|
const int3 start, const int3 end);
|
||||||
|
```
|
||||||
|
Where `##identifier` is replaced with the name of the user-specified kernel. For example, a device
|
||||||
|
function `Kernel solve()` can be called with `acDeviceKernel_solve()` via the API.
|
||||||
|
|
||||||
|
## Stream Synchronization
|
||||||
|
|
||||||
All library functions that take a `Stream` as a parameter are asynchronous. When calling these
|
All library functions that take a `Stream` as a parameter are asynchronous. When calling these
|
||||||
functions, control returns immediately back to the host even if the called device function has not
|
functions, control returns immediately back to the host even if the called device function has not
|
||||||
@@ -270,13 +277,20 @@ synchronized at once by passing the alias `STREAM_ALL` to the synchronization fu
|
|||||||
Usage of streams is demonstrated with the following example.
|
Usage of streams is demonstrated with the following example.
|
||||||
```C
|
```C
|
||||||
funcA(STREAM_0);
|
funcA(STREAM_0);
|
||||||
funcB(STREAM_0); // Blocks until funcA has completed
|
funcB(STREAM_0); // Blocks until funcA has completed
|
||||||
funcC(STREAM_1); // May execute in parallel with funcB
|
funcC(STREAM_1); // May execute in parallel with funcB
|
||||||
barrierSynchronizeStream(STREAM_ALL); // Blocks until functions in all streams have completed
|
barrierSynchronizeStream(STREAM_ALL); // Blocks until functions in all streams have completed
|
||||||
funcD(STREAM_2); // Is started when command returns from synchronizeStream()
|
funcD(STREAM_2); // Is started when command returns from synchronizeStream()
|
||||||
```
|
```
|
||||||
|
|
||||||
### Data Synchronization
|
Astaroth API provides the following functions for barrier synchronization.
|
||||||
|
```C
|
||||||
|
AcResult acSynchronize(void);
|
||||||
|
AcResult acNodeSynchronizeStream(const Node node, const Stream stream);
|
||||||
|
AcResult acDeviceSynchronizeStream(const Device device, const Stream stream);
|
||||||
|
```
|
||||||
|
|
||||||
|
## Data Synchronization
|
||||||
|
|
||||||
Stream synchronization works in the same fashion on node and device layers. However on the node
|
Stream synchronization works in the same fashion on node and device layers. However on the node
|
||||||
layer, one has to take in account that a portion of the mesh is shared between devices and that the
|
layer, one has to take in account that a portion of the mesh is shared between devices and that the
|
||||||
@@ -296,7 +310,7 @@ AcResult acNodeSynchronizeVertexBuffer(const Node node, const Stream stream,
|
|||||||
|
|
||||||
> **NOTE**: Local halos must be up to date before synchronizing the data. Local halos are the grid points outside the computational domain which are used only by a single device. The mesh is distributed to multiple devices by blocking along the z axis. If there are *n* devices and the z-dimension of the computational domain is *nz*, then each device is assigned *nz / n* two-dimensional planes. For example with two devices, the data block that has to be up to date ranges from *(0, 0, nz)* to *(mx, my, nz + 2 * NGHOST)*.
|
> **NOTE**: Local halos must be up to date before synchronizing the data. Local halos are the grid points outside the computational domain which are used only by a single device. The mesh is distributed to multiple devices by blocking along the z axis. If there are *n* devices and the z-dimension of the computational domain is *nz*, then each device is assigned *nz / n* two-dimensional planes. For example with two devices, the data block that has to be up to date ranges from *(0, 0, nz)* to *(mx, my, nz + 2 * NGHOST)*.
|
||||||
|
|
||||||
### Input and Output Buffers
|
## Input and Output Buffers
|
||||||
|
|
||||||
The mesh is duplicated to input and output buffers for performance reasons. The input buffers are
|
The mesh is duplicated to input and output buffers for performance reasons. The input buffers are
|
||||||
read-only in user-specified compute kernels, which allows us to read them via the texture cache
|
read-only in user-specified compute kernels, which allows us to read them via the texture cache
|
||||||
@@ -357,14 +371,14 @@ Meshes are the primary structures for passing information to the library and ker
|
|||||||
of a `Mesh` is declared as
|
of a `Mesh` is declared as
|
||||||
```C
|
```C
|
||||||
typedef struct {
|
typedef struct {
|
||||||
int int_params[NUM_INT_PARAMS];
|
int int_params[NUM_INT_PARAMS];
|
||||||
int3 int3_params[NUM_INT3_PARAMS];
|
int3 int3_params[NUM_INT3_PARAMS];
|
||||||
AcReal real_params[NUM_REAL_PARAMS];
|
AcReal real_params[NUM_REAL_PARAMS];
|
||||||
AcReal3 real3_params[NUM_REAL3_PARAMS];
|
AcReal3 real3_params[NUM_REAL3_PARAMS];
|
||||||
} AcMeshInfo;
|
} AcMeshInfo;
|
||||||
|
|
||||||
typedef struct {
|
typedef struct {
|
||||||
AcReal* vertex_buffer[NUM_VTXBUF_HANDLES];
|
AcReal* vertex_buffer[NUM_VTXBUF_HANDLES];
|
||||||
AcMeshInfo info;
|
AcMeshInfo info;
|
||||||
} AcMesh;
|
} AcMesh;
|
||||||
```
|
```
|
||||||
@@ -415,45 +429,7 @@ Let *i* be the device id. The portion of the halos shared by neighboring devices
|
|||||||
`acNodeSynchronizeVertexBuffer` and `acNodeSynchronizeMesh` communicate these shared areas among
|
`acNodeSynchronizeVertexBuffer` and `acNodeSynchronizeMesh` communicate these shared areas among
|
||||||
the devices in the node.
|
the devices in the node.
|
||||||
|
|
||||||
## Integration, Reductions and Boundary Conditions
|
> **NOTE:** The decomposition scheme is subject to change.
|
||||||
|
|
||||||
The library provides the following functions for integration, reductions and computing periodic
|
|
||||||
boundary conditions.
|
|
||||||
```C
|
|
||||||
AcResult acDeviceIntegrateSubstep(const Device device, const Stream stream, const int step_number,
|
|
||||||
const int3 start, const int3 end, const AcReal dt);
|
|
||||||
AcResult acDevicePeriodicBoundcondStep(const Device device, const Stream stream,
|
|
||||||
const VertexBufferHandle vtxbuf_handle, const int3 start,
|
|
||||||
const int3 end);
|
|
||||||
AcResult acDevicePeriodicBoundconds(const Device device, const Stream stream, const int3 start,
|
|
||||||
const int3 end);
|
|
||||||
AcResult acDeviceReduceScal(const Device device, const Stream stream, const ReductionType rtype,
|
|
||||||
const VertexBufferHandle vtxbuf_handle, AcReal* result);
|
|
||||||
AcResult acDeviceReduceVec(const Device device, const Stream stream_type, const ReductionType rtype,
|
|
||||||
const VertexBufferHandle vtxbuf0, const VertexBufferHandle vtxbuf1,
|
|
||||||
const VertexBufferHandle vtxbuf2, AcReal* result);
|
|
||||||
|
|
||||||
AcResult acNodeIntegrateSubstep(const Node node, const Stream stream, const int step_number,
|
|
||||||
const int3 start, const int3 end, const AcReal dt);
|
|
||||||
AcResult acNodeIntegrate(const Node node, const AcReal dt);
|
|
||||||
AcResult acNodePeriodicBoundcondStep(const Node node, const Stream stream,
|
|
||||||
const VertexBufferHandle vtxbuf_handle);
|
|
||||||
AcResult acNodePeriodicBoundconds(const Node node, const Stream stream);
|
|
||||||
AcResult acNodeReduceScal(const Node node, const Stream stream, const ReductionType rtype,
|
|
||||||
const VertexBufferHandle vtxbuf_handle, AcReal* result);
|
|
||||||
AcResult acNodeReduceVec(const Node node, const Stream stream_type, const ReductionType rtype,
|
|
||||||
const VertexBufferHandle vtxbuf0, const VertexBufferHandle vtxbuf1,
|
|
||||||
const VertexBufferHandle vtxbuf2, AcReal* result);
|
|
||||||
```
|
|
||||||
|
|
||||||
Finally, there's a library function that is automatically generated for all user-specified `Kernel`
|
|
||||||
functions written with the Astaroth DSL,
|
|
||||||
```C
|
|
||||||
AcResult acDeviceKernel_##identifier(const Device device, const Stream stream,
|
|
||||||
const int3 start, const int3 end);
|
|
||||||
```
|
|
||||||
Where `##identifier` is replaced with the name of the user-specified kernel. For example, a device
|
|
||||||
function `Kernel solve()` can be called with `acDeviceKernel_solve()` via the API.
|
|
||||||
|
|
||||||
# Astaroth Domain-Specific Language
|
# Astaroth Domain-Specific Language
|
||||||
|
|
||||||
|
BIN
doc/astaroth_logo_small.png
Normal file
BIN
doc/astaroth_logo_small.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.5 KiB |
10
doxyfile
10
doxyfile
@@ -38,7 +38,7 @@ PROJECT_NAME = "Astaroth"
|
|||||||
# could be handy for archiving the generated documentation or if some version
|
# could be handy for archiving the generated documentation or if some version
|
||||||
# control system is used.
|
# control system is used.
|
||||||
|
|
||||||
PROJECT_NUMBER =
|
PROJECT_NUMBER = 2.1
|
||||||
|
|
||||||
# Using the PROJECT_BRIEF tag one can provide an optional one line description
|
# Using the PROJECT_BRIEF tag one can provide an optional one line description
|
||||||
# for a project that appears at the top of each page and should give viewer a
|
# for a project that appears at the top of each page and should give viewer a
|
||||||
@@ -51,7 +51,7 @@ PROJECT_BRIEF =
|
|||||||
# pixels and the maximum width should not exceed 200 pixels. Doxygen will copy
|
# pixels and the maximum width should not exceed 200 pixels. Doxygen will copy
|
||||||
# the logo to the output directory.
|
# the logo to the output directory.
|
||||||
|
|
||||||
PROJECT_LOGO =
|
PROJECT_LOGO = doc/astaroth_logo_small.png
|
||||||
|
|
||||||
# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) path
|
# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) path
|
||||||
# into which the generated documentation will be written. If a relative path is
|
# into which the generated documentation will be written. If a relative path is
|
||||||
@@ -242,7 +242,7 @@ TCL_SUBST =
|
|||||||
# members will be omitted, etc.
|
# members will be omitted, etc.
|
||||||
# The default value is: NO.
|
# The default value is: NO.
|
||||||
|
|
||||||
OPTIMIZE_OUTPUT_FOR_C = NO
|
OPTIMIZE_OUTPUT_FOR_C = YES
|
||||||
|
|
||||||
# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java or
|
# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java or
|
||||||
# Python sources only. Doxygen will then generate output that is more tailored
|
# Python sources only. Doxygen will then generate output that is more tailored
|
||||||
@@ -1187,7 +1187,7 @@ HTML_TIMESTAMP = NO
|
|||||||
# The default value is: NO.
|
# The default value is: NO.
|
||||||
# This tag requires that the tag GENERATE_HTML is set to YES.
|
# This tag requires that the tag GENERATE_HTML is set to YES.
|
||||||
|
|
||||||
HTML_DYNAMIC_SECTIONS = NO
|
HTML_DYNAMIC_SECTIONS = YES
|
||||||
|
|
||||||
# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of entries
|
# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of entries
|
||||||
# shown in the various tree structured indices initially; the user can expand
|
# shown in the various tree structured indices initially; the user can expand
|
||||||
@@ -1416,7 +1416,7 @@ DISABLE_INDEX = NO
|
|||||||
# The default value is: NO.
|
# The default value is: NO.
|
||||||
# This tag requires that the tag GENERATE_HTML is set to YES.
|
# This tag requires that the tag GENERATE_HTML is set to YES.
|
||||||
|
|
||||||
GENERATE_TREEVIEW = NO
|
GENERATE_TREEVIEW = YES
|
||||||
|
|
||||||
# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values that
|
# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values that
|
||||||
# doxygen will group on one line in the generated HTML documentation.
|
# doxygen will group on one line in the generated HTML documentation.
|
||||||
|
@@ -16,6 +16,13 @@
|
|||||||
You should have received a copy of the GNU General Public License
|
You should have received a copy of the GNU General Public License
|
||||||
along with Astaroth. If not, see <http://www.gnu.org/licenses/>.
|
along with Astaroth. If not, see <http://www.gnu.org/licenses/>.
|
||||||
*/
|
*/
|
||||||
|
/**
|
||||||
|
* @file Single-Device Interface
|
||||||
|
* \brief Provides functions for controlling a single device.
|
||||||
|
*
|
||||||
|
* Detailed info.
|
||||||
|
*
|
||||||
|
*/
|
||||||
#pragma once
|
#pragma once
|
||||||
|
|
||||||
#ifdef __cplusplus
|
#ifdef __cplusplus
|
||||||
|
Reference in New Issue
Block a user