Updated documentation and made it work with Doxygen. Now the doc/doxygen/index.html generated with it looks quite good and contains lots of useful and up-to-date information about Astaroth
This commit is contained in:
@@ -1,20 +1,23 @@
|
|||||||
|
Contributing
|
||||||
|
============
|
||||||
|
|
||||||
# Contributing
|
# Contributing
|
||||||
|
|
||||||
Contributions to Astaroth are very welcome!
|
Contributions to Astaroth are very welcome!
|
||||||
|
|
||||||
This document details how to create good contributions. There are two primary concerns:
|
This document details how to create good contributions. There are two primary concerns:
|
||||||
|
|
||||||
0. The codebase should stay maintainable and commits should adhere to a consistent style.
|
1. The codebase should stay maintainable and commits should adhere to a consistent style.
|
||||||
0. New additions should not disrupt the work of others.
|
2. New additions should not disrupt the work of others.
|
||||||
|
|
||||||
## Basic workflow
|
## Basic workflow
|
||||||
|
|
||||||
*"There is something that needs fixing"*
|
> "There is something that needs fixing"
|
||||||
|
|
||||||
0. Create your work. See [Programming](#markdown-header-programming) and [Committing](#markdown-header-committing) .
|
1. Create your work. See [Programming](#markdown-header-programming) and [Committing](#markdown-header-committing) .
|
||||||
0. When done, check that autotests still pass by running `./ac_run -t`.
|
2. When done, check that autotests still pass by running `./ac_run -t`.
|
||||||
0. **[Recommended]:** Autoformat your code. See [Formatting](#markdown-header-formatting).
|
3. **[Recommended]:** Autoformat your code. See [Formatting](#markdown-header-formatting).
|
||||||
0. Create a pull request.
|
4. Create a pull request.
|
||||||
|
|
||||||
## Programming
|
## Programming
|
||||||
* **Strive for code clarity over micro-optimizations.**
|
* **Strive for code clarity over micro-optimizations.**
|
||||||
|
25
README.md
25
README.md
@@ -1,8 +1,11 @@
|
|||||||

|
Astaroth Documentation {#mainpage}
|
||||||
|
============
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
# Astaroth - A Multi-GPU Library for Generic Stencil Computations
|
# Astaroth - A Multi-GPU Library for Generic Stencil Computations
|
||||||
|
|
||||||
[Wiki](https://bitbucket.org/jpekkila/astaroth/wiki/Home) | [Issue Tracker](https://bitbucket.org/jpekkila/astaroth/issues?status=new&status=open) | [Contributing](https://bitbucket.org/jpekkila/astaroth/src/master/CONTRIBUTING.md) | [Licence](https://bitbucket.org/jpekkila/astaroth/src/master/LICENCE.txt)
|
[Specification](doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md) | [Contributing](CONTRIBUTING.md) | [Licence](https://bitbucket.org/jpekkila/astaroth/src/master/LICENCE.txt) | [Issue Tracker](https://bitbucket.org/jpekkila/astaroth/issues?status=new&status=open) | [Wiki](https://bitbucket.org/jpekkila/astaroth/wiki/Home)
|
||||||
|
|
||||||
Astaroth is a multi-GPU library for three-dimensional stencil computations. It is designed especially for performing high-order stencil
|
Astaroth is a multi-GPU library for three-dimensional stencil computations. It is designed especially for performing high-order stencil
|
||||||
computations in structured grids, where several coupled fields are updated each time step. Astaroth consists of a multi-GPU and single-GPU
|
computations in structured grids, where several coupled fields are updated each time step. Astaroth consists of a multi-GPU and single-GPU
|
||||||
@@ -11,7 +14,7 @@ makes Astaroth especially suitable for multiphysics simulations.
|
|||||||
|
|
||||||
Astaroth is licenced under the terms of the GNU General Public Licence, version 3, or later
|
Astaroth is licenced under the terms of the GNU General Public Licence, version 3, or later
|
||||||
(see [LICENCE.txt](https://bitbucket.org/miikkavaisala/astaroth-code/src/master/astaroth_2.0/LICENCE.txt)). For contributing guidelines,
|
(see [LICENCE.txt](https://bitbucket.org/miikkavaisala/astaroth-code/src/master/astaroth_2.0/LICENCE.txt)). For contributing guidelines,
|
||||||
see [Contributing](https://bitbucket.org/jpekkila/astaroth/src/master/CONTRIBUTING.md).
|
see [Contributing](CONTRIBUTING.md).
|
||||||
|
|
||||||
|
|
||||||
## System Requirements
|
## System Requirements
|
||||||
@@ -24,14 +27,14 @@ Relative recent versions of
|
|||||||
|
|
||||||
## Building
|
## Building
|
||||||
|
|
||||||
In the base astaroth directory, run
|
In the base directory, run
|
||||||
|
|
||||||
0. `mkdir build`
|
1. `mkdir build`
|
||||||
0. `cd build`
|
2. `cd build`
|
||||||
0. `cmake ..`
|
3. `cmake ..`
|
||||||
0. `make -j`
|
4. `make -j`
|
||||||
|
|
||||||
> **Optional:** Documentation can be generated with `doxygen doxyfile` (requires Doxygen). The
|
> **Optional:** Documentation can be generated by running `doxygen` in the base directory. The
|
||||||
generated documentation can be found in `doc/doxygen`.
|
generated documentation can be found in `doc/doxygen`.
|
||||||
|
|
||||||
> **Tip:** The library is configured by passing [options](#markdown-header-cmake-options) to CMake with `-D[option]=[ON|OFF]`.
|
> **Tip:** The library is configured by passing [options](#markdown-header-cmake-options) to CMake with `-D[option]=[ON|OFF]`.
|
||||||
@@ -74,7 +77,7 @@ See `analysis/python/` directory of existing data visualization and analysis scr
|
|||||||
* `astaroth/include/astaroth.h`: Legacy interface for backwards compatibility and quick testing.
|
* `astaroth/include/astaroth.h`: Legacy interface for backwards compatibility and quick testing.
|
||||||
* `astaroth/include/astaroth_node.h`: Multi-GPU interface (single node).
|
* `astaroth/include/astaroth_node.h`: Multi-GPU interface (single node).
|
||||||
* `astaroth/include/astaroth_device.h`: Single-GPU interface.
|
* `astaroth/include/astaroth_device.h`: Single-GPU interface.
|
||||||
* `astaroth/src/utils`: Utility library for host-side memory allocations, verification and other tasks.
|
* `astaroth/src/utils/`: Utility library for host-side memory allocations, verification and other tasks.
|
||||||
|
|
||||||
## FAQ
|
## FAQ
|
||||||
|
|
||||||
@@ -92,5 +95,5 @@ Otherwise the build steps are the same. Run with `mpirun -np 4 ./mpitest`.
|
|||||||
|
|
||||||
How do I contribute?
|
How do I contribute?
|
||||||
|
|
||||||
> See [Contributing](https://bitbucket.org/jpekkila/astaroth/src/master/CONTRIBUTING.md).
|
> See [Contributing](CONTRIBUTING.md).
|
||||||
|
|
||||||
|
@@ -1,3 +1,6 @@
|
|||||||
|
Astaroth DSL compiler
|
||||||
|
============
|
||||||
|
|
||||||
# Dependencies
|
# Dependencies
|
||||||
## Debian/Ubuntu
|
## Debian/Ubuntu
|
||||||
`apt install flex bison build-essential`
|
`apt install flex bison build-essential`
|
||||||
|
@@ -1,4 +1,7 @@
|
|||||||
# Astaroth specification and user manual
|
Astaroth Specification and User Manual
|
||||||
|
============
|
||||||
|
|
||||||
|
# Astaroth Specification and User Manual
|
||||||
|
|
||||||
Copyright (C) 2014-2019, Johannes Pekkila, Miikka Vaisala.
|
Copyright (C) 2014-2019, Johannes Pekkila, Miikka Vaisala.
|
||||||
|
|
||||||
@@ -20,7 +23,7 @@ Copyright (C) 2014-2019, Johannes Pekkila, Miikka Vaisala.
|
|||||||
along with Astaroth. If not, see <http://www.gnu.org/licenses/>.
|
along with Astaroth. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
|
||||||
|
|
||||||
# Introduction and background
|
# Introduction and Background
|
||||||
|
|
||||||
Astaroth is a collection of tools for utilizing multiple graphics processing units (GPUs)
|
Astaroth is a collection of tools for utilizing multiple graphics processing units (GPUs)
|
||||||
efficiently in three-dimensional stencil computations. This document specifies the Astaroth
|
efficiently in three-dimensional stencil computations. This document specifies the Astaroth
|
||||||
@@ -67,8 +70,8 @@ to these publications in their work.
|
|||||||
|
|
||||||
The Astroth application-programming interface (API) provides the means for controlling execution of
|
The Astroth application-programming interface (API) provides the means for controlling execution of
|
||||||
user-defined and built-in functions on multiple graphics processing units. Functions in the API are
|
user-defined and built-in functions on multiple graphics processing units. Functions in the API are
|
||||||
prefixed with lower case ```ac```, while structures and data types are prefixed with capitalized
|
prefixed with lower case `ac`, while structures and data types are prefixed with capitalized
|
||||||
```Ac```. Compile-time constants, such as definitions and enumerations, have the prefix ```AC_```.
|
`Ac`. Compile-time constants, such as definitions and enumerations, have the prefix `AC_`.
|
||||||
All of the API functions return an AcResult value indicating either success or failure. The return
|
All of the API functions return an AcResult value indicating either success or failure. The return
|
||||||
codes are
|
codes are
|
||||||
```C
|
```C
|
||||||
@@ -103,13 +106,13 @@ Finally, a third layer is provided for convenience and backwards compatibility.
|
|||||||
There are also several helper functions defined in `include/astaroth_defines.h`, which can be used for, say, determining the size or performing index calculations within the simulation domain.
|
There are also several helper functions defined in `include/astaroth_defines.h`, which can be used for, say, determining the size or performing index calculations within the simulation domain.
|
||||||
|
|
||||||
|
|
||||||
## List of Astaroth API functions
|
## List of Astaroth API Functions
|
||||||
|
|
||||||
Here's a non-exhaustive list of astaroth API functions. For more info and an up-to-date list, see
|
Here's a non-exhaustive list of astaroth API functions. For more info and an up-to-date list, see
|
||||||
the corresponding header files in `include/astaroth_defines.h`, `include/astaroth.h`, `include/
|
the corresponding header files in `include/astaroth_defines.h`, `include/astaroth.h`, `include/
|
||||||
astaroth_node.h`, `include/astaroth_device.h`.
|
astaroth_node.h`, `include/astaroth_device.h`.
|
||||||
|
|
||||||
### Initialization, quitting and helper functions
|
### Initialization, Quitting and Helper Functions
|
||||||
|
|
||||||
Device layer.
|
Device layer.
|
||||||
```C
|
```C
|
||||||
@@ -137,7 +140,7 @@ size_t acVertexBufferCompdomainSizeBytes(const AcMeshInfo info);
|
|||||||
size_t acVertexBufferIdx(const int i, const int j, const int k, const AcMeshInfo info);
|
size_t acVertexBufferIdx(const int i, const int j, const int k, const AcMeshInfo info);
|
||||||
```
|
```
|
||||||
|
|
||||||
### Loading and storing
|
### Loading and Storing
|
||||||
|
|
||||||
Loading meshes and vertex buffers to device memory.
|
Loading meshes and vertex buffers to device memory.
|
||||||
```C
|
```C
|
||||||
@@ -245,7 +248,7 @@ AcResult acNodeReduceVec(const Node node, const Stream stream_type, const Reduct
|
|||||||
const VertexBufferHandle vtxbuf2, AcReal* result);
|
const VertexBufferHandle vtxbuf2, AcReal* result);
|
||||||
```
|
```
|
||||||
|
|
||||||
### Stream synchronization
|
### Stream Synchronization
|
||||||
|
|
||||||
All library functions that take a `Stream` as a parameter are asynchronous. When calling these
|
All library functions that take a `Stream` as a parameter are asynchronous. When calling these
|
||||||
functions, control returns immediately back to the host even if the called device function has not
|
functions, control returns immediately back to the host even if the called device function has not
|
||||||
@@ -273,7 +276,7 @@ barrierSynchronizeStream(STREAM_ALL); // Blocks until functions in all streams h
|
|||||||
funcD(STREAM_2); // Is started when command returns from synchronizeStream()
|
funcD(STREAM_2); // Is started when command returns from synchronizeStream()
|
||||||
```
|
```
|
||||||
|
|
||||||
### Data synchronization
|
### Data Synchronization
|
||||||
|
|
||||||
Stream synchronization works in the same fashion on node and device layers. However on the node
|
Stream synchronization works in the same fashion on node and device layers. However on the node
|
||||||
layer, one has to take in account that a portion of the mesh is shared between devices and that the
|
layer, one has to take in account that a portion of the mesh is shared between devices and that the
|
||||||
@@ -291,14 +294,9 @@ AcResult acNodeSynchronizeVertexBuffer(const Node node, const Stream stream,
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
> **NOTE**: Local halos must be up to date before synchronizing the data. Local halos are the grid
|
> **NOTE**: Local halos must be up to date before synchronizing the data. Local halos are the grid points outside the computational domain which are used only by a single device. The mesh is distributed to multiple devices by blocking along the z axis. If there are *n* devices and the z-dimension of the computational domain is *nz*, then each device is assigned *nz / n* two-dimensional planes. For example with two devices, the data block that has to be up to date ranges from *(0, 0, nz)* to *(mx, my, nz + 2 * NGHOST)*.
|
||||||
points outside the computational domain which are used only by a single device. The mesh is
|
|
||||||
distributed to multiple devices by blocking along the z axis. If there are *n* devices and the z-
|
|
||||||
dimension of the computational domain is *nz*, then each device is assigned *nz / n* two-
|
|
||||||
dimensional planes. For example with two devices, the data block that has to be up to date ranges
|
|
||||||
from *(0, 0, nz)* to *(mx, my, nz + 2 * NGHOST)*
|
|
||||||
|
|
||||||
### Input and output buffers
|
### Input and Output Buffers
|
||||||
|
|
||||||
The mesh is duplicated to input and output buffers for performance reasons. The input buffers are
|
The mesh is duplicated to input and output buffers for performance reasons. The input buffers are
|
||||||
read-only in user-specified compute kernels, which allows us to read them via the texture cache
|
read-only in user-specified compute kernels, which allows us to read them via the texture cache
|
||||||
@@ -313,10 +311,7 @@ is done via the API calls
|
|||||||
AcResult acDeviceSwapBuffers(const Device device);
|
AcResult acDeviceSwapBuffers(const Device device);
|
||||||
AcResult acNodeSwapBuffers(const Node node);
|
AcResult acNodeSwapBuffers(const Node node);
|
||||||
```
|
```
|
||||||
> **NOTE**: All functions provided with the API operate on input buffers and ensure that the
|
> **NOTE**: All functions provided with the API operate on input buffers and ensure that the complete result is available in the input buffer when the function has completed. User-specified kernels are exceptions and write the result to output buffers. Therefore buffers have to be swapped only after calling user-specified kernels.
|
||||||
complete result is available in the input buffer when the function has completed. User-specified
|
|
||||||
kernels are exceptions and write the result to output buffers. Therefore buffers have to be swapped
|
|
||||||
only after calling user-specified kernels.
|
|
||||||
|
|
||||||
## Devices
|
## Devices
|
||||||
|
|
||||||
@@ -420,7 +415,7 @@ Let *i* be the device id. The portion of the halos shared by neighboring devices
|
|||||||
`acNodeSynchronizeVertexBuffer` and `acNodeSynchronizeMesh` communicate these shared areas among
|
`acNodeSynchronizeVertexBuffer` and `acNodeSynchronizeMesh` communicate these shared areas among
|
||||||
the devices in the node.
|
the devices in the node.
|
||||||
|
|
||||||
## Integration, reductions and boundary conditions
|
## Integration, Reductions and Boundary Conditions
|
||||||
|
|
||||||
The library provides the following functions for integration, reductions and computing periodic
|
The library provides the following functions for integration, reductions and computing periodic
|
||||||
boundary conditions.
|
boundary conditions.
|
||||||
@@ -487,18 +482,18 @@ pipeline shown in the following figure.
|
|||||||
|
|
||||||
| Stage | File ending | Description |
|
| Stage | File ending | Description |
|
||||||
|--------------------|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|--------------------|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
| Stencil assembly | .sas | Defines the shape of the stencils and functions to be preprocessed before entering the stencil processing stage. Reading from input arrays is only possible during this stage. |
|
| Stencil assembly | .ac | Defines the shape of the stencils and functions to be preprocessed before entering the stencil processing stage. Reading from input arrays is only possible during this stage. |
|
||||||
| Stencil process | .sps | The functions executed on streams of data are defined here. Contains kernels, which are essentially main functions of GPU programs. |
|
| Stencil process | .ac | The functions executed on streams of data are defined here. Contains kernels, which are essentially main functions of GPU programs. |
|
||||||
| Stencil definition | .sdh | All field identifiers and constant memory symbols are defined in this file. |
|
| Stencil definition | .ac | All field identifiers and constant memory symbols are defined in this file. |
|
||||||
| Any | .h | Optional header files which can be included in any other file.
|
| Any | .h | Optional header files which can be included in any file.
|
||||||
|
|
||||||
Compilation of the DSL files is integrated into `CMakelists.txt` provided with the library and
|
Compilation of the DSL files is integrated into `CMakelists.txt` provided with the library and
|
||||||
dependencies are recompiled if needed when calling `make`. All DSL files should reside in the same
|
dependencies are recompiled if needed when calling `make`. All DSL files should reside in the same
|
||||||
directory and there should be only one `.sas`, `.sps` and `.sdh` file. There may be any number of
|
directory and there should be only one `.ac` file. There may be any number of
|
||||||
optional `.h` files. When configuring the project, the user should pass the path to the DSL
|
optional `.h` files. When configuring the project, the user should pass the path to the DSL
|
||||||
directory as a cmake option like so: ```cmake -DDSL_MODULE_DIR="some/user/dir" ..```.
|
directory as a cmake option like so: ```cmake -DDSL_MODULE_DIR="some/user/dir" ..```.
|
||||||
|
|
||||||
## Data types
|
## Data Types
|
||||||
|
|
||||||
In addition to basic datatypes in C/C++/CUDA, such as int and int3, we provide the following datatypes with the DSL.
|
In addition to basic datatypes in C/C++/CUDA, such as int and int3, we provide the following datatypes with the DSL.
|
||||||
|
|
||||||
@@ -517,13 +512,13 @@ In addition to basic datatypes in C/C++/CUDA, such as int and int3, we provide t
|
|||||||
`Scalars` are 32-bit floating-point numbers by default. Double precision can be turned on by setting cmake option `DOUBLE_PRECISION=ON`.
|
`Scalars` are 32-bit floating-point numbers by default. Double precision can be turned on by setting cmake option `DOUBLE_PRECISION=ON`.
|
||||||
All real number literals are converted automatically to the correct precision. In cases where , the precision can be declared explicitly by appending `f` or `d` postfix to the real number. For example,
|
All real number literals are converted automatically to the correct precision. In cases where , the precision can be declared explicitly by appending `f` or `d` postfix to the real number. For example,
|
||||||
```C
|
```C
|
||||||
1.0 // The same precision as Scalar/AcReal
|
1.0 // The same precision as Scalar/AcReal
|
||||||
1.0f // Explicit float
|
1.0f // Explicit float
|
||||||
1.0d // Explicit double
|
1.0d // Explicit double
|
||||||
(1.0f * 1.0d) // 1.0f is implicitly cast to double and the multiplication is done in double precision.
|
(1.0f * 1.0d) // 1.0f is implicitly cast to double and the multiplication is done in double precision.
|
||||||
```
|
```
|
||||||
|
|
||||||
## Control flow
|
## Control Flow
|
||||||
|
|
||||||
Conditional statements are expressed with the `if-else` construct. Unlike in C and C++, we require
|
Conditional statements are expressed with the `if-else` construct. Unlike in C and C++, we require
|
||||||
that the scope of the `if-else` statement is explicitly declared using braces `{` and `}` in order
|
that the scope of the `if-else` statement is explicitly declared using braces `{` and `}` in order
|
||||||
@@ -566,7 +561,7 @@ The following built-in variables are available in `Kernel`s.
|
|||||||
| globalVertexIdx | Holds the global index of the currently processed vertex. If there is only single device, then vertexIdx is the same as globalVertexIdx. Otherwise globalVertexIdx is offset accordingly. |
|
| globalVertexIdx | Holds the global index of the currently processed vertex. If there is only single device, then vertexIdx is the same as globalVertexIdx. Otherwise globalVertexIdx is offset accordingly. |
|
||||||
| globalGridN | Holds the dimensions of the computational domain. |
|
| globalGridN | Holds the dimensions of the computational domain. |
|
||||||
|
|
||||||
## Preprocessed functions
|
## Preprocessed Functions
|
||||||
|
|
||||||
The type qualifier `Preprocessed` indicates which functions should be evaluated immediately when
|
The type qualifier `Preprocessed` indicates which functions should be evaluated immediately when
|
||||||
entering a `Kernel` function. The return values of `Preprocessed` functions are cached and calling
|
entering a `Kernel` function. The return values of `Preprocessed` functions are cached and calling
|
||||||
@@ -574,11 +569,13 @@ these functions during the stencil processing stage is essentially free. As main
|
|||||||
significantly slower than on-chip memories and registers, declaring reading-heavy functions as
|
significantly slower than on-chip memories and registers, declaring reading-heavy functions as
|
||||||
`Preprocessed` is critical for obtaining good performance in stencil codes.
|
`Preprocessed` is critical for obtaining good performance in stencil codes.
|
||||||
|
|
||||||
`Preprocessed` functions may only be defined in stencil assembly files.
|
|
||||||
|
|
||||||
The built-in variables `vertexIdx`, `globalVertexidx` and `globalGridN` are available in all
|
The built-in variables `vertexIdx`, `globalVertexidx` and `globalGridN` are available in all
|
||||||
`Preprocessed` functions.
|
`Preprocessed` functions.
|
||||||
|
|
||||||
|
## Device Functions
|
||||||
|
|
||||||
|
The type qualifier `Device` indicates which functions can be called from `Kernel` functions or other `Device` functions.
|
||||||
|
|
||||||
## Uniforms
|
## Uniforms
|
||||||
|
|
||||||
`Uniform`s are global device variables which stay constant for the duration of a kernel launch.
|
`Uniform`s are global device variables which stay constant for the duration of a kernel launch.
|
||||||
@@ -603,17 +600,23 @@ Instead, one should load the appropriate values during runtime using the `acLoad
|
|||||||
related functions.
|
related functions.
|
||||||
|
|
||||||
|
|
||||||
## Standard libraries
|
## Standard Libraries
|
||||||
|
|
||||||
> Not implemented
|
The following table lists the standard libraries currently available.
|
||||||
|
|
||||||
## Performance considerations
|
| Built-in variable | Description |
|
||||||
|
|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
|
| stdderiv.h | Contains functions for computing 2nd, 4th, 6th and 8th order derivatives (configured by defining the STENCIL_ORDER before including stdderiv.h) |
|
||||||
|
|
||||||
|
Astaroth DSL libraries can be included in the same way as C/C++ headers. For example, `#include <stdderiv.h>`.
|
||||||
|
|
||||||
|
## Performance Considerations
|
||||||
|
|
||||||
Uniforms are as fast as compile-time constants as long as
|
Uniforms are as fast as compile-time constants as long as
|
||||||
|
|
||||||
0. The halting condition of a tight loop does not depend on an uniform or a variable, as this would prevent unrolling of the loop during compile-time.
|
1. The halting condition of a tight loop does not depend on an uniform or a variable, as this would prevent unrolling of the loop during compile-time.
|
||||||
0. Uniforms are not multiplied with each other. The result should be stored in an auxiliary uniform instead. For example, the result of `nx * ny` should be stored in a new `uniform nxy`
|
2. Uniforms are not multiplied with each other. The result should be stored in an auxiliary uniform instead. For example, the result of `nx * ny` should be stored in a new `uniform nxy`
|
||||||
0. At least 32 neighboring streams in the x-axis access the same `uniform`. That is, the vertices at vertexIdx.x = i... i + 32 should access the same `uniform` where i is a multiple of 32.
|
3. At least 32 neighboring streams in the x-axis access the same `uniform`. That is, the vertices at vertexIdx.x = i... i + 32 should access the same `uniform` where i is a multiple of 32.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
8
doxyfile
8
doxyfile
@@ -771,7 +771,7 @@ WARN_LOGFILE = doc/doxygen/doxygen_warnings.log
|
|||||||
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
|
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
|
||||||
# Note: If this tag is empty the current directory is searched.
|
# Note: If this tag is empty the current directory is searched.
|
||||||
|
|
||||||
INPUT = src include
|
INPUT =
|
||||||
|
|
||||||
# This tag can be used to specify the character encoding of the source files
|
# This tag can be used to specify the character encoding of the source files
|
||||||
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
|
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
|
||||||
@@ -796,7 +796,7 @@ INPUT_ENCODING = UTF-8
|
|||||||
# *.m, *.markdown, *.md, *.mm, *.dox, *.py, *.pyw, *.f90, *.f, *.for, *.tcl,
|
# *.m, *.markdown, *.md, *.mm, *.dox, *.py, *.pyw, *.f90, *.f, *.for, *.tcl,
|
||||||
# *.vhd, *.vhdl, *.ucf, *.qsf, *.as and *.js.
|
# *.vhd, *.vhdl, *.ucf, *.qsf, *.as and *.js.
|
||||||
|
|
||||||
FILE_PATTERNS = *.cc *.h *.cu *.cuh
|
FILE_PATTERNS = *.c *.cc *.h *.cu *.cuh *.md
|
||||||
|
|
||||||
# The RECURSIVE tag can be used to specify whether or not subdirectories should
|
# The RECURSIVE tag can be used to specify whether or not subdirectories should
|
||||||
# be searched for input files as well.
|
# be searched for input files as well.
|
||||||
@@ -2314,7 +2314,7 @@ DIRECTORY_GRAPH = YES
|
|||||||
# The default value is: png.
|
# The default value is: png.
|
||||||
# This tag requires that the tag HAVE_DOT is set to YES.
|
# This tag requires that the tag HAVE_DOT is set to YES.
|
||||||
|
|
||||||
DOT_IMAGE_FORMAT = png
|
DOT_IMAGE_FORMAT = svg
|
||||||
|
|
||||||
# If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to
|
# If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to
|
||||||
# enable generation of interactive SVG images that allow zooming and panning.
|
# enable generation of interactive SVG images that allow zooming and panning.
|
||||||
@@ -2326,7 +2326,7 @@ DOT_IMAGE_FORMAT = png
|
|||||||
# The default value is: NO.
|
# The default value is: NO.
|
||||||
# This tag requires that the tag HAVE_DOT is set to YES.
|
# This tag requires that the tag HAVE_DOT is set to YES.
|
||||||
|
|
||||||
INTERACTIVE_SVG = NO
|
INTERACTIVE_SVG = YES
|
||||||
|
|
||||||
# The DOT_PATH tag can be used to specify the path where the dot tool can be
|
# The DOT_PATH tag can be used to specify the path where the dot tool can be
|
||||||
# found. If left blank, it is assumed the dot tool can be found in the path.
|
# found. If left blank, it is assumed the dot tool can be found in the path.
|
||||||
|
@@ -41,16 +41,48 @@ typedef struct {
|
|||||||
Grid subgrid;
|
Grid subgrid;
|
||||||
} DeviceConfiguration;
|
} DeviceConfiguration;
|
||||||
|
|
||||||
/** */
|
/**
|
||||||
|
Initializes all devices on the current node.
|
||||||
|
|
||||||
|
Devices on the node are configured based on the contents of AcMesh.
|
||||||
|
|
||||||
|
@return Exit status. Places the newly created handle in the output parameter.
|
||||||
|
@see AcMeshInfo
|
||||||
|
|
||||||
|
|
||||||
|
Usage example:
|
||||||
|
@code
|
||||||
|
AcMeshInfo info;
|
||||||
|
acLoadConfig(AC_DEFAULT_CONFIG, &info);
|
||||||
|
|
||||||
|
Node node;
|
||||||
|
acNodeCreate(0, info, &node);
|
||||||
|
acNodeDestroy(node);
|
||||||
|
@endcode
|
||||||
|
*/
|
||||||
AcResult acNodeCreate(const int id, const AcMeshInfo node_config, Node* node);
|
AcResult acNodeCreate(const int id, const AcMeshInfo node_config, Node* node);
|
||||||
|
|
||||||
/** */
|
/**
|
||||||
|
Resets all devices on the current node.
|
||||||
|
|
||||||
|
@see acNodeCreate()
|
||||||
|
*/
|
||||||
AcResult acNodeDestroy(Node node);
|
AcResult acNodeDestroy(Node node);
|
||||||
|
|
||||||
/** */
|
/**
|
||||||
|
Prints information about the devices available on the current node.
|
||||||
|
|
||||||
|
Requires that Node has been initialized with
|
||||||
|
@See acNodeCreate().
|
||||||
|
*/
|
||||||
AcResult acNodePrintInfo(const Node node);
|
AcResult acNodePrintInfo(const Node node);
|
||||||
|
|
||||||
/** */
|
/**
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@see DeviceConfiguration
|
||||||
|
*/
|
||||||
AcResult acNodeQueryDeviceConfiguration(const Node node, DeviceConfiguration* config);
|
AcResult acNodeQueryDeviceConfiguration(const Node node, DeviceConfiguration* config);
|
||||||
|
|
||||||
/** */
|
/** */
|
||||||
|
Reference in New Issue
Block a user