Rewrote acc/README.md
This commit is contained in:
@@ -1,45 +1,36 @@
|
|||||||
Astaroth DSL compiler
|
# ACC - Astaroth Code Compiler
|
||||||
============
|
|
||||||
|
|
||||||
# Dependencies
|
ACC is a source-to-source compiler for generating CUDA kernels from programs written in Astaroth Code (AC). This document focuses on how to build and run the compiler. For detailed description of code generation and compilation phases, we refer the reader to [J. Pekkilä, Astaroth: A Library for Stencil Computations on Graphics Processing Units. 2019.](http://urn.fi/URN:NBN:fi:aalto-201906233993), Section 4.3. We refer the reader to [Specification](doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md) for a detailed description of AC syntax.
|
||||||
## Debian/Ubuntu
|
|
||||||
`apt install flex bison build-essential`
|
|
||||||
|
|
||||||
# Usage
|
ACC is automatically compiled and invoked when compiling the Astaroth Library, user intervention is not needed. The instructions presented in this file are only for developers looking to debug the AC compiler.
|
||||||
* `./build_acc.sh # Builds the ASPL compiler (acc)`
|
|
||||||
* `./compile.sh <.sps or .sas source> # Compiles the given stage into CUDA`
|
|
||||||
* `./test.sh # Tries to compile the sample stages`
|
|
||||||
* `./clean.sh # Removed directories generated by build_acc.sh and test.sh`
|
|
||||||
|
|
||||||
## Example
|
## Dependencies
|
||||||
|
|
||||||
- `./compile.sh src/stencil_assembly.sas # Generates stencil_assembly.cuh`
|
`gcc flex bison`
|
||||||
- `./compile.sh src/stencil_process.sps # Generates stencil_process.cuh`
|
|
||||||
|
|
||||||
# What happens under the hood
|
## Building
|
||||||
|
|
||||||
The compiler is made of a scanner (flex), parser (bison), implementation of the abstract syntax tree (AST) and a code generator.
|
1. `mkdir build`
|
||||||
The language is defined by tokens and grammars found in acc.l and acc.y. These files are given as input to flex and bison, which generate the scanning and parsing stages for the compiler. The resulting AST is defined in ast.h. Finally, we traverse the generated AST with our code generator, generating CUDA code.
|
2. `cd build`
|
||||||
|
3. `cmake ..`
|
||||||
|
4. `make -j`
|
||||||
|
|
||||||
## ACC compilation stages
|
## Usage
|
||||||
|
|
||||||
### In short:
|
Script `compile_acc_module.sh` executes all compilation stages from preprocessing to linking AC standard libraries. The resulting cuda headers are placed in the current working directory. The script should be invoked as follows.
|
||||||
* Preprocess .ac
|
|
||||||
* Compile preprocessed .ac to .cuh
|
|
||||||
* Compile .cuh
|
|
||||||
|
|
||||||
### More detailed:
|
> `./compile_acc_module <a directory containing AC files>`
|
||||||
0. A Parser is generated: bison --verbose -d acc.y
|
|
||||||
0. A Scanner is generated: flex acc.l
|
|
||||||
0. The compiler is built: gcc -std=gnu11 code_generator.c acc.tab.c lex.yy.c -lfl
|
|
||||||
0. Source files (.sps and .sas) are preprocessed using the GCC preprocessor and cleaned from any residual directives which would be useful when compiling the code further with GCC. We do not need those when compiling with ACC and are not recognized by our grammar.
|
|
||||||
0. Either the stencil processing stage (.sps) or the stencil assembly stage (.sas) are generated by passing the preprocessed file to acc. This emits the final CUDA code.
|
|
||||||
0. Compilation is continued with the NVIDIA CUDA compiler
|
|
||||||
|
|
||||||
### Even more detailed:
|
For preprocessing only, see `preprocess.sh`. The first parameter is regarded as the AC source file, while rest of the parameters are passed to gcc. For example:
|
||||||
The NVIDIA CUDA compiler compiles .cuh to .fatbin, which is embedded into a C++ binary containig host code of the program. A fatbin contains .cubin files, which contain the configuration of the GPU and the kernels in a streaming assembly code (.sass). We could also compile for a virtual architecture (.ptx) instead of the actual hardware-specific machine code (.cubin) by passing -code=compute_XX flag to nvcc, which would compile cuda sources at runtime (just-in-time compilation, JIT) when creating the CUDA context. However, we alway know which architecture we want to run the code on and JIT compilation would just increase the time to takes to launch the program.
|
|
||||||
|
|
||||||
nvcc -DAC_DOUBLE_PRECISION=1 -ptx --relocatable-device-code true -O3 -std=c++11 --maxrregcount=255 -ftz=true -gencode arch=compute_60,code=sm_60 device.cu -I ../../include -I ../../
|
> `./preprocess.sh file.ac -I dir`
|
||||||
nvcc -DAC_DOUBLE_PRECISION=1 -cubin --relocatable-device-code true -O3 -std=c++11 --maxrregcount=255 -ftz=true -gencode arch=compute_60,code=sm_60 device.cu -I ../../include -I ../../
|
|
||||||
cuobjdump --dump-sass device.cubin > device.sass
|
Preprocesses `file.ac` and searches `dir` for files to be included.
|
||||||
|
|
||||||
|
For invoking the code generator, pass preprocessed files that respect AC syntax to `acc`.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
> `acc < file.ac.preprocessed`
|
||||||
|
|
||||||
|
See [Building](#markdown-header-building) on how to obtain `acc`.
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user