Rewrote acc/README.md

This commit is contained in:
jpekkila
2020-01-14 21:37:56 +02:00
parent 25180c00b3
commit 8dbeb9b654

View File

@@ -1,45 +1,36 @@
Astaroth DSL compiler
============
# ACC - Astaroth Code Compiler
# Dependencies
## Debian/Ubuntu
`apt install flex bison build-essential`
ACC is a source-to-source compiler for generating CUDA kernels from programs written in Astaroth Code (AC). This document focuses on how to build and run the compiler. For detailed description of code generation and compilation phases, we refer the reader to [J. Pekkilä, Astaroth: A Library for Stencil Computations on Graphics Processing Units. 2019.](http://urn.fi/URN:NBN:fi:aalto-201906233993), Section 4.3. We refer the reader to [Specification](doc/Astaroth_API_specification_and_user_manual/API_specification_and_user_manual.md) for a detailed description of AC syntax.
# Usage
* `./build_acc.sh # Builds the ASPL compiler (acc)`
* `./compile.sh <.sps or .sas source> # Compiles the given stage into CUDA`
* `./test.sh # Tries to compile the sample stages`
* `./clean.sh # Removed directories generated by build_acc.sh and test.sh`
ACC is automatically compiled and invoked when compiling the Astaroth Library, user intervention is not needed. The instructions presented in this file are only for developers looking to debug the AC compiler.
## Example
## Dependencies
- `./compile.sh src/stencil_assembly.sas # Generates stencil_assembly.cuh`
- `./compile.sh src/stencil_process.sps # Generates stencil_process.cuh`
`gcc flex bison`
# What happens under the hood
## Building
The compiler is made of a scanner (flex), parser (bison), implementation of the abstract syntax tree (AST) and a code generator.
The language is defined by tokens and grammars found in acc.l and acc.y. These files are given as input to flex and bison, which generate the scanning and parsing stages for the compiler. The resulting AST is defined in ast.h. Finally, we traverse the generated AST with our code generator, generating CUDA code.
1. `mkdir build`
2. `cd build`
3. `cmake ..`
4. `make -j`
## ACC compilation stages
## Usage
### In short:
* Preprocess .ac
* Compile preprocessed .ac to .cuh
* Compile .cuh
Script `compile_acc_module.sh` executes all compilation stages from preprocessing to linking AC standard libraries. The resulting cuda headers are placed in the current working directory. The script should be invoked as follows.
### More detailed:
0. A Parser is generated: bison --verbose -d acc.y
0. A Scanner is generated: flex acc.l
0. The compiler is built: gcc -std=gnu11 code_generator.c acc.tab.c lex.yy.c -lfl
0. Source files (.sps and .sas) are preprocessed using the GCC preprocessor and cleaned from any residual directives which would be useful when compiling the code further with GCC. We do not need those when compiling with ACC and are not recognized by our grammar.
0. Either the stencil processing stage (.sps) or the stencil assembly stage (.sas) are generated by passing the preprocessed file to acc. This emits the final CUDA code.
0. Compilation is continued with the NVIDIA CUDA compiler
> `./compile_acc_module <a directory containing AC files>`
### Even more detailed:
The NVIDIA CUDA compiler compiles .cuh to .fatbin, which is embedded into a C++ binary containig host code of the program. A fatbin contains .cubin files, which contain the configuration of the GPU and the kernels in a streaming assembly code (.sass). We could also compile for a virtual architecture (.ptx) instead of the actual hardware-specific machine code (.cubin) by passing -code=compute_XX flag to nvcc, which would compile cuda sources at runtime (just-in-time compilation, JIT) when creating the CUDA context. However, we alway know which architecture we want to run the code on and JIT compilation would just increase the time to takes to launch the program.
For preprocessing only, see `preprocess.sh`. The first parameter is regarded as the AC source file, while rest of the parameters are passed to gcc. For example:
nvcc -DAC_DOUBLE_PRECISION=1 -ptx --relocatable-device-code true -O3 -std=c++11 --maxrregcount=255 -ftz=true -gencode arch=compute_60,code=sm_60 device.cu -I ../../include -I ../../
nvcc -DAC_DOUBLE_PRECISION=1 -cubin --relocatable-device-code true -O3 -std=c++11 --maxrregcount=255 -ftz=true -gencode arch=compute_60,code=sm_60 device.cu -I ../../include -I ../../
cuobjdump --dump-sass device.cubin > device.sass
> `./preprocess.sh file.ac -I dir`
Preprocesses `file.ac` and searches `dir` for files to be included.
For invoking the code generator, pass preprocessed files that respect AC syntax to `acc`.
For example:
> `acc < file.ac.preprocessed`
See [Building](#markdown-header-building) on how to obtain `acc`.