Files
astaroth/acc
..
2019-06-14 14:19:07 +03:00

Astaroth DSL compiler

Dependencies

Debian/Ubuntu

apt install flex bison build-essential

Usage

  • ./build_acc.sh # Builds the ASPL compiler (acc)
  • ./compile.sh <.sps or .sas source> # Compiles the given stage into CUDA
  • ./test.sh # Tries to compile the sample stages
  • ./clean.sh # Removed directories generated by build_acc.sh and test.sh

Example

  • ./compile.sh src/stencil_assembly.sas # Generates stencil_assembly.cuh
  • ./compile.sh src/stencil_process.sps # Generates stencil_process.cuh

What happens under the hood

The compiler is made of a scanner (flex), parser (bison), implementation of the abstract syntax tree (AST) and a code generator. The language is defined by tokens and grammars found in acc.l and acc.y. These files are given as input to flex and bison, which generate the scanning and parsing stages for the compiler. The resulting AST is defined in ast.h. Finally, we traverse the generated AST with our code generator, generating CUDA code.

ACC compilation stages

In short:

  • Preprocess .ac
  • Compile preprocessed .ac to .cuh
  • Compile .cuh

More detailed:

  1. A Parser is generated: bison --verbose -d acc.y
  2. A Scanner is generated: flex acc.l
  3. The compiler is built: gcc -std=gnu11 code_generator.c acc.tab.c lex.yy.c -lfl
  4. Source files (.sps and .sas) are preprocessed using the GCC preprocessor and cleaned from any residual directives which would be useful when compiling the code further with GCC. We do not need those when compiling with ACC and are not recognized by our grammar.
  5. Either the stencil processing stage (.sps) or the stencil assembly stage (.sas) are generated by passing the preprocessed file to acc. This emits the final CUDA code.
  6. Compilation is continued with the NVIDIA CUDA compiler

Even more detailed:

The NVIDIA CUDA compiler compiles .cuh to .fatbin, which is embedded into a C++ binary containig host code of the program. A fatbin contains .cubin files, which contain the configuration of the GPU and the kernels in a streaming assembly code (.sass). We could also compile for a virtual architecture (.ptx) instead of the actual hardware-specific machine code (.cubin) by passing -code=compute_XX flag to nvcc, which would compile cuda sources at runtime (just-in-time compilation, JIT) when creating the CUDA context. However, we alway know which architecture we want to run the code on and JIT compilation would just increase the time to takes to launch the program.

nvcc -DAC_DOUBLE_PRECISION=1 -ptx --relocatable-device-code true -O3 -std=c++11 --maxrregcount=255 -ftz=true -gencode arch=compute_60,code=sm_60 device.cu -I ../../include -I ../../ nvcc -DAC_DOUBLE_PRECISION=1 -cubin --relocatable-device-code true -O3 -std=c++11 --maxrregcount=255 -ftz=true -gencode arch=compute_60,code=sm_60 device.cu -I ../../include -I ../../ cuobjdump --dump-sass device.cubin > device.sass