PDSEC
This commit is contained in:
26
content/publication/20220304_pearson_pdsec/index.md
Normal file
26
content/publication/20220304_pearson_pdsec/index.md
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
+++
|
||||||
|
title = "[IPDPSw] Machine Learning for CUDA+MPI Design Rules"
|
||||||
|
date = 2022-03-04T00:00:00 # Schedule page publish date.
|
||||||
|
draft = false
|
||||||
|
|
||||||
|
math = false
|
||||||
|
|
||||||
|
tags = ["CUDA", "mpi"]
|
||||||
|
+++
|
||||||
|
|
||||||
|
**Carl Pearson, Aurya Javeed, Karen Devine**
|
||||||
|
|
||||||
|
To be presented in *23rd IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC)*
|
||||||
|
|
||||||
|
We present a new strategy for automatically exploring the design space of key CUDA+MPI programs and providing design rules that discriminate slow from fast implementations.
|
||||||
|
In such programs, the order of operations (e.g., GPU kernels, MPI communication) and assignment of operations to resources (e.g., GPU streams) makes the space of possible designs enormous.
|
||||||
|
Systems experts have the task of redesigning and reoptimizing these programs to effectively utilize each new platform.
|
||||||
|
This work provides a prototype tool to reduce that burden.
|
||||||
|
|
||||||
|
In our approach, a directed acyclic graph of CUDA and MPI operations defines the design space for the program.
|
||||||
|
Monte-Carlo tree search discovers regions of the design space that have large impact on the program's performance.
|
||||||
|
A sequence-to-vector transformation defines features for each explored implementation, and each implementation is assigned a class label according to its relative performance.
|
||||||
|
A decision tree is trained on the features and labels to produce design rules for each class; these rules can be used by systems experts to guide their implementations.
|
||||||
|
We demonstrate our strategy using a key kernel from scientific computing --- sparse-matrix vector multiplication --- on a platform with multiple MPI ranks and GPU streams.
|
||||||
|
|
||||||
|
* [arxiv](https://arxiv.org/abs/2012.14363)
|
||||||
Reference in New Issue
Block a user