diff --git a/content/publication/20220304_pearson_pdsec/index.md b/content/publication/20220304_pearson_pdsec/index.md new file mode 100644 index 0000000..f4f5abb --- /dev/null +++ b/content/publication/20220304_pearson_pdsec/index.md @@ -0,0 +1,26 @@ ++++ +title = "[IPDPSw] Machine Learning for CUDA+MPI Design Rules" +date = 2022-03-04T00:00:00 # Schedule page publish date. +draft = false + +math = false + +tags = ["CUDA", "mpi"] ++++ + +**Carl Pearson, Aurya Javeed, Karen Devine** + +To be presented in *23rd IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC)* + +We present a new strategy for automatically exploring the design space of key CUDA+MPI programs and providing design rules that discriminate slow from fast implementations. +In such programs, the order of operations (e.g., GPU kernels, MPI communication) and assignment of operations to resources (e.g., GPU streams) makes the space of possible designs enormous. +Systems experts have the task of redesigning and reoptimizing these programs to effectively utilize each new platform. +This work provides a prototype tool to reduce that burden. + +In our approach, a directed acyclic graph of CUDA and MPI operations defines the design space for the program. +Monte-Carlo tree search discovers regions of the design space that have large impact on the program's performance. +A sequence-to-vector transformation defines features for each explored implementation, and each implementation is assigned a class label according to its relative performance. +A decision tree is trained on the features and labels to produce design rules for each class; these rules can be used by systems experts to guide their implementations. +We demonstrate our strategy using a key kernel from scientific computing --- sparse-matrix vector multiplication --- on a platform with multiple MPI ranks and GPU streams. + +* [arxiv](https://arxiv.org/abs/2012.14363)