Files
hugo-cwpearson/content/publication/20200522_pearson_iwapt/index.md
2021-01-27 17:40:20 -07:00

2.8 KiB

+++ title = "Node-Aware Stencil Communication on Heterogeneous Supercomputers" date = 2020-03-09T00:00:00 # Schedule page publish date. draft = false

Authors. Comma separated list, e.g. ["Bob Smith", "David Jones"].

authors = ["Carl Pearson", "Mert Hidayetoglu", "Mohammad Almasri", "Omer Anjum", "I-Hsin Chung", "Jinjun Xiong", "Wen-Mei Hwu"]

Publication type.

Legend:

0 = Uncategorized

1 = Conference paper

2 = Journal article

3 = Manuscript

4 = Report

5 = Book

6 = Book section

publication_types = ["1"]

Publication name and optional abbreviated version.

publication = "2020 IEEE International Workshop on Automatic Performance Tuning" publication_short = "In iWAPT'20"

Does this page contain LaTeX math? (true/false)

math = false

Does this page require source code highlighting? (true/false)

highlight = false

Featured image thumbnail (optional)

image_preview = ""

Is this a selected publication? (true/false)

selected = true

Projects (optional).

Associate this publication with one or more of your projects.

Simply enter your project's folder or file name without extension.

E.g. projects = ["deep-learning"] references

content/project/deep-learning/index.md.

Otherwise, set projects = [].

projects = ["stencil_library"]

Links (optional)

url_pdf = "pdf/20200522_pearson_iwapt.pdf" url_preprint = "" url_code = "https://github.com/cwpearson/stencil" url_dataset = "" url_project = "" url_slides = "pdf/20200522_pearson_iwapt_slides.pdf" url_video = "" url_poster = "" url_source = ""

Featured image

To use, add an image named featured.jpg/png to your page's folder.

[image]

Caption (optional)

caption = ""

Focal point (optional)

Options: Smart, Center, TopLeft, Top, TopRight, Left, Right, BottomLeft, Bottom, BottomRight

focal_point = "" +++

High-performance distributed computing systems increasingly feature nodes that have multiple CPU sockets and multiple GPUs. The communication bandwidth between these components is non-uniform. Furthermore, these systems can expose different communication capabilities between these components. For communication-heavy applications, optimally using these capabilities is challenging and essential for performance. Bespoke codes with optimized communication may be non-portable across run-time/software/hardware configurations, and existing stencil frameworks neglect optimized communication. This work presents node-aware approaches for automatic data placement and communication implementation for 3D stencil codes on multi-GPU nodes with non-homogeneous communication performance and capabilities. Benchmarking results in the Summit system show that choices in placement can result in a 20% improvement in single-node exchange, and communication specialization can yield a further 6x improvement in exchange time in a single node, and a 16% improvement at 1536 GPUs.