233 lines
10 KiB
TeX
233 lines
10 KiB
TeX
% IEEE Paper Template for A4 Page Size (V1)
|
|
% Sample Conference Paper using IEEE LaTeX style file for A4 pagesize.
|
|
% Copyright (C) 2006 Causal Productions Pty Ltd.
|
|
% Permission is granted to distribute and revise this file provided that
|
|
% this header remains intact.
|
|
%
|
|
%
|
|
% This style file is produced for CEM'17 Computational Electromagnetics Workshop
|
|
% Modified from a file indicated above.
|
|
|
|
|
|
\documentclass[10pt,conference,a4paper]{IEEEtran}
|
|
\usepackage{times,amsmath,epsfig}
|
|
\usepackage{makecell}
|
|
|
|
\title{Solving Problems Involving Inhomogeneous Media with MLFMA on GPU Clusters}
|
|
\author{
|
|
{Carl Pearson{\small $^{1}$}, Mert Hidayetoglu{\small $^{1}$}, and
|
|
Wen-Mei Hwu{\small $^{1}$} }
|
|
\vspace{1.6mm}\\
|
|
\fontsize{10}{10}\selectfont\itshape
|
|
$~^{1}$University of Illinois Urbana-Champaign Electrical and Computer Engineering, Urbana, 61801, USA\\
|
|
$~^{2}$Second Affiliation, City, Postal Code, Country\\
|
|
\fontsize{9}{9}\upshape \texttt{\{pearson, hidayet2, w-hwu\}}@illinois.edu}
|
|
\begin{document}
|
|
\maketitle
|
|
|
|
\begin{abstract}
|
|
The multilevel fast multiple method (MLFMM) is a key tool for efficiently solving large scattering problems.
|
|
Highly inhomogeneous media prevents converting the problem into a surface-scattering problem via equivalence principle, and therefore we solve the corresponding volume integral equation.
|
|
We evaluate an efficient implementation of MLFMM for such two-dimensional volumetric scattering problems on high-performance GPU-accelerated supercomputing nodes.
|
|
This class of problems are commonly encountered in imaging and inverse-scattering applications.
|
|
\end{abstract}
|
|
|
|
\section{Introduction}
|
|
|
|
|
|
In order to achieve an efficient implementation on multiple graphics processing units (GPUs), we formulate the MLFMM operations as matrix-matrix multiplications, where the large matrices are partitioned among message passing interface (MPI) processes. Each process employs a single GPU for performing the corresponding partial multiplications. The implementation can employ up to 16 GPUs. During the MLFMM multiplications, the GPUs communicate through MPI to receive the required data from each other. These communications are costly since they involve moving the data from GPUs to central processing units (CPUs), CPUs to CPUs (as in the traditional CPU implementation), and then from CPUs to GPUs. To minimize this cost, we optimize the data amount to be transferred, and merge small MPI buffers into large ones. Furthermore, we overlap the communications with the GPU computations by a reordering of the MLFMM operations. This strategy completely hides the communication overhead and provides good, i.e., 94\%, MPI parallelization efficiency.
|
|
|
|
\section{Inverse-Scattering Formulation and Application Architecture}
|
|
\label{sec:application}
|
|
|
|
Table \ref{tab:components} shows the breakdown of application component execution times on the Blue Waters supercomputer.
|
|
|
|
\begin{table}{}
|
|
\centering \caption{Breakdown of Application Component Time} \label{tab:components}
|
|
\begin{tabular}{|c|c|}
|
|
\hline \textbf{Component} & \textbf{Wall Time (s)} \\
|
|
\hline
|
|
\hline Preprocessing & 0 \\
|
|
\hline Setup & 0 \\
|
|
\hline Solution & 0 \\
|
|
\hline Matvec & 0 \\
|
|
\hline Solver & 0 \\
|
|
\hline Postprocessing & 0 \\
|
|
\hline Other & 0 \\
|
|
\hline Total & 0 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
\section{MLFMM Results}
|
|
|
|
As described in section \ref{sec:application} and shown in Table \ref{tab:components}, the MLFMM realization of matrix-vector multiplications forms the core computational kernel of the application, and its performance dominates that of the full inverse solver.
|
|
This section presents an analysis of the performance of the MLFMM algorithm in three different environments.
|
|
|
|
\subsection{Evaluation Environments}
|
|
|
|
\begin{table}{}
|
|
\centering \caption{Evaluation Systems} \label{tab:systems}
|
|
\begin{tabular}{|c|c|c|c|}
|
|
\hline & \textbf{XK Node} & \textbf{XE Node} & \textbf{S822LC} \\
|
|
\hline
|
|
\hline \textbf{CPU 1} & AMD Opteron 6276 & AMD Opteron 6276 & IBM Power8 \\
|
|
\hline \textbf{CPU 2} & -- & AMD Opteron 6276 & IBM Power8 \\
|
|
\hline
|
|
\hline \textbf{GPU 1} & \makecell{K20X \\ (6 GB RAM) } & -- & P100 (16GB RAM) \\
|
|
\hline \textbf{GPU 2} & -- & -- & P100 (16GB RAM) \\
|
|
\hline \textbf{GPU 3} & -- & -- & P100 (16GB RAM) \\
|
|
\hline \textbf{GPU 4} & -- & -- & P100 (16GB RAM) \\
|
|
\hline \textbf{RAM} & 32GB & 64 GB & 512 GB \\
|
|
\hline \makecell{\textbf{CPU-GPU} \\ \textbf{Bus}} & PCIe & -- & NVLink \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
The performance of MLFMM is evaluated in three different computing environments: Blue Waters XE nodes, Blue Waters XK nodes, and an IBM S822LC.
|
|
The Blue Waters XE and XK nodes are two different kinds of computing nodes available on the Blue Waters supercomputer.
|
|
Each Blue Waters node is a two-socket system: the XE node has two AMD Opteron 6276 CPUs, each with eight floating-point units, hardware support for 16 executing threads, and $32$~GB of RAM.
|
|
The XK node replaces one of these CPUs with an NVIDIA K20X GPU with the Kepler architecture and $6$~GB of RAM.
|
|
The K20x is connected to the Operton 6276 with PCIe.
|
|
These XE and XK nodes are representative of the compute capabilities of current-generation clusters and supercomputers.
|
|
The IBM S822LC represents a next-generation accelerator-heavy supercomputing node.
|
|
It has two IBM Power8 CPUs with ten floating-point units, support for 80 executing threads, and $256$~GB of RAM.
|
|
In addition, each Minsky machine has four NVIDIA P100 GPUs Pascal-architecture GPUs with $16$~GB of RAM.
|
|
The P100s are connected to the Power8 CPUs via $80$~GB/s NVLink connections.
|
|
|
|
\subsection{MLFMM Performance}
|
|
|
|
\subsection{GPU Kernel Performance}
|
|
|
|
Table \ref{tab:mlfmm_breakdown} shows the breakdown of MLFMM kernel times in different execution environments.
|
|
|
|
\begin{figure}[b]
|
|
\begin{center}
|
|
\begin{tabular}{c}
|
|
\mbox{\psfig{figure=figures/example_fig0.pdf,width=8cm}}
|
|
\end{tabular}
|
|
\end{center}
|
|
\caption{A three-dimensional plot with gray-scale
|
|
format.}\label{fig1}
|
|
\end{figure}
|
|
|
|
This document is a template for authors preparing papers for the
|
|
CEM'17 Computing and Electromagnetics Workshop in Barcelona, Spain.
|
|
The papers are required to use the IEEE style by following the
|
|
instructions provided in this document. The language is English.
|
|
The papers are expected to be two-pages long.
|
|
\begin{figure}[b]
|
|
\begin{center}
|
|
\begin{tabular}{c}
|
|
\mbox{\psfig{figure=example_fig0.pdf,width=8cm}}
|
|
\end{tabular}
|
|
\end{center}
|
|
\caption{A three-dimensional plot with gray-scale
|
|
format.}\label{fig1}
|
|
\end{figure}
|
|
|
|
\section{Text Format} Page size is A4, which is 210 mm (8.27 in) wide and 297 mm
|
|
(11.69 in) long. The margins are as follows:
|
|
\begin{itemize}
|
|
\item Top: 19 mm (0.75 in) \item Bottom: 43 mm (1.69 in) \item
|
|
Left-Right: 14.32 mm (0.56 in)
|
|
\end{itemize}
|
|
The paper is in two column format with a space of 4.22 mm (0.17 in)
|
|
between columns. All title and author details must be in
|
|
single-column format and must be centered. All paragraphs are
|
|
indented. The entire document should be in Times New Roman or
|
|
Times font. Recommended font size is 10~pt for the main text.
|
|
Headings of the subsections are as follows, if required:
|
|
\subsection{This is First-Level Subsection}
|
|
You may use 1st level subsections, if required.
|
|
\\
|
|
\subsubsection{This is Second-Level Subsection}
|
|
You may use 2nd level subsections, if required.
|
|
\\
|
|
\\
|
|
\indent Page numbers, headers and footers should not be used. All
|
|
hypertext links and bookmarks should be removed from papers. If
|
|
you need to refer to an Internet email address or URL in your
|
|
paper, you should type out the address or URL fully in regular
|
|
font.
|
|
|
|
\begin{figure}[t]
|
|
\begin{center}
|
|
\begin{tabular}{c}
|
|
\mbox{\psfig{figure=example_fig1.pdf,width=8cm}}\\
|
|
{(a)}\\\\ \mbox{\psfig{figure=example_fig2.pdf,width=8cm}}\\{(b)}
|
|
\end{tabular}
|
|
\end{center}
|
|
\caption{Three-dimensional plots with colors. Using (a)
|
|
inappropriate and (b) appropriate colors for gray-scale
|
|
prints.}\label{fig2}
|
|
\end{figure}
|
|
|
|
\section{Figures and Tables}
|
|
Figures should be centered in the column, but large figures may
|
|
span across both columns, if they are positioned either at the top
|
|
or at the bottom of the page. Graphics should have an adequate
|
|
resolution. Fig.~\ref{fig1} presents an example plot in gray-scale
|
|
format. Colors can be used; however, it is recommended that the
|
|
graphics are checked to reproduce the required details in
|
|
gray-scale copy. For example, the colors in Fig.~\ref{fig2}(a) are
|
|
not appropriate for a gray-scale print. For the same plot,
|
|
Fig.~\ref{fig2}(b) is more preferable. Figures are numbered using
|
|
Arabic numerals and the captions are in 8~pt regular font. Tables
|
|
should be numbered using uppercase Roman numerals and their
|
|
captions are centered as in Table~\ref{table1}.
|
|
|
|
|
|
|
|
\begin{table}{}
|
|
\centering \caption{Caption of the Table.} \label{table1}
|
|
\begin{tabular}{|c|c|c|c|}
|
|
\hline Item~1& Item~2
|
|
& Item~3 & Item~4\\
|
|
\hline\hline \multicolumn{4}{|c|}{Item~5} \\
|
|
\hline Item~6&
|
|
\multicolumn{3}{|c|}{Item~7}\\
|
|
\hline Item~8 & Item~9 & Item~10 & Item~11\\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
|
|
|
|
\section{References} The heading of the references section is
|
|
not be numbered and all reference items are in 8~pt font.
|
|
References are required to be in IEEE style. Please refer to the
|
|
examples for journals~\cite{journal}, for
|
|
books~\cite{book1},~\cite{book2}, and for conference
|
|
papers~\cite{conf1},~\cite{conf2}.
|
|
|
|
% the following vfill coursely balances the columns on the last page
|
|
\vfill \pagebreak
|
|
|
|
\section{Conclusions}
|
|
This template uses IEEE style and provides necessary information
|
|
to prepare papers for CEM'17 Workshop. Thank you for your
|
|
contributions.
|
|
|
|
|
|
\section*{Acknowledgment}
|
|
Acknowledgments should be here.
|
|
|
|
\bibliographystyle{IEEEtran}
|
|
\begin{thebibliography}{99}
|
|
\bibitem{journal} A.~Author, B.~Author, and C.~Author,
|
|
``Publication title,'' {\it Journal Title}, vol.~0, no.~0,
|
|
pp.~00--00, Month~Year.
|
|
\bibitem{book1} A.~Author, B.~Author, and C.~Author,
|
|
{\it Book Title}. Location: Publisher,~Year.
|
|
\bibitem{book2} A.~Author, B.~Author, and C.~Author,
|
|
``Chapter title,'' in {\it Book Title}, A.~Editor,~Ed. Location:
|
|
Publisher,~Year,~Chap.~0.
|
|
\bibitem{conf1} A.~Author, B.~Author, and C.~Author, ``Paper
|
|
title,'' in {\it Proc. Conference Title}, vol.~0, Year, pp.~0--0.
|
|
\bibitem{conf2} A.~Author, B.~Author, and C.~Author, ``Paper
|
|
title,'' {\it Conference Title}, Location, Country, Month~Year.
|
|
\end{thebibliography}
|
|
|
|
\end{document}
|