Updates from ShareLaTeX
This commit is contained in:
16
main.tex
16
main.tex
@@ -16,15 +16,15 @@
|
||||
\usepackage{verbatim}
|
||||
|
||||
|
||||
%\title{Solving Problems Involving Inhomogeneous Media with MLFMM on GPU Clusters}
|
||||
\title{Evaluating MLFMM for Large Scattering Problems on Multiple GPUs}
|
||||
\title{Evaluating MLFMM Performance for 2-D VIE Problems on Multiple Architectures}
|
||||
\author{
|
||||
{Carl Pearson{\small $^{1}$}, Mert Hidayetoglu{\small $^{1}$}, Wei Ren{\small $^{1}$}, and Wen-Mei Hwu{\small $^{1}$} }
|
||||
{Carl Pearson{\small $^{1}$}, Mert Hidayetoglu{\small $^{1}$}, Wei Ren{\small $^{2}$}, Levent Gurel{\small $^{1}$}, and Wen-Mei Hwu{\small $^{1}$} }
|
||||
\vspace{1.6mm}\\
|
||||
\fontsize{10}{10}\selectfont\itshape
|
||||
$~^{1}$Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, 61801, USA\\
|
||||
$~^{2}$Second Affiliation, City, Postal Code, Country\\
|
||||
\fontsize{9}{9}\upshape \texttt{\{pearson, hidayet2, weiren2, w-hwu\}}@illinois.edu}
|
||||
$~^{1}$Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA\\
|
||||
$~^{2}$Department of Physics, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA\\
|
||||
%\fontsize{9}{9}\upshape \texttt{\{pearson, hidayet2, weiren2, lgurel, w-hwu\}}@illinois.edu}
|
||||
\fontsize{9}{9}\upshape pearson@illinois.edu}
|
||||
\begin{document}
|
||||
\maketitle
|
||||
|
||||
@@ -42,13 +42,13 @@ The MLFMM is evaluated on current- and next-generation GPU-accelerated supercomp
|
||||
|
||||
|
||||
|
||||
MLFMM computes pairwise interactions between pixels in the scattering problem by hierarchically clustering pixels into a spatial quad-tree. In a ``nearfield'' phase, nearby pixel interactions are computed within a level of a tree. An ``aggregation'' and ``disaggregation'' phase propagate interactions up and down the tree, and a ``translation'' phase propagates long-range interactions within a level. In this way, for $N$ pixels $\mathcal{O}(N)$ work for $N^2$ interactions is achieved~\cite{rokhlin93}.
|
||||
MLFMM computes pairwise interactions between pixels in the scattering problem by hierarchically clustering pixels into a spatial quad-tree. In the nearfield phase, nearby pixel interactions are computed within the lowest level of the MLFMM tree. The aggregation and disaggregation phases propagate interactions up and down the tree, and the translation phase propagates long-range interactions within a level. In this way, $\mathcal{O}(N)$ work for $N^2$ interactions is achieved for $N$ pixels~\cite{rokhlin93}.
|
||||
Even with algorithmic speedup, high performance parallel MLFMM is needed to take advantage of high-performancing computing resources.
|
||||
This work presents how a GPU-accelerated MLFMM effectively scales from current to next-generation computers.
|
||||
|
||||
In order to achieve an efficient implementation on graphics processing units (GPUs), these four MLFMM phases are formulated as matrix multiplications.
|
||||
Common operators are pre-computed, moved to the GPU, and reused as needed to avoid host-device data transfer.
|
||||
Large matrices are partitioned among message passing interface (MPI) processes and each process employs a single GPU for performing partial multiplications.
|
||||
The MLFMM tree structure is partitioned among message passing interface (MPI) processes and each process employs a single GPU for performing partial multiplications.
|
||||
During the MLFMM multiplications, data is transferred between GPUs through their owning MPI processes by moving the data from GPUs to central processing units (CPUs), CPUs to CPUs through MPI, and then from CPUs to GPUs.
|
||||
To hide this communication cost, MPI communication is overlapped with GPU kernels.
|
||||
This strategy completely hides the communication cost and provides $96$\%, MPI parallelization efficiency on up to 16 GPUs.
|
||||
|
Reference in New Issue
Block a user