Updates from ShareLaTeX

2017-05-08 18:00:14 -07:00
parent 8464eaba6a
commit 177df1529b
1 changed files with 9 additions and 0 deletions
--- a/main.tex
+++ b/main.tex
@@ -15,6 +15,7 @@
 \usepackage{todonotes}
 \usepackage{verbatim}

+
 %\title{Solving Problems Involving Inhomogeneous Media with MLFMM on GPU Clusters}
 \title{Evaluating MLFMM for Large Scattering Problems on Multiple GPUs}
 \author{
@@ -38,6 +39,8 @@ We evaluate an efficient implementation of MLFMM for such two-dimensional volume
 \section{Introduction}
 \label{sec:introduction}

+
+
 In order to achieve an efficient implementation on graphics processing units (GPUs), the MLFMM operations are formulated as matrix-matrix multiplications.
 To avoid host-device data transfer, common operators are pre-computed, moved to the GPU, and reused as needed.
 Large matrices are partitioned among message passing interface (MPI) processes and each process employs a single GPU for performing partial multiplications.
@@ -148,6 +151,12 @@ This reflects the current slow pace of single-threaded CPU performance improveme
 The corresponding single-GPU speedup in S822LC over XK is $4.4\times$.
 On a per-node basis (``1 GPU'' in XK, ``4 GPU'' in S822LC), the speedup is $17.9\times$.

+\subsection{MPI Communication Overlap}
+
+\tikzstyle{int}=[draw, fill=blue!20, minimum size=2em]
+\tikzstyle{init} = [pin edge={to-,thin,black}]
+
+
 \subsection{Computation Kernel Breakdown}

 Fig.~\ref{fig:kernel_breakdown} shows the amount of  of MLFMM execution time spent in computational kernels.