Updates from ShareLaTeX

2017-05-07 15:54:56 -07:00
parent 932c9f6770
commit 0be0e6b1c1
1 changed files with 42 additions and 39 deletions
--- a/main.tex
+++ b/main.tex
@@ -1,248 +0,0 @@
-% IEEE Paper Template for A4 Page Size (V1)
-% Sample Conference Paper using IEEE LaTeX style file for A4 pagesize.
-% Copyright (C) 2006 Causal Productions Pty Ltd.
-% Permission is granted to distribute and revise this file provided that
-% this header remains intact.
-%
-%
-% This style file is produced for CEM'17 Computational Electromagnetics Workshop
-% Modified from a file indicated above.
-
-
-\documentclass[10pt,conference,a4paper]{IEEEtran}
-\usepackage{times,amsmath,epsfig}
-\usepackage{makecell}
-\usepackage{todonotes}
-\usepackage{verbatim}
-
-\title{Solving Problems Involving Inhomogeneous Media with MLFMA on GPU Clusters}
-\author{
-{Carl Pearson{\small $^{1}$}, Mert Hidayetoglu{\small $^{1}$}, and
-Wen-Mei Hwu{\small $^{1}$} }
-\vspace{1.6mm}\\
-\fontsize{10}{10}\selectfont\itshape
-$~^{1}$University of Illinois Urbana-Champaign Electrical and Computer Engineering, Urbana, 61801, USA\\
-$~^{2}$Second Affiliation, City, Postal Code, Country\\
-\fontsize{9}{9}\upshape \texttt{\{pearson, hidayet2, w-hwu\}}@illinois.edu}
-\begin{document}
-\maketitle
-
-\begin{abstract}
-The multilevel fast multiple method (MLFMM) is a key tool for efficiently solving large scattering problems.
-Highly inhomogeneous media prevents converting the problem into a surface-scattering problem via equivalence principle, and therefore we solve the corresponding volume integral equation.
-We evaluate an efficient implementation of MLFMM for such two-dimensional volumetric scattering problems on high-performance GPU-accelerated supercomputing nodes.
-This class of problems are commonly encountered in imaging and inverse-scattering applications.
-\end{abstract}
-
-\section{Introduction}
-
-
-In order to achieve an efficient implementation on multiple graphics processing units (GPUs), we formulate the MLFMM operations as matrix-matrix multiplications, where the large matrices are partitioned among message passing interface (MPI) processes. Each process employs a single GPU for performing the corresponding partial multiplications. The implementation can employ up to 16 GPUs. During the MLFMM multiplications, the GPUs communicate through MPI to receive the required data from each other. These communications are costly since they involve moving the data from GPUs to central processing units (CPUs), CPUs to CPUs (as in the traditional CPU implementation), and then from CPUs to GPUs. To minimize this cost, we optimize the data amount to be transferred, and merge small MPI buffers into large ones. Furthermore, we overlap the communications with the GPU computations by a reordering of the MLFMM operations. This strategy completely hides the communication overhead and provides good, i.e., 94\%, MPI parallelization efficiency.
-
-\section{Inverse-Scattering Formulation and Application Architecture}
-\label{sec:application}
-
-Fig.~\ref{fig:app_breakdown} shows the amount of time the full inverse-solver application spends on MFLMM in two parallelized CPU executions.
-
-``BW (32T)'' corresponds to a 32-thread OpenMP parallel run on a single XE node, and S822LC corresponds to a 160-thread OpenMP parallel run on the S822LC node.
-Non-MLFMM operations are a minority of the time, and become an even smaller proportion of the time as the object reconstructions grow larger.
-
-\begin{figure}[b]
-\begin{center}
-\begin{tabular}{c}
-\mbox{\psfig{figure=figures/cpu_matvec.pdf,width=8cm}}
-\end{tabular}
-\end{center}
-  \caption{A three-dimensional plot with gray-scale format.}
-  \label{fig:app_breakdown}
-\end{figure}
-
-
-
-\section{MLFMM Results}
-
-As described in section \ref{sec:application} and shown in Table \ref{tab:components}, the MLFMM realization of matrix-vector multiplications forms the core computational kernel of the application, and its performance dominates that of the full inverse solver.
-This section presents an analysis of the performance of the MLFMM algorithm in three different environments.
-
-\subsection{Evaluation Environments}
-
-\begin{table}{}
-\centering \caption{Evaluation Systems} \label{tab:systems}
-\begin{tabular}{|c|c|c|c|}
-\hline & \textbf{XK Node} & \textbf{XE Node} & \textbf{S822LC} \\
-\hline
-\hline \textbf{CPU 1} & AMD Opteron 6276 & AMD Opteron 6276 & IBM Power8 \\
-\hline \textbf{CPU 2} & --               & AMD Opteron 6276 & IBM Power8 \\
-\hline
-\hline \textbf{GPU 1} & \makecell{K20X \\ (6 GB RAM) }  & --               & P100 (16GB RAM) \\
-\hline \textbf{GPU 2} & --                              & --               & P100 (16GB RAM) \\
-\hline \textbf{GPU 3} & --                              & --               & P100 (16GB RAM) \\
-\hline \textbf{GPU 4} & --                              & --               & P100 (16GB RAM) \\
-\hline \textbf{RAM}   & 32GB                            & 64 GB            & 512 GB \\
-\hline \makecell{\textbf{CPU-GPU} \\ \textbf{Bus}} & PCIe & -- & NVLink \\
-\hline
-\end{tabular}
-\end{table}
-
-The performance of MLFMM is evaluated in three different computing environments: Blue Waters XE nodes, Blue Waters XK nodes, and an IBM S822LC.
-The Blue Waters XE and XK nodes are two different kinds of computing nodes available on the Blue Waters supercomputer.
-Each Blue Waters node is a two-socket system: the XE node has two AMD Opteron 6276 CPUs, each with eight floating-point units, hardware support for 16 executing threads, and $32$~GB of RAM.
-The XK node replaces one of these CPUs with an NVIDIA K20X GPU with the Kepler architecture and $6$~GB of RAM.
-The K20x is connected to the Operton 6276 with PCIe.
-These XE and XK nodes are representative of the compute capabilities of current-generation clusters and supercomputers.
-The IBM S822LC represents a next-generation accelerator-heavy supercomputing node.
-It has two IBM Power8 CPUs with ten floating-point units, support for 80 executing threads, and $256$~GB of RAM.
-In addition, each Minsky machine has four NVIDIA P100 GPUs Pascal-architecture GPUs with $16$~GB of RAM.
-The P100s are connected to the Power8 CPUs via $80$~GB/s NVLink connections.
-
-\subsection{MLFMM Performance}
-
-All evaluations are done on a problem with these parameters. \todo{get from mert}
-
-
-Fig.~\ref{fig:kernel_breakdown} shows the amount of  of MLFMM execution time spent in computational kernels.
-
-\begin{figure}[b]
-\begin{center}
-\begin{tabular}{c}
-\mbox{\psfig{figure=figures/mlfmm_bw.pdf,width=8cm}}
-\end{tabular}
-\end{center}
-  \caption{BW.}
-  \label{fig:kernel_breakdown}
-\end{figure}
-
-Fig.~\ref{fig:kernel_breakdown} shows the amount of  of MLFMM execution time spent in computational kernels.
-
-\begin{figure}[b]
-\begin{center}
-\begin{tabular}{c}
-\mbox{\psfig{figure=figures/mlfmm_minsky.pdf,width=8cm}}
-\end{tabular}
-\end{center}
-  \caption{A three-dimensional plot with gray-scale format.}
-  \label{fig:kernel_breakdown}
-\end{figure}
-
-
-
-\subsection{Computation Kernel Breakdown}
-
-Fig.~\ref{fig:kernel_breakdown} shows the amount of  of MLFMM execution time spent in computational kernels.
-
-\begin{figure}[b]
-\begin{center}
-\begin{tabular}{c}
-\mbox{\psfig{figure=figures/kernels.pdf,width=8cm}}
-\end{tabular}
-\end{center}
-  \caption{A three-dimensional plot with gray-scale format.}
-  \label{fig:kernel_breakdown}
-\end{figure}
-
-
-
-This document is a template for authors preparing papers for the
-CEM'17 Computing and Electromagnetics Workshop in Barcelona, Spain.
-The papers are required to use the IEEE style by following the
-instructions provided in this document. The language is English.
-The papers are expected to be two-pages long.
-
-
-\section{Text Format}
-%Page size is A4, which is 210 mm (8.27 in) wide and 297 mm
-%(11.69 in) long. The margins are as follows:
-%\begin{itemize}
-%\item   Top: 19 mm (0.75 in) \item   Bottom: 43 mm (1.69 in) \item
-%Left-Right: 14.32 mm (0.56 in)
-%\end{itemize}
-%The paper is in two column format with a space of 4.22 mm (0.17 in)
-%between columns. All title and author details must be in
-%single-column format and must be centered. All paragraphs are
-%indented. The entire document should be in Times New Roman or
-%Times font. Recommended font size is 10~pt for the main text.
-%Headings of the subsections are as follows, if required:
-%\subsection{This is First-Level Subsection}
-%You may use 1st level subsections, if required.
-%\\
-%\subsubsection{This is Second-Level Subsection}
-%You may use 2nd level subsections, if required.
-%\\
-%\\
-%\indent Page numbers, headers and footers should not be used. All
-%hypertext links and bookmarks should be removed from papers. If
-%you need to refer to an Internet email address or URL in your
-%paper, you should type out the address or URL fully in regular
-%font.
-
-
-
-
-%\section{Figures and Tables}
-%Figures should be centered in the column, but large figures may
-%span across both columns, if they are positioned either at the top
-%or at the bottom of the page. Graphics should have an adequate
-%resolution. Fig.~\ref{fig1} presents an example plot in gray-scale
-%format. Colors can be used; however, it is recommended that the
-%graphics are checked to reproduce the required details in
-%gray-scale copy. For example, the colors in Fig.~\ref{fig2}(a) are
-%not appropriate for a gray-scale print. For the same plot,
-%Fig.~\ref{fig2}(b) is more preferable. Figures are numbered using
-%Arabic numerals and the captions are in 8~pt regular font. Tables
-%should be numbered using uppercase Roman numerals and their
-%captions are centered as in Table~\ref{table1}.
-
-
-
-\begin{table}{}
-\centering \caption{Caption of the Table.} \label{table1}
-\begin{tabular}{|c|c|c|c|}
-\hline Item~1& Item~2
-& Item~3 & Item~4\\
-\hline\hline  \multicolumn{4}{|c|}{Item~5} \\
-\hline Item~6&
-\multicolumn{3}{|c|}{Item~7}\\
-\hline Item~8 & Item~9 & Item~10 & Item~11\\
-\hline
-\end{tabular}
-\end{table}
-
-
-
-\section{References} 
-%The heading of the references section is
-%not be numbered and all reference items are in 8~pt font.
-%References are required to be in IEEE style.  Please refer to the
-%examples for journals~\cite{journal}, for
-%books~\cite{book1},~\cite{book2}, and for conference
-%papers~\cite{conf1},~\cite{conf2}.
-
-% the following vfill coarsely balances the columns on the last page
-\vfill \pagebreak
-
-\section{Conclusions}
-%This template uses IEEE style and provides necessary information
-%to prepare papers for CEM'17 Workshop. Thank you for your
-%contributions.
-
-
-\section*{Acknowledgment}
-%Acknowledgments should be here.
-
-\bibliographystyle{IEEEtran}
-\begin{thebibliography}{99}
-\bibitem{journal} A.~Author, B.~Author, and C.~Author,
-``Publication title,'' {\it Journal Title}, vol.~0, no.~0,
-pp.~00--00, Month~Year.
-\bibitem{book1} A.~Author, B.~Author, and C.~Author,
-{\it Book Title}. Location: Publisher,~Year.
-\bibitem{book2} A.~Author, B.~Author, and C.~Author,
-``Chapter title,'' in {\it Book Title}, A.~Editor,~Ed. Location:
-Publisher,~Year,~Chap.~0.
-\bibitem{conf1} A.~Author, B.~Author, and C.~Author, ``Paper
-title,'' in {\it Proc. Conference Title}, vol.~0, Year, pp.~0--0.
-\bibitem{conf2} A.~Author, B.~Author, and C.~Author, ``Paper
-title,'' {\it Conference Title}, Location, Country, Month~Year.
-\end{thebibliography}
-
-\end{document}