Initial ShareLaTeX Import

2017-05-04 14:03:15 -05:00
commit 473f335540
9 changed files with 15215 additions and 0 deletions
--- a/cem17_template.tex
+++ b/cem17_template.tex
@@ -0,0 +1,232 @@
+% IEEE Paper Template for A4 Page Size (V1)
+% Sample Conference Paper using IEEE LaTeX style file for A4 pagesize.
+% Copyright (C) 2006 Causal Productions Pty Ltd.
+% Permission is granted to distribute and revise this file provided that
+% this header remains intact.
+%
+%
+% This style file is produced for CEM'17 Computational Electromagnetics Workshop
+% Modified from a file indicated above.
+
+
+\documentclass[10pt,conference,a4paper]{IEEEtran}
+\usepackage{times,amsmath,epsfig}
+
+\title{Solving Problems Involving Inhomogeneous Media with MLFMA on GPU Clusters}
+\author{
+{Carl Pearson{\small $^{1}$}, Mert Hidayetoglu{\small $^{1}$}, and
+Wen-Mei Hwu{\small $^{1}$} }
+\vspace{1.6mm}\\
+\fontsize{10}{10}\selectfont\itshape
+$~^{1}$University of Illinois Urbana-Champaign Electrical and Computer Engineering, Urbana, 61801, USA\\
+$~^{2}$Second Affiliation, City, Postal Code, Country\\
+\fontsize{9}{9}\upshape \texttt{\{pearson, hidayet2, w-hwu\}}@illinois.edu}
+\begin{document}
+\maketitle
+
+\begin{abstract}
+The multilevel fast multiple method (MLFMM) is a key tool for efficiently solving large scattering problems.
+Highly inhomogeneous media prevents converting the problem into a surface-scattering problem via equivalence principle, and therefore we solve the corresponding volume integral equation.
+We evaluate an efficient implementation of MLFMM for such two-dimensional volumetric scattering problems on high-performance GPU-accelerated supercomputing nodes.
+This class of problems are commonly encountered in imaging and inverse-scattering applications.
+\end{abstract}
+
+\section{Introduction}
+
+
+In order to achieve an efficient implementation on multiple graphics processing units (GPUs), we formulate the MLFMM operations as matrix-matrix multiplications, where the large matrices are partitioned among message passing interface (MPI) processes. Each process employs a single GPU for performing the corresponding partial multiplications. The implementation can employ up to 16 GPUs. During the MLFMM multiplications, the GPUs communicate through MPI to receive the required data from each other. These communications are costly since they involve moving the data from GPUs to central processing units (CPUs), CPUs to CPUs (as in the traditional CPU implementation), and then from CPUs to GPUs. To minimize this cost, we optimize the data amount to be transferred, and merge small MPI buffers into large ones. Furthermore, we overlap the communications with the GPU computations by a reordering of the MLFMM operations. This strategy completely hides the communication overhead and provides good, i.e., 94\%, MPI parallelization efficiency.
+
+\section{Inverse-Scattering Formulation and Application Architecture}
+\label{sec:application}
+
+Table \ref{tab:components} shows the breakdown of application component execution times on the Blue Waters supercomputer.
+
+\begin{table}{}
+\centering \caption{Breakdown of Application Component Time} \label{tab:components}
+\begin{tabular}{|c|c|}
+\hline \textbf{Component} & \textbf{Wall Time (s)} \\
+\hline
+\hline Preprocessing      & 0 \\
+\hline Setup              & 0 \\
+\hline Solution           & 0 \\
+\hline Matvec             & 0 \\
+\hline Solver             & 0 \\
+\hline Postprocessing     & 0 \\
+\hline Other              & 0 \\
+\hline Total              & 0 \\
+\hline
+\end{tabular}
+\end{table}
+
+\section{MLFMM Results}
+
+As described in section \ref{sec:application} and shown in Table \ref{tab:components}, the MLFMM realization of matrix-vector multiplications forms the core computational kernel of the application, and its performance dominates that of the full inverse solver. This section presents an analysis of the performance of the MLFMM algorithm on three different systems.
+
+\subsection{Evaluation Systems}
+
+\begin{table}{}
+\centering \caption{Evaluation Systems} \label{tab:systems}
+\begin{tabular}{|c|c|c|c|}
+\hline & \textbf{Blue Waters XK Node} & \textbf{Blue Waters XE Node} & \textbf{Minsky} \\
+\hline
+\hline \textbf{CPU 1} & AMD Opteron 6276 & AMD Opteron 6276 & IBM Power8 \\
+\hline \textbf{CPU 2} & --               & AMD Opteron 6276 & IBM Power8 \\
+\hline
+\hline \textbf{GPU 1} & K20X (6 GB RAM   & --               & P100 (16GB RAM) \\
+\hline \textbf{GPU 2} & --               & --               & P100 (16GB RAM) \\
+\hline \textbf{GPU 3} & --               & --               & P100 (16GB RAM) \\
+\hline \textbf{GPU 4} & --               & --               & P100 (16GB RAM) \\
+\hline \textbf{RAM}   & 32GB             & 64 GB            & 512 GB \\
+\hline \textbf{CPU-GPU Connection} & PCIe & -- & NVLink \\
+\hline
+\end{tabular}
+\end{table}
+
+The Blue Waters XE Node and Blue Waters XK node are two different kinds of computing nodes available on the Blue Waters supercomputer.
+Both nodes are two socket-systems: the XE node has two AMD Opteron 6276 CPUs, each which has 8 floating-point units and hardware support for 16 executing threads. The XK node replaces one of these CPUs with an NVIDIA K20X GPUs based off of the Kepler architecture with 6GB of RAM.
+These systems are representative of the nodes in current-generation clusters and supercomputers.
+The Minsky system represents a next-generation accelerator-heavy supercomputing node.
+It has two IBM Power8 CPUs with 10 floating-point units and 80 executing threads.
+In addition, each Minsky machine has four NVIDIA P100 GPUs based off of the Pascal architecture with 16GB of RAM.
+
+\subsection{MLFMM Performance}
+
+\subsection{GPU Kernel Performance}
+
+\begin{table}{}
+\centering \caption{Evaluation Systems} \label{tab:systems}
+\begin{tabular}{|c|c|c|}
+\hline \textbf{Kernel} & \textbf{XK} & \textbf{Minsky} \\
+\hline
+\hline \textbf{CPU 1} & AMD Opteron 6276 & AMD Opteron 6276  \\
+\hline \textbf{CPU 2} & --               & AMD Opteron 6276  \\
+\hline
+\hline \textbf{GPU 1} & K20X (6 GB RAM   & --               \\
+\hline \textbf{GPU 2} & --               & --               \\
+\hline \textbf{GPU 3} & --               & --               \\
+\hline \textbf{GPU 4} & --               & --               \\
+\hline \textbf{RAM}   & 32GB             & 64 GB            \\
+\hline \textbf{CPU-GPU Connection} & PCIe &  \\
+\hline
+\end{tabular}
+\end{table}
+
+This document is a template for authors preparing papers for the
+CEM'17 Computing and Electromagnetics Workshop in Barcelona, Spain.
+The papers are required to use the IEEE style by following the
+instructions provided in this document. The language is English.
+The papers are expected to be two-pages long.
+\begin{figure}[b]
+\begin{center}
+\begin{tabular}{c}
+\mbox{\psfig{figure=example_fig0.pdf,width=8cm}}
+\end{tabular}
+\end{center}
+ \caption{A three-dimensional plot with gray-scale
+format.}\label{fig1}
+\end{figure}
+
+\section{Text Format} Page size is A4, which is 210 mm (8.27 in) wide and 297 mm
+(11.69 in) long. The margins are as follows:
+\begin{itemize}
+\item   Top: 19 mm (0.75 in) \item   Bottom: 43 mm (1.69 in) \item
+Left-Right: 14.32 mm (0.56 in)
+\end{itemize}
+The paper is in two column format with a space of 4.22 mm (0.17 in)
+between columns. All title and author details must be in
+single-column format and must be centered. All paragraphs are
+indented. The entire document should be in Times New Roman or
+Times font. Recommended font size is 10~pt for the main text.
+Headings of the subsections are as follows, if required:
+\subsection{This is First-Level Subsection}
+You may use 1st level subsections, if required.
+\\
+\subsubsection{This is Second-Level Subsection}
+You may use 2nd level subsections, if required.
+\\
+\\
+\indent Page numbers, headers and footers should not be used. All
+hypertext links and bookmarks should be removed from papers. If
+you need to refer to an Internet email address or URL in your
+paper, you should type out the address or URL fully in regular
+font.
+
+\begin{figure}[t]
+\begin{center}
+\begin{tabular}{c}
+\mbox{\psfig{figure=example_fig1.pdf,width=8cm}}\\
+{(a)}\\\\ \mbox{\psfig{figure=example_fig2.pdf,width=8cm}}\\{(b)}
+\end{tabular}
+\end{center}
+\caption{Three-dimensional plots with colors. Using (a)
+inappropriate and (b) appropriate colors for gray-scale
+prints.}\label{fig2}
+\end{figure}
+
+\section{Figures and Tables}
+Figures should be centered in the column, but large figures may
+span across both columns, if they are positioned either at the top
+or at the bottom of the page. Graphics should have an adequate
+resolution. Fig.~\ref{fig1} presents an example plot in gray-scale
+format. Colors can be used; however, it is recommended that the
+graphics are checked to reproduce the required details in
+gray-scale copy. For example, the colors in Fig.~\ref{fig2}(a) are
+not appropriate for a gray-scale print. For the same plot,
+Fig.~\ref{fig2}(b) is more preferable. Figures are numbered using
+Arabic numerals and the captions are in 8~pt regular font. Tables
+should be numbered using uppercase Roman numerals and their
+captions are centered as in Table~\ref{table1}.
+
+
+
+\begin{table}{}
+\centering \caption{Caption of the Table.} \label{table1}
+\begin{tabular}{|c|c|c|c|}
+\hline Item~1& Item~2
+& Item~3 & Item~4\\
+\hline\hline  \multicolumn{4}{|c|}{Item~5} \\
+\hline Item~6&
+\multicolumn{3}{|c|}{Item~7}\\
+\hline Item~8 & Item~9 & Item~10 & Item~11\\
+\hline
+\end{tabular}
+\end{table}
+
+
+
+\section{References} The heading of the references section is
+not be numbered and all reference items are in 8~pt font.
+References are required to be in IEEE style.  Please refer to the
+examples for journals~\cite{journal}, for
+books~\cite{book1},~\cite{book2}, and for conference
+papers~\cite{conf1},~\cite{conf2}.
+
+% the following vfill coursely balances the columns on the last page
+\vfill \pagebreak
+
+\section{Conclusions}
+This template uses IEEE style and provides necessary information
+to prepare papers for CEM'17 Workshop. Thank you for your
+contributions.
+
+
+\section*{Acknowledgment}
+Acknowledgments should be here.
+
+\bibliographystyle{IEEEtran}
+\begin{thebibliography}{99}
+\bibitem{journal} A.~Author, B.~Author, and C.~Author,
+``Publication title,'' {\it Journal Title}, vol.~0, no.~0,
+pp.~00--00, Month~Year.
+\bibitem{book1} A.~Author, B.~Author, and C.~Author,
+{\it Book Title}. Location: Publisher,~Year.
+\bibitem{book2} A.~Author, B.~Author, and C.~Author,
+``Chapter title,'' in {\it Book Title}, A.~Editor,~Ed. Location:
+Publisher,~Year,~Chap.~0.
+\bibitem{conf1} A.~Author, B.~Author, and C.~Author, ``Paper
+title,'' in {\it Proc. Conference Title}, vol.~0, Year, pp.~0--0.
+\bibitem{conf2} A.~Author, B.~Author, and C.~Author, ``Paper
+title,'' {\it Conference Title}, Location, Country, Month~Year.
+\end{thebibliography}
+
+\end{document}