Merge sharelatex-2017-05-07-2001 into master

This commit is contained in:
Carl Pearson
2017-05-07 15:01:20 -05:00
committed by GitHub

View File

@@ -11,6 +11,7 @@
\documentclass[10pt,conference,a4paper]{IEEEtran}
\usepackage{times,amsmath,epsfig}
\usepackage{makecell}
\title{Solving Problems Involving Inhomogeneous Media with MLFMA on GPU Clusters}
\author{
@@ -60,56 +61,55 @@ Table \ref{tab:components} shows the breakdown of application component executio
\section{MLFMM Results}
As described in section \ref{sec:application} and shown in Table \ref{tab:components}, the MLFMM realization of matrix-vector multiplications forms the core computational kernel of the application, and its performance dominates that of the full inverse solver. This section presents an analysis of the performance of the MLFMM algorithm on three different systems.
As described in section \ref{sec:application} and shown in Table \ref{tab:components}, the MLFMM realization of matrix-vector multiplications forms the core computational kernel of the application, and its performance dominates that of the full inverse solver.
This section presents an analysis of the performance of the MLFMM algorithm in three different environments.
\subsection{Evaluation Systems}
\subsection{Evaluation Environments}
\begin{table}{}
\centering \caption{Evaluation Systems} \label{tab:systems}
\begin{tabular}{|c|c|c|c|}
\hline & \textbf{Blue Waters XK Node} & \textbf{Blue Waters XE Node} & \textbf{Minsky} \\
\hline & \textbf{XK Node} & \textbf{XE Node} & \textbf{S822LC} \\
\hline
\hline \textbf{CPU 1} & AMD Opteron 6276 & AMD Opteron 6276 & IBM Power8 \\
\hline \textbf{CPU 2} & -- & AMD Opteron 6276 & IBM Power8 \\
\hline
\hline \textbf{GPU 1} & K20X (6 GB RAM & -- & P100 (16GB RAM) \\
\hline \textbf{GPU 1} & \makecell{K20X \\ (6 GB RAM) } & -- & P100 (16GB RAM) \\
\hline \textbf{GPU 2} & -- & -- & P100 (16GB RAM) \\
\hline \textbf{GPU 3} & -- & -- & P100 (16GB RAM) \\
\hline \textbf{GPU 4} & -- & -- & P100 (16GB RAM) \\
\hline \textbf{RAM} & 32GB & 64 GB & 512 GB \\
\hline \textbf{CPU-GPU Connection} & PCIe & -- & NVLink \\
\hline \makecell{\textbf{CPU-GPU} \\ \textbf{Bus}} & PCIe & -- & NVLink \\
\hline
\end{tabular}
\end{table}
The Blue Waters XE Node and Blue Waters XK node are two different kinds of computing nodes available on the Blue Waters supercomputer.
Both nodes are two socket-systems: the XE node has two AMD Opteron 6276 CPUs, each which has 8 floating-point units and hardware support for 16 executing threads. The XK node replaces one of these CPUs with an NVIDIA K20X GPUs based off of the Kepler architecture with 6GB of RAM.
These systems are representative of the nodes in current-generation clusters and supercomputers.
The Minsky system represents a next-generation accelerator-heavy supercomputing node.
It has two IBM Power8 CPUs with 10 floating-point units and 80 executing threads.
In addition, each Minsky machine has four NVIDIA P100 GPUs based off of the Pascal architecture with 16GB of RAM.
The performance of MLFMM is evaluated in three different computing environments: Blue Waters XE nodes, Blue Waters XK nodes, and an IBM S822LC.
The Blue Waters XE and XK nodes are two different kinds of computing nodes available on the Blue Waters supercomputer.
Each Blue Waters node is a two-socket system: the XE node has two AMD Opteron 6276 CPUs, each with eight floating-point units, hardware support for 16 executing threads, and $32$~GB of RAM.
The XK node replaces one of these CPUs with an NVIDIA K20X GPU with the Kepler architecture and $6$~GB of RAM.
The K20x is connected to the Operton 6276 with PCIe.
These XE and XK nodes are representative of the compute capabilities of current-generation clusters and supercomputers.
The IBM S822LC represents a next-generation accelerator-heavy supercomputing node.
It has two IBM Power8 CPUs with ten floating-point units, support for 80 executing threads, and $256$~GB of RAM.
In addition, each Minsky machine has four NVIDIA P100 GPUs Pascal-architecture GPUs with $16$~GB of RAM.
The P100s are connected to the Power8 CPUs via $80$~GB/s NVLink connections.
\subsection{MLFMM Performance}
\subsection{GPU Kernel Performance}
\begin{table}{}
\centering \caption{Evaluation Systems} \label{tab:systems}
\begin{tabular}{|c|c|c|}
\hline \textbf{Kernel} & \textbf{XK} & \textbf{Minsky} \\
\hline
\hline \textbf{CPU 1} & AMD Opteron 6276 & AMD Opteron 6276 \\
\hline \textbf{CPU 2} & -- & AMD Opteron 6276 \\
\hline
\hline \textbf{GPU 1} & K20X (6 GB RAM & -- \\
\hline \textbf{GPU 2} & -- & -- \\
\hline \textbf{GPU 3} & -- & -- \\
\hline \textbf{GPU 4} & -- & -- \\
\hline \textbf{RAM} & 32GB & 64 GB \\
\hline \textbf{CPU-GPU Connection} & PCIe & \\
\hline
Table \ref{tab:mlfmm_breakdown} shows the breakdown of MLFMM kernel times in different execution environments.
\begin{figure}[b]
\begin{center}
\begin{tabular}{c}
\mbox{\psfig{figure=figures/example_fig0.pdf,width=8cm}}
\end{tabular}
\end{table}
\end{center}
\caption{A three-dimensional plot with gray-scale
format.}\label{fig1}
\end{figure}
This document is a template for authors preparing papers for the
CEM'17 Computing and Electromagnetics Workshop in Barcelona, Spain.