Updates from ShareLaTeX
This commit is contained in:
@@ -47,13 +47,16 @@ Fig.~\ref{fig:app_breakdown} shows the amount of time the full inverse-solver ap
|
||||
``BW (32T)'' corresponds to a 32-thread OpenMP parallel run on a single XE node, and S822LC corresponds to a 160-thread OpenMP parallel run on the S822LC node.
|
||||
Non-MLFMM operations are a minority of the time, and become an even smaller proportion of the time as the object reconstructions grow larger.
|
||||
|
||||
\begin{figure}[b]
|
||||
\begin{figure}[h]
|
||||
\begin{center}
|
||||
\begin{tabular}{c}
|
||||
\mbox{\psfig{figure=figures/cpu_matvec.pdf,width=8cm}}
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
\caption{A three-dimensional plot with gray-scale format.}
|
||||
\caption{
|
||||
Amount of application time spent in MLFMM for two different execution environments.
|
||||
MLFMM is the dominant component even with CPU parallelization on a single node.
|
||||
}
|
||||
\label{fig:app_breakdown}
|
||||
\end{figure}
|
||||
|
||||
@@ -61,30 +64,30 @@ Non-MLFMM operations are a minority of the time, and become an even smaller prop
|
||||
|
||||
\section{MLFMM Results}
|
||||
|
||||
As described in section \ref{sec:application} and shown in Table \ref{tab:components}, the MLFMM realization of matrix-vector multiplications forms the core computational kernel of the application, and its performance dominates that of the full inverse solver.
|
||||
As described in Section \ref{sec:application} and shown in Fig. \ref{fig:app_breakdown}, the MLFMM realization of matrix-vector multiplications forms the core computational kernel of the application, and its performance dominates that of the full inverse solver.
|
||||
This section presents an analysis of the performance of the MLFMM algorithm in three different environments.
|
||||
|
||||
\subsection{Evaluation Environments}
|
||||
|
||||
\begin{table}{}
|
||||
\centering \caption{Evaluation Systems} \label{tab:systems}
|
||||
\begin{tabular}{|c|c|c|c|}
|
||||
\hline & \textbf{XK Node} & \textbf{XE Node} & \textbf{S822LC} \\
|
||||
\hline
|
||||
\hline \textbf{CPU 1} & AMD Opteron 6276 & AMD Opteron 6276 & IBM Power8 \\
|
||||
\hline \textbf{CPU 2} & -- & AMD Opteron 6276 & IBM Power8 \\
|
||||
\hline
|
||||
\hline \textbf{GPU 1} & \makecell{K20X \\ (6 GB RAM) } & -- & P100 (16GB RAM) \\
|
||||
\hline \textbf{GPU 2} & -- & -- & P100 (16GB RAM) \\
|
||||
\hline \textbf{GPU 3} & -- & -- & P100 (16GB RAM) \\
|
||||
\hline \textbf{GPU 4} & -- & -- & P100 (16GB RAM) \\
|
||||
\hline \textbf{RAM} & 32GB & 64 GB & 512 GB \\
|
||||
\hline \makecell{\textbf{CPU-GPU} \\ \textbf{Bus}} & PCIe & -- & NVLink \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
%\begin{table}{}
|
||||
%\centering \caption{Evaluation Systems} \label{tab:systems}
|
||||
%\begin{tabular}{|c|c|c|c|}
|
||||
%\hline & \textbf{XK Node} & \textbf{XE Node} & \textbf{S822LC} \\
|
||||
%\hline
|
||||
%\hline \textbf{CPU 1} & AMD Opteron 6276 & AMD Opteron 6276 & IBM Power8 \\
|
||||
%\hline \textbf{CPU 2} & -- & AMD Opteron 6276 & IBM Power8 \\
|
||||
%\hline
|
||||
%\hline \textbf{GPU 1} & \makecell{K20X \\ (6 GB RAM) } & -- & P100 (16GB RAM) \\
|
||||
%\hline \textbf{GPU 2} & -- & -- & P100 (16GB RAM) \\
|
||||
%\hline \textbf{GPU 3} & -- & -- & P100 (16GB RAM) \\
|
||||
%\hline \textbf{GPU 4} & -- & -- & P100 (16GB RAM) \\
|
||||
%\hline \textbf{RAM} & 32GB & 64 GB & 512 GB \\
|
||||
%\hline \makecell{\textbf{CPU-GPU} \\ \textbf{Bus}} & PCIe & -- & NVLink \\
|
||||
%\hline
|
||||
%\end{tabular}
|
||||
%\end{table}
|
||||
|
||||
The performance of MLFMM is evaluated in three different computing environments: Blue Waters XE nodes, Blue Waters XK nodes, and an IBM S822LC.
|
||||
The performance of MLFMM is evaluated in three different computing systems: Blue Waters XE nodes, Blue Waters XK nodes, and an IBM S822LC.
|
||||
The Blue Waters XE and XK nodes are two different kinds of computing nodes available on the Blue Waters supercomputer.
|
||||
Each Blue Waters node is a two-socket system: the XE node has two AMD Opteron 6276 CPUs, each with eight floating-point units, hardware support for 16 executing threads, and $32$~GB of RAM.
|
||||
The XK node replaces one of these CPUs with an NVIDIA K20X GPU with the Kepler architecture and $6$~GB of RAM.
|
||||
@@ -100,7 +103,7 @@ The P100s are connected to the Power8 CPUs via $80$~GB/s NVLink connections.
|
||||
All evaluations are done on a problem with these parameters. \todo{get from mert}
|
||||
|
||||
|
||||
Fig.~\ref{fig:kernel_breakdown} shows the amount of of MLFMM execution time spent in computational kernels.
|
||||
Fig.~\ref{fig:mlfmm_bw} shows the amount of of MLFMM execution time spent in computational kernels.
|
||||
|
||||
\begin{figure}[b]
|
||||
\begin{center}
|
||||
@@ -109,10 +112,10 @@ Fig.~\ref{fig:kernel_breakdown} shows the amount of of MLFMM execution time spe
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
\caption{BW.}
|
||||
\label{fig:kernel_breakdown}
|
||||
\label{fig:mlfmm_bw}
|
||||
\end{figure}
|
||||
|
||||
Fig.~\ref{fig:kernel_breakdown} shows the amount of of MLFMM execution time spent in computational kernels.
|
||||
Fig.~\ref{fig:mlfmm_minsky} shows the amount of MLFMM execution time spent in computational kernels.
|
||||
|
||||
\begin{figure}[b]
|
||||
\begin{center}
|
||||
@@ -120,8 +123,8 @@ Fig.~\ref{fig:kernel_breakdown} shows the amount of of MLFMM execution time spe
|
||||
\mbox{\psfig{figure=figures/mlfmm_minsky.pdf,width=8cm}}
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
\caption{A three-dimensional plot with gray-scale format.}
|
||||
\label{fig:kernel_breakdown}
|
||||
\caption{S822LC.}
|
||||
\label{fig:mlfmm_minsky}
|
||||
\end{figure}
|
||||
|
||||
|
||||
@@ -136,7 +139,7 @@ Fig.~\ref{fig:kernel_breakdown} shows the amount of of MLFMM execution time spe
|
||||
\mbox{\psfig{figure=figures/kernels.pdf,width=8cm}}
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
\caption{A three-dimensional plot with gray-scale format.}
|
||||
\caption{Normalized breakdown of the computation time across different MLFMM kernels in different exection environments.}
|
||||
\label{fig:kernel_breakdown}
|
||||
\end{figure}
|
||||
|
||||
@@ -194,18 +197,18 @@ The papers are expected to be two-pages long.
|
||||
|
||||
|
||||
|
||||
\begin{table}{}
|
||||
\centering \caption{Caption of the Table.} \label{table1}
|
||||
\begin{tabular}{|c|c|c|c|}
|
||||
\hline Item~1& Item~2
|
||||
& Item~3 & Item~4\\
|
||||
\hline\hline \multicolumn{4}{|c|}{Item~5} \\
|
||||
\hline Item~6&
|
||||
\multicolumn{3}{|c|}{Item~7}\\
|
||||
\hline Item~8 & Item~9 & Item~10 & Item~11\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
%\begin{table}{}
|
||||
%\centering \caption{Caption of the Table.} \label{table1}
|
||||
%\begin{tabular}{|c|c|c|c|}
|
||||
%\hline Item~1& Item~2
|
||||
%& Item~3 & Item~4\\
|
||||
%\hline\hline \multicolumn{4}{|c|}{Item~5} \\
|
||||
%\hline Item~6&
|
||||
%\multicolumn{3}{|c|}{Item~7}\\
|
||||
%\hline Item~8 & Item~9 & Item~10 & Item~11\\
|
||||
%\hline
|
||||
%\end{tabular}
|
||||
%\end{table}
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user