work
This commit is contained in:
@@ -47,7 +47,7 @@ Fig.~\ref{fig:app_breakdown} shows the amount of time the full inverse-solver ap
|
|||||||
``BW (32T)'' corresponds to a 32-thread OpenMP parallel run on a single XE node, and S822LC corresponds to a 160-thread OpenMP parallel run on the S822LC node.
|
``BW (32T)'' corresponds to a 32-thread OpenMP parallel run on a single XE node, and S822LC corresponds to a 160-thread OpenMP parallel run on the S822LC node.
|
||||||
Non-MLFMM operations are a minority of the time, and become an even smaller proportion of the time as the object reconstructions grow larger.
|
Non-MLFMM operations are a minority of the time, and become an even smaller proportion of the time as the object reconstructions grow larger.
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[ht]
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\begin{tabular}{c}
|
\begin{tabular}{c}
|
||||||
\mbox{\psfig{figure=figures/cpu_matvec.pdf,width=8cm}}
|
\mbox{\psfig{figure=figures/cpu_matvec.pdf,width=8cm}}
|
||||||
@@ -56,6 +56,7 @@ Non-MLFMM operations are a minority of the time, and become an even smaller prop
|
|||||||
\caption{
|
\caption{
|
||||||
Amount of application time spent in MLFMM for two different execution environments.
|
Amount of application time spent in MLFMM for two different execution environments.
|
||||||
MLFMM is the dominant component even with CPU parallelization on a single node.
|
MLFMM is the dominant component even with CPU parallelization on a single node.
|
||||||
|
As object reconstructions grow larger or more challenging, MLFMM time further increases as a proportion of application time.
|
||||||
}
|
}
|
||||||
\label{fig:app_breakdown}
|
\label{fig:app_breakdown}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
@@ -103,27 +104,32 @@ The P100s are connected to the Power8 CPUs via $80$~GB/s NVLink connections.
|
|||||||
All evaluations are done on a problem with these parameters. \todo{get from mert}
|
All evaluations are done on a problem with these parameters. \todo{get from mert}
|
||||||
|
|
||||||
|
|
||||||
Fig.~\ref{fig:mlfmm_bw} shows the amount of of MLFMM execution time spent in computational kernels.
|
Fig.~\ref{fig:mlfmm_bw} shows the MLFMM performance scaling on various Blue Waters configurations.
|
||||||
|
|
||||||
\begin{figure}[b]
|
\begin{figure}[htbp]
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\begin{tabular}{c}
|
\begin{tabular}{c}
|
||||||
\mbox{\psfig{figure=figures/mlfmm_bw.pdf,width=8cm}}
|
\mbox{\psfig{figure=figures/mlfmm_bw.pdf,width=8cm}}
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{center}
|
\end{center}
|
||||||
\caption{BW.}
|
\caption{
|
||||||
|
BW.
|
||||||
|
}
|
||||||
\label{fig:mlfmm_bw}
|
\label{fig:mlfmm_bw}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
Fig.~\ref{fig:mlfmm_minsky} shows the amount of MLFMM execution time spent in computational kernels.
|
Fig.~\ref{fig:mlfmm_minsky} shows the MLFMM performance scaling for various S822LC configurations.
|
||||||
|
|
||||||
\begin{figure}[b]
|
|
||||||
|
\begin{figure}[htbp]
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\begin{tabular}{c}
|
\begin{tabular}{c}
|
||||||
\mbox{\psfig{figure=figures/mlfmm_minsky.pdf,width=8cm}}
|
\mbox{\psfig{figure=figures/mlfmm_minsky.pdf,width=8cm}}
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{center}
|
\end{center}
|
||||||
\caption{S822LC.}
|
\caption{
|
||||||
|
S822LC.
|
||||||
|
}
|
||||||
\label{fig:mlfmm_minsky}
|
\label{fig:mlfmm_minsky}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
@@ -132,8 +138,12 @@ Fig.~\ref{fig:mlfmm_minsky} shows the amount of MLFMM execution time spent in co
|
|||||||
\subsection{Computation Kernel Breakdown}
|
\subsection{Computation Kernel Breakdown}
|
||||||
|
|
||||||
Fig.~\ref{fig:kernel_breakdown} shows the amount of of MLFMM execution time spent in computational kernels.
|
Fig.~\ref{fig:kernel_breakdown} shows the amount of of MLFMM execution time spent in computational kernels.
|
||||||
|
\texttt{P2P} is the ``particle-to-particle'' or nearfield exchanges.
|
||||||
|
\texttt{P2M} and \texttt{M2M} are the lowest-level and higher-level aggregations, respectively.
|
||||||
|
\texttt{L2L} and \texttt{L2P} are the higher-level and lowest-level disaggregations, respectively.
|
||||||
|
\texttt{M2M} is the translations.
|
||||||
|
|
||||||
\begin{figure}[b]
|
\begin{figure}[htbp]
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\begin{tabular}{c}
|
\begin{tabular}{c}
|
||||||
\mbox{\psfig{figure=figures/kernels.pdf,width=8cm}}
|
\mbox{\psfig{figure=figures/kernels.pdf,width=8cm}}
|
||||||
@@ -145,11 +155,11 @@ Fig.~\ref{fig:kernel_breakdown} shows the amount of of MLFMM execution time spe
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
This document is a template for authors preparing papers for the
|
%This document is a template for authors preparing papers for the
|
||||||
CEM'17 Computing and Electromagnetics Workshop in Barcelona, Spain.
|
%CEM'17 Computing and Electromagnetics Workshop in Barcelona, Spain.
|
||||||
The papers are required to use the IEEE style by following the
|
%The papers are required to use the IEEE style by following the
|
||||||
instructions provided in this document. The language is English.
|
%instructions provided in this document. The language is English.
|
||||||
The papers are expected to be two-pages long.
|
%The papers are expected to be two-pages long.
|
||||||
|
|
||||||
|
|
||||||
\section{Text Format}
|
\section{Text Format}
|
||||||
|
|||||||
Reference in New Issue
Block a user