bench
Protoype C++11 MPI benchmark support library inspired by google/benchmark.
Benchmark Loop
An example ping-pong benchmark (bin/pingpong.cpp)
#include "bench/bench.hpp"
#include <mpi.h>
void pingpong(bench::State &state) {
const int rank = bench::world_rank();
const int size = bench::world_size();
const size_t sz = 1;
char *sbuf = new char[sz];
char *rbuf = new char[sz];
for (auto _ : state) {
if (0 == rank) {
MPI_Send(sbuf, sz, MPI_BYTE, 1, 0, MPI_COMM_WORLD);
MPI_Recv(rbuf, sz, MPI_BYTE, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
} else if (1 == rank) {
MPI_Recv(rbuf, sz, MPI_BYTE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Send(sbuf, sz, MPI_BYTE, 0, 0, MPI_COMM_WORLD);
}
}
state.set_bytes_processed(sz);
delete[] sbuf;
delete[] rbuf;
}
BENCH(pingpong)->timing_root_rank()->no_iter_barrier();
BENCH_MAIN()
The library will automatically determine the number of iterations to run.
Before the pingpong function is called, the library will call MPI_Barrier(MPI_COMM_WORLD).
Then, pingpong will be called.
Setup code happens before the auto _ : state loop.
Each iteration of the loop contributes to the total time.
After each iteration, an MPI_Barrier(MPI_COMM_WORLD) is invoked, it's time does not contribute (see Benchmark::no_iter_barrier().
After the loop, benchmark-specific teardown occurs.
timing_root_rank() says that the reported timing should be tracked just by elapsed time on the root rank.
no_iter_barrier() says that there should be no MPI_Barrier() between state iterations.
Reporting
The reported time the average ns/iteration.
If state.set_bytes_processed is used, the provided value should be the number of bytes per iteration.
The reported number of bytes will be bytes / second.
Benchmark::timing_max_rank(): report the maximum time consumed across all ranksBenchmark::timing_root_rank(): only record time in rank 0Benchmark::no_iter_barrier(): Do not do anMPI_Barrier()between iterations.
Roadmap
- Automatic Timing
timing_root_ranktiming_max_ranktiming_wall: the wall time from the first rank starts to the last rank endstiming_aggregate: aggregate time consumed in each rank
- Manual timing
- state.pause_timing()
- state.resume_timing()
- state.set_iteration_time()
- Iteration control
- manual
- automatic
- Support running a benchmark over multiple communicators
- Benchmark must take a communicator
- All pairs of ranks
- Specific pairs of ranks
- CSV reporter
- Add arguments to a benchmark
- Add statistics for repeated runs
- trimean
- standard deviation
- min
- max
- JSON reporter
- Benchmark registration
- static
- Auto-generated main function
- function pointer
- lambda function
- static