# bench Protoype C++11 MPI benchmark support library inspired by [google/benchmark](github.com/google/benchmark). ## Benchmark Loop An example ping-pong benchmark (bin/pingpong.cpp) ```c++ #include "bench/bench.hpp" #include void pingpong(bench::State &state) { const int rank = bench::world_rank(); const int size = bench::world_size(); const size_t sz = 1; char *sbuf = new char[sz]; char *rbuf = new char[sz]; for (auto _ : state) { if (0 == rank) { MPI_Send(sbuf, sz, MPI_BYTE, 1, 0, MPI_COMM_WORLD); MPI_Recv(rbuf, sz, MPI_BYTE, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); } else if (1 == rank) { MPI_Recv(rbuf, sz, MPI_BYTE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); MPI_Send(sbuf, sz, MPI_BYTE, 0, 0, MPI_COMM_WORLD); } } state.set_bytes_processed(sz); delete[] sbuf; delete[] rbuf; } BENCH(pingpong)->timing_root_rank()->no_iter_barrier(); BENCH_MAIN() ``` The library will automatically determine the number of iterations to run. Before the `pingpong` function is called, the library will call `MPI_Barrier(MPI_COMM_WORLD)`. Then, `pingpong` will be called. Setup code happens before the `auto _ : state` loop. Each iteration of the loop contributes to the total time. After each iteration, an `MPI_Barrier(MPI_COMM_WORLD)` is invoked, it's time does not contribute (see `Benchmark::no_iter_barrier()`. After the loop, benchmark-specific teardown occurs. `timing_root_rank()` says that the reported timing should be tracked just by elapsed time on the root rank. `no_iter_barrier()` says that there should be no `MPI_Barrier()` between state iterations. ## Reporting The reported time the average ns/iteration. If `state.set_bytes_processed` is used, the provided value should be the number of bytes per iteration. The reported number of bytes will be bytes / second. ## * `Benchmark::timing_max_rank()`: report the maximum time consumed across all ranks * `Benchmark::timing_root_rank()`: only record time in rank 0 * `Benchmark::no_iter_barrier()`: Do not do an `MPI_Barrier()` between iterations. ## Roadmap - [ ] Automatic Timing - [x] `timing_root_rank` - [x] `timing_max_rank` - [ ] `timing_wall`: the wall time from the first rank starts to the last rank ends - [ ] `timing_aggregate`: aggregate time consumed in each rank - [ ] Manual timing - [x] state.pause_timing() - [x] state.resume_timing() - [ ] state.set_iteration_time() - [ ] Iteration control - [ ] manual - [ ] automatic - [ ] Support running a benchmark over multiple communicators - [ ] Benchmark must take a communicator - [ ] All pairs of ranks - [ ] Specific pairs of ranks - [ ] CSV reporter - [ ] Add arguments to a benchmark - [ ] Add statistics for repeated runs - [ ] trimean - [ ] standard deviation - [ ] min - [ ] max - [ ] JSON reporter - [ ] Benchmark registration - [x] static - [x] Auto-generated main function - [x] function pointer - [ ] lambda function