#include "bench/bench.hpp"

#include <mpi.h>

void pingpong(bench::State &state) {

  const int rank = bench::world_rank();
  const int size = bench::world_size();

  const size_t sz = 1;

  char *sbuf = new char[sz];
  char *rbuf = new char[sz];

  for (auto _ : state) {
    if (0 == rank) {
      MPI_Send(sbuf, sz, MPI_BYTE, 1, 0, MPI_COMM_WORLD);
      MPI_Recv(rbuf, sz, MPI_BYTE, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    } else if (1 == rank) {
      MPI_Recv(rbuf, sz, MPI_BYTE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
      MPI_Send(sbuf, sz, MPI_BYTE, 0, 0, MPI_COMM_WORLD);
    }
  }

  state.set_bytes_processed(sz);
  delete[] sbuf;
  delete[] rbuf;
}

BENCH(pingpong)->timing_root_rank()->no_iter_barrier();
BENCH_MAIN()

The library will automatically determine the number of iterations to run.

Before the pingpong function is called, the library will call MPI_Barrier(MPI_COMM_WORLD). Then, pingpong will be called. Setup code happens before the auto _ : state loop. Each iteration of the loop contributes to the total time. After each iteration, an MPI_Barrier(MPI_COMM_WORLD) is invoked, it's time does not contribute (see Benchmark::no_iter_barrier(). After the loop, benchmark-specific teardown occurs. timing_root_rank() says that the reported timing should be tracked just by elapsed time on the root rank. no_iter_barrier() says that there should be no MPI_Barrier() between state iterations.

Reporting

The reported time the average ns/iteration. If state.set_bytes_processed is used, the provided value should be the number of bytes per iteration. The reported number of bytes will be bytes / second.

Benchmark::timing_max_rank(): report the maximum time consumed across all ranks
Benchmark::timing_root_rank(): only record time in rank 0
Benchmark::no_iter_barrier(): Do not do an MPI_Barrier() between iterations.

Roadmap

Automatic Timing
- timing_root_rank
- timing_max_rank
- timing_wall: the wall time from the first rank starts to the last rank ends
- timing_aggregate: aggregate time consumed in each rank
Manual timing
- state.pause_timing()
- state.resume_timing()
- state.set_iteration_time()
Iteration control
- manual
- automatic
Support running a benchmark over multiple communicators
- Benchmark must take a communicator
- All pairs of ranks
- Specific pairs of ranks
CSV reporter
Add arguments to a benchmark
Add statistics for repeated runs
- trimean
- standard deviation
- min
- max
JSON reporter
Benchmark registration
- static
  - Auto-generated main function
- function pointer
- lambda function