perfect
| Branch | Status |
|---|---|
| master |
CPU/GPU Performance control library for benchmarking on Linux, x86, POWER, and Nvidia.
Features
- GPU power/utilization/temperature monitoring (nvidia)
- Disable CPU turbo (linux)
- Set OS CPU performance mode to maximum (linux)
- Set GPU clocks (nvidia)
- Disable GPU turbo (nvidia)
- Flush addresses from cache (amd64, POWER)
- CUDA not required (GPU functions will not be compiled)
- Flush file system caches (linux)
Contributors
Installing
CMake
Ensure you have CMake 3.13+.
Add the source tree to your project and then use add_subdirectory
git submodule add git@github.com:cwpearson/perfect.git thirdparty/perfect
CMakeLists.txt
...
add_subdirectory(thirdparty/perfect)
...
target_link_libraries(your-target perfect)
Without CMake
Download the source AND
- for compiling with a non-cuda compiler:
- add the include directory to your includes
- add
nvidia-mlto your link flags - add
-DPERFECT_HAS_CUDAto your compile definitions
- with a CUDA compiler, just compile normally (
PERFECT_HAS_CUDAis defined for you)
g++ code_using_perfect.cpp -DPERFECT_HAS_CUDA -Iperfect/include -lnvidia-ml
nvcc code_using_perfect.cu -Iperfect/include -lnvidia-ml
If you don't have CUDA, then you could just do
g++ code_using_perfect.cpp -I perfect/include
Usage
The perfect functions all return a perfect::Result, which is defined in [include/perfect/result.hpp].
When things are working, it will be perfect::Result::SUCCESS.
A PERFECT macro is also defined, which will terminate with an error message unless the perfect::Result is perfect::Result::SUCCESS.
perfect::CpuTurboState state;
PERFECT(perfect::get_cpu_turbo_state(&state));
Monitoring
perfect can monitor and record GPU activity.
#include "perfect/gpu_monitor.hpp"
Monitor(std::ostream *stream): create a monitor that will write tostream.void Monitor::start(): start the monitorvoid Monitor::stop(): terminate the monitorvoid Monitor::pause(): pause the monitor threadvoid Monitor::resume(): resume the monitor thread
Flush file system caches
perfect can drop various filesystem caches
See tools/sync_drop_caches.cpp
#include "perfect/drop_caches.hpp"
Result sync(): flush filesystem caches to diskResult drop_caches(DropCaches_t mode): remove file system cachesmode = PAGECACHE: drop page cachesmode = ENTRIES: drop dentries and inodesmode = PAGECACHE | ENTRIES: both
CPU Turbo
perfect can enable and disable CPU boost through the Intel p-state mechanism or the ACPI cpufreq mechanism.
#include "perfect/cpu_turbo.hpp"
Result get_cpu_turbo_state(CpuTurboState *state): save the current CPU turbo stateResult set_cpu_turbo_state(CpuTurboState *state): restore a saved CPU turbo stateResult disable_cpu_turbo(): disable CPU turboResult enable_cpu_turbo(): enable CPU turbobool is_turbo_enabled(CpuTurboState state): check if turbo is enabled
OS Performance
perfect can control the OS governor on linux.
See examples/os_perf.cpp.
#include "perfect/os_perf.hpp"
Result get_os_perf_state(OsPerfState *state, const int cpu): Save the current OS governor mode for CPUcpu.Result os_perf_state_maximum(const int cpu): Set the OS governor to it's maximum performance mode.Result set_os_perf_state(const int cpu, OsPerfState state): Restore a previously-saved OS governor mode.
GPU Turbo
perfect can enable/disable GPU turbo boost.
#include "perfect/gpu_turbo.hpp"
Result get_gpu_turbo_state(GpuTurboState *state, unsigned int idx): Get the current turbo state for GPUidx, useful to restore later.bool is_turbo_enabled(GpuTurboState state): Check if turbo is enabled.Result set_gpu_turbo_state(GpuTurboState state, unsigned int idx): Set a previously saved turbo state.Result disable_gpu_turbo(unsigned int idx): Disable GPUidxturbo.Result enable_gpu_turbo(unsigned int idx): Enable GPUidxturbo.
GPU Clocks
perfect can lock GPU clocks to their maximum values.
#include "perfect/gpu_clocks.hpp"
Result set_max_gpu_clocks(unsigned int idx): Set GPUidxclocks to their maximum reported values.Result reset_gpu_clocks(unsigned int idx): Unset GPUidxclocks.
CPU Cache
perfect can flush data from CPU caches. Unlike the other APIs, these do not return a Result because they do not fail.
#include "perfect/cpu_cache.hpp"
void flush_all(void *p, const size_t n): Flush all cache lines starting atpfornbytes.
Changelog
- v0.3.0
- Add filesystem cache interface
- v0.2.0
- add GPU monitoring
- Make CUDA optional
- v0.1.0
- cache control
- Intel P-State control
- linux governor control
- POWER cpufreq control
- Nvidia GPU boost control
- Nvidia GPU clock control
Wish List
- only monitor certain GPUs
- hyperthreading interface
- ASLR interface
- process priority interface
- A wrapper utility
- disable hyperthreading
- reserve cores
- set process priority
- disable ASLR
Related
- LLVM benchmarking instructions covering ASLR, Linux governor, cpuset shielding, SMT, and Intel turbo.
- easyperf.net blog post discussing ACPI/Intel turbo, SMT, Linux governor, CPU affinity, process priority, file system caches, and ASLR.
- temci benchmarking tool for cpu sheilding and disabling hyperthreading, among other things.
- perflock tool for locking CPU frequency scaling domains