20 Commits

Author SHA1 Message Date
Carl Pearson
fabfecd306 Update README.md
Some checks failed
CI / build_cuda10-1 (push) Failing after 10s
CI / build (push) Failing after 2s
2019-10-02 07:34:58 -05:00
Carl Pearson
4c0eabed89 add a tool to fix broken cpusets 2019-10-01 14:48:00 -05:00
Carl Pearson
46ca4d00ef perfect-cli cleans up on SIGINT, fixed a problem where cpu_set would silently fail 2019-10-01 14:31:36 -05:00
Carl Pearson
bbda6e1262 add interface for scheduling priority 2019-10-01 06:55:50 -05:00
Carl Pearson
343b2b35ca remove test from actions on CUDA job 2019-09-30 15:08:08 -05:00
Carl Pearson
c28e7b0945 add -h --help flag 2019-09-30 13:23:25 -05:00
Carl Pearson
46aa8c85ac run build/tools/perfect-cli -h in test step 2019-09-30 13:07:19 -05:00
Carl Pearson
7b6332c90e add test -h to binary 2019-09-30 12:07:29 -05:00
Carl Pearson
cc92923509 drop fs caches before each iteration 2019-09-30 12:04:52 -05:00
Carl Pearson
09e8757f72 . 2019-09-30 11:56:08 -05:00
Carl Pearson
1695ebb8ea Add -n flag, change --no-aslr to --aslr, add --stdout and --stderr, chown outputs when run with sudo 2019-09-30 11:51:04 -05:00
Carl Pearson
158bffa61f always change CPU turbo state 2019-09-26 12:30:10 -05:00
Carl Pearson
057fec7411 --no-cpu-turbo -> --cpu-turbo 2019-09-26 12:24:37 -05:00
Carl Pearson
a8d83417e8 add drop fs caches to tools/perfect-cli 2019-09-26 11:02:53 -05:00
Carl Pearson
1b3cf604a8 OsPerfState saves for all CPUs 2019-09-26 10:58:01 -05:00
Carl Pearson
d576ac099d add tools/perfect-cli 2019-09-26 10:37:26 -05:00
Carl Pearson
aff90d408e add NO_TASK result 2019-09-26 10:37:14 -05:00
Carl Pearson
6ace6932a7 simplify addrs 2019-09-26 08:56:46 -05:00
Carl Pearson
33243fe3bb add some discussion of ASLR tools 2019-09-25 15:49:20 -05:00
Carl Pearson
64eb67cc2d add tools/addrs 2019-09-25 15:45:18 -05:00
18 changed files with 9426 additions and 119 deletions

View File

@@ -38,6 +38,7 @@ jobs:
g++ --version
nvcc --version
make VERBOSE=1
build:
runs-on: ubuntu-latest
steps:
@@ -61,3 +62,6 @@ jobs:
cd build
g++ --version
make VERBOSE=1
- name: test
run: |
build/tools/perfect-cli -h

124
README.md
View File

@@ -17,6 +17,7 @@ CPU/GPU Performance control library for benchmarking on Linux, x86, POWER, and N
- [x] CUDA not required (GPU functions will not be compiled)
- [x] Flush file system caches (linux)
- [x] Disable ASLR (linux)
- [x] process priority interface (linux)
## Contributors
* [Carl Pearson](https://cwpearson.github.io)
@@ -59,7 +60,87 @@ If you don't have CUDA, then you could just do
g++ code_using_perfect.cpp -I perfect/include
```
## Usage
## Tools Usage
### tools/perfect-cli
`perfect` provides some useful tools on Linux:
```
$ tools/perfect-cli -h
SYNOPSIS
./tools/perfect-cli --no-mod [-n <INT>] -- <cmd>...
./tools/perfect-cli ([-u <INT>] | [-s <INT>]) [--no-drop-cache] [--no-max-perf] [--aslr]
[--cpu-turbo] [--stdout <PATH>] [--stderr <PATH>] [-n <INT>] -- <cmd>...
OPTIONS
--no-mod don't control performance
-u number of unshielded CPUs
-s number of shielded CPUs
--no-drop-cache do not drop filesystem caches
--no-max-perf do not max os perf
--aslr enable ASLR
--cpu-turbo enable CPU turbo
--stdout redirect child stdout
--stderr redirect child stderr
-n run multiple times
```
The basic usage is `tools/perfect-cli -- my-exe`, which will attempt to configure the system for repeatable performance before executing `my-exe`, and then restore the system to the original performance state before exiting.
Most modifications require elevated privileges.
The default behavior is to:
* disable ASLR
* set CPU performance to maximum
* disable CPU turbo
* drop filesystem caches before each iteration
Some options (all should provided before the `--` option):
* `--no-mod` flag will cause `perfect-cli` to not modify the system performance state
* `-n INT` will run the requested program `INT` times.
* `--stderr`/`--stdout` will redirect the program-under-test's stderr and stdout to the provided paths.
* `-s`/`-u`: set the number of shielded /unshielded CPUs. The program-under-test will run on the shielded CPUs. All other tasks will run on the unshielded CPUs.
A common invocation might look like:
```
sudo tools/perfect-cli -n 5 --stderr=run.err --stdout=run.out -- ./my-benchmark
```
This will disable ASLR, set CPU performance to maximum, disable CPU turbo, and then run `./my-benchmark` 5 times after dropping the filesystem cache before each run, redirecting stdout/stderr of ./my-benchmark to `run.out`/`run.err`.
The owner of `run.out` and `run.err` will be set to whichever user called `sudo`.
### tools/addr
Print the address of `main`, a stack variable, and a heap variable.
Useful for demoing ASLR.
### tools/no-aslr
Disable ASLR on the provided execution.
With ASLR, addresses are different with each invocation
```
$ tools/addr
main: 94685074364704
stack: 140734279743492
heap: 94685084978800
$ tools/addr
main: 93891046344992
stack: 140722671706708
heap: 93891068624496
```
Without ASLR, addresses are the same in each invocation
```
$ tools/no-aslr tools/addrs
main: 93824992233760
stack: 140737488347460
heap: 93824994414192
$ tools/no-aslr tools/addrs
main: 93824992233760
stack: 140737488347460
heap: 93824994414192
```
## API Usage
The `perfect` functions all return a `perfect::Result`, which is defined in [include/perfect/result.hpp].
When things are working, it will be `perfect::Result::SUCCESS`.
@@ -70,7 +151,19 @@ perfect::CpuTurboState state;
PERFECT(perfect::get_cpu_turbo_state(&state));
```
## Monitoring
### High Priority
`perfect` can set high scheduling priority for a process
See [examples/high_priority.cpp](examples/high_priority.cpp)
```c++
#include "perfect/priority.hpp"
```
* `Result set_high_priority()`: set the highest possible scheduling priority for the calling process
### Monitoring
`perfect` can monitor and record GPU activity.
@@ -100,6 +193,7 @@ See [tools/no_aslr.cpp](tools/no_aslr.cpp)
* `Result get_aslr(AslrState &state)`: save the current ASLR state
* `Result set_aslr(const AslrState &state)`: set a previously-saved ASLR state
### Flush file system caches
`perfect` can drop various filesystem caches
@@ -111,7 +205,7 @@ See [tools/sync_drop_caches.cpp](tools/sync_drop_caches.cpp)
```
* `Result sync()`: flush filesystem caches to disk
* `Result drop_caches(DropCaches_t mode)`: remove file system caches
* `Result drop_caches(DropCaches_t mode = DropCaches_t(PAGECACHE | ENTRIES))`: remove file system caches
* `mode = PAGECACHE`: drop page caches
* `mode = ENTRIES`: drop dentries and inodes
* `mode = PAGECACHE | ENTRIES`: both
@@ -143,9 +237,9 @@ See [examples/os_perf.cpp](examples/os_perf.cpp).
#include "perfect/os_perf.hpp"
```
* `Result get_os_perf_state(OsPerfState *state, const int cpu)`: Save the current OS governor mode for CPU `cpu`.
* `Result get_os_perf_state(OsPerfState &state)`: Save the current OS governor mode for all CPUs.
* `Result os_perf_state_maximum(const int cpu)`: Set the OS governor to it's maximum performance mode.
* `Result set_os_perf_state(const int cpu, OsPerfState state)`: Restore a previously-saved OS governor mode.
* `Result set_os_perf_state(OsPerfState state)`: Restore a previously-saved OS governor mode.
### GPU Turbo
@@ -188,6 +282,7 @@ See [examples/cpu_cache.cpp](examples/cpu_cache.cpp).
* `void flush_all(void *p, const size_t n)`: Flush all cache lines starting at `p` for `n` bytes.
## Changelog
* v0.5.0
@@ -217,16 +312,17 @@ See [examples/cpu_cache.cpp](examples/cpu_cache.cpp).
- [ ] only monitor certain GPUs
- [ ] hyperthreading interface
- [ ] process priority interface
- [ ] A wrapper utility
- [ ] disable hyperthreading
- [ ] reserve cores
- [ ] set process priority
- [ ] disable ASLR
## Related
* [LLVM benchmarking instructions](https://llvm.org/docs/Benchmarking.html#linux) covering ASLR, Linux governor, cpuset shielding, SMT, and Intel turbo.
* [easyperf.net](https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux#2-disable-hyper-threading) blog post discussing ACPI/Intel turbo, SMT, Linux governor, CPU affinity, process priority, file system caches, and ASLR.
* [temci](https://github.com/parttimenerd/temci) benchmarking tool for cpu sheilding and disabling hyperthreading, among other things.
* [perflock](https://github.com/aclements/perflock) tool for locking CPU frequency scaling domains
* [easyperf.net blog post](https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux#2-disable-hyper-threading) discussing ACPI/Intel turbo, SMT, Linux governor, CPU affinity, process priority, file system caches, and ASLR.
* [parttimenerd/temci](https://github.com/parttimenerd/temci) benchmarking tool for cpu sheilding and disabling hyperthreading, among other things.
* [aclements/perflock](https://github.com/aclements/perflock) tool for locking CPU frequency scaling domains
* [lpechacek/cpuset](https://github.com/lpechacek/cpuset) python package/tool for managing CPU shielding
## Acks
* Uses [muellan/clipp](https://github.com/muellan/clipp) for cli option parsing.
* Uses [martinmoene/optional-lite](https://github.com/martinmoene/optional-lite).

View File

@@ -43,6 +43,9 @@ target_link_libraries(cpu-turbo perfect)
add_executable(os-perf os_perf.cpp)
target_link_libraries(os-perf perfect)
add_executable(high-priority high_priority.cpp)
target_link_libraries(high-priority perfect)
if(CMAKE_CUDA_COMPILER)
add_executable(gpu-clocks gpu_clocks.cu)
target_link_libraries(gpu-clocks perfect)

View File

@@ -0,0 +1,12 @@
#include <iostream>
#include "perfect/priority.hpp"
int main(void) {
perfect::init();
PERFECT(perfect::set_high_priority());
// do things with high process scheduling priority
}

View File

@@ -5,23 +5,20 @@
int main(void) {
perfect::init();
std::map<int, perfect::OsPerfState> states;
// os performance state for each cpu
perfect::OsPerfState state;
// store the current state
PERFECT(perfect::get_os_perf_state(state));
// max state for each cpu
for (auto cpu : perfect::cpus()) {
perfect::OsPerfState state;
perfect::Result result;
result = perfect::get_os_perf_state(&state, cpu);
if (perfect::Result::SUCCESS == result) {
states[cpu] = state;
}
perfect::os_perf_state_maximum(cpu);
PERFECT(perfect::os_perf_state_maximum(cpu));
}
// do things with all CPUs set to the maximum performancem mode by the OS
for (auto kv : states) {
int cpu = kv.first;
perfect::OsPerfState state = kv.second;
perfect::set_os_perf_state(cpu, state);
}
// restore original state
PERFECT(perfect::set_os_perf_state(state));
}

View File

@@ -12,18 +12,10 @@
#include <string>
#include <vector>
#include "detail/fs.hpp"
#include "init.hpp"
#include "result.hpp"
#define SUCCESS_OR_RETURN(stmt) \
{\
Result _ret; \
_ret = (stmt); \
if (_ret != Result::SUCCESS) {\
return _ret;\
}\
}
std::set<int> operator-(const std::set<int> &lhs, const std::set<int> &rhs) {
std::set<int> result;
for (auto e : lhs) {
@@ -34,6 +26,17 @@ std::set<int> operator-(const std::set<int> &lhs, const std::set<int> &rhs) {
return result;
}
// intersection
std::set<int> operator&(const std::set<int> &lhs, const std::set<int> &rhs) {
std::set<int> result;
for (auto e : lhs) {
if (1 == rhs.count(e)) {
result.insert(e);
}
}
return result;
}
std::string remove_space(const std::string &s) {
std::string result;
@@ -86,7 +89,6 @@ std::set<int> parse_token(const std::string &token) {
}
std::set<int> parse_cpuset(const std::string &s) {
// std::cerr << "parse_cpuset: parsing '" << s << "'\n";
std::set<int> result;
std::string token;
@@ -109,11 +111,12 @@ namespace perfect {
class CpuSet {
public:
std::string path_;
std::set<int> cpus_;
std::set<int> mems_;
CpuSet *parent_;
// make sure cpuset is initialized
CpuSet() : path_(""), parent_(nullptr) {}
CpuSet(const CpuSet &other) : path_(other.path_), parent_(other.parent_) {}
// make sure cpuset system is initialized
static Result init() {
// check for "nodev cpuset" in /proc/filesystems
@@ -148,8 +151,8 @@ public:
return Result::SUCCESS;
}
case EPERM: {
// std::cerr << "EPERM in mount: " << strerror(errno) << "\n";
return Result::NO_PERMISSION;
// std::cerr << "EPERM in mount: " << strerror(errno) << "\n";
return Result::NO_PERMISSION;
}
case ENOENT:
case EROFS:
@@ -162,23 +165,24 @@ public:
return Result::SUCCESS;
}
std::string get_raw_cpus() {
std::ifstream is(path_ + "/cpuset.cpus");
std::string get_raw_cpus() const {
std::string path = path_ + "/cpuset.cpus";
std::ifstream is(path);
std::stringstream ss;
ss << is.rdbuf();
return remove_space(ss.str());
}
std::string get_raw_mems() {
std::string get_raw_mems() const {
std::ifstream is(path_ + "/cpuset.mems");
std::stringstream ss;
ss << is.rdbuf();
return remove_space(ss.str());
}
std::set<int> get_cpus() { return parse_cpuset(get_raw_cpus()); }
std::set<int> get_cpus() const { return parse_cpuset(get_raw_cpus()); }
std::set<int> get_mems() { return parse_cpuset(get_raw_mems()); }
std::set<int> get_mems() const { return parse_cpuset(get_raw_mems()); }
// migrate the caller task from this cpu set to another
Result migrate_self_to(CpuSet &other) {
@@ -193,11 +197,12 @@ public:
std::string line;
while (std::getline(is, line)) {
line = remove_space(line);
if (std::to_string(self) == line) {
// std::cerr << "migrating self task " << line << " to " << other.path
// << "\n";
other.write_task(line);
return Result::SUCCESS;
// std::cerr << "migrating self task " << line << " to " << other.path_
// << "\n";
pid_t pid = std::stoi(line);
return other.write_task(pid);
}
}
return Result::NO_TASK;
@@ -205,46 +210,58 @@ public:
// migrate tasks in this cpu set to another
Result migrate_tasks_to(CpuSet &other) {
// other must have cpus and mems
auto s = other.get_cpus();
assert(!other.get_cpus().empty());
assert(!other.get_mems().empty());
// enable memory migration in other
SUCCESS_OR_RETURN(other.enable_memory_migration());
PERFECT_SUCCESS_OR_RETURN(other.enable_memory_migration());
// read this tasks and write each line to other.tasks
std::ifstream is(path_ + "/tasks");
std::string line;
while (std::getline(is, line)) {
// std::cerr << "migrating task " << line << " to " << other.path << "\n";
other.write_task(line);
pid_t pid = std::stoi(line);
// std::cerr << "migrating task " << pid << " to " << other.path_ << "\n";
Result result = other.write_task(pid);
if (Result::ERRNO_INVALID == result) {
// std::cerr << "task " << pid << " is unmovable\n";
} else {
PERFECT_SUCCESS_OR_RETURN(result);
}
}
return Result::SUCCESS;
}
Result enable_memory_migration() {
std::ofstream ofs(path_ + "/" + "cpuset.memory_migrate");
ofs << "1";
ofs.close();
if (ofs.fail()) {
switch (errno) {
case EACCES:
return Result::NO_PERMISSION;
case ENOENT:
return Result::NOT_SUPPORTED;
default:
return Result::UNKNOWN;
}
}
return Result::SUCCESS;
return detail::write_str(path_ + "/cpuset.memory_migrate", "1");
}
void write_task(const std::string &task) {
// write `task` to path/tasks
std::ofstream os(path_ + "/tasks");
os << task << "\n";
Result write_task(pid_t pid) {
return detail::write_str(path_ + "/tasks", std::to_string(pid) + "\n");
}
static Result get_affinity(std::set<int> &cpus, pid_t pid) {
cpu_set_t mask;
CPU_ZERO(&mask);
if (sched_getaffinity(pid, sizeof(mask), &mask)) {
return from_errno(errno);
}
cpus.clear();
for (int i = 0; i < CPU_SETSIZE; ++i) {
if
CPU_ISSET(i, &mask) { cpus.insert(i); }
}
return Result::SUCCESS;
}
// object representing the root CPU set
static Result get_root(CpuSet &root) {
SUCCESS_OR_RETURN(CpuSet::init());
PERFECT_SUCCESS_OR_RETURN(CpuSet::init());
root.path_ = "/dev/cpuset";
root.parent_ = nullptr;
return Result::SUCCESS;
@@ -256,7 +273,7 @@ public:
Result make_child(CpuSet &child, const std::string &name) {
if (mkdir((path_ + "/" + name).c_str(),
S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)) {
S_IRUSR | S_IWUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH)) {
switch (errno) {
case EEXIST: {
// okay
@@ -264,8 +281,6 @@ public:
}
case EACCES:
return Result::NO_PERMISSION;
case ENOENT:
case EROFS:
default:
return Result::UNKNOWN;
}
@@ -276,6 +291,8 @@ public:
return Result::SUCCESS;
}
std::vector<CpuSet> get_children() { assert(false && "unimplemented"); }
Result enable_cpu(const int cpu) {
std::set<int> cpus = get_cpus();
cpus.insert(cpu);
@@ -290,30 +307,28 @@ public:
return write_cpus(finalCpus);
}
// FIXME: check error
Result write_cpus(std::set<int> cpus) {
std::ofstream os(path_ + "/cpuset.cpus");
std::string str;
bool comma = false;
for (auto cpu : cpus) {
if (comma)
os << ",";
os << cpu << "-" << cpu;
str += ",";
str += std::to_string(cpu) + "-" + std::to_string(cpu);
comma = true;
}
return Result::SUCCESS;
return detail::write_str(path_ + "/cpuset.cpus", str);
}
// FIXME: check write
Result write_mems(std::set<int> mems) {
std::ofstream os(path_ + "/cpuset.mems");
std::string str;
bool comma = false;
for (auto mem : mems) {
if (comma)
os << ",";
os << mem << "-" << mem;
str += ",";
str += std::to_string(mem) + "-" + std::to_string(mem);
comma = true;
}
return Result::SUCCESS;
return detail::write_str(path_ + "/cpuset.mems", str);
}
Result enable_mem(const int mem) {
@@ -331,31 +346,40 @@ public:
}
Result destroy() {
// already destroyed
if (!detail::path_exists(path_)) {
return Result::SUCCESS;
}
// remove all child cpu sets
// move all attached processes back to parent
assert(parent_);
migrate_tasks_to(*parent_);
assert(parent_ && "should not call destroy on root cpuset");
PERFECT_SUCCESS_OR_RETURN(migrate_tasks_to(*parent_));
// remove with rmdir
Result result = Result::UNKNOWN;
if (rmdir(path_.c_str())) {
switch (errno) {
case ENOENT:
// already gone
result = Result::SUCCESS;
break;
default:
std::cerr << "unhandled error in rmdir: " << strerror(errno) << "\n";
return Result::UNKNOWN;
result = Result::UNKNOWN;
}
}
path_ = "";
return Result::SUCCESS;
return result;
}
};
std::ostream &operator<<(std::ostream &s, const CpuSet &c) {
s << c.path_;
return s;
}
std::ostream &operator<<(std::ostream &s, const CpuSet &c) {
s << c.path_;
return s;
}
} // namespace perfect

View File

@@ -1,5 +1,6 @@
#pragma once
#include <cstring>
#include <fstream>
#include <string>
@@ -32,15 +33,20 @@ Result write_str(const std::string &path, const std::string &val) {
if (ofs.fail()) {
switch (errno) {
case EACCES:
std::cerr << "EACCES when writing to " << path << "\n";
// std::cerr << "EACCES when writing to " << path << "\n";
return Result::NO_PERMISSION;
case EPERM:
std::cerr << "EPERM when writing to " << path << "\n";
// std::cerr << "EPERM when writing to " << path << "\n";
return Result::NO_PERMISSION;
case ENOENT:
std::cerr << "ENOENT when writing to " << path << "\n";
// std::cerr << "ENOENT when writing to " << path << "\n";
return Result::NOT_SUPPORTED;
case EINVAL:
// std::cerr << "EINVAL when writing to " << path << "\n";
return Result::ERRNO_INVALID;
default:
std::cerr << strerror(errno) << " when writing " << val << " to " << path
<< "\n";
return Result::UNKNOWN;
}
}

View File

@@ -13,6 +13,8 @@
#include <sys/types.h>
#include <unistd.h>
#include <sys/personality.h>
#include <sys/time.h>
#include <sys/resource.h>
#include "perfect/result.hpp"
@@ -107,6 +109,26 @@ Result set_personality(const int persona) {
}
return Result::SUCCESS;
}
// give the calling process the highest priority
Result set_high_priority() {
if (setpriority(PRIO_PROCESS, 0, -20)) {
return from_errno(errno);
}
return Result::SUCCESS;
}
// disable all but one SMT thread for all CPUs the calling process can run on
Result disable_smt() {
return Result::NOT_SUPPORTED;
}
// enable SMT for all CPUs the calling process can run on
Result enable_smt() {
return Result::NOT_SUPPORTED;
}
} // namespace detail
} // namespace perfect

View File

@@ -24,7 +24,7 @@ Result sync() {
return Result::SUCCESS;
}
Result drop_caches(const DropCaches_t mode) {
Result drop_caches(const DropCaches_t mode = DropCaches_t(PAGECACHE | ENTRIES)) {
using detail::write_str;
const std::string path = "/proc/sys/vm/drop_caches";
if (mode & PAGECACHE & ENTRIES) {

View File

@@ -3,6 +3,7 @@
#include <vector>
#include <string>
#include <cassert>
#include <map>
#ifdef __linux__
#include "detail/os/linux.hpp"
@@ -17,19 +18,23 @@ namespace perfect {
struct OsPerfState {
#ifdef __linux__
std::string governor;
std::map<int, std::string> governors;
#else
#error "unsupported platform"
#endif
};
Result get_os_perf_state(OsPerfState *state, const int cpu) {
assert(state);
Result get_os_perf_state(OsPerfState &state) {
#ifdef __linux__
return get_governor(state->governor, cpu);
for (auto cpu : cpus()) {
std::string gov;
PERFECT_SUCCESS_OR_RETURN(get_governor(gov, cpu));
state.governors[cpu] = gov;
}
#else
#error "unsupported platform"
#endif
return Result::SUCCESS;
}
Result os_perf_state_maximum(const int cpu) {
@@ -48,13 +53,15 @@ Result os_perf_state_minimum(const int cpu) {
#endif
}
Result set_os_perf_state(const int cpu, OsPerfState state) {
#ifdef __linux__
return set_governor(cpu, state.governor);
Result set_os_perf_state(OsPerfState state) {
#ifdef __linux__
for (auto kv : state.governors) {
PERFECT_SUCCESS_OR_RETURN(set_governor(kv.first, kv.second));
}
#else
#error "unsupported platform"
#endif
return Result::SUCCESS;
}
};

View File

@@ -0,0 +1,15 @@
#pragma once
#ifdef __linux__
#include "detail/os/linux.hpp"
#else
#error "unsupported platform"
#endif
#include "init.hpp"
namespace perfect {
Result set_high_priority() {
return detail::set_high_priority();
}
}

View File

@@ -12,11 +12,17 @@
#include <nvml.h>
#endif
#ifdef __linux__
#include <cerrno>
#endif
namespace perfect {
enum class Result {
NO_PERMISSION,
NOT_SUPPORTED,
NO_TASK,
ERRNO_INVALID,
NVML_NO_PERMISSION,
NVML_NOT_SUPPORTED,
NVML_UNINITIALIZED,
@@ -38,6 +44,23 @@ Result from_nvml(nvmlReturn_t nvml) {
case NVML_ERROR_INVALID_ARGUMENT:
case NVML_ERROR_GPU_IS_LOST:
case NVML_ERROR_UNKNOWN:
case NVML_ERROR_ALREADY_INITIALIZED:
case NVML_ERROR_NOT_FOUND:
case NVML_ERROR_INSUFFICIENT_SIZE:
case NVML_ERROR_INSUFFICIENT_POWER:
case NVML_ERROR_DRIVER_NOT_LOADED:
case NVML_ERROR_TIMEOUT:
case NVML_ERROR_IRQ_ISSUE:
case NVML_ERROR_LIBRARY_NOT_FOUND:
case NVML_ERROR_FUNCTION_NOT_FOUND:
case NVML_ERROR_CORRUPTED_INFOROM:
case NVML_ERROR_RESET_REQUIRED:
case NVML_ERROR_OPERATING_SYSTEM:
case NVML_ERROR_LIB_RM_VERSION_MISMATCH:
case NVML_ERROR_IN_USE:
case NVML_ERROR_MEMORY:
case NVML_ERROR_NO_DATA:
case NVML_ERROR_VGPU_ECC_NOT_SUPPORTED:
default:
assert(0 && "unhandled nvmlReturn_t");
}
@@ -45,12 +68,28 @@ Result from_nvml(nvmlReturn_t nvml) {
}
#endif
#ifdef __linux__
Result from_errno(int err) {
switch (err) {
default:
assert(0 && "unhandled errno");
}
return Result::UNKNOWN;
}
#endif
const char *get_string(const Result &result) {
switch (result) {
case Result::SUCCESS:
return "success";
case Result::NO_PERMISSION:
return "no permission";
case Result::NOT_SUPPORTED:
return "unsupported operation";
case Result::NO_TASK:
return "no such task";
case Result::ERRNO_INVALID:
return "errno EINVAL";
case Result::UNKNOWN:
return "unknown error";
case Result::NVML_NOT_SUPPORTED:
@@ -59,8 +98,7 @@ const char *get_string(const Result &result) {
return "nvidia-ml returned no permission";
case Result::NVML_UNINITIALIZED:
return "nvidia-ml returned uninitialized";
case Result::NOT_SUPPORTED:
return "unsupported operation";
default:
assert(0 && "unexpected perfect::Result");
}
@@ -81,11 +119,11 @@ inline void check(Result result, const char *file, const int line) {
#define PERFECT(stmt) check(stmt, __FILE__, __LINE__);
#define PERFECT_SUCCESS_OR_RETURN(stmt) \
{\
Result _ret; \
_ret = (stmt); \
if (_ret != Result::SUCCESS) {\
return _ret;\
}\
}
#define PERFECT_SUCCESS_OR_RETURN(stmt) \
{ \
Result _ret; \
_ret = (stmt); \
if (_ret != Result::SUCCESS) { \
return _ret; \
} \
}

View File

@@ -52,6 +52,12 @@ target_link_libraries(max-os-perf perfect)
add_executable(min-os-perf min_os_perf.cpp)
target_link_libraries(min-os-perf perfect)
add_executable(addrs addrs.cpp)
add_executable(perfect-cli perfect.cpp)
target_link_libraries(perfect-cli perfect)
target_include_directories(perfect-cli PUBLIC thirdparty)
## OpenMP
find_package(OpenMP)
if (OpenMP_FOUND)

9
tools/addrs.cpp Normal file
View File

@@ -0,0 +1,9 @@
#include <iostream>
int main(void) {
int *a = new int;
std::cout << "main: " << uintptr_t(main) << "\n";
std::cout << "stack: " << uintptr_t(&a) << "\n";
std::cout << "heap: " << uintptr_t(a) << "\n";
delete a;
}

14
tools/migrate-to-cpuset.sh Executable file
View File

@@ -0,0 +1,14 @@
#! /bin/bash
while read i; do
echo $i;
echo $i > /dev/cpuset/tasks;
done < /dev/cpuset/unshielded/tasks
while read i; do
echo $i;
echo $i > /dev/cpuset/tasks;
done < /dev/cpuset/shielded/tasks
rmdir /dev/cpuset/shielded
rmdir /dev/cpuset/unshielded

446
tools/perfect.cpp Normal file
View File

@@ -0,0 +1,446 @@
#include <cassert>
#include <cerrno>
#include <chrono>
#include <functional>
#include <iostream>
#include <string>
#include <thread>
#include <vector>
#ifdef __linux__
#include <fcntl.h>
#include <pwd.h>
#include <signal.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#else
#error "unsupported platform"
#endif
#include "clipp/clipp.h"
#include "nonstd/optional.hpp"
#include "perfect/aslr.hpp"
#include "perfect/cpu_set.hpp"
#include "perfect/cpu_turbo.hpp"
#include "perfect/detail/os/linux.hpp"
#include "perfect/drop_caches.hpp"
#include "perfect/os_perf.hpp"
#include "perfect/priority.hpp"
typedef std::function<perfect::Result()> CleanupFn;
std::vector<CleanupFn> cleanups;
// restore the system state to how we found it
void cleanup(int dummy) {
(void)dummy;
std::cerr << "caught ctrl-c\n";
// unregister our handler
signal(SIGINT, SIG_DFL);
std::cerr << "cleaning up\n";
std::cerr << "ctrl-c again to quit\n";
for (auto f : cleanups) {
perfect::Result result = f();
}
exit(EXIT_FAILURE);
}
// argv should be null-terminated
// outf and errf are file descriptors to where stdout and stderr should be
// redirected write stdout to out and stderr to err, if not null
int fork_child(char *const *argv, int outf, int errf) {
pid_t pid;
int status;
pid = fork();
if (pid == -1) {
// pid == -1 means error occured
std::cerr << "can't fork, error occured\n";
return EXIT_FAILURE;
} else if (pid == 0) {
// in the child process
if (outf > 0) {
std::cerr << "redirecting child stdout to file\n";
if (dup2(outf, 1)) {
std::cerr << "dup2 error: " << strerror(errno) << "\n";
/*
EBADF
oldfd isn't an open file descriptor, or newfd is out of the allowed
range for file descriptors. EBUSY (Linux only) This may be returned by
dup2() or dup3() during a race condition with open(2) and dup(). EINTR The
dup2() or dup3() call was interrupted by a signal; see signal(7). EINVAL
(dup3()) flags contain an invalid value. Or, oldfd was equal to newfd.
EMFILE
The process already has the maximum number of file descriptors open and
tried to open a new one.
*/
}
if (close(outf)) {
/*
EBADF
The fildes argument is not a valid file descriptor.
EINTR
The close() function was interrupted by a signal.
The close() function may fail if:
EIO
An I/O error occurred while reading from or writing to the file
system.
*/
}
}
if (errf > 0) {
std::cerr << "redirecting child stderr to file\n";
if (dup2(errf, 2)) {
std::cerr << "dup2 error: " << strerror(errno) << "\n";
/*
EBADF
oldfd isn't an open file descriptor, or newfd is out of the allowed
range for file descriptors. EBUSY (Linux only) This may be returned by
dup2() or dup3() during a race condition with open(2) and dup(). EINTR The
dup2() or dup3() call was interrupted by a signal; see signal(7). EINVAL
(dup3()) flags contain an invalid value. Or, oldfd was equal to newfd.
EMFILE
The process already has the maximum number of file descriptors open and
tried to open a new one.
*/
}
if (close(errf)) {
/*
EBADF
The fildes argument is not a valid file descriptor.
EINTR
The close() function was interrupted by a signal.
The close() function may fail if:
EIO
An I/O error occurred while reading from or writing to the file system.
*/
}
}
// the execv() only return if error occured.
// The return value is -1
return execvp(argv[0], argv);
} else {
// parent process
if (waitpid(pid, &status, 0) > 0) {
if (WIFEXITED(status) && !WEXITSTATUS(status)) {
// success
return status;
}
else if (WIFEXITED(status) && WEXITSTATUS(status)) {
if (WEXITSTATUS(status) == 127) {
std::cerr << "execv failed\n";
return status;
} else {
std::cerr << "program terminated normally, but returned a non-zero "
"status\n";
return status;
}
} else {
printf("program didn't terminate normally\n");
return status;
}
} else {
printf("waitpid() failed\n");
return EXIT_FAILURE;
}
return 0;
}
}
int main(int argc, char **argv) {
signal(SIGINT, cleanup);
using namespace clipp;
size_t numUnshielded = 0;
size_t numShielded = 0;
bool aslr = false;
nonstd::optional<bool> cpuTurbo = false;
nonstd::optional<bool> maxOsPerf = true;
bool dropCaches = true;
bool highPriority = true;
std::vector<std::string> program;
std::string stdoutPath;
std::string stderrPath;
int iters = 1;
int sleepMs = 1000;
bool help = false;
auto helpMode = option("-h", "--help").set(help).doc("show help");
auto shieldGroup = ((option("-u").doc("number of unshielded CPUs") &
value("INT", numUnshielded)) |
(option("-s").doc("number of shielded CPUs") &
value("INT", numShielded)));
auto noModMode = (option("--no-mod")
.doc("don't control performance")
.set(aslr, true)
.call([&]() { cpuTurbo = nonstd::nullopt; })
.call([&]() { maxOsPerf = nonstd::nullopt; })
.set(dropCaches, false)
.set(highPriority, false));
auto modMode = (shieldGroup,
option("--no-drop-cache")
.set(dropCaches, false)
.doc("do not drop filesystem caches"),
option("--no-max-perf").doc("do not max os perf").call([&]() {
maxOsPerf = false;
}),
option("--aslr").set(aslr, true).doc("enable ASLR"),
option("--no-priority")
.set(highPriority, false)
.doc("don't set high priority"),
option("--cpu-turbo").doc("enable CPU turbo").call([&]() {
cpuTurbo = true;
}),
(option("--stdout").doc("redirect child stdout") &
value("PATH", stdoutPath)),
(option("--stderr").doc("redirect child stderr") &
value("PATH", stderrPath)));
auto cli =
helpMode |
((noModMode | modMode),
(option("--sleep-ms").doc("sleep before run") & value("INT", sleepMs)),
(option("-n").doc("run multiple times") & value("INT", iters)), helpMode,
// run everything after "--"
required("--") & greedy(values("cmd", program))
);
if (!parse(argc, argv, cli)) {
auto fmt = doc_formatting{}.doc_column(31);
std::cout << make_man_page(cli, argv[0], fmt);
return -1;
}
if (help) {
auto fmt = doc_formatting{}.doc_column(31);
std::cout << make_man_page(cli, argv[0], fmt);
return 0;
}
// open the redirect files, if needed
int errf = 0;
int outf = 0;
if (!stderrPath.empty()) {
std::cerr << "open " << stderrPath << "\n";
errf = open(stderrPath.c_str(), O_WRONLY | O_CREAT,
S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
if (-1 == errf) {
std::cerr << "error while opening " << stderrPath << ": "
<< strerror(errno) << "\n";
}
}
if (!stdoutPath.empty()) {
outf = open(stdoutPath.c_str(), O_WRONLY | O_CREAT,
S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
if (-1 == outf) {
std::cerr << "error while opening " << stdoutPath << ": "
<< strerror(errno) << "\n";
}
}
// if called with sudo, chown the files to whoever called sudo
const char *sudoUser = std::getenv("SUDO_USER");
if (sudoUser) {
std::cerr << "called with sudo by " << sudoUser << "\n";
uid_t uid;
gid_t gid;
struct passwd *pwd;
pwd = getpwnam(sudoUser);
if (pwd == NULL) {
// die("Failed to get uid");
}
uid = pwd->pw_uid;
gid = pwd->pw_gid;
if (!stdoutPath.empty()) {
if (chown(stdoutPath.c_str(), uid, gid) == -1) {
// die("chown fail");
}
}
if (!stderrPath.empty()) {
if (chown(stderrPath.c_str(), uid, gid) == -1) {
// die("chown fail");
}
}
}
// build the program arguments
std::vector<char *> args;
for (auto &c : program) {
args.push_back((char *)c.c_str());
}
args.push_back(nullptr);
// init the perfect library
PERFECT(perfect::init());
auto cpus = perfect::cpus();
if (0 < numShielded) {
numUnshielded = cpus.size() - numShielded;
} else if (0 < numUnshielded) {
numShielded = cpus.size() - numUnshielded;
}
// handle CPU shielding
perfect::CpuSet root, shielded, unshielded;
if (numShielded) {
std::cerr << "shielding " << numShielded << " cpus\n";
PERFECT(perfect::CpuSet::get_root(root));
PERFECT(root.make_child(shielded, "shielded"));
PERFECT(root.make_child(unshielded, "unshielded"));
std::cerr << "enable memory\n";
PERFECT(shielded.enable_mem(0));
PERFECT(unshielded.enable_mem(0));
std::cerr << "enable cpus\n";
size_t i = 0;
for (; i < cpus.size() - numShielded; ++i) {
std::cerr << "unshield cpu " << cpus[i] << "\n";
unshielded.enable_cpu(cpus[i]);
}
for (; i < cpus.size(); ++i) {
std::cerr << "shield cpu " << cpus[i] << "\n";
shielded.enable_cpu(cpus[i]);
}
std::cerr << "migrate self\n";
PERFECT(root.migrate_self_to(shielded));
std::cerr << "migrate other (1/2)\n";
PERFECT(root.migrate_tasks_to(unshielded));
// some tasks may have been spawned by unmigrated tasks while we migrated
std::cerr << "migrate other (2/2)\n";
PERFECT(root.migrate_tasks_to(unshielded));
cleanups.push_back(CleanupFn([&] {
std::cerr << "cleanup: shielded cpu set\n";
shielded.destroy();
std::cerr << "cleanup: unshielded cpu set\n";
unshielded.destroy();
return perfect::Result::SUCCESS;
}));
}
// handle aslr
if (!aslr) {
std::cerr << "disable ASLR for this process\n";
PERFECT(perfect::disable_aslr());
}
// handle CPU turbo
perfect::CpuTurboState cpuTurboState;
if (cpuTurbo.has_value()) {
PERFECT(perfect::get_cpu_turbo_state(&cpuTurboState));
if (false == cpuTurbo) {
std::cerr << "disabling cpu turbo\n";
PERFECT(perfect::disable_cpu_turbo());
} else {
std::cerr << "enabling cpu turbo\n";
PERFECT(perfect::enable_cpu_turbo());
}
cleanups.push_back(CleanupFn([&] {
std::cerr << "cleanup: restore CPU turbo state\n";
return perfect::set_cpu_turbo_state(cpuTurboState);
}));
}
// handle governor
perfect::OsPerfState osPerfState;
if (maxOsPerf.has_value()) {
PERFECT(perfect::get_os_perf_state(osPerfState));
if (true == maxOsPerf) {
std::cerr << "set max performance state\n";
for (auto cpu : perfect::cpus()) {
PERFECT(perfect::os_perf_state_maximum(cpu));
}
}
cleanups.push_back(CleanupFn([&] {
std::cerr << "cleanup: os governor\n";
return perfect::set_os_perf_state(osPerfState);
}));
}
if (highPriority) {
std::cerr << "set high priority\n";
PERFECT(perfect::set_high_priority());
}
// parent should return
for (int runIter = 0; runIter < iters; ++runIter) {
// drop filesystem caches before each run
if (dropCaches) {
std::cerr << "clearing file system cache\n";
PERFECT(perfect::drop_caches());
}
// sleep before each run
if (sleepMs) {
std::cerr << "sleep " << sleepMs << " ms before run\n";
std::this_thread::sleep_for(std::chrono::milliseconds(sleepMs));
}
std::cerr << "exec ";
for (size_t i = 0; i < args.size() - 1; ++i) {
std::cerr << args[i] << " ";
}
std::cerr << "\n";
int status = fork_child(args.data(), outf, errf);
if (0 != status) {
std::cerr << "did not terminate successfully\n";
}
std::cerr << "finished execution\n";
}
// clean up CpuSets (if needed)
if (numShielded) {
std::cerr << "clean up cpu sets\n";
shielded.destroy();
unshielded.destroy();
}
// restore original turbo state
if (cpuTurbo.has_value()) {
std::cerr << "restore CPU turbo\n";
PERFECT(perfect::set_cpu_turbo_state(cpuTurboState));
}
if (maxOsPerf.has_value()) {
std::cerr << "restore os performance state\n";
PERFECT(perfect::set_os_perf_state(osPerfState));
}
return 0;
}

7023
tools/thirdparty/clipp/clipp.h vendored Normal file

File diff suppressed because it is too large Load Diff

1585
tools/thirdparty/nonstd/optional.hpp vendored Normal file

File diff suppressed because it is too large Load Diff