Update README.md

add a tool to fix broken cpusets
perfect-cli cleans up on SIGINT, fixed a problem where cpu_set would silently fail
2019-10-02 07:34:58 -05:00 · 2019-10-01 14:48:00 -05:00 · 2019-10-01 14:31:36 -05:00 · 2019-10-01 06:55:50 -05:00 · 2019-09-30 15:08:08 -05:00 · 2019-09-30 13:23:25 -05:00
18 changed files with 9426 additions and 119 deletions
--- a/.github/workflows/ccpp.yml
+++ b/.github/workflows/ccpp.yml
@@ -38,6 +38,7 @@ jobs:
        g++ --version
        nvcc --version
        make VERBOSE=1
+
  build:
    runs-on: ubuntu-latest
    steps:
@@ -61,3 +62,6 @@ jobs:
        cd build
        g++ --version
        make VERBOSE=1
+    - name: test
+      run: |
+        build/tools/perfect-cli -h
--- a/README.md
+++ b/README.md
@@ -17,6 +17,7 @@ CPU/GPU Performance control library for benchmarking on Linux, x86, POWER, and N
 - [x] CUDA not required (GPU functions will not be compiled)
 - [x] Flush file system caches (linux)
 - [x] Disable ASLR (linux)
+- [x] process priority interface (linux)

 ## Contributors
 * [Carl Pearson](https://cwpearson.github.io)
@@ -59,7 +60,87 @@ If you don't have CUDA, then you could just do
 g++ code_using_perfect.cpp -I perfect/include
 ```

-## Usage
+## Tools Usage
+
+### tools/perfect-cli
+
+`perfect` provides some useful tools on Linux:
+
+```
+$ tools/perfect-cli -h
+SYNOPSIS
+        ./tools/perfect-cli --no-mod [-n <INT>] -- <cmd>...
+        ./tools/perfect-cli ([-u <INT>] | [-s <INT>]) [--no-drop-cache] [--no-max-perf] [--aslr]
+                            [--cpu-turbo] [--stdout <PATH>] [--stderr <PATH>] [-n <INT>] -- <cmd>...
+
+OPTIONS
+        --no-mod               don't control performance
+        -u                     number of unshielded CPUs
+        -s                     number of shielded CPUs
+        --no-drop-cache        do not drop filesystem caches
+        --no-max-perf          do not max os perf
+        --aslr                 enable ASLR
+        --cpu-turbo            enable CPU turbo
+        --stdout               redirect child stdout
+        --stderr               redirect child stderr
+        -n                     run multiple times
+```
+
+The basic usage is `tools/perfect-cli -- my-exe`, which will attempt to configure the system for repeatable performance before executing `my-exe`, and then restore the system to the original performance state before exiting.
+Most modifications require elevated privileges.
+The default behavior is to:
+* disable ASLR
+* set CPU performance to maximum
+* disable CPU turbo
+* drop filesystem caches before each iteration
+
+Some options (all should provided before the `--` option):
+* `--no-mod` flag will cause `perfect-cli` to not modify the system performance state
+* `-n INT` will run the requested program `INT` times.
+* `--stderr`/`--stdout` will redirect the program-under-test's stderr and stdout to the provided paths.
+* `-s`/`-u`: set the number of shielded /unshielded CPUs. The program-under-test will run on the shielded CPUs. All other tasks will run on the unshielded CPUs.
+
+A common invocation might look like:
+```
+sudo tools/perfect-cli -n 5 --stderr=run.err --stdout=run.out -- ./my-benchmark
+```
+This will disable ASLR, set CPU performance to maximum, disable CPU turbo, and then run `./my-benchmark` 5 times after dropping the filesystem cache before each run, redirecting stdout/stderr of ./my-benchmark to `run.out`/`run.err`.
+The owner of `run.out` and `run.err` will be set to whichever user called `sudo`.
+
+### tools/addr
+
+Print the address of `main`, a stack variable, and a heap variable.
+Useful for demoing ASLR.
+
+### tools/no-aslr
+
+Disable ASLR on the provided execution.
+
+With ASLR, addresses are different with each invocation
+```
+$ tools/addr
+main:  94685074364704
+stack: 140734279743492
+heap:  94685084978800
+$ tools/addr
+main:  93891046344992
+stack: 140722671706708
+heap:  93891068624496
+```
+
+Without ASLR, addresses are the same in each invocation
+```
+$ tools/no-aslr tools/addrs       
+main:  93824992233760
+stack: 140737488347460
+heap:  93824994414192
+$ tools/no-aslr tools/addrs       
+main:  93824992233760
+stack: 140737488347460
+heap:  93824994414192
+```
+
+## API Usage

 The `perfect` functions all return a `perfect::Result`, which is defined in [include/perfect/result.hpp].
 When things are working, it will be `perfect::Result::SUCCESS`.
@@ -70,7 +151,19 @@ perfect::CpuTurboState state;
 PERFECT(perfect::get_cpu_turbo_state(&state));
 ```

-## Monitoring
+### High Priority
+
+`perfect` can set high scheduling priority for a process
+
+See [examples/high_priority.cpp](examples/high_priority.cpp)
+
+```c++
+#include "perfect/priority.hpp"
+```
+
+* `Result set_high_priority()`: set the highest possible scheduling priority for the calling process
+
+### Monitoring

 `perfect` can monitor and record GPU activity.

@@ -100,6 +193,7 @@ See [tools/no_aslr.cpp](tools/no_aslr.cpp)
 * `Result get_aslr(AslrState &state)`: save the current ASLR state
 * `Result set_aslr(const AslrState &state)`: set a previously-saved ASLR state

+
 ### Flush file system caches

 `perfect` can drop various filesystem caches
@@ -111,7 +205,7 @@ See [tools/sync_drop_caches.cpp](tools/sync_drop_caches.cpp)
 ```

 * `Result sync()`: flush filesystem caches to disk
-* `Result drop_caches(DropCaches_t mode)`: remove file system caches
+* `Result drop_caches(DropCaches_t mode = DropCaches_t(PAGECACHE | ENTRIES))`: remove file system caches
  * `mode = PAGECACHE`: drop page caches
  * `mode = ENTRIES`: drop dentries and inodes
  * `mode = PAGECACHE | ENTRIES`: both
@@ -143,9 +237,9 @@ See [examples/os_perf.cpp](examples/os_perf.cpp).
 #include "perfect/os_perf.hpp"
 ```

-* `Result get_os_perf_state(OsPerfState *state, const int cpu)`: Save the current OS governor mode for CPU `cpu`.
+* `Result get_os_perf_state(OsPerfState &state)`: Save the current OS governor mode for all CPUs.
 * `Result os_perf_state_maximum(const int cpu)`: Set the OS governor to it's maximum performance mode.
-* `Result set_os_perf_state(const int cpu, OsPerfState state)`: Restore a previously-saved OS governor mode.
+* `Result set_os_perf_state(OsPerfState state)`: Restore a previously-saved OS governor mode.

 ### GPU Turbo

@@ -188,6 +282,7 @@ See [examples/cpu_cache.cpp](examples/cpu_cache.cpp).

 * `void flush_all(void *p, const size_t n)`: Flush all cache lines starting at `p` for `n` bytes.

+
 ## Changelog

 * v0.5.0
@@ -217,16 +312,17 @@ See [examples/cpu_cache.cpp](examples/cpu_cache.cpp).

 - [ ] only monitor certain GPUs
 - [ ] hyperthreading interface
- [ ] process priority interface
- [ ] A wrapper utility
-    - [ ] disable hyperthreading
-    - [ ] reserve cores 
-    - [ ] set process priority
-    - [ ] disable ASLR
+

 ## Related

 * [LLVM benchmarking instructions](https://llvm.org/docs/Benchmarking.html#linux) covering ASLR, Linux governor, cpuset shielding, SMT, and Intel turbo.
-* [easyperf.net](https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux#2-disable-hyper-threading) blog post discussing ACPI/Intel turbo, SMT, Linux governor, CPU affinity, process priority, file system caches, and ASLR. 
-* [temci](https://github.com/parttimenerd/temci) benchmarking tool for cpu sheilding and disabling hyperthreading, among other things.
-* [perflock](https://github.com/aclements/perflock) tool for locking CPU frequency scaling domains
+* [easyperf.net blog post](https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux#2-disable-hyper-threading) discussing ACPI/Intel turbo, SMT, Linux governor, CPU affinity, process priority, file system caches, and ASLR. 
+* [parttimenerd/temci](https://github.com/parttimenerd/temci) benchmarking tool for cpu sheilding and disabling hyperthreading, among other things.
+* [aclements/perflock](https://github.com/aclements/perflock) tool for locking CPU frequency scaling domains
+* [lpechacek/cpuset](https://github.com/lpechacek/cpuset) python package/tool for managing CPU shielding
+
+## Acks
+
+* Uses [muellan/clipp](https://github.com/muellan/clipp) for cli option parsing.
+* Uses [martinmoene/optional-lite](https://github.com/martinmoene/optional-lite).
--- a/examples/CMakeLists.txt
+++ b/examples/CMakeLists.txt
@@ -43,6 +43,9 @@ target_link_libraries(cpu-turbo perfect)
 add_executable(os-perf os_perf.cpp)
 target_link_libraries(os-perf perfect)

+add_executable(high-priority high_priority.cpp)
+target_link_libraries(high-priority perfect)
+
 if(CMAKE_CUDA_COMPILER)
    add_executable(gpu-clocks gpu_clocks.cu)
    target_link_libraries(gpu-clocks perfect)
--- a/examples/high_priority.cpp
+++ b/examples/high_priority.cpp
@@ -0,0 +1,12 @@
+#include <iostream>
+
+#include "perfect/priority.hpp"
+
+int main(void) {
+  perfect::init();
+
+  PERFECT(perfect::set_high_priority());
+
+  // do things with high process scheduling priority
+
+}
--- a/examples/os_perf.cpp
+++ b/examples/os_perf.cpp
@@ -5,23 +5,20 @@
 int main(void) {
  perfect::init();

-  std::map<int, perfect::OsPerfState> states;
+  // os performance state for each cpu
+  perfect::OsPerfState state;

+  // store the current state
+  PERFECT(perfect::get_os_perf_state(state));
+
+  // max state for each cpu
  for (auto cpu : perfect::cpus()) {
-    perfect::OsPerfState state;
-    perfect::Result result;
-    result = perfect::get_os_perf_state(&state, cpu);
-    if (perfect::Result::SUCCESS == result) {
-      states[cpu] = state;
-    }
-    perfect::os_perf_state_maximum(cpu);
+    PERFECT(perfect::os_perf_state_maximum(cpu));
  }

  // do things with all CPUs set to the maximum performancem mode by the OS

-  for (auto kv : states) {
-    int cpu = kv.first;
-    perfect::OsPerfState state = kv.second;
-    perfect::set_os_perf_state(cpu, state);
-  }
+  // restore original state
+  PERFECT(perfect::set_os_perf_state(state));
+
 }
--- a/include/perfect/cpu_set.hpp
+++ b/include/perfect/cpu_set.hpp
@@ -12,18 +12,10 @@
 #include <string>
 #include <vector>

+#include "detail/fs.hpp"
 #include "init.hpp"
 #include "result.hpp"

-#define SUCCESS_OR_RETURN(stmt) \
-{\
-  Result _ret; \
-  _ret = (stmt); \
-if (_ret != Result::SUCCESS) {\
-  return _ret;\
-}\
-}
-
 std::set<int> operator-(const std::set<int> &lhs, const std::set<int> &rhs) {
  std::set<int> result;
  for (auto e : lhs) {
@@ -34,6 +26,17 @@ std::set<int> operator-(const std::set<int> &lhs, const std::set<int> &rhs) {
  return result;
 }

+// intersection
+std::set<int> operator&(const std::set<int> &lhs, const std::set<int> &rhs) {
+  std::set<int> result;
+  for (auto e : lhs) {
+    if (1 == rhs.count(e)) {
+      result.insert(e);
+    }
+  }
+  return result;
+}
+
 std::string remove_space(const std::string &s) {
  std::string result;

@@ -86,7 +89,6 @@ std::set<int> parse_token(const std::string &token) {
 }

 std::set<int> parse_cpuset(const std::string &s) {
-  // std::cerr << "parse_cpuset: parsing '" << s << "'\n";
  std::set<int> result;

  std::string token;
@@ -109,11 +111,12 @@ namespace perfect {
 class CpuSet {
 public:
  std::string path_;
-  std::set<int> cpus_;
-  std::set<int> mems_;
  CpuSet *parent_;

-  // make sure cpuset is initialized
+  CpuSet() : path_(""), parent_(nullptr) {}
+  CpuSet(const CpuSet &other) : path_(other.path_), parent_(other.parent_) {}
+
+  // make sure cpuset system is initialized
  static Result init() {

    // check for "nodev cpuset" in /proc/filesystems
@@ -148,8 +151,8 @@ public:
          return Result::SUCCESS;
        }
        case EPERM: {
-        // std::cerr << "EPERM in mount: " << strerror(errno) << "\n";
-        return Result::NO_PERMISSION;
+          // std::cerr << "EPERM in mount: " << strerror(errno) << "\n";
+          return Result::NO_PERMISSION;
        }
        case ENOENT:
        case EROFS:
@@ -162,23 +165,24 @@ public:
    return Result::SUCCESS;
  }

-  std::string get_raw_cpus() {
-    std::ifstream is(path_ + "/cpuset.cpus");
+  std::string get_raw_cpus() const {
+    std::string path = path_ + "/cpuset.cpus";
+    std::ifstream is(path);
    std::stringstream ss;
    ss << is.rdbuf();
    return remove_space(ss.str());
  }

-  std::string get_raw_mems() {
+  std::string get_raw_mems() const {
    std::ifstream is(path_ + "/cpuset.mems");
    std::stringstream ss;
    ss << is.rdbuf();
    return remove_space(ss.str());
  }

-  std::set<int> get_cpus() { return parse_cpuset(get_raw_cpus()); }
+  std::set<int> get_cpus() const { return parse_cpuset(get_raw_cpus()); }

-  std::set<int> get_mems() { return parse_cpuset(get_raw_mems()); }
+  std::set<int> get_mems() const { return parse_cpuset(get_raw_mems()); }

  // migrate the caller task from this cpu set to another
  Result migrate_self_to(CpuSet &other) {
@@ -193,11 +197,12 @@ public:
    std::string line;
    while (std::getline(is, line)) {
      line = remove_space(line);
+
      if (std::to_string(self) == line) {
-        // std::cerr << "migrating self task " << line << " to " << other.path
-        //           << "\n";
-        other.write_task(line);
-        return Result::SUCCESS;
+        // std::cerr << "migrating self task " << line << " to " << other.path_
+        //            << "\n";
+        pid_t pid = std::stoi(line);
+        return other.write_task(pid);
      }
    }
    return Result::NO_TASK;
@@ -205,46 +210,58 @@ public:

  // migrate tasks in this cpu set to another
  Result migrate_tasks_to(CpuSet &other) {
+    // other must have cpus and mems
+    auto s = other.get_cpus();
+
+    assert(!other.get_cpus().empty());
+    assert(!other.get_mems().empty());
+
    // enable memory migration in other
-    SUCCESS_OR_RETURN(other.enable_memory_migration());
+    PERFECT_SUCCESS_OR_RETURN(other.enable_memory_migration());

    // read this tasks and write each line to other.tasks
    std::ifstream is(path_ + "/tasks");
    std::string line;
    while (std::getline(is, line)) {
-      // std::cerr << "migrating task " << line << " to " << other.path << "\n";
-      other.write_task(line);
+      pid_t pid = std::stoi(line);
+      // std::cerr << "migrating task " << pid << " to " << other.path_ << "\n";
+      Result result = other.write_task(pid);
+      if (Result::ERRNO_INVALID == result) {
+        // std::cerr << "task " << pid << " is unmovable\n";
+      } else {
+        PERFECT_SUCCESS_OR_RETURN(result);
+      }
    }
-
    return Result::SUCCESS;
  }

  Result enable_memory_migration() {
-    std::ofstream ofs(path_ + "/" + "cpuset.memory_migrate");
-    ofs << "1";
-    ofs.close();
-    if (ofs.fail()) {
-    switch (errno) {
-    case EACCES:
-      return Result::NO_PERMISSION;
-    case ENOENT:
-      return Result::NOT_SUPPORTED;
-    default:
-      return Result::UNKNOWN;
-    }
-  }
-    return Result::SUCCESS;
+    return detail::write_str(path_ + "/cpuset.memory_migrate", "1");
  }

-  void write_task(const std::string &task) {
-    // write `task` to path/tasks
-    std::ofstream os(path_ + "/tasks");
-    os << task << "\n";
+  Result write_task(pid_t pid) {
+    return detail::write_str(path_ + "/tasks", std::to_string(pid) + "\n");
+  }
+
+  static Result get_affinity(std::set<int> &cpus, pid_t pid) {
+    cpu_set_t mask;
+    CPU_ZERO(&mask);
+    if (sched_getaffinity(pid, sizeof(mask), &mask)) {
+      return from_errno(errno);
+    }
+
+    cpus.clear();
+    for (int i = 0; i < CPU_SETSIZE; ++i) {
+      if
+        CPU_ISSET(i, &mask) { cpus.insert(i); }
+    }
+
+    return Result::SUCCESS;
  }

  // object representing the root CPU set
  static Result get_root(CpuSet &root) {
-    SUCCESS_OR_RETURN(CpuSet::init());
+    PERFECT_SUCCESS_OR_RETURN(CpuSet::init());
    root.path_ = "/dev/cpuset";
    root.parent_ = nullptr;
    return Result::SUCCESS;
@@ -256,7 +273,7 @@ public:
  Result make_child(CpuSet &child, const std::string &name) {

    if (mkdir((path_ + "/" + name).c_str(),
-              S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)) {
+              S_IRUSR | S_IWUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH)) {
      switch (errno) {
      case EEXIST: {
        // okay
@@ -264,8 +281,6 @@ public:
      }
      case EACCES:
        return Result::NO_PERMISSION;
-      case ENOENT:
-      case EROFS:
      default:
        return Result::UNKNOWN;
      }
@@ -276,6 +291,8 @@ public:
    return Result::SUCCESS;
  }

+  std::vector<CpuSet> get_children() { assert(false && "unimplemented"); }
+
  Result enable_cpu(const int cpu) {
    std::set<int> cpus = get_cpus();
    cpus.insert(cpu);
@@ -290,30 +307,28 @@ public:
    return write_cpus(finalCpus);
  }

-  // FIXME: check error
  Result write_cpus(std::set<int> cpus) {
-    std::ofstream os(path_ + "/cpuset.cpus");
+    std::string str;
    bool comma = false;
    for (auto cpu : cpus) {
      if (comma)
-        os << ",";
-      os << cpu << "-" << cpu;
+        str += ",";
+      str += std::to_string(cpu) + "-" + std::to_string(cpu);
      comma = true;
    }
-    return Result::SUCCESS;
+    return detail::write_str(path_ + "/cpuset.cpus", str);
  }

-  // FIXME: check write
  Result write_mems(std::set<int> mems) {
-    std::ofstream os(path_ + "/cpuset.mems");
+    std::string str;
    bool comma = false;
    for (auto mem : mems) {
      if (comma)
-        os << ",";
-      os << mem << "-" << mem;
+        str += ",";
+      str += std::to_string(mem) + "-" + std::to_string(mem);
      comma = true;
    }
-    return Result::SUCCESS;
+    return detail::write_str(path_ + "/cpuset.mems", str);
  }

  Result enable_mem(const int mem) {
@@ -331,31 +346,40 @@ public:
  }

  Result destroy() {
+
+    // already destroyed
+    if (!detail::path_exists(path_)) {
+      return Result::SUCCESS;
+    }
+
    // remove all child cpu sets

    // move all attached processes back to parent
-    assert(parent_);
-    migrate_tasks_to(*parent_);
+    assert(parent_ && "should not call destroy on root cpuset");
+    PERFECT_SUCCESS_OR_RETURN(migrate_tasks_to(*parent_));

    // remove with rmdir
+    Result result = Result::UNKNOWN;
    if (rmdir(path_.c_str())) {
      switch (errno) {
+      case ENOENT:
+        // already gone
+        result = Result::SUCCESS;
+        break;
      default:
        std::cerr << "unhandled error in rmdir: " << strerror(errno) << "\n";
-        return Result::UNKNOWN;
+        result = Result::UNKNOWN;
      }
    }

    path_ = "";
-    return Result::SUCCESS;
+    return result;
  }
-
-
 };

-  std::ostream &operator<<(std::ostream &s, const CpuSet &c) {
-    s << c.path_;
-    return s;
-  }
+std::ostream &operator<<(std::ostream &s, const CpuSet &c) {
+  s << c.path_;
+  return s;
+}

 } // namespace perfect
--- a/include/perfect/detail/fs.hpp
+++ b/include/perfect/detail/fs.hpp
@@ -1,5 +1,6 @@
 #pragma once

+#include <cstring>
 #include <fstream>
 #include <string>

@@ -32,15 +33,20 @@ Result write_str(const std::string &path, const std::string &val) {
  if (ofs.fail()) {
    switch (errno) {
    case EACCES:
-    std::cerr << "EACCES when writing to " << path << "\n";
+      // std::cerr << "EACCES when writing to " << path << "\n";
      return Result::NO_PERMISSION;
    case EPERM:
-      std::cerr << "EPERM when writing to " << path << "\n";
+      // std::cerr << "EPERM when writing to " << path << "\n";
      return Result::NO_PERMISSION;
    case ENOENT:
-    std::cerr << "ENOENT when writing to " << path << "\n";
+      // std::cerr << "ENOENT when writing to " << path << "\n";
      return Result::NOT_SUPPORTED;
+    case EINVAL:
+      // std::cerr << "EINVAL when writing to " << path << "\n";
+      return Result::ERRNO_INVALID;
    default:
+      std::cerr << strerror(errno) << " when writing " << val << " to " << path
+                << "\n";
      return Result::UNKNOWN;
    }
  }
--- a/include/perfect/detail/os/linux.hpp
+++ b/include/perfect/detail/os/linux.hpp
@@ -13,6 +13,8 @@
 #include <sys/types.h>
 #include <unistd.h>
 #include <sys/personality.h>
+#include <sys/time.h>
+#include <sys/resource.h>

 #include "perfect/result.hpp"

@@ -107,6 +109,26 @@ Result set_personality(const int persona) {
  }
  return Result::SUCCESS;
 }
+
+// give the calling process the highest priority
+Result set_high_priority() {
+  if (setpriority(PRIO_PROCESS, 0, -20)) {
+    return from_errno(errno);
+  }
+  return Result::SUCCESS;
 }

+// disable all but one SMT thread for all CPUs the calling process can run on
+Result disable_smt() {
+  return Result::NOT_SUPPORTED;
+}
+
+// enable SMT for all CPUs the calling process can run on
+Result enable_smt() {
+return Result::NOT_SUPPORTED;
+}
+
+
+} // namespace detail
+
 } // namespace perfect
--- a/include/perfect/drop_caches.hpp
+++ b/include/perfect/drop_caches.hpp
@@ -24,7 +24,7 @@ Result sync() {
    return Result::SUCCESS;
 }

-Result drop_caches(const DropCaches_t mode) {
+Result drop_caches(const DropCaches_t mode = DropCaches_t(PAGECACHE | ENTRIES)) {
    using detail::write_str;
    const std::string path = "/proc/sys/vm/drop_caches";
    if (mode & PAGECACHE & ENTRIES) {
--- a/include/perfect/os_perf.hpp
+++ b/include/perfect/os_perf.hpp
@@ -3,6 +3,7 @@
 #include <vector>
 #include <string>
 #include <cassert>
+#include <map>

 #ifdef __linux__
 #include "detail/os/linux.hpp"
@@ -17,19 +18,23 @@ namespace perfect {

 struct OsPerfState {
 #ifdef __linux__
-    std::string governor;
+    std::map<int, std::string> governors;
 #else
 #error "unsupported platform"
 #endif
 };

-Result get_os_perf_state(OsPerfState *state, const int cpu) {
-    assert(state);
+Result get_os_perf_state(OsPerfState &state) {
    #ifdef __linux__
-    return get_governor(state->governor, cpu);
+    for (auto cpu : cpus()) {
+        std::string gov;
+        PERFECT_SUCCESS_OR_RETURN(get_governor(gov, cpu));
+        state.governors[cpu] = gov;
+    }
    #else
    #error "unsupported platform"
    #endif
+    return Result::SUCCESS;
 }

 Result os_perf_state_maximum(const int cpu) {
@@ -48,13 +53,15 @@ Result os_perf_state_minimum(const int cpu) {
    #endif
 }

-Result set_os_perf_state(const int cpu, OsPerfState state) {
-        #ifdef __linux__
-    return set_governor(cpu, state.governor);
+Result set_os_perf_state(OsPerfState state) {
+    #ifdef __linux__
+    for (auto kv : state.governors) {
+        PERFECT_SUCCESS_OR_RETURN(set_governor(kv.first, kv.second));
+    }
    #else
    #error "unsupported platform"
    #endif
-
+    return Result::SUCCESS;
 }

 };
--- a/include/perfect/priority.hpp
+++ b/include/perfect/priority.hpp
@@ -0,0 +1,15 @@
+#pragma once
+
+#ifdef __linux__
+#include "detail/os/linux.hpp"
+#else
+#error "unsupported platform"
+#endif
+
+#include "init.hpp"
+
+namespace perfect {
+    Result set_high_priority() {
+        return detail::set_high_priority();
+    }
+}
--- a/include/perfect/result.hpp
+++ b/include/perfect/result.hpp
@@ -12,11 +12,17 @@
 #include <nvml.h>
 #endif

+#ifdef __linux__
+#include <cerrno>
+#endif
+
 namespace perfect {

 enum class Result {
  NO_PERMISSION,
  NOT_SUPPORTED,
+  NO_TASK,
+  ERRNO_INVALID,
  NVML_NO_PERMISSION,
  NVML_NOT_SUPPORTED,
  NVML_UNINITIALIZED,
@@ -38,6 +44,23 @@ Result from_nvml(nvmlReturn_t nvml) {
  case NVML_ERROR_INVALID_ARGUMENT:
  case NVML_ERROR_GPU_IS_LOST:
  case NVML_ERROR_UNKNOWN:
+  case NVML_ERROR_ALREADY_INITIALIZED:
+  case NVML_ERROR_NOT_FOUND:
+  case NVML_ERROR_INSUFFICIENT_SIZE:
+  case NVML_ERROR_INSUFFICIENT_POWER:
+  case NVML_ERROR_DRIVER_NOT_LOADED:
+  case NVML_ERROR_TIMEOUT:
+  case NVML_ERROR_IRQ_ISSUE:
+  case NVML_ERROR_LIBRARY_NOT_FOUND:
+  case NVML_ERROR_FUNCTION_NOT_FOUND:
+  case NVML_ERROR_CORRUPTED_INFOROM:
+  case NVML_ERROR_RESET_REQUIRED:
+  case NVML_ERROR_OPERATING_SYSTEM:
+  case NVML_ERROR_LIB_RM_VERSION_MISMATCH:
+  case NVML_ERROR_IN_USE:
+  case NVML_ERROR_MEMORY:
+  case NVML_ERROR_NO_DATA:
+  case NVML_ERROR_VGPU_ECC_NOT_SUPPORTED:
  default:
    assert(0 && "unhandled nvmlReturn_t");
  }
@@ -45,12 +68,28 @@ Result from_nvml(nvmlReturn_t nvml) {
 }
 #endif

+#ifdef __linux__
+Result from_errno(int err) {
+  switch (err) {
+  default:
+    assert(0 && "unhandled errno");
+  }
+  return Result::UNKNOWN;
+}
+#endif
+
 const char *get_string(const Result &result) {
  switch (result) {
  case Result::SUCCESS:
    return "success";
  case Result::NO_PERMISSION:
    return "no permission";
+  case Result::NOT_SUPPORTED:
+    return "unsupported operation";
+  case Result::NO_TASK:
+    return "no such task";
+  case Result::ERRNO_INVALID:
+    return "errno EINVAL";
  case Result::UNKNOWN:
    return "unknown error";
  case Result::NVML_NOT_SUPPORTED:
@@ -59,8 +98,7 @@ const char *get_string(const Result &result) {
    return "nvidia-ml returned no permission";
  case Result::NVML_UNINITIALIZED:
    return "nvidia-ml returned uninitialized";
-  case Result::NOT_SUPPORTED:
-    return "unsupported operation";
+
  default:
    assert(0 && "unexpected perfect::Result");
  }
@@ -81,11 +119,11 @@ inline void check(Result result, const char *file, const int line) {

 #define PERFECT(stmt) check(stmt, __FILE__, __LINE__);

-#define PERFECT_SUCCESS_OR_RETURN(stmt) \
-{\
-  Result _ret; \
-  _ret = (stmt); \
-if (_ret != Result::SUCCESS) {\
-  return _ret;\
-}\
-}
+#define PERFECT_SUCCESS_OR_RETURN(stmt)                                        \
+  {                                                                            \
+    Result _ret;                                                               \
+    _ret = (stmt);                                                             \
+    if (_ret != Result::SUCCESS) {                                             \
+      return _ret;                                                             \
+    }                                                                          \
+  }
--- a/tools/CMakeLists.txt
+++ b/tools/CMakeLists.txt
@@ -52,6 +52,12 @@ target_link_libraries(max-os-perf perfect)
 add_executable(min-os-perf min_os_perf.cpp)
 target_link_libraries(min-os-perf perfect)

+add_executable(addrs addrs.cpp)
+
+add_executable(perfect-cli perfect.cpp)
+target_link_libraries(perfect-cli perfect)
+target_include_directories(perfect-cli PUBLIC thirdparty)
+
 ## OpenMP
 find_package(OpenMP)
 if (OpenMP_FOUND)
--- a/tools/addrs.cpp
+++ b/tools/addrs.cpp
@@ -0,0 +1,9 @@
+#include <iostream>
+
+int main(void) {
+    int *a = new int;
+    std::cout << "main:  " << uintptr_t(main) << "\n";
+    std::cout << "stack: " << uintptr_t(&a) << "\n";
+    std::cout << "heap:  " << uintptr_t(a) << "\n";
+    delete a;
+}
--- a/tools/migrate-to-cpuset.sh
+++ b/tools/migrate-to-cpuset.sh
@@ -0,0 +1,14 @@
+#! /bin/bash
+
+while read i; do
+  echo $i;
+  echo $i > /dev/cpuset/tasks;
+done < /dev/cpuset/unshielded/tasks
+
+while read i; do
+  echo $i;
+  echo $i > /dev/cpuset/tasks;
+done < /dev/cpuset/shielded/tasks
+
+rmdir /dev/cpuset/shielded
+rmdir /dev/cpuset/unshielded
--- a/tools/perfect.cpp
+++ b/tools/perfect.cpp
@@ -0,0 +1,446 @@
+#include <cassert>
+#include <cerrno>
+#include <chrono>
+#include <functional>
+#include <iostream>
+#include <string>
+#include <thread>
+#include <vector>
+
+#ifdef __linux__
+#include <fcntl.h>
+#include <pwd.h>
+#include <signal.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#else
+#error "unsupported platform"
+#endif
+
+#include "clipp/clipp.h"
+#include "nonstd/optional.hpp"
+
+#include "perfect/aslr.hpp"
+#include "perfect/cpu_set.hpp"
+#include "perfect/cpu_turbo.hpp"
+#include "perfect/detail/os/linux.hpp"
+#include "perfect/drop_caches.hpp"
+#include "perfect/os_perf.hpp"
+#include "perfect/priority.hpp"
+
+typedef std::function<perfect::Result()> CleanupFn;
+std::vector<CleanupFn> cleanups;
+
+// restore the system state to how we found it
+void cleanup(int dummy) {
+  (void)dummy;
+  std::cerr << "caught ctrl-c\n";
+
+  // unregister our handler
+  signal(SIGINT, SIG_DFL);
+  std::cerr << "cleaning up\n";
+  std::cerr << "ctrl-c again to quit\n";
+
+  for (auto f : cleanups) {
+    perfect::Result result = f();
+  }
+
+  exit(EXIT_FAILURE);
+}
+
+// argv should be null-terminated
+// outf and errf are file descriptors to where stdout and stderr should be
+// redirected write stdout to out and stderr to err, if not null
+int fork_child(char *const *argv, int outf, int errf) {
+
+  pid_t pid;
+  int status;
+  pid = fork();
+  if (pid == -1) {
+    // pid == -1 means error occured
+    std::cerr << "can't fork, error occured\n";
+    return EXIT_FAILURE;
+  } else if (pid == 0) {
+    // in the child process
+
+    if (outf > 0) {
+      std::cerr << "redirecting child stdout to file\n";
+      if (dup2(outf, 1)) {
+        std::cerr << "dup2 error: " << strerror(errno) << "\n";
+        /*
+
+    EBADF
+        oldfd isn't an open file descriptor, or newfd is out of the allowed
+    range for file descriptors. EBUSY (Linux only) This may be returned by
+    dup2() or dup3() during a race condition with open(2) and dup(). EINTR The
+    dup2() or dup3() call was interrupted by a signal; see signal(7). EINVAL
+        (dup3()) flags contain an invalid value. Or, oldfd was equal to newfd.
+    EMFILE
+        The process already has the maximum number of file descriptors open and
+    tried to open a new one.
+        */
+      }
+
+      if (close(outf)) {
+        /*
+        EBADF
+            The fildes argument is not a valid file descriptor.
+        EINTR
+            The close() function was interrupted by a signal.
+
+        The close() function may fail if:
+
+        EIO
+            An I/O error occurred while reading from or writing to the file
+        system.
+          */
+      }
+    }
+
+    if (errf > 0) {
+      std::cerr << "redirecting child stderr to file\n";
+      if (dup2(errf, 2)) {
+        std::cerr << "dup2 error: " << strerror(errno) << "\n";
+
+        /*
+
+    EBADF
+        oldfd isn't an open file descriptor, or newfd is out of the allowed
+    range for file descriptors. EBUSY (Linux only) This may be returned by
+    dup2() or dup3() during a race condition with open(2) and dup(). EINTR The
+    dup2() or dup3() call was interrupted by a signal; see signal(7). EINVAL
+        (dup3()) flags contain an invalid value. Or, oldfd was equal to newfd.
+    EMFILE
+        The process already has the maximum number of file descriptors open and
+    tried to open a new one.
+        */
+      }
+
+      if (close(errf)) {
+        /*
+    EBADF
+        The fildes argument is not a valid file descriptor.
+    EINTR
+        The close() function was interrupted by a signal.
+
+    The close() function may fail if:
+
+    EIO
+        An I/O error occurred while reading from or writing to the file system.
+      */
+      }
+    }
+
+    // the execv() only return if error occured.
+    // The return value is -1
+    return execvp(argv[0], argv);
+  } else {
+    // parent process
+    if (waitpid(pid, &status, 0) > 0) {
+
+      if (WIFEXITED(status) && !WEXITSTATUS(status)) {
+        // success
+        return status;
+      }
+
+      else if (WIFEXITED(status) && WEXITSTATUS(status)) {
+        if (WEXITSTATUS(status) == 127) {
+          std::cerr << "execv failed\n";
+          return status;
+        } else {
+          std::cerr << "program terminated normally, but returned a non-zero "
+                       "status\n";
+          return status;
+        }
+      } else {
+        printf("program didn't terminate normally\n");
+        return status;
+      }
+    } else {
+      printf("waitpid() failed\n");
+      return EXIT_FAILURE;
+    }
+    return 0;
+  }
+}
+
+int main(int argc, char **argv) {
+
+  signal(SIGINT, cleanup);
+
+  using namespace clipp;
+
+  size_t numUnshielded = 0;
+  size_t numShielded = 0;
+  bool aslr = false;
+  nonstd::optional<bool> cpuTurbo = false;
+  nonstd::optional<bool> maxOsPerf = true;
+  bool dropCaches = true;
+  bool highPriority = true;
+
+  std::vector<std::string> program;
+  std::string stdoutPath;
+  std::string stderrPath;
+  int iters = 1;
+  int sleepMs = 1000;
+
+  bool help = false;
+
+  auto helpMode = option("-h", "--help").set(help).doc("show help");
+
+  auto shieldGroup = ((option("-u").doc("number of unshielded CPUs") &
+                       value("INT", numUnshielded)) |
+                      (option("-s").doc("number of shielded CPUs") &
+                       value("INT", numShielded)));
+
+  auto noModMode = (option("--no-mod")
+                        .doc("don't control performance")
+                        .set(aslr, true)
+                        .call([&]() { cpuTurbo = nonstd::nullopt; })
+                        .call([&]() { maxOsPerf = nonstd::nullopt; })
+                        .set(dropCaches, false)
+                        .set(highPriority, false));
+
+  auto modMode = (shieldGroup,
+                  option("--no-drop-cache")
+                      .set(dropCaches, false)
+                      .doc("do not drop filesystem caches"),
+                  option("--no-max-perf").doc("do not max os perf").call([&]() {
+                    maxOsPerf = false;
+                  }),
+                  option("--aslr").set(aslr, true).doc("enable ASLR"),
+                  option("--no-priority")
+                      .set(highPriority, false)
+                      .doc("don't set high priority"),
+                  option("--cpu-turbo").doc("enable CPU turbo").call([&]() {
+                    cpuTurbo = true;
+                  }),
+                  (option("--stdout").doc("redirect child stdout") &
+                   value("PATH", stdoutPath)),
+                  (option("--stderr").doc("redirect child stderr") &
+                   value("PATH", stderrPath)));
+
+  auto cli =
+      helpMode |
+      ((noModMode | modMode),
+       (option("--sleep-ms").doc("sleep before run") & value("INT", sleepMs)),
+       (option("-n").doc("run multiple times") & value("INT", iters)), helpMode,
+       // run everything after "--"
+       required("--") & greedy(values("cmd", program))
+
+      );
+
+  if (!parse(argc, argv, cli)) {
+    auto fmt = doc_formatting{}.doc_column(31);
+    std::cout << make_man_page(cli, argv[0], fmt);
+    return -1;
+  }
+
+  if (help) {
+    auto fmt = doc_formatting{}.doc_column(31);
+    std::cout << make_man_page(cli, argv[0], fmt);
+    return 0;
+  }
+
+  // open the redirect files, if needed
+  int errf = 0;
+  int outf = 0;
+  if (!stderrPath.empty()) {
+    std::cerr << "open " << stderrPath << "\n";
+    errf = open(stderrPath.c_str(), O_WRONLY | O_CREAT,
+                S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
+    if (-1 == errf) {
+      std::cerr << "error while opening " << stderrPath << ": "
+                << strerror(errno) << "\n";
+    }
+  }
+  if (!stdoutPath.empty()) {
+    outf = open(stdoutPath.c_str(), O_WRONLY | O_CREAT,
+                S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
+    if (-1 == outf) {
+      std::cerr << "error while opening " << stdoutPath << ": "
+                << strerror(errno) << "\n";
+    }
+  }
+
+  // if called with sudo, chown the files to whoever called sudo
+  const char *sudoUser = std::getenv("SUDO_USER");
+  if (sudoUser) {
+    std::cerr << "called with sudo by " << sudoUser << "\n";
+    uid_t uid;
+    gid_t gid;
+    struct passwd *pwd;
+
+    pwd = getpwnam(sudoUser);
+    if (pwd == NULL) {
+      // die("Failed to get uid");
+    }
+    uid = pwd->pw_uid;
+    gid = pwd->pw_gid;
+
+    if (!stdoutPath.empty()) {
+      if (chown(stdoutPath.c_str(), uid, gid) == -1) {
+        // die("chown fail");
+      }
+    }
+    if (!stderrPath.empty()) {
+      if (chown(stderrPath.c_str(), uid, gid) == -1) {
+        // die("chown fail");
+      }
+    }
+  }
+
+  // build the program arguments
+  std::vector<char *> args;
+  for (auto &c : program) {
+    args.push_back((char *)c.c_str());
+  }
+  args.push_back(nullptr);
+
+  // init the perfect library
+  PERFECT(perfect::init());
+
+  auto cpus = perfect::cpus();
+  if (0 < numShielded) {
+    numUnshielded = cpus.size() - numShielded;
+  } else if (0 < numUnshielded) {
+    numShielded = cpus.size() - numUnshielded;
+  }
+
+  // handle CPU shielding
+  perfect::CpuSet root, shielded, unshielded;
+  if (numShielded) {
+    std::cerr << "shielding " << numShielded << " cpus\n";
+
+    PERFECT(perfect::CpuSet::get_root(root));
+    PERFECT(root.make_child(shielded, "shielded"));
+    PERFECT(root.make_child(unshielded, "unshielded"));
+
+    std::cerr << "enable memory\n";
+    PERFECT(shielded.enable_mem(0));
+    PERFECT(unshielded.enable_mem(0));
+
+    std::cerr << "enable cpus\n";
+    size_t i = 0;
+    for (; i < cpus.size() - numShielded; ++i) {
+      std::cerr << "unshield cpu " << cpus[i] << "\n";
+      unshielded.enable_cpu(cpus[i]);
+    }
+    for (; i < cpus.size(); ++i) {
+      std::cerr << "shield cpu " << cpus[i] << "\n";
+      shielded.enable_cpu(cpus[i]);
+    }
+
+    std::cerr << "migrate self\n";
+    PERFECT(root.migrate_self_to(shielded));
+    std::cerr << "migrate other (1/2)\n";
+    PERFECT(root.migrate_tasks_to(unshielded));
+    // some tasks may have been spawned by unmigrated tasks while we migrated
+    std::cerr << "migrate other (2/2)\n";
+    PERFECT(root.migrate_tasks_to(unshielded));
+
+    cleanups.push_back(CleanupFn([&] {
+      std::cerr << "cleanup: shielded cpu set\n";
+      shielded.destroy();
+      std::cerr << "cleanup: unshielded cpu set\n";
+      unshielded.destroy();
+      return perfect::Result::SUCCESS;
+    }));
+  }
+
+  // handle aslr
+  if (!aslr) {
+    std::cerr << "disable ASLR for this process\n";
+    PERFECT(perfect::disable_aslr());
+  }
+
+  // handle CPU turbo
+  perfect::CpuTurboState cpuTurboState;
+  if (cpuTurbo.has_value()) {
+    PERFECT(perfect::get_cpu_turbo_state(&cpuTurboState));
+    if (false == cpuTurbo) {
+      std::cerr << "disabling cpu turbo\n";
+      PERFECT(perfect::disable_cpu_turbo());
+    } else {
+      std::cerr << "enabling cpu turbo\n";
+      PERFECT(perfect::enable_cpu_turbo());
+    }
+
+    cleanups.push_back(CleanupFn([&] {
+      std::cerr << "cleanup: restore CPU turbo state\n";
+      return perfect::set_cpu_turbo_state(cpuTurboState);
+    }));
+  }
+
+  // handle governor
+  perfect::OsPerfState osPerfState;
+  if (maxOsPerf.has_value()) {
+    PERFECT(perfect::get_os_perf_state(osPerfState));
+    if (true == maxOsPerf) {
+      std::cerr << "set max performance state\n";
+      for (auto cpu : perfect::cpus()) {
+        PERFECT(perfect::os_perf_state_maximum(cpu));
+      }
+    }
+
+    cleanups.push_back(CleanupFn([&] {
+      std::cerr << "cleanup: os governor\n";
+      return perfect::set_os_perf_state(osPerfState);
+    }));
+  }
+
+  if (highPriority) {
+    std::cerr << "set high priority\n";
+    PERFECT(perfect::set_high_priority());
+  }
+
+  // parent should return
+  for (int runIter = 0; runIter < iters; ++runIter) {
+
+    // drop filesystem caches before each run
+    if (dropCaches) {
+      std::cerr << "clearing file system cache\n";
+      PERFECT(perfect::drop_caches());
+    }
+
+    // sleep before each run
+    if (sleepMs) {
+      std::cerr << "sleep " << sleepMs << " ms before run\n";
+      std::this_thread::sleep_for(std::chrono::milliseconds(sleepMs));
+    }
+
+    std::cerr << "exec ";
+    for (size_t i = 0; i < args.size() - 1; ++i) {
+      std::cerr << args[i] << " ";
+    }
+    std::cerr << "\n";
+
+    int status = fork_child(args.data(), outf, errf);
+    if (0 != status) {
+      std::cerr << "did not terminate successfully\n";
+    }
+    std::cerr << "finished execution\n";
+  }
+
+  // clean up CpuSets (if needed)
+  if (numShielded) {
+    std::cerr << "clean up cpu sets\n";
+    shielded.destroy();
+    unshielded.destroy();
+  }
+
+  // restore original turbo state
+  if (cpuTurbo.has_value()) {
+    std::cerr << "restore CPU turbo\n";
+    PERFECT(perfect::set_cpu_turbo_state(cpuTurboState));
+  }
+
+  if (maxOsPerf.has_value()) {
+    std::cerr << "restore os performance state\n";
+    PERFECT(perfect::set_os_perf_state(osPerfState));
+  }
+
+  return 0;
+}
--- a/tools/thirdparty/clipp/clipp.h
+++ b/tools/thirdparty/clipp/clipp.h
--- a/tools/thirdparty/nonstd/optional.hpp
+++ b/tools/thirdparty/nonstd/optional.hpp
Author	SHA1	Message	Date
Carl Pearson	fabfecd306	Update README.md Some checks failed CI / build_cuda10-1 (push) Failing after 10s Details CI / build (push) Failing after 2s Details	2019-10-02 07:34:58 -05:00
Carl Pearson	4c0eabed89	add a tool to fix broken cpusets	2019-10-01 14:48:00 -05:00
Carl Pearson	46ca4d00ef	perfect-cli cleans up on SIGINT, fixed a problem where cpu_set would silently fail	2019-10-01 14:31:36 -05:00
Carl Pearson	bbda6e1262	add interface for scheduling priority	2019-10-01 06:55:50 -05:00
Carl Pearson	343b2b35ca	remove test from actions on CUDA job	2019-09-30 15:08:08 -05:00
Carl Pearson	c28e7b0945	add -h --help flag	2019-09-30 13:23:25 -05:00
Carl Pearson	46aa8c85ac	run build/tools/perfect-cli -h in test step	2019-09-30 13:07:19 -05:00
Carl Pearson	7b6332c90e	add test -h to binary	2019-09-30 12:07:29 -05:00
Carl Pearson	cc92923509	drop fs caches before each iteration	2019-09-30 12:04:52 -05:00
Carl Pearson	09e8757f72	.	2019-09-30 11:56:08 -05:00
Carl Pearson	1695ebb8ea	Add -n flag, change --no-aslr to --aslr, add --stdout and --stderr, chown outputs when run with sudo	2019-09-30 11:51:04 -05:00
Carl Pearson	158bffa61f	always change CPU turbo state	2019-09-26 12:30:10 -05:00
Carl Pearson	057fec7411	--no-cpu-turbo -> --cpu-turbo	2019-09-26 12:24:37 -05:00
Carl Pearson	a8d83417e8	add drop fs caches to tools/perfect-cli	2019-09-26 11:02:53 -05:00
Carl Pearson	1b3cf604a8	OsPerfState saves for all CPUs	2019-09-26 10:58:01 -05:00
Carl Pearson	d576ac099d	add tools/perfect-cli	2019-09-26 10:37:26 -05:00
Carl Pearson	aff90d408e	add NO_TASK result	2019-09-26 10:37:14 -05:00
Carl Pearson	6ace6932a7	simplify addrs	2019-09-26 08:56:46 -05:00
Carl Pearson	33243fe3bb	add some discussion of ASLR tools	2019-09-25 15:49:20 -05:00
Carl Pearson	64eb67cc2d	add tools/addrs	2019-09-25 15:45:18 -05:00