From 84bdee85ce64d6e5d9b82e69299bddecad72edfc Mon Sep 17 00:00:00 2001 From: Carl Pearson Date: Mon, 29 Nov 2021 07:52:27 -0800 Subject: [PATCH] readme --- README.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 566406a..4148540 100644 --- a/README.md +++ b/README.md @@ -14,10 +14,20 @@ poetry run python list.py poetry run python download.py ``` -## Notes +To download all datasets +``` +poetry run python download.py all +``` -ssgetpy does not discriminate "real" datatype from "integer". -Therefore, we have to do some filtering on the returned results. +## What it does + +Downloads subsets of the suitesparse collection to different directories. +For a given subset, the matrices are first downloaded to the `$SCRATCH/suitesparse` folder. If a required matrix already exists there, it is not redownloaded. +Then, a relative symlink is created from the `$SCRATCH//.mtx` file to the corresponding `.mtx` file in `$SCRATCH/suitesparse`. + +This makes use of a [fork of the `ssgetpy`](github.com/cwpearson/ssgetpy) package with a faster download limit. +ssgetpy does not discriminate "real" datatype from "integer" datatype, as shown on the suitesparse collection website. +Therefore, `lists.py` maintains a manually-curated list of `integer` datatype matrices to facilitate discrimination. ## how this was done