ss-downloader
pipenv shell
pip install -r requirements.txt
how to use
source load-env.sh
python list.py
python download.py <dataset name>
To download all datasets
python download.py all
You can move the datasets due to relative symlinks. For example:
rsync -azvh $SCRATCH/ ~/cfm_m3918/pearson
how to use (unsupported platform)
set SS_DIR
environment variable to the directory where you want the dataset folders to be generated.
What it does
Downloads subsets of the suitesparse collection to different directories.
For a given subset, the matrices are first downloaded to the $SCRATCH/suitesparse
folder. If a required matrix already exists there, it is not redownloaded.
Then, a relative symlink is created from the $SCRATCH/<subset>/<matrix>.mtx
file to the corresponding .mtx
file in $SCRATCH/suitesparse
.
This makes use of a fork of the ssgetpy
package with a faster download limit.
ssgetpy does not discriminate "real" datatype from "integer" datatype, as shown on the suitesparse collection website.
Therefore, we access https://sparse.tamu.edu/files/ss_index.mat to determine that metadata for each file.
Transfer data to a different filesystem
rsync -rzvh --links --info=progress2 pearson@cori.nersc.gov:$SS_DIR/ .
how this was done
poetry-new init
poetry add ssgetpy
poetry install