2022-11-02 08:51:02 -07:00
2022-11-02 08:50:33 -07:00
2021-11-25 06:18:51 -08:00
2022-01-28 13:14:43 -07:00
2022-11-02 08:50:08 -07:00
2021-12-01 14:32:07 -08:00

ss-downloader

Install poetry & Python 3.8+

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -

how to use

source load-env.sh
poetry run python list.py
poetry run python download.py <dataset name>

To download all datasets

poetry run python download.py all

You can move the datasets due to relative symlinks. For example:

rsync -azvh $SCRATCH/ ~/cfm_m3918/pearson

how to use (unsupported platform)

set SS_DIR environment variable to the directory where you want the dataset folders to be generated.

What it does

Downloads subsets of the suitesparse collection to different directories. For a given subset, the matrices are first downloaded to the $SCRATCH/suitesparse folder. If a required matrix already exists there, it is not redownloaded. Then, a relative symlink is created from the $SCRATCH/<subset>/<matrix>.mtx file to the corresponding .mtx file in $SCRATCH/suitesparse.

This makes use of a fork of the ssgetpy package with a faster download limit. ssgetpy does not discriminate "real" datatype from "integer" datatype, as shown on the suitesparse collection website. Therefore, we access https://sparse.tamu.edu/files/ss_index.mat to determine that metadata for each file.

Transfer data to a different filesystem

rsync -rzvh --links --info=progress2 pearson@cori.nersc.gov:$SS_DIR/ .

how this was done

poetry-new init
poetry add ssgetpy
poetry install
Description
No description provided
Readme 55 KiB
Languages
Python 93.8%
Shell 6.2%