# ss-downloader ``` pipenv shell pip install -r requirements.txt ``` ## how to use ``` source load-env.sh python list.py python download.py ``` To download all datasets ``` python download.py all ``` You can move the datasets due to relative symlinks. For example: ``` rsync -azvh $SCRATCH/ ~/cfm_m3918/pearson ``` ## how to use (unsupported platform) set `SS_DIR` environment variable to the directory where you want the dataset folders to be generated. ## What it does Downloads subsets of the suitesparse collection to different directories. For a given subset, the matrices are first downloaded to the `$SCRATCH/suitesparse` folder. If a required matrix already exists there, it is not redownloaded. Then, a relative symlink is created from the `$SCRATCH//.mtx` file to the corresponding `.mtx` file in `$SCRATCH/suitesparse`. This makes use of a [fork of the `ssgetpy`](github.com/cwpearson/ssgetpy) package with a faster download limit. ssgetpy does not discriminate "real" datatype from "integer" datatype, as shown on the suitesparse collection website. Therefore, we access https://sparse.tamu.edu/files/ss_index.mat to determine that metadata for each file. ## Transfer data to a different filesystem ``` rsync -rzvh --links --info=progress2 pearson@cori.nersc.gov:$SS_DIR/ . ``` ## how this was done ``` poetry-new init poetry add ssgetpy ``` ``` poetry install ```