Download Images and Metadata

After installing the som tools:

pip install som

You can type the command som to see it’s usage:

usage: som [-h] [--version] [--debug] {get} ...

Stanford Open Modules for Python [SOM]

optional arguments:
  -h, --help  show this help message and exit
  --version   show software version
  --debug     use verbose logging to debug.

actions:
  actions for som tools

  {get}       google storage and datastore
    get       download data from storge and datastore

Right now the only thing this client does is download data, meaning images and metadata:

metadata is from Google DataStore
images are from Google Storage

Specifically, we query datastore to get some subset, and then use the download links from datastore to retrieve them in Storage. Let’s take a look more closely at the get command:

som get --help

usage: som get [-h] [--outfolder OUTFOLDER] --project PROJECT --suid SUID
               [--query-images] --bucket BUCKET --collection COLLECTION

optional arguments:
  -h, --help            show this help message and exit
  --outfolder OUTFOLDER
                        full path to folder for output, stays in tmp (or pwd)
                        if not specified
  --project PROJECT     name of Google Cloud project (eg, irlhs-learning)
  --suid SUID           An suid associated with an entity to find images
                        (default) or query single images (set flag --query-
                        images for the study. Typically begins with IRXXXX
  --query-images        use verbose logging to debug.
  --bucket BUCKET       Name of the storage bucket. Eg (irlhs-dicom)
  --collection COLLECTION
                        name of collection (eg, IRB33192)

Credentials

Before this will work for you, you need to be given a credentials file, and you need to export the path to this credentials file in your environment. For example, let’s say I saved my top secret credentials in the file /home/vanessa/.secrets/hufflepuff.json. This could be in my $HOME/.bashrc or $HOME/.profile:

GOOGLE_APPLICATION_CREDENTIALS="/home/vanessa/.secrets/hufflepuff.json"
export GOOGLE_APPLICATION_CREDENTIALS 

This is important so your credentials are discovered automatically. If you don’t do this, you will get permissions errors (in the family of 400).

Parameters

If you notice from the help above, we need to have several parameters.

outfolder: should be an output folder for your study folder to be generated, and images and associated metadata saved.
project: should be the name of the Google Cloud Project. Likely this needs to be given to you.
suid: This is a stanford unique id associated with a patient (de-identified). The default of the client is to query across all studies for a single Patient ID, and in this case, you would set the suid to be on the level of the patient. If you want to query across the compressed images, then set the suid to be a study identifier, and also set the flag --query-images.
bucket: the Google Cloud Storage bucket. This also needs to be given to you, if you don’t know.
collection: the collection is typically named by your IRB. This should also be given to you.

API Call

An example API Call might look like the following:

som get --outfolder /tmp --project som-irlearning --suid IR661f32 --bucket irlhs-dicom --collection IRB33192
Collecting available images...
Found 2 images for suid IR661f32 in collection IRB33192
DEBUG Saving images and metadata...
Progress |===================================| 100.0% 

Yeah it’s a lot of parameters. We try to make this really hard on your memory centers :)

som

Stanford open modules for python

Download Images and Metadata

Credentials

Parameters

API Call