Development

This section is intended for those interested in how the identifiers module works, in the case you want to extend or change it, or make a different module for another endpoint. If we look at the entire application, the api modules are organized as follows:

som/api
├── base
├── google
├── __init__py
└── identifiers

What is in each folder?

Let’s look more closely at identifiers. Identifiers is structured like this:

som/api/identifiers
├── client.py
├── data
├── dicom
├── standards.py
├── utils.py
└── validators

The main client (an instance of the class defined in base mentioned above has functions to deidentify, and the following:

The dicom folder is a module for defining settings for preparing the request to the API from extracted identifier for the data, discussed next.

Dicom

The dicom module contains functionality for working with dicom. Before we talk about this module, it’s important to distinguish two things that are working together, but very different:

This application specializes in the second, and for the first (needed) functionality, we use a module called deid. We do this so that, if someone wanted to de-identify outside of the API, that would be possible.

Step 1: The De-id Recipe

We can tell deid how to repace identifiers by defining a recipe. For more details on how deid works, see its docs. Our default recipe that we enforce for the identifiers endpoint will use the default to blank all fields (this is a default of deid) and then replace entity and item identifiers with those returned from DASHER. The recipe looks like this:

FORMAT dicom

%header

REPLACE PatientID var:entity_id
REPLACE SOPInstanceUID var:item_id
ADD PatientIdentityRemoved "Yes"
REPLACE PatientBirthDate var:entity_timestamp
REPLACE InstanceCreationDate var:item_timestamp
REPLACE InstanceCreationTime var:item_timestamp

Step 2: Extract identifiers

We don’t need to use the recipe to extract identifiers, as the default deid will give us everything back that it finds (aside from pixel data). However, after extraction, we need to know which of those identifiers we want to send to DASHER as custom_fields for saving, and which identifiers in the data map to the variables of source_id, etc, that the API expects to get. To do this, we define a dicom settings to enforce the fields that we want to send to the API to keep securely. For example, here is the specification for an entity:


entity = {'id_source': 'PatientID',
          'id_timestamp': {"date":"PatientBirthDate"},
          'custom_fields':[ "AccessionNumber",
                            "OtherPatientIDs",
                            "OtherPatientNames",
                            "OtherPatientIDsSequence",
                            "PatientAddress",
                            "PatientBirthDate",
                            "PatientBirthName",
                            "PatientID",
                            "PatientMotherBirthName",
                            "PatientName",
                            "PatientTelephoneNumbers",
                            "ReferringPhysicianName"
                          ]
         }

And logically, it follows that if you want or need to change any of the data going to DASHER, this would be the file to change.

Step 3: Send to Dasher

As we saw in the example provided, the last step is to send the prepared request to DASHER.

Generally, as a developer, you can use this dicom folder as a template for a different data type, and write the corresponding functions for your datatype. Please post an issue if you have questionsor need help with this task.

Finally, you might be wondering “what happens between getting identifiers from the data (with deid) and sending a request to the API? We need to do stuff like calculate timestamps, and generate the datastructure. This module can only validate it once that is done. The answer is that more specific logic for creating the variables is implemented by the application using the API, for example, sendit.