Skip to content

Translating Cancer Data Commons (CDA) to FHIR (Fast Healthcare Interoperability Resources) format.

Notifications You must be signed in to change notification settings

FHIR-Aggregator/CDA2FHIR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CDA2FHIR

Status License: MIT

img

Translating Cancer Data Commons (CDA) to 🔥 FHIR (Fast Healthcare Interoperability Resources) format.

Usage

Installation

  • from source
# clone repo & setup virtual env
python3 -m venv venv
. venv/bin/activate
pip install -e .

Transform to FHIR

Data

To run the transformer, ensure that CDA raw data is located in the ./data/raw/ directory. If you need to retrieve the raw data, please contact cancerdataaggregator @ gmail.

Usage: cda2fhir transform [OPTIONS]

Options:
  -s, --save                 Save FHIR ndjson to CDA2FHIR/data/META folder.
                             [default: True]
  -v, --verbose
  -ns, --n_samples TEXT      Number of samples to randomly select - max 100.
  -nd, --n_diagnosis TEXT    Number of diagnosis to randomly select - max 100.
  -nf, --n_files TEXT        Number of files to randomly select - max 100.
  -f, --transform_files      Transform CDA files to FHIR DocumentReference and
                             Group.
  -t, --transform_treatment  Transform CDA treatment to all sub-hierarchy of
                             FHIR MedicationAdministration ->
                             SubstanceDefinitionRepresentation.
  -c, --transform_condition  Transform CDA disease to Condition
  -m, --transform_mutation   Transform CDA mutation to Observation
  -p, --path TEXT            Path to save the FHIR NDJSON files. default is
                             CDA2FHIR/data/META.
  --help                     Show this message and exit.
  • example
cda2fhir transform 

FHIR data validation

Run validate

 cda2fhir validate --path data/META
{'summary': {'Specimen': 742505, 'Medication': 214, 'Observation': 832864, 'ResearchStudy': 429, 'SubstanceDefinition': 214, 'BodyStructure': 135, 'Condition': 114804, 'ResearchSubject': 184888, 'MedicationAdministration': 38267, 'Patient': 159047, 'Substance': 214}}

This command will validate your FHIR entities and their reference relations to each other. It will also generate a summary count of all entities in each ndjson file.

NOTE: This process may take 5 minutes or more, depending on your platform or compute power due to the size of the current data.

Check for a field ex. extension

awk '!/extension/ {exit 1}' data/META/ResearchSubject.ndjson && echo "Every line contains 'extension'" || echo "Not every line contains 'extension'"

Testing

Current integration testing runs on all data and may take approximately 2 hours.

pytest -cov 

About

Translating Cancer Data Commons (CDA) to FHIR (Fast Healthcare Interoperability Resources) format.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages