Topic: Data Pipelines for Ingress and Egress#

Session 1, Room 2

  • Chair: Arron Lacey

Prompts#

  • Challenges building data pipelines?

  • Automation / Reproducibility

Notes#

  • Challenges

    • Data Quality: creating one solution to ingest different datasets / schemas

    • Different date formats

    • Limits of normality in data

    • Need to harmonise data

      • Should this happen before data arrives in TREs?

      • Does it need to happen inside a TRE (ie, can it typically be done at schema level or does it require sight of aggregate data? record-level data?)

    • Instrument level

  • Automation / Reproducibility

    • Introduce common standards i.e. BIDS, NIFTI, DICOM Need to consult with domain experts.

      • Can TRE operators (information governance authorities?) require data providers to send data only in a set number of formats? (like data depositories do)

    • With increased processes through data pipeline you can introduce more error in the final output