Use DICOM data transformation in healthcare data solutions

The DICOM data transformation capability in healthcare data solutions allows you to ingest, store, and analyze Digital Imaging and Communications in Medicine (DICOM) data from various sources. To learn more about the capability and understand how to deploy and configure it, see:

DICOM data transformation is an optional capability with healthcare data solutions in Microsoft Fabric.

Prerequisites

Before you run the DICOM data transformation pipeline, make sure you complete the prerequisites, deployment process, and configuration steps explained in Deploy and configure DICOM data transformation.

Data ingestion options

This article provides step-by-step guidance on how to use the DICOM data transformation capability to ingest, transform, and unify the DICOM imaging dataset. The capability supports the following two ingestion options:

  • Option 1: End to end ingestion of DICOM files. The DICOM files, either in the native (DCM) or compressed (ZIP) formats, are ingested into the lakehouse. This option is called the Ingest option.

  • Option 2: Integration with the DICOM service. The ingestion is facilitated through native integration with the DICOM service in Azure Health Data Services. In this option, the DCM files are first transferred from the Azure Health Data Services DICOM service to Data Lake Storage Gen2. The pipeline then follows the Bring Your Own Storage (BYOS) ingestion pattern. This option is called the Azure Health Data Services (AHDS) option.

To understand the transformation mapping details, see DICOM metadata transformation mapping in healthcare data solutions.

Option 1: End to end ingestion of DICOM files

In this option, we ingest and transform the imaging data from DICOM files into the healthcare data solutions lakehouses using the prebuilt data pipeline. The end-to-end transformation consists of the following consecutive steps:

  1. Ingest DICOM files into OneLake
  2. Organize DICOM files in OneLake
  3. Extract DICOM metadata into the bronze lakehouse
  4. Convert DICOM metadata to the FHIR (Fast Health Interoperability Resources) format
  5. Ingest data into the ImagingStudy delta table in the bronze lakehouse
  6. Flatten and transform data into the ImagingStudy delta table in the silver lakehouse
  7. Convert and ingest data into the Image_Occurrence table in the gold lakehouse (optional)

Tip

This ingestion option uses the 340ImagingStudies sample dataset that contains compressed ZIP files. Alternatively, you can ingest DICOM files directly in their native DCM format by placing them in the Ingest folder. Within the ZIP files, the DCM files may be structured into multiple nested subfolders. There's no limit on the number of DCM files or the number, depth, and nesting of subfolders within the ingested ZIP files. For information on file size limits, see Ingestion file size.

Step 1: Ingest DICOM files into OneLake

The Ingest folder in the bronze lakehouse represents a drop (queue) folder. You can drop the DICOM files inside this folder. The files then move to an organized folder structure within the bronze lakehouse.

  1. Go to the Ingest\Imaging\DICOM\DICOM-HDS folder in the bronze lakehouse.

  2. Select ... (ellipsis) > Upload > Upload folder.

  3. Select and upload the 340ImagingStudies imaging dataset from the SampleData folder in SampleData\Imaging\DICOM\DICOM-HDS. Alternatively, you can also use OneLake file explorer or Azure Storage Explorer to upload the sample dataset.

Step 2: Run the imaging data pipeline

After moving the DCM/ZIP files to the Ingest folder in the bronze lakehouse, you can now run the imaging data pipeline to organize and process the data to the silver lakehouse.

  1. In your healthcare data solutions environment, open the healthcare#_msft_imaging_with_clinical_foundation_ingestion data pipeline.

  2. Select the Run button to begin processing the imaging data from the bronze to silver lakehouse.

This data pipeline sequentially runs five notebooks: three deployed as part of the healthcare data foundations capability and two from the DICOM data transformation capability. To learn more about these notebooks, see DICOM data transformation: Artifacts.

A screenshot displaying a sample pipeline run.

Step 3: Run the silver to gold transformation notebook

Note

This transformation step is optional. Use it only if you need to further transform your DICOM data into the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) format. Otherwise, you can skip this step.

Before you run this transformation, deploy and configure the OMOP transformations capability in healthcare data solutions.

After you run the imaging pipeline, your imaging data transforms up to the silver lakehouse. The silver lakehouse serves as the initial point where data from various modalities begins to converge in a structured manner. To further transform your data into the OMOP research standard for use in the Discover and build cohorts (preview) capability, run the silver to gold transformation notebook.

  1. In your healthcare data solutions environment, open the healthcare#_msft_omop_silver_gold_transformation notebook.

    This notebook uses the healthcare data solutions OMOP APIs to transform resources from the silver lakehouse into OMOP CDM delta tables in the gold lakehouse. By default, you don't need to make any changes to the notebook configuration.

  2. Select Run all to run the notebook.

    The notebook implements the OMOP tracking approach to track and process new or updated records in the ImagingStudy delta table in the silver lakehouse. It transforms data from the FHIR delta tables in the silver lakehouse (including the ImagingStudy table) into the corresponding OMOP delta tables in the gold lakehouse (including the Image_Occurrence table). For more information on this transformation, see Transformation mapping for silver to gold delta table.

    For detailed OMOP mapping information, see FHIR to OMOP mapping.

Step 4: Validate the data

In real-world scenarios, data ingestion involves sources with varying quality levels. The validation engine, detailed in Data validation, intentionally triggers validations on some of the provided imaging sample data. Files that don't comply with DICOM standards move to the Failed folder and don't process. But, a single file failure doesn't disrupt the entire pipeline, as demonstrated by the imaging sample data. The pipeline and associated notebooks run successfully, but the Failed folder under Imaging\DICOM\DICOM-HDS\YYYY\MM\DD contains a noncompliant file. All other valid files process successfully, resulting in an overall successful pipeline status. We intentionally include this invalid file in the imaging sample data to illustrate how the imaging pipeline handles invalid files and helps you identify dataset issues.

A screenshot displaying the invalid file in the **Failed** folder.

To confirm the pipeline successfully extracted all the metadata from the raw DICOM files, open the bronze lakehouse, switch to the SQL analytics endpoint, and select New SQL query.

A screenshot displaying the SQL analytics endpoint switch.

If the pipeline ran correctly, you must see 7739 successfully processed DICOM instances in the ImagingDicom table. To verify, run the following SQL query. For successful processing, you should see 7739 in the Results pane. This number represents the total number of DICOM instances in the sample data, comprising data from different modalities such as Computerized Tomography (CT) and Magnetic Resonance Imaging (MRI) scans.

select count(*) from ImagingDicom

A screenshot displaying the DICOM instances in the bronze lakehouse.

To confirm the pipeline successfully hydrated the lakehouses, open the silver lakehouse, switch to the SQL analytics endpoint, and select New SQL query. For a correct pipeline run, you must see 339 successfully processed ImagingStudy resources. To verify, run the following SQL query. Initially, we begin with 340 ImagingStudy resources, but one encounters an error during processing.

 select count(*) from ImagingStudy

A screenshot displaying the DICOM instances in the silver lakehouse.

Option 2: Integration with the DICOM service

Important

Use this transformation option only if you're using the Azure Health Data Services DICOM service and deployed the DICOM API.

This transformation approach extends the Bring Your Own Storage (BYOS) pattern with the Azure Health Data Services DICOM service. The DICOM service is a subset of DICOMweb APIs that allow you to store, review, search, and delete DICOM objects. It integrates with the Azure Data Lake Storage Gen2 account linked to your Fabric workspace, so that the transformation pipeline can directly access your DICOM data.

Alternatively, you can skip using the Azure DICOM API and ingest DICOM files stored in your Data Lake Storage Gen2 account (in this case, begin from step 5).

  1. Review and complete the configuration in Deploy the DICOM API in Azure Health Data Services.

  2. After deploying the Azure DICOM service, use the Store (STOW-RS) API to ingest DCM files. Test it by downloading a DCM file from the imaging sample data using either OneLake file explorer or Azure Storage Explorer.

  3. Depending on your preferred language, upload the DCM files to the server using one of the following options:

  4. Verify whether file upload was successful:

    1. In the Azure portal, select the storage account linked to the DICOM service.
    2. Go to Containers and follow the path [ContainerName]/AHDS/[AzureHealthDataServicesWorkspaceName]/dicom/[DICOMServiceName].
    3. Check if you can see the DCM file uploaded here.

    An Azure portal screenshot displaying the uploaded data.

    Note

    • The file name might change when uploaded to the server. However, the file content remains unchanged.
    • For information on file size limits, see Ingestion file size.
  5. Create a shortcut in the bronze lakehouse for the DICOM file stored in the Data Lake Storage Gen2 location. Follow the steps in Create an Azure Data Lake Storage Gen2 shortcut.

    • For the Azure DICOM service, make sure you use the Data Lake Storage Gen2 account created with the service.
    • If you aren't using the Azure DICOM service, you can create a new Data Lake Storage Gen2 account or use an existing one. To learn more, see Create a storage account to use with Azure Data Lake Storage.

    For consistency, use the following folder structure to create the shortcut: Files\External\Imaging\DICOM\[Namespace]\[BYOSShortcutName]. The Namespace value ensures logical separation of shortcuts from different source systems. For example, you can use the Data Lake Storage Gen2 name for the Namespace value.

    A screenshot displaying how to create the shortcut using the recommended folder structure.

    Note

    OneLake shortcuts also support multiple storage systems beyond Data Lake Storage Gen2. For a full list of supported storage types, see OneLake shortcuts.

  6. Configure the admin lakehouse to enable BYOS:

    1. Go to the healthcare#_msft_admin lakehouse and open the deploymentParametersConfiguration.json file under Files\system-configurations.

    2. Enable the BYOS setting in this configuration file. Use OneLake file explorer to open the deploymentParametersConfiguration.json file from the following folder path: OneLake - Microsoft\[WorkspaceName]\healthcare#_msft_admin.Lakehouse\Files\system-configurations. Use any JSON or text editor (such as Windows Notepad) to open the file, search for the byos_enabled parameter, and set it to true.

      A screenshot displaying the setting in the configuration file.

  7. The DICOM data transformation capability can now access all your DICOM files in their source Data Lake Storage Gen2 location, regardless of the folder hierarchy/structure. You don't need to manually ingest the DICOM files as done in the Ingest option. Begin execution from Step 2: Run the imaging data pipeline in the previous section to use the imaging pipeline and transform your DICOM data.

Note

To understand the integration limitations with Azure Health Data Services DICOM service, see Integration with DICOM service.