Deploy and configure SDOH datasets - Transformations in healthcare data solutions

The SDOH datasets - Transformations pipeline helps you integrate various SDOH (Social determinants of health) datasets into Fabric OneLake. You can deploy and configure this capability after deploying healthcare data solutions to your Fabric workspace and the healthcare data foundations capability. This article outlines the deployment process and shows you how to access the public datasets for end-to-end-execution.

SDOH datasets - Transformations is an optional capability under healthcare data solutions in Microsoft Fabric. You have the flexibility to decide whether or not to use it, depending on your specific needs or scenarios.

Prerequisites

Deploy SDOH datasets - Transformations

You can deploy the capability and the associated sample data using the setup module explained in Healthcare data solutions: Deploy healthcare data foundations. Alternatively, you can also deploy the sample data later using the steps in Deploy sample data.

This capability uses the 8SDOHPublicDatasets sample dataset. It contains SDOH data published by government agencies and other official sources, consolidated at geographic levels such as state, county, or zip code. The current release provides eight sample SDOH datasets to help you run data pipelines and explore the capability. For more information, see Public datasets in SDOH datasets - Transformations.

If you didn't use the setup module to deploy the capability and want to use the capability tile instead, follow these steps:

  1. Go to the healthcare data solutions home page on Fabric.

  2. Select the SDOH datasets - Transformations tile.

    A screenshot displaying the capability tile.

  3. On the capability page, select Deploy to workspace.

    A screenshot displaying how to deploy the capability to the workspace.

  4. The deployment can take a few minutes to complete. Don't close the tab or the browser while deployment is in progress. While you wait, you can work in another tab.

    After the deployment completes, you can see a notification on the message bar.

  5. Select Manage capability from the message bar to go to the Capability management page.

    Here, you can view, configure, and manage the artifacts deployed with the capability.

Artifacts

The capability installs the following notebooks and data pipeline in your healthcare data solutions environment:

Artifact Type Description
healthcare#_msft_sdoh_raw_extract_bronze_ingestion Notebook Facilitates the ingestion of SDOH public datasets into delta tables within the bronze lakehouse.
healthcare#_msft_sdoh_bronze_silver_flatten Notebook Transforms the SDOH public datasets from the bronze lakehouse and ingests the data into the silver lakehouse.
healthcare#_msft_sdoh_ingestion Data pipeline Sequentially runs a series of notebooks to ingest and transform SDOH public datasets from the landing zone into a custom data model in the silver lakehouse. It enables unification of SDOH data with core healthcare modalities such as clinical and claims.

Notebook configuration

  • Global configuration: The global configuration values apply to the SDOH datasets - Transformations pipeline as outlined in Admin lakehouse: Global configuration and the healthcare#_msft_config_notebook in Deploy healthcare data foundations.

  • Notebook-level configuration: The SDOH datasets - Transformations notebooks deploy with preconfigured values required to run the associated data pipeline. Some configuration parameters inherit from the global configuration and can be overridden at the notebook level. By default, you aren't expected to make any changes to the notebook configuration files. If needed, you can review or modify the configuration by selecting the respective notebooks in your environment.

  • Runtime configuration: The SDOH notebooks are preconfigured to run using Runtime 1.2 (Spark 3.4, Delta 2.4) by default. Ensure you maintain this setting at the environment level. To learn more, see Reset Spark runtime version in the Fabric workspace.