Del via


Overview of SDOH datasets - Transformations (preview) in healthcare data solutions

Important

  • This is a preview feature.
  • Preview features aren't meant for production use and might have restricted functionality. These features are available before an official release so that customers can get early access and provide feedback.
  • To review the terms of service, see Healthcare data solutions in Microsoft Fabric.

Social determinants of health (SDOH) refer to the socio-economic factors that influence health outcomes. These determinants include where individuals are born, grow, live, work, and age, and factors such as income and access to resources. With SDOH data, you can gain insights into the nonmedical factors that affect patients health and well-being, enabling you to design more targeted care interventions.

SDOH datasets - Transformations (preview) in healthcare data solutions enable you to ingest, store, and analyze geography-level SDOH datasets in CSV (Comma-separated values) and XLSX (Excel Open XML Spreadsheet) formats. When combined with other healthcare data, including clinical, imaging, and claims, SDOH data supports unified analytics for population health, care management, and risk stratification.

This capability integrates SDOH data with core healthcare domains such as clinical and claims data to power large-scale analytics. The data pipeline streamlines the transformation of geography-level SDOH datasets into tabular formats, persisting the data in the lake under the healthcare data model.

SDOH datasets - Transformations (preview) is an optional capability under healthcare data solutions in Microsoft Fabric. You have the flexibility to decide whether or not to use it, depending on your specific needs or scenarios.

To learn how to deploy, configure, and use the capability, see:

Conceptual architecture

SDOH datasets - Transformations (preview) use the innovative medallion lakehouse design explained in Data architecture and management in healthcare data solutions.

The medallion architecture for this capability consists of the following three fundamental layers:

  • Bronze: Also called the raw zone, this layer stores SDOH datasets in their original format such as XLSX or CSV files. Before ingestion, prepare the raw datasets by adding three sets of vital information:

    • Dataset metadata: Include details such as the dataset name, publishing agency, and publication date.
    • Layout: Define the abbreviated SDOH metrics, including the social determinant name, description, unit, and categories.
    • Location configuration: Specify location information within the dataset, such as state, county, and County Federal Information Processing Series (FIPS).

    In this layer, the ingestion notebooks convert each sheet of the datasets into tables. The dataset metadata and layout information are stored in one common table.

    The harmonization key serves as an annotation or tag that depicts the category of each SDOH metric. For example:

    • Total weighted population is assigned the harmonization key Demographics.
    • Total number of civilian veterans is assigned to Veteran.

    Use these keys to identify and query related metrics within and across datasets, regardless of publishers, to enrich your analytical use cases.

  • Silver: Also called the enriched zone, this layer stores SDOH datasets sourced from the bronze lakehouse within a custom data model. As the SDOH public datasets don't have predefined standards or Fast Healthcare Interoperability Resources (FHIR) resources, a custom data model is used to persist SDOH data.

    Here's the entity-relationship for this layer:

    Entity relationship diagram for SDOH datasets - Transformations (preview)

  • Gold: Also called the curated zone, this layer stores SDOH data sourced from the corresponding silver lakehouse. The preview release focuses on the bronze and silver layers. However, you can use the gold layer with the foundational healthcare data solutions architecture, which integrates SDOH and other core healthcare modalities such as clinical, imaging, and claims. This layer enables you to build various analytical models such as risk stratification dashboards, care management analytics, and similar value-based care use cases.