Use CMS claims data transformations (preview) in healthcare data solutions

[This article is prerelease documentation and is subject to change.]

With CMS claims data transformations (preview), you can ingest, store, and analyze claims data in CMS (Centers for Medicare & Medicaid Services) CCLF (Claim and Claim Line Feed) format. To learn more about the capability and understand how to deploy and configure it, see:

Understand the transformation mechanism

The claims data transformation pipeline ingests claims files in either native or compressed format into the lakehouse. The end-to-end transformation follows these high-level consecutive steps:

  • Transform the claims files in OneLake
  • Organize the claims files in OneLake
  • Extract claims data into the bronze lakehouse
  • Convert claims data to FHIR NDJSON files
  • Transform claims data into FHIR flattened tables in the bronze lakehouse
  • Transform claims data into FHIR relational tables in the silver lakehouse

Run the claims data transformations pipeline

Ensure you complete the steps in Set up claims sample data before running the claims data transformations pipeline.

  1. To transform the claims data from the bronze lakehouse to the silver lakehouse, open the healthcare#_msft_clinical_claims_cclf_data_transformation data pipeline and select Run.

    A screenshot displaying a sample data pipeline run.

  2. After the pipeline runs successfully, open the ExplanationOfBenefit table in the silver lakehouse to view the transformed data.

    A screenshot displaying transformed data in the ExplanationOfBenefit table.

Usage considerations

Review these key points before using the CMS claims data transformations (preview) capability.

Spark version

The notebooks are preconfigured to run with Spark runtime version 1.2 (Spark 3.4, Delta 2.4) by default. Ensure you maintain this setting at the environment level. To learn more, see Reset Spark runtime version in the Fabric workspace.

File extension

The uploaded CCLF files must follow the extension format: *.T1000001 to *.T1000009. Files with incorrect extensions move to the Failed folder in the bronze lakehouse.

Record length

A record length mismatch in CCLF files occurs when one or more records deviate from the required fixed-length format. This mismatch can cause data misalignment, incomplete data capture, or processing errors. Files with records that don't meet the expected length move to the Failed folder in the bronze lakehouse.