Update a Common Data Model data source to use Delta tables

Update an existing data connection with Common Data Model tables and move to Delta-formatted tables without removing and recreating an existing configuration that depends on the data connection.

Key reasons to connect to data stored in Delta format:

  • Directly import Delta formatted data to save time and effort.
  • Eliminate the compute and storage costs associated with transforming and storing a copy of your lakehouse data.
  • Automatically improve the reliability of data ingestion to Customer Insights - Data provided by Delta versioning.

Delta is a term introduced with Delta Lake, the foundation for storing data and tables in the Databricks Lakehouse Platform. Delta Lake is an open-source storage layer that brings ACID (atomicity, consistency, isolation, and durability) transactions to big data workloads. For more information, see the Delta Lake Documentation Page.

Prerequisites

  • The Azure Data Lake Storage must be in the same tenant and Azure region as Customer Insights - Data.

  • To connect to storage protected by firewalls, Set up Azure private links.

  • The Customer Insights - Data service principal must have Storage Blob Data Contributor permissions to access the storage account. For more information, see Grant permissions to the service principal to access the storage account.

  • The user that sets up or updates the data source needs at least Storage Blob Data Reader permissions on the Azure Data Lake Storage account.

  • Data stored in online services might be stored in a different location than where data is processed or stored. By importing or connecting to data stored in online services, you agree that data can be transferred. Learn more at the Microsoft Trust Center.

  • Customer Insights - Data supports Databricks reader version 2. Delta tables using features that require Databricks reader version 3 or above aren't supported. Learn more: Supported Databricks features.

  • The Delta tables must be in a folder in the storage container and can't be in the container root directory. For example:

    storageaccountcontainer/
        DeltaDataRoot/
           ADeltaTable/
                 _delta_log/
                     0000.json
                     0001.json
                 part-0001-snappy.parquet
                 part-0002-snappy.parquet
    
  • The Delta tables and their schema must match the tables in the existing Common Data Model data source and be in the same storage container. The tables in the new data folder must match exactly to the selected tables in the Common Data Model data source. The tables names and their schemas must match exactly. In Delta, table names are the same as the folder name where the data is stored. Therefore, the folder names must match exactly to the selected tables in the Common Data Model data source. Otherwise, the update fails.

    For example, if the selected Common Data Model data source tables are Table1 and Table2, then the folder you choose for the update must show Table1 and Table2 in the hierarchy.

    storageaccountroot/
    DeltaDataRoot/
        Table1/
        Table2/
    

Update Common Data Model data tables to Delta tables

  1. Go to Data > Data sources.

  2. Select the Azure Data Lake Common Data Model data source and then select Update to Delta tables. Or, select Begin update from the Add tables page if you're editing the Common Data Model data source.

    Data sources page showing a Common Data Model data source with Update to Delta tables highlighted.

  3. Select Browse and navigate to the folder that contains the data in Delta format and exactly matches the selected Azure Data Lake data source table. Select it, and then select Update data source.

    The Data sources page opens showing the new data source in Refreshing status.

    Important

    Don't stop the refreshing process as it could negatively impact updating the data source.

    Tip

    There are statuses for tasks and processes. Most processes depend on other upstream processes, such as data sources and data profiling refreshes.

    Select the status to open the Progress details pane and view the progress of the tasks. To cancel the job, select Cancel job at the bottom of the pane.

    Under each task, you can select See details for more progress information, such as processing time, the last processing date, and any applicable errors and warnings associated with the task or process. Select the View system status at the bottom of the panel to see other processes in the system.

We recommend you continue to stream your data to the Data Lake Storage location through your existing pipeline and maintain the manifests and schemas until you determine the update was successful and everything is working as expected.

Revert the conversion from Common Data Model tables to Delta tables

If you tried to update an Azure Data Lake Common Data Model data source to Delta tables and the process fails, perform the following steps.

Prerequisites

  • Your organization has continued to stream the Data Lake Storage data through your pipeline.
  • Your organization has maintained the Data Lake Storage manifests and schemas.

Revert back to an Azure Data Lake Common Data Model data source

  1. Go to Data > Data sources.

  2. Select the Azure Data Lake Common Data Model data source and then select Revert to Common Data Model tables.

  3. Confirm that you want to revert. The Data sources page opens showing the new data source in Refreshing status.

    Important

    Don't stop the refreshing process as it could negatively impact reverting the data source.