次の方法で共有


Convert a Delta Live Tables pipeline into a Databricks Asset Bundle project

This article shows how to convert an existing Delta Live Tables (DLT) pipeline into a bundle project. Bundles enable you to define and manage your Azure Databricks data processing configuration in a single, source-controlled YAML file that provides easier maintenance and enables automated deployment to target environments.

Conversion process overview

Diagram showing the specific steps in converting an existing pipeline to a bundle

The steps you take to convert an existing pipeline to a bundle are:

  1. Make sure you have access to a previously configured pipeline you want to convert to a bundle.
  2. Create or prepare a folder (preferably in a source-controlled hierarchy) to store the bundle.
  3. Generate a configuration for the bundle from the existing pipeline, using the Azure Databricks CLI.
  4. Review the generated bundle configuration to ensure it is complete.
  5. Link the bundle to the original pipeline.
  6. Deploy the pipeline to a target workspace using the bundle config.

Requirements

Before you start, you must have:

Step 1: Set up a folder for your bundle project

You must have access to a Git repository that is configured in Azure Databricks as a Git folder. You will create your bundle project in this repository, which will apply source control and make it available to other collaborators through a Git folder in the corresponding Azure Databricks workspace. (For more details on Git folders, see Git integration for Databricks Git folders.)

  1. Go to the root of the cloned Git repository on your local machine.

  2. At an appropriate place in the folder hierarchy, create a folder specifically for your bundle project. For example:

    mkdir - p ~/source/my-pipelines/ingestion/events/my-bundle
    
  3. Change your current working directory to this new folder. For example:

    cd ~/source/my-pipelines/ingestion/events/my-bundle
    
  4. Initialize a new bundle by running databricks bundle init and answering the prompts. Once it completes, you will have a project configuration file named databricks.yml in the new home folder for your project. This file is required for deploying your pipeline from the command line. For more details on this configuration file, see Databricks Asset Bundle configuration.

Step 2: Generate the pipeline configuration

From this new directory in your cloned Git repository’s folder tree, run the following Azure Databricks CLI command and provide the ID of your DLT pipeline as <pipeline-id>:

databricks bundle generate pipeline --existing-pipeline-id <pipeline-id> --profile <profile-name>

When you run the generate command, it creates a bundle configuration file for your pipeline in the bundle’s resources folder and downloads any referenced artifacts to the src folder. The --profile (or -p flag) is optional, but if you have a specific Databricks configuration profile (defined in your .databrickscfg file created when you installed the Azure Databricks CLI) that you’d rather use instead of the default profile, provide it in this command. For information about Databricks configuration profiles, see Azure Databricks configuration profiles.

Step 3: Review the bundle project files

When the bundle generate command completes, it will have created two new folders:

  • resources is is the project subdirectory that contains project configuration files.
  • src is the project folder where source files, such as queries and notebooks, are stored.

The command also creates some additional files:

  • *.pipeline.yml under the resources subdirectory. This file contains the specific configuration and settings for your DLT pipeline.
  • Source files such as SQL queries under the src subdirectory, copied from your existing DLT pipeline.
├── databricks.yml                            # Project configuration file created with the bundle init command
├── resources/
│   └── {your-pipeline-name.pipeline}.yml     # Pipeline configuration
└── src/
    └── {SQl-query-retrieved-from-your-existing-pipeline}.sql # Your pipeline's declarative query

Step 4: Bind the bundle pipeline to your existing pipeline

You must link, or bind, the pipeline definition in the bundle to your existing pipeline in order to keep it up to date as you make changes. To do this, run the following Azure Databricks CLI command:

databricks bundle deployment bind <pipeline-name> <pipeline-ID> --profile <profile-name>

<pipeline-name> is the name of the pipeline. This name should be the same as the prefixed string value of the file name for the pipeline configuration in your new resources directory. For example, if you have a pipeline configuration file named ingestion_data_pipeline.pipeline.yml in your resources folder, then you must provide ingestion_data_pipeline as your pipeline name.

<pipeline-ID> is the ID for your pipeline. It is the same as the one you copied as part of the requirements for these instructions.

Step 5: Deploy your pipeline using your new bundle

Now, deploy your pipeline bundle to your target workspace using the bundle deploy Azure Databricks CLI command:

databricks bundle deploy --target <target-name> --profile <profile-name>

The --target flag is required and must be set to a string that matches a configured target workspace name, such as development or production.

If this command is successful, you now have your DLT pipeline configuration in an external project that can be loaded into other workspaces and run, and easily shared with other Azure Databricks users in your account.

Troubleshooting

Issue Solution
databricks.yml not found” error when running bundle generate Currently, the bundle generate command doesn’t create the bundle configuration file (databricks.yml) automatically. You must create the file using databricks bundle init or manually.
Existing pipeline settings don’t match the values in the generated pipeline YAML configuration The pipeline ID does not appear in the the bundle configuration YML file. If you notice any other missing settings, you can manually apply them.

Tips for success

  • Always use version control. If you aren’t using Databricks Git folders, store your project subdirectories and files in a Git or other version-controlled repository or file system.
  • Test your pipeline in a non-production environment (such as a “development” or “test” environment) before deploying it to a production environment. It’s easy to introduce a misconfiguration by accident.

Additional resources

For more information about using bundles to define and manage data processing, see: