Κοινή χρήση μέσω


Databricks Asset Bundles library dependencies

This article describes the syntax for declaring Databricks Asset Bundles library dependencies. Bundles enable programmatic management of Azure Databricks workflows. See What are Databricks Asset Bundles?.

In addition to notebooks, your Azure Databricks jobs will likely depend on libraries in order to work as expected. Databricks Asset Bundles dependencies for local development are specified in the requirements*.txt file at the root of the bundle project, but job task library dependencies are declared in your bundle configuration files and are often necessary as part of the job task type specification.

Bundles provide support for the following library dependencies for Azure Databricks jobs:

  • Python wheel file
  • JAR file (Java or Scala)
  • PyPI, Maven, or CRAN packages

Note

Whether or not a library is supported depends on the cluster configuration for the job and the library source. For complete library support information, see Libraries.

Python wheel file

To add a Python wheel file to a job task, in libraries specify a whl mapping for each library to be installed. You can install a wheel file from workspace files, Unity Catalog volumes, cloud object storage, or a local file path.

Important

Libraries can be installed from DBFS when using Databricks Runtime 14.3 LTS and below. However, any workspace user can modify library files stored in DBFS. To improve the security of libraries in a Azure Databricks workspace, storing library files in the DBFS root is deprecated and disabled by default in Databricks Runtime 15.1 and above. See Storing libraries in DBFS root is deprecated and disabled by default.

Instead, Databricks recommends uploading all libraries, including Python libraries, JAR files, and Spark connectors, to workspace files or Unity Catalog volumes, or using library package repositories. If your workload does not support these patterns, you can also use libraries stored in cloud object storage.

The following example shows how to install three Python wheel files for a job task.

  • The first Python wheel file was either previously uploaded to the Azure Databricks workspace or added as an include item in the sync mapping, and is in the same local folder as the bundle configuration file.
  • The second Python wheel file is in the specified workspace files location in the Azure Databricks workspace.
  • The third Python wheel file was previously uploaded to the volume named my-volume in the Azure Databricks workspace.
resources:
  jobs:
    my_job:
      # ...
      tasks:
        - task_key: my_task
          # ...
          libraries:
            - whl: ./my-wheel-0.1.0.whl
            - whl: /Workspace/Shared/Libraries/my-wheel-0.0.1-py3-none-any.whl
            - whl: /Volumes/main/default/my-volume/my-wheel-0.1.0.whl

JAR file

To add a JAR file to a job task, in libraries specify a jar mapping for each library to be installed. You can install a JAR from workspace files, Unity Catalog volumes, cloud object storage, or a local file path.

Important

Libraries can be installed from DBFS when using Databricks Runtime 14.3 LTS and below. However, any workspace user can modify library files stored in DBFS. To improve the security of libraries in a Azure Databricks workspace, storing library files in the DBFS root is deprecated and disabled by default in Databricks Runtime 15.1 and above. See Storing libraries in DBFS root is deprecated and disabled by default.

Instead, Databricks recommends uploading all libraries, including Python libraries, JAR files, and Spark connectors, to workspace files or Unity Catalog volumes, or using library package repositories. If your workload does not support these patterns, you can also use libraries stored in cloud object storage.

The following example shows how to install a JAR file that was previously uploaded to the volume named my-volume in the Azure Databricks workspace.

resources:
  jobs:
    my_job:
      # ...
      tasks:
        - task_key: my_task
          # ...
          libraries:
            - jar: /Volumes/main/default/my-volume/my-java-library-1.0.jar

PyPI package

To add a PyPI package to a job task definition, in libraries, specify a pypi mapping for each PyPI package to be installed. For each mapping, specify the following:

  • For package, specify the name of the PyPI package to install. An optional exact version specification is also supported.
  • Optionally, for repo, specify the repository where the PyPI package can be found. If not specified, the default pip index is used (https://pypi.org/simple/).

The following example shows how to install two PyPI packages.

  • The first PyPI package uses the specified package version and the default pip index.
  • The second PyPI package uses the specified package version and the explicitly specified pip index.
resources:
  jobs:
    my_job:
      # ...
      tasks:
        - task_key: my_task
          # ...
          libraries:
            - pypi:
                package: wheel==0.41.2
            - pypi:
                package: numpy==1.25.2
                repo: https://pypi.org/simple/

Maven package

To add a Maven package to a job task definition, in libraries, specify a maven mapping for each Maven package to be installed. For each mapping, specify the following:

  • For coordinates, specify the Gradle-style Maven coordinates for the package.
  • Optionally, for repo, specify the Maven repo to install the Maven package from. If omitted, both the Maven Central Repository and the Spark Packages Repository are searched.
  • Optionally, for exclusions, specify any dependencies to explicitly exclude. See Maven dependency exclusions.

The following example shows how to install two Maven packages.

  • The first Maven package uses the specified package coordinates and searches for this package in both the Maven Central Repository and the Spark Packages Repository.
  • The second Maven package uses the specified package coordinates, searches for this package only in the Maven Central Repository, and does not include any of this package’s dependencies that match the specified pattern.
resources:
  jobs:
    my_job:
      # ...
      tasks:
        - task_key: my_task
          # ...
          libraries:
            - maven:
                coordinates: com.databricks:databricks-sdk-java:0.8.1
            - maven:
                coordinates: com.databricks:databricks-dbutils-scala_2.13:0.1.4
                repo: https://mvnrepository.com/
                exclusions:
                  - org.scala-lang:scala-library:2.13.0-RC*