แชร์ผ่าน


Connect to Azure Blob Storage with WASB (legacy)

Microsoft has deprecated the Windows Azure Storage Blob driver (WASB) for Azure Blob Storage in favor of the Azure Blob Filesystem driver (ABFS); see Connect to Azure Data Lake Storage Gen2 and Blob Storage. ABFS has numerous benefits over WASB; see Azure documentation on ABFS.

This article provides documentation for maintaining code that uses the WASB driver. Databricks recommends using ABFS for all connections to Azure Blob Storage.

Configure WASB credentials in Databricks

The WASB driver allows you to use either a storage account access key or a Shared Access Signature (SAS). (If you are reading data from a public storage account, you do not need to configure credentials).

Databricks recommends using secrets whenever you need to pass credentials in Azure Databricks. Secrets are available to all users with access to the containing secret scope.

You can pass credentials:

  • Scoped to the cluster in the Spark configuration
  • Scoped to the notebook
  • Attached to a mounted directory

Databricks recommends upgrading all your connections to use ABFS to access Azure Blob Storage, which provides similar access patterns as WASB. Use ABFS for the best security and performance when interacting with Azure Blob Storage.

To configure cluster credentials, set Spark configuration properties when you create the cluster. Credentials set at the cluster level are available to all users with access to that cluster.

To configure notebook-scoped credentials, use spark.conf.set(). Credentials passed at the notebook level are available to all users with access to that notebook.

Setting Azure Blob Storage credentials with a storage account access key

A storage account access key grants full access to all containers within a storage account. While this pattern is useful for prototyping, avoid using it in production to reduce risks associated with granting unrestricted access to production data.

spark.conf.set(
  "fs.azure.account.key.<storage-account-name>.blob.core.windows.net",
  "<storage-account-access-key>"
)

You can upgrade account key URIs to use ABFS. For more information, see Connect to Azure Data Lake Storage Gen2 and Blob Storage.

Setting Azure Blob Storage credentials with a Shared Access Signature (SAS)

You can use SAS tokens to configure limited access to a single container in a storage account that expires at a specific time.

spark.conf.set(
  "fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net",
  "<sas-token-for-container>"
)

Access Azure Blob Storage using the DataFrame API

The Apache Spark DataFrame API can use credentials configured at either the notebook or cluster level. All WASB driver URIs specify the container and storage account names. The directory name is optional, and can specify multiple nested directories relative to the container.

wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>

The following code examples show how you can use the DataFrames API and Databricks Utilities (dbutils) reference to interact with a named directory within a container.

df = spark.read.format("parquet").load("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>")

dbutils.fs.ls("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>")

To update ABFS instead of WASB, update your URIs. For more information, see Access Azure storage

Access Azure Blob Storage with SQL

Credentials set in a notebook’s session configuration are not accessible to notebooks running Spark SQL.

After an account access key or a SAS is set up in your cluster configuration, you can use standard Spark SQL queries with Azure Blob Storage:

-- SQL
CREATE DATABASE <db-name>
LOCATION "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/";

To update ABFS instead of WASB, update your URIs; see Access Azure storage

Mount Azure Blob Storage containers to DBFS

You can mount an Azure Blob Storage container or a folder inside a container to DBFS. For Databricks recommendations, see Mounting cloud object storage on Azure Databricks.

Important

  • Azure Blob storage supports three blob types: block, append, and page. You can only mount block blobs to DBFS.
  • All users have read and write access to the objects in Blob storage containers mounted to DBFS.
  • After a mount point is created through a cluster, users of that cluster can immediately access the mount point. To use the mount point in another running cluster, you must run dbutils.fs.refreshMounts() on that running cluster to make the newly created mount point available.

DBFS uses the credential that you provide when you create the mount point to access the mounted Blob storage container. If a Blob storage container is mounted using a storage account access key, DBFS uses temporary SAS tokens derived from the storage account key when it accesses this mount point.

Mount an Azure Blob storage container

Databricks recommends using ABFS instead of WASB. For more information about mounting with ABFS, see: Mount ADLS Gen2 or Blob Storage with ABFS.

  1. To mount a Blob storage container or a folder inside a container, use the following command:

    Python

    dbutils.fs.mount(
      source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
      mount_point = "/mnt/<mount-name>",
      extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})
    

    Scala

    dbutils.fs.mount(
      source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>",
      mountPoint = "/mnt/<mount-name>",
      extraConfigs = Map("<conf-key>" -> dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")))
    

    where

    • <storage-account-name> is the name of your Azure Blob storage account.
    • <container-name> is the name of a container in your Azure Blob storage account.
    • <mount-name> is a DBFS path representing where the Blob storage container or a folder inside the container (specified in source) will be mounted in DBFS.
    • <conf-key> can be either fs.azure.account.key.<storage-account-name>.blob.core.windows.net or fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net
    • dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>") gets the key that has been stored as a secret in a secret scope.
  2. Access files in your container as if they were local files, for example:

    Python

    # python
    df = spark.read.format("text").load("/mnt/<mount-name>/...")
    df = spark.read.format("text").load("dbfs:/<mount-name>/...")
    

    Scala

    // scala
    val df = spark.read.format("text").load("/mnt/<mount-name>/...")
    val df = spark.read.format("text").load("dbfs:/<mount-name>/...")
    

    SQL

    -- SQL
    CREATE DATABASE <db-name>
    LOCATION "/mnt/<mount-name>"