Κοινή χρήση μέσω


Enable firewall support for your workspace storage account

Each Azure Databricks workspace has an associated Azure storage account in a managed resource group known as the workspace storage account. The workspace storage account includes workspace system data (job output, system settings, and logs), DBFS root, and in some cases a Unity Catalog workspace catalog. This article describes how to limit access to your workspace storage account from only authorized resources and networks using an ARM (Azure Resource Manager) template.

What is firewall support for your workspace storage account?

By default, the Azure storage account for your workspace storage account accepts authenticated connections from all networks. You can limit this access by enabling firewall support for your workspace storage account. This ensures that public network access is disallowed and the workspace storage account is not accessible from unauthorized networks. You might want to configure this if your organization has Azure policies that ensure storage accounts are private.

When firewall support for your workspace storage account is enabled all access from services outside Azure Databricks must use approved private endpoints with Private Link. Azure Databricks creates a access connector to connect to the storage using an Azure managed identity. Access from Azure Databricks serverless compute must either use either service endpoints or private endpoints.

Requirements

  • Your workspace must enable VNet injection for connections from the classic compute plane.

  • Your workspace must enable secure cluster connectivity (No Public IP / NPIP) for connections from the classic compute plane.

  • Your workspace must be on the Premium plan.

  • You must have a separate subnet for the private endpoints for the storage account. This is in addition to the main two subnets for basic Azure Databricks functionality.

    The subnet must be in the same VNet as the workspace or in a separate VNet that the workspace can access. Use the minimum size /28 in CIDR notation.

  • If you are using Cloud Fetch with the Microsoft Fabric Power BI service, you must always use a gateway for private access to the workspace storage account or disable Cloud Fetch. See Step 2 (Recommended): Configure private endpoints for Cloud Fetch client VNets.

You can also use the ARM template in Step 5: Deploy the required ARM template to create a new workspace. In that case, shut down all compute in your workspace before you follow steps 1 through 4.

Step 1: Create private endpoints to the storage account

Create two private endpoints to your workspace storage account from your VNet that you used for VNet injection for the Target sub-resource values: dfs and blob.

  1. In the Azure portal, navigate to your workspace.

  2. Under Essentials, click the name of the Managed Resource Group.

  3. Under Resources, click the resource of type Storage account that has a name that begins with dbstorage.

  4. In the sidebar, click Networking.

  5. Click Private endpoint connections.

  6. Click + Private endpoint.

  7. In the Resource Group name field, set your resource group.

    Important

    The resource group must not be the same as the managed resource group that your workspace storage account is in.

  8. In the Name field, type a unique name for this private endpoint:

    • For the first private endpoint you create for each source network, create a DFS endpoint. Databricks recommends you add the suffix -dfs-pe
    • For the second private endpoint you create for each source network, create a Blob endpoint. Databricks recommends you add the suffix -blob-pe

    The Network Interface Name field auto-populates.

  9. Set the Region field to the region of your workspace.

  10. Click Next.

  11. In Target sub-resource, click the target resource type.

    • For the first private endpoint you create for each source network, set this to dfs.
    • For the second private endpoint you create for each source network, set this to blob.
  12. In the Virtual network field, select a VNet.

  13. In the subnet field, set the subnet to the separate subnet you have for the private endpoints for the storage account.

    This field might auto-populate with the subnet for your private endpoints, but you may have to set it explicitly. You cannot use one of the two workspace subnets that are used for basic Azure Databricks workspace functionality, which are typically called private-subnet and public-subnet.

  14. Click Next. The DNS tab auto-populates to the right subscription and resource group that you previously selected. Change them if needed.

  15. Click Next and add tags if desired.

  16. Click Next and review the fields.

  17. Click Create.

To disable firewall support for your workspace storage account use the same process as above, but set the parameter Storage Account Firewall (storageAccountFirewall in the template) to Disabled and set the Workspace Catalog Enabled field to true or false based on whether your workspace uses a Unity Catalog workspace catalog. See What are catalogs in Azure Databricks?.

Step 2 (Recommended): Configure private endpoints for Cloud Fetch client VNets

Cloud Fetch is a mechanism in ODBC and JDBC for fetching data in parallel through cloud storage to bring the data faster to BI tools. If you are fetching query results larger than 1 MB from BI tools, you are likely using Cloud Fetch.

Note

If you are using the Microsoft Fabric Power BI service with Azure Databricks, you must disable Cloud Fetch as this feature blocks direct access to the workspace storage account from Fabric Power BI. Alternatively, you can configure a virtual network data gateway or on-premises data gateway to allow private access to the workspace storage account. This does not apply to Power BI desktop. To disable Cloud Fetch, use the configuration EnableQueryResultDownload=0.

If you use Cloud Fetch, create private endpoints to the workspace storage account from any VNets of your Cloud Fetch clients.

For each source network for Cloud Fetch clients, create two private endpoints that use two different Target sub-resource values: dfs and blob. Refer to Step 1: Create private endpoints to the storage account for detailed steps. In those steps, for the Virtual network field when creating the private endpoint, be sure that you specify your source VNet for each Cloud Fetch client.

Step 3: Confirm endpoint approvals

After you create all your private endpoints to the storage account, check if they are approved. They might auto-approve or you might need to approve them on the storage account.

  1. Navigate to your workspace in the Azure portal.
  2. Under Essentials, click the name of the Managed Resource Group.
  3. Under Resources, click the resource of type Storage account that has a name that begins with dbstorage.
  4. In the sidebar, click Networking.
  5. Click Private endpoint connections.
  6. Check the Connection state to confirm they say Approved or select them and click Approve.

Step 4: Authorize serverless compute connections

You must authorize serverless compute to connect to your workspace storage account by attaching a network connectivity configuration (NCC) to your workspace. When an NCC is attached to a workspace, the network rules are automatically added to the Azure storage account for the workspace storage account. For instructions, see Configure a firewall for serverless compute access.

If you want to enable access from Azure Databricks serverless compute using private endpoints, contact your Azure Databricks account team.

Step 5: Deploy the required ARM template

This step uses an ARM template to manage the Azure Databricks workspace. You can also update or create your workspace using Terraform. See the azurerm_databricks_workspace Terraform provider.

  1. In Azure portal, search for and select Deploy a custom template.

  2. Click Build your own template in the editor.

  3. Copy the ARM template from ARM template for firewall support for your workspace storage account and paste it in the editor.

  4. Click Save.

  5. Review and edit fields. Use the same parameters that you used to create the workspace, such as subscription, region, workspace name, subnet names, resource ID of the existing VNet.

    For a description for the fields, see ARM template fields.

  6. Click Review and Create, then Create.

Note

The public network access on your workspace storage account is set Enabled from selected virtual networks and IP addresses and not to Disabled in order to support serverless compute resources without requiring private endpoints. The workspace storage account is in a managed resource group and the storage firewall can only be updated when you add a network connectivity configuration (NCC) for serverless connections to your workspace. If you want to enable access from Azure Databricks serverless compute using private endpoints, contact your Azure Databricks account team.