Jaa


Set up and use Azure managed identities authentication for Azure Databricks automation

Follow this article’s steps to authenticate managed identities for Azure resources (formerly Managed Service Identities (MSI)) to automate your Azure Databricks accounts and workspaces.

Azure automatically manages identities in Microsoft Entra ID for applications to use when connecting to resources that support Microsoft Entra ID authentication. These resources include Azure Databricks accounts and workspaces. Azure managed identities authentication for Azure Databricks uses managed identities to obtain Microsoft Entra ID tokens without having to manage any credentials.

Note

Managed identities for Azure resources are different than Microsoft Entra ID managed service principals, which Azure Databricks also supports for authentication. To learn how to use Microsoft Entra ID managed service principals for Azure Databricks authentication instead of managed identities for Azure resources, see:

This article demonstrates how to set up and use Azure managed identities authentication as follows:

  • Create a user-assigned managed identity. Azure supports system-assigned and user-assigned managed identities. Databricks recommends that you use user-assigned managed identities for Azure managed identities authentication with Azure Databricks.
  • Assign your managed identity to your Azure Databricks account and to an Azure Databricks workspace in that account.
  • Create and log in to an Azure virtual machine (Azure VM). You must use a resource that supports managed identities such as an Azure VM, with a managed identity assigned to that Azure VM, to programmatically call Azure Databricks account and workspace operations.
  • Assign your user-assigned managed identity to your Azure VM.
  • Install the Databricks CLI on your Azure VM and then configure the Databricks CLI for Azure managed identities authentication for Azure Databricks by using the assigned managed identity.
  • Run commands with the Databricks CLI to automate your Azure Databricks account and workspace by using Azure managed identities authentication for Azure Databricks with the assigned managed identity.

Requirements

Step 1: Create a user-assigned managed identity

In this step, you create a user-assigned managed identity for Azure resources. Azure supports system-assigned and user-assigned managed identities. Databricks recommends that you use user-assigned managed identities for Azure managed identities authentication with Azure Databricks. See also Manage user-assigned managed identities.

  1. Sign in to the Azure portal.

    Note

    The portal to use is different depending on whether you use the Azure public cloud or a national or sovereign cloud. For more information, see National clouds.

  2. If you have access to multiple tenants, subscriptions, or directories, click the gear (Settings) icon in the top menu to switch to the directory in which you want to create the managed identity.

  3. In Search resources, services, and docs, search for and select the Azure service named Managed Identities.

  4. Click + Create.

  5. On the Basics tab, for Resource group, choose an existing resource group to add this managed identity to, or click Create new to create a new resource group to add this managed identity to. For information about resource groups, see Manage Azure resource groups by using the Azure portal.

  6. For Region, choose the appropriate region to add this managed identity to. For information about regions, see Choose the right Azure region for you.

  7. For Name, enter some unique name for this managed identity that’s easy for you to remember.

  8. On the Review + create tab, click Create.

  9. Click Go to resource.

  10. Copy the value of the Client ID field, as you will need it later for Steps 2, 3, and 8:

    If you forget to copy this value, you can return to your managed identity’s overview page later to get this value. To return to your managed identity’s overview page, in Search resources, services, and docs, search for and select your managed identity’s name. Then, on the managed identity’s settings page, click Overview in the sidebar.

Step 2: Assign your managed identity to your Azure Databricks account

In this step, you give your managed identity access to your Azure Databricks account. If you do not want to give your managed identity access to your Azure Databricks account, skip ahead to Step 3.

  1. In your Azure Databricks workspace, click your username in the top bar and click Manage account.

    Alternatively, go directly to your Azure Databricks account console, at https://accounts.azuredatabricks.net.

  2. Sign in to your Azure Databricks account, if prompted.

  3. On the sidebar, click User management.

  4. Click the Service principals tab.

    Note

    Although this tab is labeled Service principals, this tab works with managed identities as well. Azure Databricks treats managed identities as service principals in your Azure Databricks account.

  5. Click Add service principal.

  6. Enter some unique Name for the service principal that’s easy for you to remember.

  7. For UUID, enter the Client ID value for your managed identity from Step 1.

  8. Click Add. Your managed identity is added as a service principal in your Azure Databricks account.

  9. Assign any account-level permissions that you want the service principal to have:

    1. On the Service principals tab, click the name of your service principal.
    2. On the Roles tab, toggle to enable or disable each target role that you want this service principal to have.
    3. On the Permissions tab, grant access to any Azure Databricks users, service principals, and account group roles that you want to manage and use this service principal. See Manage roles on a service principal.

Step 3: Assign your managed identity to your Azure Databricks workspace

In this step, you give your managed identity access to your Azure Databricks workspace.

If your workspace is enabled for identity federation:

  1. In your Azure Databricks workspace, click your username in the top bar and click Settings.

  2. Click Service principals.

    Note

    Although this tab is labeled Service principals, this tab works with managed identities as well. Azure Databricks treats managed identities as service principals in your Azure Databricks account.

  3. Click Add service principal.

  4. Select your service principal from Step 2 and click Add. Your service principal is added as a service principal in your Azure Databricks workspace.

  5. Assign any workspace-level permissions that you want the service principal to have:

    1. On the Service principals tab, click the name of your service principal.
    2. On the Configurations tab, select or clear to grant or revoke each target status or entitlement that you want this service principal to have.
    3. On the Permissions tab, grant access to any Azure Databricks users, service principals, and account group roles that you want to manage and use this service principal. See Manage roles on a service principal.

Skip ahead to Step 4.

If your workspace is not enabled for identity federation:

  1. In your Azure Databricks workspace, click your username in the top bar and click Settings.

  2. Click Service principals.

    Note

    Although this tab is labelled Service principals, this tab works with managed identities as well. Azure Databricks treats managed identities as service principals in your Azure Databricks workspace.

  3. Click Add service principal.

  4. In the Service Principal list, select Add new service principal.

  5. For ApplicationId, enter the Client ID for your managed identity from Step 1.

  6. Enter some Display Name that’s easy for you to remember for the new service principal, and click Add. Your managed identity is added as a service principal in your Azure Databricks workspace.

  7. Assign any workspace-level permissions that you want the service principal to have:

    1. On the Service principals tab, click the name of your service principal.
    2. On the Configurations tab, select or clear to grant or revoke each target status or entitlement that you want this service principal to have.
    3. On the Permissions tab, grant access to any Azure Databricks users, service principals, and account group roles that you want to manage and use this service principal. See Manage roles on a service principal.

Step 4: Get the Azure resource ID for your Azure Databricks workspace

In this step, you get the resource ID that Azure assigns to your Azure Databricks workspace. You will need this Azure resource ID later to help Azure managed identities authentication determine the specific Azure resource that Azure associates with your Azure Databricks workspace.

  1. In your Azure Databricks workspace, click your username in the top bar and click Azure Portal.

  2. On the side pane, in the Settings section, click Properties.

  3. In the Essentials section, copy the Id value, as you will need it later in Step 8. It should look similar to the following:

    /subscriptions/<subscription-id>/resourceGroups/<resource-group-id>/providers/Microsoft.Databricks/workspaces/<workspace-id>
    

Step 5: Create and log in to an Azure VM

In this step, you create and log in to an Azure virtual machine (Azure VM). Azure VMs are one of the resource types that support managed identities. See also Quickstart: Create a Linux virtual machine in the Azure portal.

This Azure VM is intended only for demonstration purposes. This Azure VM uses settings that are not necessarily optimized for your ongoing usage needs. After you are done experimenting with this Azure VM, you should delete it as shown later in Step 11.

  1. In the Azure portal that you signed in to in Step 1, in Search resources, services, and docs, search for and select the Azure service named Virtual machines.

  2. Click + Create > Azure virtual machine.

  3. On the Basics tab, for Resource group, choose an existing resource group to add this Azure VM to, or click Create new to create a new resource group to add this Azure VM to. For information about resource groups, see Manage Azure resource groups by using the Azure portal.

  4. For Virtual machine name, enter some unique name for this Azure VM that’s easy for you to remember.

  5. For Region, choose the appropriate region to add this Azure VM to. For information about regions, see Choose the right Azure region for you.

  6. For Image, choose Ubuntu Server 22.04 LTS - x64 Gen 2.

  7. For Authentication type, select SSH public key.

  8. For Username, enter azureuser.

  9. For SSH public key source, leave the default of Generate new key pair.

  10. For Key pair name, enter myKey.

  11. For Public inbound ports, select Allow selected ports.

  12. For Select inbound ports, select HTTP (80) and SSH (22).

  13. Leave the remaining defaults.

  14. On the Review + create tab, click Create.

  15. Click Download private key and create resource. Your key file is download to your local development machine as myKey.pem. Make a note of where this myKey.pem file is downloaded, as you will need it to log in to the Azure VM later in this Step.

    If you lose your key file, you can return to your Azure VM’s settings page later to get a replacement key file. To return to your Azure VM’s settings page, in Search resources, services, and docs, search for and select your Azure VM’s name. To get a replacement key file from your Azure VM’s settings page, do the following:

    1. In the Help section on the side pane, click Reset password.
    2. Select Add SSH public key.
    3. For Key pair name, enter some unique name.
    4. Click Update.
    5. Click Download + create. Your key file is download with a .pem extension. Make a note of where this .pem file is downloaded, as you will need it to log in to the Azure VM later in this Step.
  16. After the Azure VM is created, click Go to resource.

  17. Copy the value of the Public IP address field, as you will need it to log in to the Azure VM later in this Step.

    If you forget to copy this value, you can return to your Azure VM’s overview page later to get this value. To return to your Azure VM’s overview page, in Search resources, services, and docs, search for and select your Azure VM’s name. Then, on the Azure VM’s settings page, click Overview in the sidebar and look for the Public IP address field.

  18. If your local development machine runs Linux, macOS, or WSL on Windows, check that you have read-only access to the private key you just downloaded. To do this, run the following command from your local development machine’s terminal or command prompt. In this command, replace the following values:

    • Replace </path/to> with the path to your downloaded .pem file.
    • Replacce myKey.pem with the filename of your downloaded .pem file.
    chmod 400 </path/to>/myKey.pem
    
  19. Log in to the Azure VM. To do this, from your local development machine’s terminal or command prompt, run the following command. In this command, replace the following values:

    • Replace </path/to> with the path to your downloaded .pem file.
    • Replacce myKey.pem with the filename of your downloaded .pem file.
    • Replace <public-ip-address> with the value of the Public IP address field that you copied earlier in this Step.
    ssh -i </path/to>/myKey.pem azureuser@<public-ip-address>
    

    For example:

    ssh -i ./myKey.pem azureuser@192.0.2.0
    
  20. If you have never connected to this Azure VM before, you are prompted to verify the the host’s fingerprint. To do this, follow the on-screen prompts. Databricks recommends that you always validate the host’s fingerprint.

  21. The terminal or command prompt changes to azureuser@<your-azure-vm-name>:~$.

  22. To exit the Azure VM at any time, run the command logout or exit. The terminal or command prompt then changes back to normal.

Step 6: Assign your managed identity to your Azure VM

In this step, you associate your managed identity with your Azure VM. This enables Azure to use the managed identity for authentication as needed while the Azure VM is running. See also Assign a user-assigned managed identity to an existing VM.

  1. In the Azure portal that you signed in to in Step 1, on the Azure VM’s settings page, in the Settings section on the side pane, click Identity.

    To return to your Azure VM’s overview page if you closed it earlier, in Search resources, services, and docs, search for and select your Azure VM’s name.

  2. On the User assigned tab, click + Add.

  3. Select your user-assigned managed identity that you created in Step 1, and click Add.

Step 7: Install the Databricks CLI on your Azure VM

In this step, you install the Databricks CLI so that you can use it to run commands that automate your Azure Databricks accounts and workspaces.

Tip

You can also use the Databricks Terraform provider or the Databricks SDK for Go along with Azure managed identities authentication to automate your Azure Databricks accounts and workspaces by running HCL or Go code. See the Databricks SDK for Go and Azure managed identities authentication.

  1. With the terminal or command prompt still open and logged in to your Azure VM from Step 5, install the Databricks CLI by running the following two commands:

    sudo apt install unzip
    curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sudo sh
    
  2. Confirm that the Databricks CLI is installed by running the following command, which prints the installed Databricks CLI version:

    databricks -v
    

Step 8: Configure the Databricks CLI for Azure managed identities authentication

In this step, you set up the Databricks CLI to use Azure managed identities authentication for Azure Databricks with your managed identity’s settings. To do this, you create a file with a default filename in a default location where the Databricks CLI expects to find the authentication settings it needs.

  1. With the terminal or command prompt still open and logged in to your Azure VM from Step 5, use the vi text editor to create and open a file named .databrickscfg for editing in the logged-in user’s home directory, by running the following command:

    vi ~/.databrickscfg
    
  2. Begin editing the .databrickscfg file by pressing the editor key combination Esc followed by i. The command prompt disappears, the vi editor starts, and the word -- INSERT -- appears at the bottom of the editor to indicate that the .databrickscfg file is in editing mode.

  3. Enter the following content. In this content, replace the following values:

    • Replace <account-console-url> with your Azure Databricks account console URL, such as https://accounts.azuredatabricks.net.
    • Replace <account-id> with your Azure Databricks account ID. See Locate your account ID.
    • Replace <azure-managed-identity-application-id> with Client ID value for your managed identity from Step 1.
    • Replace <workspace-url> with your per-workspace URL, for example https://adb-1234567890123456.7.azuredatabricks.net.
    • Replace <azure-workspace-resource-id> with the Azure resource ID from Step 4.
    • You can replace the suggested configuration profile names AZURE_MI_ACCOUNT and AZURE_MI_WORKSPACE with different configuration profile names if desired. These specific names are not required.

    If you do not want to run account-level operations, you can omit the [AZURE_MI_ACCOUNT] section in the following content.

    [AZURE_MI_ACCOUNT]
    host            = <account-console-url>
    account_id      = <account-id>
    azure_client_id = <azure-managed-identity-application-id>
    azure_use_msi   = true
    
    [AZURE_MI_WORKSPACE]
    host                        = <workspace-url>
    azure_workspace_resource_id = <azure-workspace-resource-id>
    azure_client_id             = <azure-managed-identity-application-id>
    azure_use_msi               = true
    
  4. Save your edits to the .databrickscfg file by pressing the editor key combination Esc, followed by entering :wq, followed by Enter. The vi editor closes and the command prompt reappears.

Step 9: Run an account-level command

In this step, you use the Databricks CLI to run a command that automates the Azure Databricks account that was configured in Step 8.

If you do not want to run account-level commands, skip ahead to Step 10.

With the terminal or command prompt still open and logged in to your Azure VM from Step 5, run the following command to list all available users in your Azure Databricks account. If you renamed AZURE_MI_ACCOUNT in Step 8, be sure to replace it here.

databricks account users list -p AZURE_MI_ACCOUNT

Step 10: Run a workspace-level command

In this step, you use the Databricks CLI to run a command that automates the Azure Databricks workspace that was configured in Step 8.

With the terminal or command prompt still open and logged in to your Azure VM from Step 5, run the following command to list all available users in your Azure Databricks workspace. If you renamed AZURE_MI_WORKSPACE in Step 8, be sure to replace it here.

databricks users list -p AZURE_MI_WORKSPACE

Step 11: Clean up

This step is optional. It deletes the Azure VM to save costs, and it deletes the managed identity if you no longer want to keep using it. This Step also removes the deleted managed identity from your Azure Databricks account and workspace for completeness.

Delete the Azure VM

  1. If the terminal or command prompt is still open and logged in to your Azure VM from Step 5, exit the Azure VM by running the command logout or exit. The terminal or command prompt then changes back to normal.
  2. In the Azure portal that you signed in to in Step 1, return to your Azure VM’s overview page if you closed it earlier. To do this, in Search resources, services, and docs, search for and select your Azure VM’s name.
  3. On your Azure VM’s overview page’s menu bar, click Delete.
  4. Select the I have read and understand checkbox, and click Delete.

Delete the managed identity from your Azure subscription

  1. In the Azure portal that you signed in to in Step 1, return to your managed identity’s overview page if you closed it earlier. To do this, in Search resources, services, and docs, search for and select your managed identity’s name.
  2. On your managed identity’s overview page’s menu bar, click Delete.
  3. Select the I have read and understand checkbox, and click Delete.

Remove the managed identity from your Azure Databricks account

  1. In your Azure Databricks account, on the sidebar, click User management.
  2. Click the Service principals tab.
  3. Click the name of the service principal that you added in Step 2. If the service principal’s name is not visible, use Filter service principals to find it.
  4. Click the ellipses button, and then click Delete.
  5. Click Confirm delete.

Remove the managed identity from your Azure Databricks workspace

  1. In your Azure Databricks workspace, click your username in the top bar and click Settings.
  2. Click the Service principals tab.
  3. Click the name of the service principal that you added in Step 3. If the service principal’s name is not visible, use Filter service principals to find it.
  4. Click Delete.
  5. In the confirmation dialog, click Delete.