Databricks Connect 的計算組態

發行項
02/07/2025

注意

本文涵蓋 Databricks Runtime 13.3 LTS 和更新版本適用的 Databricks Connect。

在本文中，您會設定屬性來建立 Databricks Connect 與 Azure Databricks 叢集或無伺服器計算之間的連線。此資訊適用於 Databricks Connect 的 Python 和 Scala 版本，除非另有說明。

Databricks Connect 可讓您將熱門的 IDE 連線到 Azure Databricks 叢集，例如 Visual Studio Code、PyCharm、RStudio Desktop、IntelliJ IDEA、Notebook 伺服器和其他自定義應用程式。請參閱什麼是 Databricks Connect？。

需求

若要設定 Databricks 計算的連線，您必須具備：

已安裝 Databricks Connect。如需 Databricks Connect 特定語言版本的安裝需求和步驟，請參閱：
- 安裝適用於 Python 的 Databricks Connect。
- 安裝 Databricks Connect for R。
- 安裝 Databricks Connect for Scala。
已在 Azure Databricks 帳戶和工作區中啟用 Unity 目錄。請參閱設定與管理 Unity 目錄及啟用 Unity 目錄工作區。
具有 Databricks Runtime 13.3 LTS 或更新版本之 Azure Databricks 叢集。
叢集的 Databricks Runtime 版本必須等於或更新版本 Databricks Connect 套件版本。 Databricks 建議您使用與 Databricks Runtime 版本相符的最新 Databricks Connect 套件。若要使用更新版本的 Databricks Runtime 中可用的功能，您必須升級 Databricks Connect 套件。如需可用的 Databricks Connect 版本清單，請參閱 Databricks Connect 版本資訊。如需 Databricks Runtime 版本版本資訊，請參閱 Databricks Runtime 版本資訊和相容性。
叢集必須使用指派或共用的叢集存取模式。請參閱存取模式。

設定

在開始之前，您需要下列項目：

如果您要連線到叢集，叢集的標識符。您可以從 URL 擷取叢集標識碼。請參閱叢集 URL 和標識碼。
Azure Databricks 工作區實例名稱。這是計算的伺服器 主機名 值。如需取得 Azure Databricks 計算資源的連線詳細資訊，請參閱。
您想要使用之 Databricks 驗證類型所需的任何其他屬性。

注意

適用於 Python 0.19.0 和更新版本之 Databricks SDK 支援 OAuth 使用者對電腦（U2M）驗證。請將您的程式碼專案中已安裝的適用於 Python 的 Databricks SDK 更新至 0.19.0 或更新版本，以使用 OAuth U2M 驗證。請參閱開始使用適用於 Python 的 Databricks SDK。

針對 OAuth U2M 驗證，您必須先使用 Databricks CLI 進行驗證，才能執行 Python 程式代碼。請參閱教學課程。
適用於 Python 0.18.0 和更新版本之 Databricks SDK 支援 OAuth 機器對機器（M2M）驗證 OAuth 機器對機器（M2M）驗證。將適用於 Python 的 Databricks SDK 已安裝版本更新為 0.18.0 或更新版本，以使用 OAuth M2M 驗證。請參閱開始使用適用於 Python 的 Databricks SDK。
適用於 Python 的 Databricks SDK 尚未實作 Azure 受控識別驗證。

設定叢集的連線

有多種方式可設定叢集的連線。 Databricks Connect 會依下列順序搜尋組態屬性，並使用它找到的第一個組態。如需進階組態資訊，請參閱適用於 Python 的 Databricks Connect 進階使用方式。

DatabricksSession 類別的 remote（）方法。
Databricks 組態配置檔
DATABRICKS_CONFIG_PROFILE環境變數
每個組態屬性的環境變數
名為 DEFAULT 的 Databricks 組態配置檔

類別 `DatabricksSession` 的 `remote()` 方法

針對此選項，僅適用於 Azure Databricks 個人存取令牌驗證、指定工作區實例名稱、Azure Databricks 個人存取令牌，以及叢集的標識符。

您可以透過數種方式初始化類別 DatabricksSession ：

在 host中設定 token、cluster_id和 DatabricksSession.builder.remote() 字段。
使用 Databricks SDK 的 Config 類別。
指定 Databricks 組態配置檔以及 cluster_id 欄位。

Databricks 建議透過環境變數或組態檔設定屬性，而不是在您的程式代碼中指定這些連接屬性，如本節所述。下列程式代碼範例假設您提供建議 retrieve_* 函式的一些實作，以從使用者或從某些其他組態存放區取得必要的屬性，例如 Azure KeyVault。

下列每個方法的程式代碼如下：

Python

# Set the host, token, and cluster_id fields in DatabricksSession.builder.remote.
# If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
# cluster's ID, you do not also need to set the cluster_id field here.
from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.remote(
host       = f"https://{retrieve_workspace_instance_name()}",
token      = retrieve_token(),
cluster_id = retrieve_cluster_id()
).getOrCreate()

Scala

// Set the host, token, and clusterId fields in DatabricksSession.builder.
// If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
// cluster's ID, you do not also need to set the clusterId field here.
import com.databricks.connect.DatabricksSession

val spark = DatabricksSession.builder()
    .host(retrieveWorkspaceInstanceName())
    .token(retrieveToken())
    .clusterId(retrieveClusterId())
    .getOrCreate()

Python

# Use the Databricks SDK's Config class.
# If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
# cluster's ID, you do not also need to set the cluster_id field here.
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config

config = Config(
host       = f"https://{retrieve_workspace_instance_name()}",
token      = retrieve_token(),
cluster_id = retrieve_cluster_id()
)

spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()

Scala

// Use the Databricks SDK's Config class.
// If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
// cluster's ID, you do not also need to set the clusterId field here.
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig

val config = new DatabricksConfig()
    .setHost(retrieveWorkspaceInstanceName())
    .setToken(retrieveToken())
val spark = DatabricksSession.builder()
    .sdkConfig(config)
    .clusterId(retrieveClusterId())
    .getOrCreate()

Python

# Specify a Databricks configuration profile along with the `cluster_id` field.
# If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
# cluster's ID, you do not also need to set the cluster_id field here.
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config

config = Config(
profile    = "<profile-name>",
cluster_id = retrieve_cluster_id()
)

spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()

Scala

// Specify a Databricks configuration profile along with the clusterId field.
// If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
// cluster's ID, you do not also need to set the clusterId field here.
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig

val config = new DatabricksConfig()
    .setProfile("<profile-name>")
val spark = DatabricksSession.builder()
    .sdkConfig(config)
    .clusterId(retrieveClusterId())
    .getOrCreate()

Databricks 組態配置檔

針對此選項，請建立或識別包含字段的 Azure Databricks cluster_id，以及您想要使用之 Databricks 驗證類型所需的任何其他欄位。

每個驗證類型的必要組態設定檔欄位如下所示：

針對 Azure Databricks 個人存取權杖驗證： host 和 token。
針對 OAuth 機器對機器（M2M）驗證（支援的位置）：host、client_id和 client_secret。
針對 OAuth 使用者對機器 (U2M) 驗證（如果支援）：host。
針對Microsoft Entra ID（先前稱為 Azure Active Directory）服務主體驗證：host、azure_tenant_id、、 azure_client_idazure_client_secret和可能azure_workspace_resource_id。
針對 Azure CLI 驗證： host。
針對 Azure 受控識別驗證（其中支援）：host、azure_use_msi、azure_client_id，以及可能 azure_workspace_resource_id。

然後透過組態類別設定此組態配置檔的名稱。

您可以透過幾種方式指定 cluster_id ：

在 cluster_id 組態配置檔中包含字段，然後只指定組態配置檔的名稱。
指定組態配置檔名稱以及 cluster_id 欄位。

如果您已經使用叢集識別碼來設定 DATABRICKS_CLUSTER_ID 環境變數，則不需要指定 cluster_id。

下列每個方法的程式代碼如下：

Python

# Include the cluster_id field in your configuration profile, and then
# just specify the configuration profile's name:
from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.profile("<profile-name>").getOrCreate()

Scala

// Include the cluster_id field in your configuration profile, and then
// just specify the configuration profile's name:
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig

val config = new DatabricksConfig()
    .setProfile("<profile-name>")
    val spark = DatabricksSession.builder()
    .sdkConfig(config)
    .getOrCreate()

Python

# Specify the configuration profile name along with the cluster_id field.
# In this example, retrieve_cluster_id() assumes some custom implementation that
# you provide to get the cluster ID from the user or from some other
# configuration store:
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config

config = Config(
profile    = "<profile-name>",
cluster_id = retrieve_cluster_id()
)

spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()

Scala

// Specify a Databricks configuration profile along with the clusterId field.
// If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
// cluster's ID, you do not also need to set the clusterId field here.
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig

val config = new DatabricksConfig()
    .setProfile("<profile-name>")
val spark = DatabricksSession.builder()
    .sdkConfig(config)
    .clusterId(retrieveClusterId())
    .getOrCreate()

`DATABRICKS_CONFIG_PROFILE`環境變數

針對此選項，請建立或識別包含字段的 Azure Databricks cluster_id，以及您想要使用之 Databricks 驗證類型所需的任何其他欄位。

如果您已經使用叢集識別碼來設定 DATABRICKS_CLUSTER_ID 環境變數，則不需要指定 cluster_id。

每個驗證類型的必要組態設定檔欄位如下所示：

針對 Azure Databricks 個人存取權杖驗證： host 和 token。
針對 OAuth 機器對機器（M2M）驗證（支援的位置）：host、client_id和 client_secret。
針對 OAuth 使用者對機器 (U2M) 驗證（如果支援）：host。
針對Microsoft Entra ID（先前稱為 Azure Active Directory）服務主體驗證：host、azure_tenant_id、、 azure_client_idazure_client_secret和可能azure_workspace_resource_id。
針對 Azure CLI 驗證： host。
針對 Azure 受控識別驗證（其中支援）：host、azure_use_msi、azure_client_id，以及可能 azure_workspace_resource_id。

將 DATABRICKS_CONFIG_PROFILE 環境變數設定為此組態配置檔的名稱。然後初始化 DatabricksSession 類別：

Python

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.getOrCreate()

Scala

import com.databricks.connect.DatabricksSession

val spark = DatabricksSession.builder().getOrCreate()

每個組態屬性的環境變數

針對此選項，請設定 DATABRICKS_CLUSTER_ID 環境變數，以及您想要使用的 Databricks 驗證類型所需的任何其他環境變數。

每個驗證類型的必要環境變數如下：

針對 Azure Databricks 個人存取權杖驗證： DATABRICKS_HOST 和 DATABRICKS_TOKEN。
針對 OAuth 機器對機器（M2M）驗證（支援的位置）：DATABRICKS_HOST、DATABRICKS_CLIENT_ID和 DATABRICKS_CLIENT_SECRET。
針對 OAuth 使用者對機器 (U2M) 驗證（如果支援）：DATABRICKS_HOST。
針對Microsoft Entra ID（先前稱為 Azure Active Directory）服務主體驗證：DATABRICKS_HOST、ARM_TENANT_ID、、 ARM_CLIENT_IDARM_CLIENT_SECRET和可能DATABRICKS_AZURE_RESOURCE_ID。
針對 Azure CLI 驗證： DATABRICKS_HOST。
針對 Azure 受控識別驗證（其中支援）：DATABRICKS_HOST、ARM_USE_MSI、ARM_CLIENT_ID，以及可能 DATABRICKS_AZURE_RESOURCE_ID。

然後初始化 DatabricksSession 類別：

Python

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.getOrCreate()

Scala

import com.databricks.connect.DatabricksSession

val spark = DatabricksSession.builder().getOrCreate()

名為的 Databricks 組態配置檔 `DEFAULT`

針對此選項，請建立或識別包含字段的 Azure Databricks cluster_id，以及您想要使用之 Databricks 驗證類型所需的任何其他欄位。

如果您已經使用叢集識別碼來設定 DATABRICKS_CLUSTER_ID 環境變數，則不需要指定 cluster_id。

每個驗證類型的必要組態設定檔欄位如下所示：

針對 Azure Databricks 個人存取權杖驗證： host 和 token。
針對 OAuth 機器對機器（M2M）驗證（支援的位置）：host、client_id和 client_secret。
針對 OAuth 使用者對機器 (U2M) 驗證（如果支援）：host。
針對Microsoft Entra ID（先前稱為 Azure Active Directory）服務主體驗證：host、azure_tenant_id、、 azure_client_idazure_client_secret和可能azure_workspace_resource_id。
針對 Azure CLI 驗證： host。
針對 Azure 受控識別驗證（其中支援）：host、azure_use_msi、azure_client_id，以及可能 azure_workspace_resource_id。

將此組態設定檔 DEFAULT命名為。

然後初始化 DatabricksSession 類別：

Python

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.getOrCreate()

Scala

import com.databricks.connect.DatabricksSession

val spark = DatabricksSession.builder().getOrCreate()

設定與無伺服器計算的連線

重要

這項功能處於公開預覽狀態。

Databricks Connect for Python 支援連線到無伺服器計算。若要使用這項功能，必須符合連線到無伺服器的版本需求。請參閱需求。

重要

這項功能有下列限制：

此功能僅支援 Databricks Connect for Python。
Python 和 Databricks Connect 版本必須相容。請參閱版本支援矩陣。
適用於 Python 的所有 Databricks Connect 限制
所有無伺服器計算限制
只有包含在無伺服器計算環境的 Python 相依性，才能用於 UDF。請參閱無伺服器環境版本。無法安裝其他相依性。
不支援具有自定義模組的UDF。

您可以透過下列其中一種方式設定與無伺服器計算的連線：

將本機環境變數 DATABRICKS_SERVERLESS_COMPUTE_ID 設定為 auto。如果設定此環境變數，Databricks Connect 會忽略 cluster_id。
在本機 Databricks 組態設定檔中，設定 serverless_compute_id = auto，然後從您的程式代碼參考該配置檔。
```
[DEFAULT]
host = https://my-workspace.cloud.databricks.com/
serverless_compute_id = auto
token = dapi123...
```
或使用下列其中一個選項：

from databricks.connect import DatabricksSession as SparkSession

spark = DatabricksSession.builder.serverless(True).getOrCreate()

from databricks.connect import DatabricksSession as SparkSession

spark = DatabricksSession.builder.remote(serverless=True).getOrCreate()

注意

無伺服器計算會話會在閑置 10 分鐘後逾時。在此之後，應該使用 getOrCreate() 來建立新的Spark作業階段，以連線到無伺服器計算。

驗證 Databricks 的連線

若要驗證您的環境、預設憑證以及計算資源的連線已正確設定針對 Databricks Connect，請執行 databricks-connect test 命令，當偵測到設定中的任何不相容之處，此命令將失敗並返回非零退出碼，並提供相應的錯誤訊息。

databricks-connect test

在 Databricks Connect 14.3 和更新版本中，您也可以使用 validateSession()來驗證您的環境：

DatabricksSession.builder.validateSession(True).getOrCreate()

停用 Databricks Connect

Databricks Connect （和基礎 Spark Connect）服務可以在任何指定的叢集上停用。

若要停用 Databricks Connect 服務，請在叢集上設定下列 Spark 組態。

spark.databricks.service.server.enabled false

共用方式為

Databricks Connect 的計算組態

需求

設定

設定叢集的連線

類別 `DatabricksSession` 的 `remote()` 方法

Python

Scala

Python

Scala

Python

Scala

Databricks 組態配置檔

Python

Scala

Python

Scala

`DATABRICKS_CONFIG_PROFILE`環境變數

Python

Scala

每個組態屬性的環境變數

Python

Scala

名為的 Databricks 組態配置檔 `DEFAULT`

Python

Scala

設定與無伺服器計算的連線

驗證 Databricks 的連線

停用 Databricks Connect

意見反應

其他資源

共用方式為

Databricks Connect 的計算組態

需求

設定

設定叢集的連線

類別 DatabricksSession 的 remote() 方法

Python

Scala

Python

Scala

Python

Scala

Databricks 組態配置檔

Python

Scala

Python

Scala

DATABRICKS_CONFIG_PROFILE環境變數

Python

Scala

每個組態屬性的環境變數

Python

Scala

名為的 Databricks 組態配置檔 DEFAULT

Python

Scala

設定與無伺服器計算的連線

驗證 Databricks 的連線

停用 Databricks Connect

意見反應

其他資源

類別 `DatabricksSession` 的 `remote()` 方法

`DATABRICKS_CONFIG_PROFILE`環境變數

名為的 Databricks 組態配置檔 `DEFAULT`