Install Databricks Connect for Python
Note
This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.
This article describes how to install Databricks Connect for Python. See What is Databricks Connect?. For the Scala version of this article, see Install Databricks Connect for Scala.
Requirements
To install Databricks Connect for Python, the following requirements must be met:
If you are connecting to serverless compute, your workspace must meet the requirements for serverless compute.
Note
Serverless compute is supported in Databricks Connect version 15.1 and above. In addition, Databricks Connect versions at or lower than the Databricks Runtime release on serverless are fully compatible. See Release notes. To verify if the Databricks Connect version is compatible with serverless compute, see Validate the connection to Databricks.
If you are connecting to a cluster, your target cluster must meet the cluster configuration requirements, which includes Databricks Runtime version requirements.
You must have Python 3 installed on your development machine, and the minor version of Python installed on your development machine must meet the version requirements in the table below.
Compute type Databricks Connect version Compatible Python version Serverless 15.1 and above 3.11 Cluster 15.1 and above 3.11 Cluster 13.3 LTS to 14.3 LTS 3.10 If you want to use PySpark UDFs, your development machine’s installed minor version of Python must match the minor version of Python that is included with the Databricks Runtime installed on the cluster or serverless compute. To find the minor Python version of your cluster, refer to the System environment section of the Databricks Runtime release notes for your cluster or serverless compute. See Databricks Runtime release notes versions and compatibility and Serverless compute release notes.
Activate a Python virtual environment
Databricks strongly recommends that you have a Python virtual environment activated for each Python version that you use with Databricks Connect. Python virtual environments help to make sure that you are using the correct versions of Python and Databricks Connect together. For more information about these tools and how to activate them, see venv or Poetry.
Install the Databricks Connect client
This section describes how to install the Databricks Connect client with venv or Poetry.
Note
If you already have the Databricks extension for Visual Studio Code installed, you do not need to follow these setup instructions, because the Databricks extension for Visual Studio Code already has built-in support for Databricks Connect for Databricks Runtime 13.3 LTS and above. Skip to Debug code using Databricks Connect for the Databricks extension for Visual Studio Code.
Install the Databricks Connect client with venv
With your virtual environment activated, uninstall PySpark, if it is already installed, by running the
uninstall
command. This is required because thedatabricks-connect
package conflicts with PySpark. For details, see Conflicting PySpark installations. To check whether PySpark is already installed, run theshow
command.# Is PySpark already installed? pip3 show pyspark # Uninstall PySpark pip3 uninstall pyspark
With your virtual environment still activated, install the Databricks Connect client by running the
install
command. Use the--upgrade
option to upgrade any existing client installation to the specified version.pip3 install --upgrade "databricks-connect==15.4.*" # Or X.Y.* to match your cluster version.
Note
Databricks recommends that you append the “dot-asterisk” notation to specify
databricks-connect==X.Y.*
instead ofdatabricks-connect=X.Y
, to make sure that the most recent package is installed. While this is not a requirement, it helps make sure that you can use the latest supported features for that cluster.
Install the Databricks Connect client with Poetry
With your virtual environment activated, uninstall PySpark, if it is already installed, by running the
remove
command. This is required because thedatabricks-connect
package conflicts with PySpark. For details, see Conflicting PySpark installations. To check whether PySpark is already installed, run theshow
command.# Is PySpark already installed? poetry show pyspark # Uninstall PySpark poetry remove pyspark
With your virtual environment still activated, install the Databricks Connect client by running the
add
command.poetry add databricks-connect@~15.4 # Or X.Y to match your cluster version.
Note
Databricks recommends that you use the “at-tilde” notation to specify
databricks-connect@~15.4
instead ofdatabricks-connect==15.4
, to make sure that the most recent package is installed. While this is not a requirement, it helps make sure that you can use the latest supported features for that cluster.
Next steps
After you have installed Databricks Connect, you need to configure a connection to Databricks. See Compute configuration for Databricks Connect.