Κοινή χρήση μέσω


Set Spark configuration properties on Azure Databricks

You can set Spark configuration properties (Spark confs) to customize settings in your compute environment.

Databricks generally recommends against configuring most Spark properties. Especially when migrating from open-source Apache Spark or upgrading Databricks Runtime versions, legacy Spark configurations can override new default behaviors that optimize workloads.

For many behaviors controlled by Spark properties, Azure Databricks also provides options to either enable behavior at a table level or to configure custom behavior as part of a write operation. For example, schema evolution was previously controlled by a Spark property, but now has coverage in SQL, Python, and Scala. See Schema evolution syntax for merge.

Configure Spark properties for notebooks and jobs

You can set Spark properties for notebooks and jobs. The scope of the configuration depends on how you set it.

Properties configured: Applies to:
Using compute configuration All notebooks and jobs run with the compute resource.
Within a notebook Only the SparkSession for the current notebook.

For instructions on configuring Spark properties at the compute level, see Spark configuration.

To set a Spark property in a notebook, use the following syntax:

SQL

SET spark.sql.ansi.enabled = true

Python

spark.conf.set("spark.sql.ansi.enabled", "true")

Scala

spark.conf.set("spark.sql.ansi.enabled", "true")

Configure Spark properties in Databricks SQL

Databricks SQL allows admins to configure Spark properties for data access in the workspace settings menu. See Enable data access configuration

Other than data access configurations, Databricks SQL only allows a handful of Spark confs, which have been aliased to shorter names for simplicity. See Configuration parameters.

For most supported SQL configurations, you can override the global behavior in your current session. The following example turns off ANSI mode:

SET ANSI_MODE = false

Configure Spark properties for Delta Live Tables pipelines

Delta Live Tables allows you to configure Spark properties for a pipeline, for one compute resource configured for a pipeline, or for individual flows, materialized views, or streaming tables.

You can set pipeline and compute Spark properties using the UI or JSON. See Configure a Delta Live Tables pipeline.

Use the spark_conf option in DLT decorator functions to configure Spark properties for flows, views, or tables. See Python Delta Live Tables properties.

Configure Spark properties for serverless notebooks and jobs

Severless compute does not support setting most Spark properties for notebooks or jobs. The following are the properties you can configure:

  • spark.sql.legacy.timeParserPolicy (Default value is EXCEPTION)
  • spark.sql.session.timeZone (Default value is Etc/UTC)
  • spark.sql.shuffle.partitions (Default value is auto)
  • spark.sql.ansi.enabled (Default value is true)