Serverless compute release notes

This article explains the features and behaviors that are currently available and upcoming on serverless compute for notebooks and jobs.

For more information on serverless compute, see Connect to serverless compute.

Azure Databricks periodically releases updates to serverless compute, automatically upgrading the serverless compute runtime to support enhancements and upgrades to the platform. All users get the same updates, rolled out over a short period of time.

Serverless environment versions

Databricks serverless compute for notebooks and jobs features a Spark Connect-based architecture, enabling independent engine upgrades without impacting the application. To ensure application compatibility, serverless workloads use a versioned API, known as the environment version or client, which remains compatible with newer server versions.

The latest environment version continues to receive updates until a new version is released. Users can select from any of the following supported environment versions:

Release notes

This section includes release notes for serverless compute. Release notes are organized by year and week of year. Serverless compute always runs using the most recently released version listed here.

High memory setting available on serverless notebooks (Public Preview)

February 7, 2025

You can now configure a higher memory size for your serverless compute notebook workloads. This setting can be applied to both interactive and scheduled notebook workloads.

Serverless usage with high memory has a higher DBU emission rate than standard memory.

For more information, see Configure high memory for your serverless workloads.

Version 16.1

February 5, 2025

This serverless compute release roughly corresponds to Databricks Runtime 16.0 and Databricks Runtime 16.1.

New features

  • Avro support for recursive schema: You can now use the recursiveFieldMaxDepth option with the from_avro function and the avro data source. This option sets the maximum depth for schema recursion on the Avro data source. See Read and write streaming Avro data.

  • Expanded support for Confluent Schema Registry for Avro: Serverless now supports Avro schema reference with the Confluent Schema Registry. See Authenticate to an external Confluent Schema Registry.

  • Force reclustering on tables with liquid clustering: You can now use the OPTIMIZE FULL syntax to force the reclustering of all records in a table with liquid clustering enabled. See Force reclustering for all records.

  • The Delta APIs for Python now support identity columns: You can now use the Delta APIs for Python to create tables with identity columns. See Use identity columns in Delta Lake.

  • Create liquid clustered tables during streaming writes: You can now use clusterBy to enable liquid clustering when creating new tables with Structured Streaming writes. See Enable liquid clustering.

  • Support for the OPTIMIZE FULL clause: Serverless compute now supports the OPTIMIZE FULL clause. This clause optimizes all records in a table that uses liquid clustering, including data that might have previously been clustered.

  • Support for WITH options specification in INSERT and table-reference: Serverless compute now supports an options specification for table references and table names of an INSERT statement which can be used to control the behavior of data sources.

  • New SQL functions: The following SQL functions are now available on serverless compute:

    • try_url_decode is an error-tolerant version of url_decode.
    • zeroifnull returns 0 if the input expression to the zeroifnull() function is NULL.
    • nullifzero returns NULL if the input is 0 or its input if it is not 0.
    • dayname(expr) returns the three-letter English acronym for the day of the week for the given date.
    • uniform(expr1, expr2 [,seed]) returns a random value with independent and identically distributed values within the specified range of numbers.
    • randstr(length) returns a random string of length alpha-numeric characters.
  • Enable automatic schema evolution when merging data into a Delta table: Support has been added for the withSchemaEvolution() member of the DeltaMergeBuilder class. Use withSchemaEvolution() to enable automatic schema evolution during MERGE operations. For example, mergeBuilder.whenMatched(...).withSchemaEvolution().execute()}}.

  • Support for collations in Apache Spark is in Public Preview: You can now assign language-aware, case-insensitive, and access-insensitive collations to STRING columns and expressions. These collations are used in string comparisons, sorting, grouping operations, and many string functions. See Collation.

  • Support for collations in Delta Lake is in Public Preview: You can now define collations for columns when creating or altering a Delta table. See Collation support for Delta Lake.

  • LITE mode for vacuum is in Public Preview: You can now use VACUUM table_name LITE to perform a lighter-weight vacuum operation that leverages metadata in the Delta transaction log. See Full vs. lite mode and VACUUM.

  • Support for parameterizing the USE CATALOG with IDENTIFIER clause: The IDENTIFIER clause is now supported for the USE CATALOG statement. With this support, you can parameterize the current catalog based on a string variable or parameter marker.

  • COMMENT ON COLUMN support for tables and views: The COMMENT ON statement now supports altering comments for view and table columns.

  • Named parameter invocation for more functions: The following functions support named parameter invocation:

  • The SYNC METADATA parameter to the REPAIR TABLE command is supported with the Hive metastore: You can now use the SYNC METADATA parameter with the REPAIR TABLE command to update the metadata of a Hive metastore managed table. See REPAIR TABLE.

  • Enhanced data integrity for compressed Apache Arrow batches: To further protect against data corruption, every LZ4 compressed Arrow batch now includes the LZ4 content and block checksums. See LZ4 Frame Format Description.

  • Built-in Oracle JDBC Driver: Serverless compute now has the Oracle JDBC Driver built in. If you use a customer-uploaded JDBC driver JAR via DriverManager, you must rewrite scripts to explicitly use the custom JAR. Otherwise, the built-in driver is used. This driver only supports Lakehouse Federation. For other use cases, you must provide your own driver.

  • More detailed errors for Delta tables accessed with paths: A new error message experience for Delta tables accessed using paths is now available. All exceptions are now forwarded to the user. The exception DELTA_MISSING_DELTA_TABLE is now reserved for when underlying files cannot be read as a Delta table.

Behavior changes

  • Breaking change: Hosted RStudio is end-of-life: With this release, Databricks-hosted RStudio Server is end-of-life and unavailable on any Azure Databricks workspace running on serverless compute. To learn more and see a list of alternatives to RStudio, see Hosted RStudio Server deprecation.

  • Breaking change: Removal of support for changing byte, short, int and long types to wider types: To ensure consistent behavior across Delta and Iceberg tables, the following data type changes can no longer be applied to tables with the type widening feature enabled:

    • byte, short, int and long to decimal.
    • byte, short, and int to double.
  • Correct parsing of regex patterns with negation in nested character grouping: This release includes a change to support the correct parsing of regex patterns with negation in nested character grouping. For example, [^[abc]] will be parsed as “any character that is NOT one of ‘abc’”.

    Additionally, Photon behavior was inconsistent with Spark for nested character classes. Regex patterns containing nested character classes will no longer use Photon, and instead will use Spark. A nested character class is any pattern containing square brackets within square brackets, such as [[a-c][1-3]].

  • Improve duplicate match detection in Delta Lake MERGE: MERGE now considers conditions specified in the WHEN MATCHED clause. See Upsert into a Delta Lake table using merge.

  • The addArtifact() functionality is now consistent across compute types: When you use addArtifact(archive = True) to add a dependency to serverless compute, the archive is automatically unpacked. This change makes the addArtifact(archive = True) behavior consistent with single user compute, which already supports automatically unpacking archives.

  • The VARIANT data type can no longer be used with operations that require comparisons: You cannot use the following clauses or operators in queries that include a VARIANT data type:

    • DISTINCT
    • INTERSECT
    • EXCEPT
    • UNION
    • DISTRIBUTE BY

    Additionally, you cannot use these DataFrame functions:

    • df.dropDuplicates()
    • df.repartition()

    These operations perform comparisons, and comparisons that use the VARIANT data type produce undefined results and are not supported in Databricks. If you use the VARIANT type in your Azure Databricks workloads or tables, Databricks recommends the following changes:

    • Update queries or expressions to explicitly cast VARIANT values to non-VARIANT data types.
    • If you have fields that must be used with any of the above operations, extract those fields from the VARIANT data type and store them using non-VARIANT data types.

    See Query variant data.

Bug fixes

  • Timezone offsets now include seconds when serialized to CSV, JSON, and XML: Timestamps with timezone offsets that include seconds (common for timestamps from before 1900) were omitting the seconds when serialized to CSV, JSON, and XML. The default timestamp formatter has been fixed and now returns the correct offset values for these timestamps.

Other changes

  • Renamed error codes for the cloudFiles Structured Streaming source: The following error codes have been renamed:
    • _LEGACY_ERROR_TEMP_DBR_0143 is renamed to CF_INCORRECT_STREAM_USAGE.
    • _LEGACY_ERROR_TEMP_DBR_0260 is renamed to CF_INCORRECT_BATCH_USAGE .

Version 15.4

October 28, 2024

This serverless compute release roughly corresponds to Databricks Runtime 15.4

New features

  • UTF-8 validation functions: This release introduces the following functions for validating UTF-8 strings:
    • is_valid_utf8 verified whether a string is a valid UTF-8 string.
    • make_valid_utf8 converts a potentially invalid UTF-8 string to a valid UTF-8 string using substitution characters.
    • validate_utf8 raises an error if the input is not a valid UTF-8 string.
    • try_validate_utf8 returns NULL if the input is not a valid UTF-8 string.
  • Enable UniForm Iceberg using ALTER TABLE: You can now enable UniForm Iceberg on existing tables without rewriting data files. See Enable Iceberg reads on an existing table.
  • try_url_decode function: This release introduces the try_url_decode function, which decodes a URL-encoded string. If the string is not in the correct format, the function returns NULL instead of raising an error.
  • Optionally allow the optimizer to rely on unenforced foreign key constraints: To improve query performance, you can now specify the RELY keyword on FOREIGN KEY constraints when you CREATE or ALTER a table.
  • Parallelized job runs for selective overwrites: Selective overwrites using replaceWhere now run jobs that delete data and insert new data in parallel, improving query performance and cluster utilization.
  • Improved performance for change data feed with selective overwrites: Selective overwrites using replaceWhere on tables with change data feed no longer write separate change data files for inserted data. These operations use a hidden _change_type column present in the underlying Parquet data files to record changes without write amplification.
  • Improved query latency for the COPY INTO command: This release includes a change that improves the query latency for the COPY INTO command. This improvement is implemented by making the loading of state by the RocksDB state store asynchronous. With this change, you should see an improvement in start times for queries with large states, such as queries with a large number of already ingested files.
  • Support for dropping the check constraints table feature: You can now drop the checkConstraints table feature from a Delta table using ALTER TABLE table_name DROP FEATURE checkConstraints. See Disable check constraints.

Behavior changes

  • Schema binding change for views: When the data types in a view’s underlying query change from those used when the view was first created, Databricks no longer throws errors for references to the view when no safe cast can be performed.

    Instead, the view compensates by using regular casting rules where possible. This change allows Databricks to tolerate table schema changes more readily.

  • Disallow undocumented ! syntax toleration for NOT outside boolean logic: Databricks will no longer tolerate the use of ! as a synonym for NOT outside of boolean logic. This change reduces confusion, aligns with the SQL standard, and makes SQL more portable. For example:

    CREATE ... IF ! EXISTS, IS ! NULL, ! NULL column or field property, ! IN and ! BETWEEN must be replaced with:

    CREATE ... IF NOT EXISTS, IS NOT NULL, NOT NULL column or field property, NOT IN and NOT BETWEEN.

    The boolean prefix operator ! (e.g. !is_mgr or !(true AND false)) is unaffected by this change.

  • Disallow undocumented and unprocessed portions of column definition syntax in views: Databricks supports CREATE VIEW with named columns and column comments.

    The specification of column types, NOT NULL constraints, or DEFAULT has been tolerated in the syntax without having any effect. Databricks will remove this syntax toleration. Doing so reduces confusion, aligns with the SQL standard, and allows for future enhancements.

  • Consistent error handling for Base64 decoding in Spark and Photon: This release changes how Photon handles Base64 decoding errors to match the Spark handling of these errors. Before these changes, the Photon and Spark code generation path sometimes failed to raise parsing exceptions, while the Spark interpreted execution correctly raised IllegalArgumentException or ConversionInvalidInputError. This update ensures that Photon consistently raises the same exceptions as Spark during Base64 decoding errors, providing more predictable and reliable error handling.

  • Adding a CHECK constraint on an invalid column now returns the UNRESOLVED_COLUMN.WITH_SUGGESTION error class: To provide more useful error messaging, in Databricks Runtime 15.3 and above, an ALTER TABLE ADD CONSTRAINT statement that includes a CHECK constraint referencing an invalid column name returns the UNRESOLVED_COLUMN.WITH_SUGGESTION error class. Previously, an INTERNAL_ERROR was returned.

The JDK is upgraded from JDK 8 to JDK 17

August 15, 2024

Serverless compute for notebooks and workflows has migrated from Java Development Kit (JDK) 8 to JDK 17 on the server side. This upgrade includes the following behavioral changes:

  • Correct parsing of regex patterns with negation in nested character grouping: With this upgrade, Azure Databricks now supports the correct parsing of regex patterns with negation in nested character grouping. For example, [^[abc]] will be parsed as “any character that is NOT one of ‘abc’”.

    Additionally, Photon behavior was inconsistent with Spark for nested character classes. Regex patterns containing nested character classes will no longer use Photon, and instead will use Spark. A nested character class is any pattern containing square brackets within square brackets, such as [[a-c][1-3]].

Version 15.1

July 23, 2024

This serverless compute release roughly corresponds to Databricks Runtime 15.1

New features

Support for star (*) syntax in the WHERE clause: You can now use the star (*) syntax in the WHERE clause to reference all columns from the SELECT list.

For example, SELECT * FROM VALUES(1, 2) AS T(a1, a2) WHERE 1 IN(T.*).

Changes

Improved error recovery for JSON parsing: The JSON parser used for from_json() and JSON path expressions now recovers faster from malformed syntax, resulting in less data loss.

When encountering malformed JSON syntax in a struct field, an array value, a map key, or a map value, the JSON parser will now return NULL only for the unreadable field, key, or element. Subsequent fields, keys, or elements will be properly parsed. Prior to this change, the JSON parser abandoned parsing the array, struct, or map and returned NULL for the remaining content.

Version 14.3

April 15, 2024

This is the initial serverless compute version. This version roughly corresponds to Databricks Runtime 14.3 with some modifications that remove support for some non-serverless and legacy features.

Supported Spark configuration parameters

To automate the configuration of Spark on serverless compute, Azure Databricks has removed support for manually setting most Spark configurations. You can manually set only the following Spark configuration parameters:

  • spark.sql.legacy.timeParserPolicy (Default value is CORRECTED)
  • spark.sql.session.timeZone (Default value is Etc/UTC)
  • spark.sql.shuffle.partitions (Default value is auto)
  • spark.sql.ansi.enabled (Default value is true)

Job runs on serverless compute will fail if you set a Spark configuration that is not in this list.

For more on configuring Spark properties, see Set Spark configuration properties on Azure Databricks.

input_file functions are deprecated

The input_file_name(), input_file_block_length(), and input_file_block_start() functions have been deprecated. Using these functions is highly discouraged.

Instead, use the file metadata column to retrieve file metadata information.

Behavioral changes

Serverless compute version 2024.15 includes the following behavioral changes:

  • unhex(hexStr) bug fix: When using the unhex(hexStr) function, hexStr is always padded left to a whole byte. Previously the unhex function ignored the first half-byte. For example: unhex('ABC') now produces x'0ABC' instead of x'BC'.
  • Auto-generated column aliases are now stable: When the result of an expression is referenced without a user-specified column alias, this auto-generated alias will now be stable. The new algorithm may result in a change to the previously auto-generated names used in features like materialized views.
  • Table scans with CHAR type fields are now always padded: Delta tables, certain JDBC tables, and external data sources store CHAR data in non-padded form. When reading, Azure Databricks will now pad the data with spaces to the declared length to ensure correct semantics.
  • Casts from BIGINT/DECIMAL to TIMESTAMP throw an exception for overflowed values: Azure Databricks allows casting from BIGINT and DECIMAL to TIMESTAMP by treating the value as the number of seconds from the Unix epoch. Previously, Azure Databricks would return overflowed values but now throws an exception in cases of overflow. Use try_cast to return NULL instead of an exception.
  • PySpark UDF execution has been improved to match the exact behavior of UDF execution on single user compute: The following changes have been made:
    • UDFs with a string return type no longer implicitly convert non-string values into strings. Previously, UDFs with a return type of str would apply a str(..) wrapper to the result regardless of the actual data type of the returned value.
    • UDFs with timestamp return types no longer implicitly apply a timezone conversion to timestamps.