What’s coming?
Learn about upcoming Azure Databricks releases.
Behavior change for working with variant data type
Azure Databricks is blocking support for using fields with the variant data type in comparisons perfomed as part of the following operators and clauses:
DISTINCT
INTERSECT
EXCEPT
UNION
DISTRIBUTE BY
The same holds for these DataFrame functions:
df.dropDuplicates()
df.repartition()
Azure Databricks does not support these operators and functions for variant data type comparisons because they produce non-deterministic results.
These expressions will be blocked when using variant in Databricks Runtime 16.1 and above. Maintenance releases will block support in Databricks Runtime 15.3 and above.
If you use VARIANT
type in your Azure Databricks workloads or tables, take the following recommended actions:
- Find the queries that use variant with any of the listed operators.
- Update these queries using recommended patterns that explicitly cast variant values to non-variant types.
The following table provides examples of existing unintended functionality and recommended workarounds:
Unintended use | Recommended use |
---|---|
SELECT distinct(variant_expr) FROM ... |
SELECT distinct(variant_expr?::string) FROM ... |
SELECT variant_expr FROM ... EXCEPT SELECT variant_expr FROM ... |
SELECT variant_expr?::string FROM ... EXCEPT SELECT variant_expr?::string FROM ... |
Note
For any fields you plan to use for comparison or distinct operations, Databricks recommends extracting these fields from the variant column and storing them using non-variant types.
See Query variant data. Contact your Databricks account representative if you require additional support or advisement.
Update to Databricks Marketplace and Partner Connect UI
We are simplifying the sidebar by merging Partner Connect and Marketplace into a single Marketplace link. The new Marketplace link will be higher on the sidebar.
IPYNB notebooks will become the default notebook format for Azure Databricks on December 2024
Currently, Databricks creates all new notebooks in the “Databricks source format” by default. In December 2024, the new default notebook format will be IPYNB (.ipynb
). This new default can be changed by the user in the workspace user Settings pane if they prefer the Databricks source format.
Workspace files will be enabled for all Azure Databricks workspaces on Feb 1, 2025
Databricks will enable workspace files for all Azure Databricks workspaces on February 1, 2025. This change unblocks workspace users from using new workspace file features. After February 1, 2025, you won’t be able to disable workspace files using the enableWorkspaceFilesystem
property with the Azure Databricks PATCH workspace-conf/setstatus REST API. For more details on workspace files, see What are workspace files?.
Tables are shared with history by default in Delta Sharing
Databricks plans to change the default setting for tables shared using Delta Sharing to include history by default. Previously, history sharing was disabled by default. Sharing table history improves read performance and provides automatic support for advanced Delta optimizations.
Predictive optimization enabled by default on all new Azure Databricks accounts
On November 11, Databricks will enable predictive optimization as the default for all new Azure Databricks accounts. Previously, it was disabled by default and could be enabled by your account administrator. When predictive optimization is enabled, Azure Databricks automatically runs maintenance operations for Unity Catalog managed tables. For more information on predictive optimization, see Predictive optimization for Unity Catalog managed tables.
Reduced cost and more control over performance vs. cost for your serverless compute for workflows workloads
In addition to the currently supported automatic performance optimizations, enhancements to the serverless compute for workflows optimization features will give you more control over whether workloads are optimized for performance or cost. To learn more, see Cost savings on serverless compute for Notebooks, Jobs, and Pipelines.
Changes to legacy dashboard version support
Databricks recommends using AI/BI dashboards (formerly Lakeview dashboards). Earlier versions of dashboards, previously referred to as Databricks SQL dashboards are now called legacy dashboards. Databricks does not recommend creating new legacy dashboards. AI/BI dashboards offer improved features compared to legacy dashboards, including AI-assisted authoring, draft and published modes, and cross-filtering.
To help transition to the latest version, upgrade tools are available in both the user interface and the API. For instructions on how to use the built-in migration tool in the UI, please see Clone a legacy dashboard to an AI/BI dashboard. For tutorials about creating and managing dashboards using the REST API at Use Azure Databricks APIs to manage dashboards.
Changes to serverless compute workload attribution
Currently, your billable usage system table might include serverless SKU billing records with null values for run_as
, job_id
, job_run_id
, and notebook_id
. These records represent costs associated with shared resources that are not directly attributable to any particular workload.
To help simplify cost reporting, Databricks will soon attribute these shared costs to the specific workloads that incurred them. You will no longer see billing records with null values in workload identifier fields. As you increase your usage of serverless compute and add more workloads, the proportion of these shared costs on your bill will decrease as they are shared across more workloads.
For more information on monitoring serverless compute costs, see Monitor the cost of serverless compute.
The sourceIpAddress field in audit logs will no longer include a port number
Due to a bug, certain authorization and authentication audit logs include a port number in addition to the IP in the sourceIPAddress
field (for example, "sourceIPAddress":"10.2.91.100:0"
). The port number, which is logged as 0
, does not provide any real value and is inconsistent with the rest of the Databricks audit logs. To enhance the consistency of audit logs, Databricks plans to change the format of the IP address for these audit log events. This change will gradually roll out starting in early August 2024.
If the audit log contains a sourceIpAddress
of 0.0.0.0
, Databricks might stop logging it.
Legacy Git integration is EOL on January 31
After January 31, 2024, Databricks will remove legacy notebook Git integrations. This feature has been in legacy status for more than two years, and a deprecation notice has been displayed in the product UI since November 2023.
For details on migrating to Databricks Git folders (formerly Repos) from legacy Git integration, see Switching to Databricks Repos from Legacy Git integration. If this removal impacts you and you need an extension, contact your Databricks account team.
JDK8 and JDK11 will be unsupported
Azure Databricks plans to remove JDK 8 support with the next major Databricks Runtime version, when Spark 4.0 releases. Azure Databricks plans to remove JDK 11 support with the next LTS version of Databricks Runtime 14.x.
Automatic enablement of Unity Catalog for new workspaces
Databricks has begun to enable Unity Catalog automatically for new workspaces. This removes the need for account admins to configure Unity Catalog after a workspace is created. Rollout is proceeding gradually across accounts.
sqlite-jdbc upgrade
Databricks Runtime plans to upgrade the sqlite-jdbc version from 3.8.11.2 to 3.42.0.0 in all Databricks Runtime maintenance releases. The APIs of version 3.42.0.0 are not fully compatible with 3.8.11.2. Confirm your methods and return type use version 3.42.0.0.
If you are using sqlite-jdbc in your code, check the sqlite-jdbc compatibility report.