Maintenance updates for Databricks Runtime (archived)
This archived page lists maintenance updates issued for Databricks Runtime releases that are no longer supported. To add a maintenance update to an existing cluster, restart the cluster.
Important
This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content have reached end-of-support. See Databricks Runtime release notes versions and compatibility.
Note
This article contains references to the term whitelist, a term that Azure Databricks does not use. When the term is removed from the software, we’ll remove it from this article.
Databricks Runtime releases
Maintenance updates by release:
- Databricks Runtime 15.3
- Databricks Runtime 15.2
- Databricks Runtime 15.1
- Databricks Runtime 15.0
- Databricks Runtime 14.2
- Databricks Runtime 14.0
- Databricks Runtime 13.1
- Databricks Runtime 12.2 LTS
- Databricks Runtime 11.3 LTS
- Databricks Runtime 10.4 LTS
- Databricks Runtime 9.1 LTS
- Databricks Runtime 13.0 (EoS)
- Databricks Runtime 12.1 (EoS)
- Databricks Runtime 12.0 (EoS)
- Databricks Runtime 11.2 (EoS)
- Databricks Runtime 11.1 (EoS)
- Databricks Runtime 11.0 (EoS)
- Databricks Runtime 10.5 (EoS)
- Databricks Runtime 10.3 (EoS)
- Databricks Runtime 10.2 (EoS)
- Databricks Runtime 10.1 (EoS)
- Databricks Runtime 10.0 (EoS)
- Databricks Runtime 9.0 (EoS)
- Databricks Runtime 8.4 (EoS)
- Databricks Runtime 8.3 (EoS)
- Databricks Runtime 8.2 (EoS)
- Databricks Runtime 8.1 (EoS)
- Databricks Runtime 8.0 (EoS)
- Databricks Runtime 7.6 (EoS)
- Databricks Runtime 7.5 (EoS)
- Databricks Runtime 7.3 LTS (EoS)
- Databricks Runtime 6.4 Extended Support (EoS)
- Databricks Runtime 5.5 LTS (EoS)
- Databricks Light 2.4 Extended Support
- Databricks Runtime 7.4 (EoS)
- Databricks Runtime 7.2 (EoS)
- Databricks Runtime 7.1 (EoS)
- Databricks Runtime 7.0 (EoS)
- Databricks Runtime 6.6 (EoS)
- Databricks Runtime 6.5 (EoS)
- Databricks Runtime 6.3 (EoS)
- Databricks Runtime 6.2 (EoS)
- Databricks Runtime 6.1 (EoS)
- Databricks Runtime 6.0 (EoS)
- Databricks Runtime 5.4 ML (EoS)
- Databricks Runtime 5.4 (EoS)
- Databricks Runtime 5.3 (EoS)
- Databricks Runtime 5.2 (EoS)
- Databricks Runtime 5.1 (EoS)
- Databricks Runtime 5.0 (EoS)
- Databricks Runtime 4.3 (EoS)
- Databricks Runtime 4.2 (EoS)
- Databricks Runtime 4.1 ML (EoS)
- Databricks Runtime 4.1 (EoS)
- Databricks Runtime 4.0 (EoS)
- Databricks Runtime 3.5 LTS (EoS)
- Databricks Runtime 3.4 (EoS)
For the maintenance updates on supported Databricks Runtime versions, see Databricks Runtime maintenance updates.
Databricks Runtime 15.3
See Databricks Runtime 15.3 (EoS).
- November 26, 2024
- With this release, you can now query the
vector_search
function usingquery_text
for text input orquery_vector
for embedding input. - Operating system security updates.
- With this release, you can now query the
- November 5, 2024
- [SPARK-49905] Use dedicated ShuffleOrigin for stateful operator to prevent the shuffle to be modified from AQE
- [SPARK-49867][SQL] Improve the error message when index is out of bounds when calling GetColumnByOrdinal
- [SPARK-48843][15.3,15.2] Prevent infinite loop with BindParameters
- [SPARK-49829] Revise the optimization on adding input to state store in stream-stream join (correctness fix)
- [SPARK-49863][SQL] Fix NormalizeFloatingNumbers to preserve nullability of nested structs
- [SPARK-49782][SQL] ResolveDataFrameDropColumns rule resolves UnresolvedAttribute with child output
- [SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children
- Operating system security updates.
- October 22, 2024
- [SPARK-49905] Use dedicated ShuffleOrigin for stateful operator to prevent the shuffle to be modified from AQE
- [SPARK-49867][SQL] Improve the error message when index is out of bounds when calling GetColumnByOrdinal
- [SPARK-48843][15.3,15.2] Prevent infinite loop with BindParameters
- [SPARK-49829] Revise the optimization on adding input to state store in stream-stream join (correctness fix)
- [SPARK-49863][SQL] Fix NormalizeFloatingNumbers to preserve nullability of nested structs
- [SPARK-49782][SQL] ResolveDataFrameDropColumns rule resolves UnresolvedAttribute with child output
- [SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children
- Operating system security updates.
- October 10, 2024
- [SPARK-49688][CONNECT] Fix a data race between interrupt and execute plan
- [SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields
- [BACKPORT][[SPARK-49474]]https://issues.apache.org/jira/browse/SPARK-49474)[SS] Classify Error class for FlatMapGroupsWithState user function error
- Operating system security updates.
- September 25, 2024
- [SPARK-49492][CONNECT] Reattach attempted on inactive ExecutionHolder
- [SPARK-49628][SQL] ConstantFolding should copy stateful expression before evaluating
- [SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
- [SPARK-49458][CONNECT][PYTHON] Supply server-side session id via ReattachExecute
- [SPARK-48719][SQL] Fix the calculation bug of
RegrSlope
&RegrIntercept
when the first parameter is null - Operating system security updates.
- September 17, 2024
- [SPARK-49336][CONNECT] Limit the nesting level when truncating a protobuf message
- [SPARK-49526][CONNECT][15.3.5] Support Windows-style paths in ArtifactManager
- [SPARK-49366][CONNECT] Treat Union node as leaf in dataframe column resolution
- [SPARK-43242][CORE] Fix throw ‘Unexpected type of BlockId’ in shuffle corruption diagnose
- [SPARK-49409][CONNECT] Adjust the default value of CONNECT_SESSION_PLAN_CACHE_SIZE
- Operating system security updates.
- August 29, 2024
- [SPARK-49263][CONNECT] Spark Connect python client: Consistently handle boolean Dataframe reader options
- [SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly
- [SPARK-48862][PYTHON][CONNECT] Avoid calling
_proto_to_string
when INFO level is not enabled - [SPARK-49146][SS] Move assertion errors related to watermark missing in append mode streaming queries to error framework
- August 14, 2024
- [SPARK-48941][SPARK-48970] Backport ML writer / reader fixes
- [SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error
- [SPARK-48954] try_mod() replaces try_remainder()
- [SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan
- [SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
- [SPARK-49047][PYTHON][CONNECT] Truncate the message for logging
- [SPARK-48740][SQL] Catch missing window specification error early
- August 1, 2024
- [Breaking change] In Databricks Runtime 15.3 and above, calling any Python user-defined function (UDF), user-defined aggregate function (UDAF), or user-defined table function (UDTF) that uses a
VARIANT
type as an argument or return value throws an exception. This change is made to prevent issues that might occur because of an invalid value returned by one of these functions. To learn more about theVARIANT
type, see use VARIANTs to store semi-structured data. - On serverless compute for notebooks and jobs, ANSI SQL mode is enabled by default. See Supported Spark configuration parameters.
- On compute configured with shared access mode, Kafka batch reads and writes now have the same limitations enforced as those documented for Structured Streaming. See Streaming limitations and requirements for Unity Catalog shared access mode.
- The output from a
SHOW CREATE TABLE
statement now includes any row filters or column masks defined on a materialized view or streaming table. See SHOW CREATE TABLE. To learn about row filters and column masks, see Filter sensitive table data using row filters and column masks. - [SPARK-46957][CORE] Decommission migrated shuffle files should be able to cleanup from executor
- [SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly threadlocal
- [SPARK-48896] [SPARK-48909] [SPARK-48883] Backport spark ML writer fixes
- [SPARK-48713][SQL] Add index range check for UnsafeRow.pointTo when baseObject is byte array
- [SPARK-48834][SQL] Disable variant input/output to python scalar UDFs, UDTFs, UDAFs during query compilation
- [SPARK-48934][SS] Python datetime types converted incorrectly for setting timeout in applyInPandasWithState
- [SPARK-48705][PYTHON] Explicitly use worker_main when it starts with pyspark
- [SPARK-48544][SQL] Reduce memory pressure of empty TreeNode BitSets
- [SPARK-48889][SS] testStream to unload state stores before finishing
- [SPARK-49054][SQL] Column default value should support current_* functions
- [SPARK-48653][PYTHON] Fix invalid Python data source error class references
- [SPARK-48463] Make StringIndexer supporting nested input columns
- [SPARK-48810][CONNECT] Session stop() API should be idempotent and not fail if the session is already closed by the server
- [SPARK-48873][SQL] Use UnsafeRow in JSON parser.
- Operating system security updates.
- [Breaking change] In Databricks Runtime 15.3 and above, calling any Python user-defined function (UDF), user-defined aggregate function (UDAF), or user-defined table function (UDTF) that uses a
- July 11, 2024
- (Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
.checkpoint()
to persist a table state throughout the lifetime of a DataFrame. - The Snowflake JDBC Driver is updated to version 3.16.1.
- This release includes a fix to an issue that prevented the Spark UI Environment tab from displaying correctly when running in Databricks Container Services.
- To ignore invalid partitions when reading data, file-based data sources, such as Parquet, ORC, CSV, or JSON, can set the ignoreInvalidPartitionPaths data source option to true. For example: spark.read.format(“parquet”).option(“ignoreInvalidPartitionPaths”, “true”).load(…)`. You can also use the SQL configuration spark.sql.files.ignoreInvalidPartitionPaths. However, the data source option takes precedence over the SQL configuration. This setting is false by default.
- [SPARK-48100][SQL] Fix issues in skipping nested structure fields not selected in schema
- [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean
- [SPARK-48292][CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
- [SPARK-48475][PYTHON] Optimize _get_jvm_function in PySpark.
- [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error
- [SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset
- Revert “[SPARK-47406][SQL] Handle TIMESTAMP and DATETIME in MYSQLDialect”
- [SPARK-48383][SS] Throw better error for mismatched partitions in startOffset option in Kafka
- [SPARK-48503][14.3-15.3][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed
- [SPARK-48445][SQL] Don’t inline UDFs with expensive children
- [SPARK-48252][SQL] Update CommonExpressionRef when necessary
- [SPARK-48273][master][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier
- [SPARK-48566][PYTHON] Fix bug where partition indices are incorrect when UDTF analyze() uses both select and partitionColumns
- [SPARK-48556][SQL] Fix incorrect error message pointing to UNSUPPORTED_GROUPING_EXPRESSION
- Operating system security updates.
- (Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
Databricks Runtime 15.2
See Databricks Runtime 15.2 (EoS).
- November 26, 2024
- Operating system security updates.
- November 5, 2024
- [SPARK-49905] Use dedicated ShuffleOrigin for stateful operator to prevent the shuffle to be modified from AQE
- [SPARK-48843][15.3,15.2] Prevent infinite loop with BindParameters
- [SPARK-49829] Revise the optimization on adding input to state store in stream-stream join (correctness fix)
- [SPARK-49863][SQL] Fix NormalizeFloatingNumbers to preserve nullability of nested structs
- [SPARK-49782][SQL] ResolveDataFrameDropColumns rule resolves UnresolvedAttribute with child output
- [SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children
- Operating system security updates.
- October 22, 2024
- [SPARK-49905] Use dedicated ShuffleOrigin for stateful operator to prevent the shuffle to be modified from AQE
- [SPARK-48843][15.3,15.2] Prevent infinite loop with BindParameters
- [SPARK-49829] Revise the optimization on adding input to state store in stream-stream join (correctness fix)
- [SPARK-49863][SQL] Fix NormalizeFloatingNumbers to preserve nullability of nested structs
- [SPARK-49782][SQL] ResolveDataFrameDropColumns rule resolves UnresolvedAttribute with child output
- [SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children
- Operating system security updates.
- October 10, 2024
- [BACKPORT][[SPARK-49474]]https://issues.apache.org/jira/browse/SPARK-49474)[SS] Classify Error class for FlatMapGroupsWithState user function error
- [SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields
- [SPARK-49688][CONNECT] Fix a data race between interrupt and execute plan
- Operating system security updates.
- September 25, 2024
- [SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
- [SPARK-48719][SQL] Fix the calculation bug of RegrSlope & RegrIntercept when the first parameter is null
- [SPARK-49458][CONNECT][PYTHON] Supply server-side session id via ReattachExecute
- [SPARK-49628][SQL] ConstantFolding should copy stateful expression before evaluating
- [SPARK-49492][CONNECT] Reattach attempted on inactive ExecutionHolder
- Operating system security updates.
- September 17, 2024
- [SPARK-49336][CONNECT] Limit the nesting level when truncating a protobuf message
- [SPARK-49526][CONNECT] Support Windows-style paths in ArtifactManager
- [SPARK-49366][CONNECT] Treat Union node as leaf in dataframe column resolution
- [SPARK-43242][CORE] Fix throw ‘Unexpected type of BlockId’ in shuffle corruption diagnose
- [SPARK-49409][CONNECT] Adjust the default value of CONNECT_SESSION_PLAN_CACHE_SIZE
- Operating system security updates.
- August 29, 2024
- [SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly
- [SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan
- [SPARK-48862][PYTHON][CONNECT] Avoid calling
_proto_to_string
when INFO level is not enabled - [SPARK-49263][CONNECT] Spark Connect python client: Consistently handle boolean Dataframe reader options
- [SPARK-49146][SS] Move assertion errors related to watermark missing in append mode streaming queries to error framework
- August 14, 2024
- [SPARK-48941][SPARK-48970] Backport ML writer / reader fixes
- [SPARK-48050][SS] Log logical plan at query start
- [SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error
- [SPARK-48740][SQL] Catch missing window specification error early
- [SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
- [SPARK-49047][PYTHON][CONNECT] Truncate the message for logging
- August 1, 2024
- On serverless compute for notebooks and jobs, ANSI SQL mode is enabled by default. See Supported Spark configuration parameters.
- On compute configured with shared access mode, Kafka batch reads and writes now have the same limitations enforced as those documented for Structured Streaming. See Streaming limitations and requirements for Unity Catalog shared access mode.
- The output from a
SHOW CREATE TABLE
statement now includes any row filters or column masks defined on a materialized view or streaming table. See SHOW CREATE TABLE. To learn about row filters and column masks, see Filter sensitive table data using row filters and column masks. - [SPARK-48705][PYTHON] Explicitly use worker_main when it starts with pyspark
- [SPARK-48047][SQL] Reduce memory pressure of empty TreeNode tags
- [SPARK-48810][CONNECT] Session stop() API should be idempotent and not fail if the session is already closed by the server
- [SPARK-48873][SQL] Use UnsafeRow in JSON parser.
- [SPARK-46957][CORE] Decommission migrated shuffle files should be able to cleanup from executor
- [SPARK-48889][SS] testStream to unload state stores before finishing
- [SPARK-48713][SQL] Add index range check for UnsafeRow.pointTo when baseObject is byte array
- [SPARK-48896] [SPARK-48909] [SPARK-48883] Backport spark ML writer fixes
- [SPARK-48544][SQL] Reduce memory pressure of empty TreeNode BitSets
- [SPARK-48934][SS] Python datetime types converted incorrectly for setting timeout in applyInPandasWithState
- [SPARK-48463] Make StringIndexer supporting nested input columns
- Operating system security updates.
- July 11, 2024
- (Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
.checkpoint()
to persist a table state throughout the lifetime of a DataFrame. - The Snowflake JDBC Driver is updated to version 3.16.1.
- This release includes a fix to an issue that prevented the Spark UI Environment tab from displaying correctly when running in Databricks Container Services.
- On serverless notebooks and jobs, the ANSI SQL mode will be enabled by default and support short names
- To ignore invalid partitions when reading data, file-based data sources, such as Parquet, ORC, CSV, or JSON, can set the ignoreInvalidPartitionPaths data source option to true. For example: spark.read.format(“parquet”).option(“ignoreInvalidPartitionPaths”, “true”).load(…)`. You can also use the SQL configuration spark.sql.files.ignoreInvalidPartitionPaths. However, the data source option takes precedence over the SQL configuration. This setting is false by default.
- [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier
- [SPARK-48292][CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
- [SPARK-48100][SQL] Fix issues in skipping nested structure fields not selected in schema
- [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error
- [SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError
- [SPARK-48556][SQL] Fix incorrect error message pointing to UNSUPPORTED_GROUPING_EXPRESSION
- [SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly threadlocal
- [SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed
- [SPARK-48252][SQL] Update CommonExpressionRef when necessary
- [SPARK-48475][PYTHON] Optimize _get_jvm_function in PySpark.
- [SPARK-48566][PYTHON] Fix bug where partition indices are incorrect when UDTF analyze() uses both select and partitionColumns
- [SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset
- [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean
- [SPARK-48383][SS] Throw better error for mismatched partitions in startOffset option in Kafka
- [SPARK-48445][SQL] Don’t inline UDFs with expensive children
- Operating system security updates.
- (Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
- June 17, 2024
applyInPandasWithState()
is available on shared clusters.- Fixes a bug where the rank-window optimization using Photon TopK incorrectly handled partitions with structs.
- Fixed a bug in the try_divide() function where inputs containing decimals resulted in unexpected exceptions.
- [SPARK-48197][SQL] Avoid assert error for invalid lambda function
- [SPARK-48276][PYTHON][CONNECT] Add the missing
__repr__
method forSQLExpression
- [SPARK-48014][SQL] Change the makeFromJava error in EvaluatePython to a user-facing error
- [SPARK-48016][SQL] Fix a bug in try_divide function when with decimals
- [SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server
- [SPARK-48173][SQL] CheckAnalysis should see the entire query plan
- [SPARK-48056][CONNECT][PYTHON] Re-execute plan if a SESSION_NOT_FOUND error is raised and no partial response was received
- [SPARK-48172][SQL] Fix escaping issues in JDBCDialects backport to 15.2
- [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting
- [SPARK-48288] Add source data type for connector cast expression
- [SPARK-48310][PYTHON][CONNECT] Cached properties must return copies
- [SPARK-48277] Improve error message for ErrorClassesJsonReader.getErrorMessage
- [SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server
- Revert “[SPARK-47406][SQL] Handle TIMESTAMP and DATETIME in MYSQLDialect”
- [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer
- [SPARK-47764][CORE][SQL] Cleanup shuffle dependencies based on ShuffleCleanupMode
- [SPARK-47921][CONNECT] Fix ExecuteJobTag creation in ExecuteHolder
- [SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression
- [SPARK-48146][SQL] Fix aggregate function in With expression child assertion
- [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs
- Operating system security updates.
Databricks Runtime 15.1
See Databricks Runtime 15.1 (EoS).
- October 22, 2024
- [SPARK-49863][SQL] Fix NormalizeFloatingNumbers to preserve nullability of nested structs
- [SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children
- [SPARK-49782][SQL] ResolveDataFrameDropColumns rule resolves UnresolvedAttribute with child output
- [SPARK-49905] Use dedicated ShuffleOrigin for stateful operator to prevent the shuffle to be modified from AQE
- [SPARK-49829] Revise the optimization on adding input to state store in stream-stream join (correctness fix)
- Operating system security updates.
- October 10, 2024
- [SPARK-49688][CONNECT] Fix a data race between interrupt and execute plan
- [SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields
- [BACKPORT][[SPARK-49474]]https://issues.apache.org/jira/browse/SPARK-49474)[SS] Classify Error class for FlatMapGroupsWithState user function error
- Operating system security updates.
- September 25, 2024
- [SPARK-49628][SQL] ConstantFolding should copy stateful expression before evaluating
- [SPARK-48719][SQL] Fix the calculation bug of RegrSlope & RegrIntercept when the first parameter is null
- [SPARK-49492][CONNECT] Reattach attempted on inactive ExecutionHolder
- [SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
- [SPARK-49458][CONNECT][PYTHON] Supply server-side session id via ReattachExecute
- Operating system security updates.
- September 17, 2024
- [SPARK-49336][CONNECT] Limit the nesting level when truncating a protobuf message
- [SPARK-49526][CONNECT] Support Windows-style paths in ArtifactManager
- [SPARK-49409][CONNECT] Adjust the default value of CONNECT_SESSION_PLAN_CACHE_SIZE
- [SPARK-43242][CORE] Fix throw ‘Unexpected type of BlockId’ in shuffle corruption diagnose
- [SPARK-49366][CONNECT] Treat Union node as leaf in dataframe column resolution
- August 29, 2024
- [SPARK-49263][CONNECT] Spark Connect python client: Consistently handle boolean Dataframe reader options
- [SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly
- [SPARK-48862][PYTHON][CONNECT] Avoid calling
_proto_to_string
when INFO level is not enabled - [SPARK-49146][SS] Move assertion errors related to watermark missing in append mode streaming queries to error framework
- August 14, 2024
- [SPARK-48941][SPARK-48970] Backport ML writer / reader fixes
- [SPARK-48050][SS] Log logical plan at query start
- [SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error
- [SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan
- [SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
- [SPARK-49047][PYTHON][CONNECT] Truncate the message for logging
- [SPARK-48740][SQL] Catch missing window specification error early
- August 1, 2024
- On serverless compute for notebooks and jobs, ANSI SQL mode is enabled by default. See Supported Spark configuration parameters.
- On compute configured with shared access mode, Kafka batch reads and writes now have the same limitations enforced as those documented for Structured Streaming. See Streaming limitations and requirements for Unity Catalog shared access mode.
- The output from a
SHOW CREATE TABLE
statement now includes any row filters or column masks defined on a materialized view or streaming table. See SHOW CREATE TABLE. To learn about row filters and column masks, see Filter sensitive table data using row filters and column masks. - [SPARK-48544][SQL] Reduce memory pressure of empty TreeNode BitSets
- [SPARK-46957][CORE] Decommission migrated shuffle files should be able to cleanup from executor
- [SPARK-47202][PYTHON] Fix typo breaking datetimes with tzinfo
- [SPARK-48713][SQL] Add index range check for UnsafeRow.pointTo when baseObject is byte array
- [SPARK-48896] [SPARK-48909] [SPARK-48883] Backport spark ML writer fixes
- [SPARK-48810][CONNECT] Session stop() API should be idempotent and not fail if the session is already closed by the server
- [SPARK-48873][SQL] Use UnsafeRow in JSON parser.
- [SPARK-48934][SS] Python datetime types converted incorrectly for setting timeout in applyInPandasWithState
- [SPARK-48705][PYTHON] Explicitly use worker_main when it starts with pyspark
- [SPARK-48889][SS] testStream to unload state stores before finishing
- [SPARK-48047][SQL] Reduce memory pressure of empty TreeNode tags
- [SPARK-48463] Make StringIndexer supporting nested input columns
- Operating system security updates.
- July 11, 2024
- (Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
.checkpoint()
to persist a table state throughout the lifetime of a DataFrame. - The Snowflake JDBC Driver is updated to version 3.16.1.
- This release includes a fix to an issue that prevented the Spark UI Environment tab from displaying correctly when running in Databricks Container Services.
- On serverless compute for notebooks and jobs, ANSI SQL mode is enabled by default. See Supported Spark configuration parameters.
- To ignore invalid partitions when reading data, file-based data sources, such as Parquet, ORC, CSV, or JSON, can set the ignoreInvalidPartitionPaths data source option to true. For example: spark.read.format(“parquet”).option(“ignoreInvalidPartitionPaths”, “true”).load(…)`. You can also use the SQL configuration spark.sql.files.ignoreInvalidPartitionPaths. However, the data source option takes precedence over the SQL configuration. This setting is false by default.
- [SPARK-48383][SS] Throw better error for mismatched partitions in startOffset option in Kafka
- [SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset
- [SPARK-48100][SQL] Fix issues in skipping nested structure fields not selected in schema
- [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean
- [SPARK-48445][SQL] Don’t inline UDFs with expensive children
- [SPARK-48292][CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
- [SPARK-48566][PYTHON] Fix bug where partition indices are incorrect when UDTF analyze() uses both select and partitionColumns
- [SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly threadlocal
- [SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed
- [SPARK-48252][SQL] Update CommonExpressionRef when necessary
- [SPARK-48475][PYTHON] Optimize _get_jvm_function in PySpark.
- [SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError
- [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error
- [SPARK-47309][SQL] XML: Add schema inference tests for value tags
- [SPARK-47309][SQL][XML] Add schema inference unit tests
- [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier
- Operating system security updates.
- (Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
- June 17, 2024
applyInPandasWithState()
is available on shared clusters.- Fixes a bug where the rank-window optimization using Photon TopK incorrectly handled partitions with structs.
- [SPARK-48310][PYTHON][CONNECT] Cached properties must return copies
- [SPARK-48276][PYTHON][CONNECT] Add the missing
__repr__
method forSQLExpression
- [SPARK-48277] Improve error message for ErrorClassesJsonReader.getErrorMessage
- [SPARK-47764][CORE][SQL] Cleanup shuffle dependencies based on ShuffleCleanupMode
- Operating system security updates.
- May 21, 2024
- Fixed a bug in the try_divide() function where inputs containing decimals resulted in unexpected exceptions.
- [SPARK-48173][SQL] CheckAnalysis should see the entire query plan
- [SPARK-48016][SQL] Fix a bug in try_divide function when with decimals
- [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting
- [SPARK-48197][SQL] Avoid assert error for invalid lambda function
- [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs
- [SPARK-48014][SQL] Change the makeFromJava error in EvaluatePython to a user-facing error
- [SPARK-48056][CONNECT][PYTHON] Re-execute plan if a SESSION_NOT_FOUND error is raised and no partial response was received
- [SPARK-48146][SQL] Fix aggregate function in With expression child assertion
- [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer
- Operating system security updates.
- May 9, 2024
- [SPARK-47543][CONNECT][PYTHON] Inferring dict as MapType from Pandas DataFrame to allow DataFrame creation
- [SPARK-47739][SQL] Register logical avro type
- [SPARK-48044][PYTHON][CONNECT] Cache
DataFrame.isStreaming
- [SPARK-47855][CONNECT] Add
spark.sql.execution.arrow.pyspark.fallback.enabled
in the unsupported list - [SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression
- [SPARK-47941] [SS] [Connect] Propagate ForeachBatch worker initialization errors to users for PySpark
- [SPARK-47819][CONNECT][Cherry-pick-15.0] Use asynchronous callback for execution cleanup
- [SPARK-47956][SQL] Sanity check for unresolved LCA reference
- [SPARK-47839][SQL] Fix aggregate bug in RewriteWithExpression
- [SPARK-48018][SS] Fix null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange
- [SPARK-47371] [SQL] XML: Ignore row tags found in CDATA
- [SPARK-47907][SQL] Put bang under a config
- [SPARK-47895][SQL] group by all should be idempotent
- [SPARK-47973][CORE] Log call site in SparkContext.stop() and later in SparkContext.assertNotStopped()
- [SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server
- Operating system security updates.
Databricks Runtime 15.0
See Databricks Runtime 15.0 (EoS).
- May 30, 2024
- (Behavior change)
dbutils.widgets.getAll()
is now supported to get all widget values in a notebook.
- (Behavior change)
- April 25, 2024
- [SPARK-47786] SELECT DISTINCT () should not become SELECT DISTINCT struct() (revert to previous behavior)
- [SPARK-47802][SQL] Revert () from meaning struct() back to meaning *
- [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions
- [SPARK-47722] Wait until RocksDB background work finish before closing
- [SPARK-47081][CONNECT][FOLLOW] Improving the usability of the Progress Handler
- [SPARK-47694][CONNECT] Make max message size configurable on the client side
- [SPARK-47669][SQL][CONNECT][PYTHON] Add
Column.try_cast
- [SPARK-47664][PYTHON][CONNECT][Cherry-pick-15.0] Validate the column name with cached schema
- [SPARK-47818][CONNECT][Cherry-pick-15.0] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests
- [SPARK-47704][SQL] JSON parsing fails with “java.lang.ClassCastException” when spark.sql.json.enablePartialResults is enabled
- [SPARK-47755][CONNECT] Pivot should fail when the number of distinct values is too large
- [SPARK-47713][SQL][CONNECT] Fix a self-join failure
- [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker
- [SPARK-47828][CONNECT][PYTHON]
DataFrameWriterV2.overwrite
fails with invalid plan - [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files
- [SPARK-47800][SQL] Create new method for identifier to tableIdentifier conversion
- Operating system security updates.
- April 3, 2024
- (Behavior change) To ensure consistent behavior across compute types, PySpark UDFs on shared clusters now match the behavior of UDFs on no-isolation and assigned clusters. This update includes the following changes that might break existing code:
- UDFs with a
string
return type no longer implicitly convert non-string
values intostring
values. Previously, UDFs with a return type ofstr
would wrap the return value with astr()
function regardless of the actual data type of the returned value. - UDFs with
timestamp
return types no longer implicitly apply a conversion totimestamp
withtimezone
. - The Spark cluster configurations
spark.databricks.sql.externalUDF.*
no longer apply to PySpark UDFs on shared clusters. - The Spark cluster configuration
spark.databricks.safespark.externalUDF.plan.limit
no longer affects PySpark UDFs, removing the Public Preview limitation of 5 UDFs per query for PySpark UDFs. - The Spark cluster configuration
spark.databricks.safespark.sandbox.size.default.mib
no longer applies to PySpark UDFs on shared clusters. Instead, available memory on the system is used. To limit the memory of PySpark UDFs, usespark.databricks.pyspark.udf.isolation.memoryLimit
with a minimum value of100m
.
- UDFs with a
- The
TimestampNTZ
data type is now supported as a clustering column with liquid clustering. See Use liquid clustering for Delta tables. - [SPARK-47218][SQL] XML: Ignore commented row tags in XML tokenizer
- [SPARK-46990][SQL] Fix loading empty Avro files emitted by event-hubs
- [SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names
- [SPARK-47368][SQL] Remove inferTimestampNTZ config check in ParquetRowConverter
- [SPARK-47561][SQL] Fix analyzer rule order issues about Alias
- [SPARK-47638][PS][CONNECT] Skip column name validation in PS
- [SPARK-46906][BACKPORT][SS] Add a check for stateful operator change for streaming
- [SPARK-47569][SQL] Disallow comparing variant.
- [SPARK-47241][SQL] Fix rule order issues for ExtractGenerator
- [SPARK-47218] [SQL] XML: Changed SchemaOfXml to fail on DROPMALFORMED mode
- [SPARK-47300][SQL]
quoteIfNeeded
should quote identifier starts with digits - [SPARK-47009][SQL][Collation] Enable create table support for collation
- [SPARK-47322][PYTHON][CONNECT] Make
withColumnsRenamed
column names duplication handling consistent withwithColumnRenamed
- [SPARK-47544][PYTHON] SparkSession builder method is incompatible with visual studio code intellisense
- [SPARK-47511][SQL] Canonicalize With expressions by re-assigning IDs
- [SPARK-47385] Fix tuple encoders with Option inputs.
- [SPARK-47200][SS] Error class for Foreach batch sink user function error
- [SPARK-47135][SS] Implement error classes for Kafka data loss exceptions
- [SPARK-38708][SQL] Upgrade Hive Metastore Client to the 3.1.3 for Hive 3.1
- [SPARK-47305][SQL] Fix PruneFilters to tag the isStreaming flag of LocalRelation correctly when the plan has both batch and streaming
- [SPARK-47380][CONNECT] Ensure on the server side that the SparkSession is the same
- Operating system security updates.
- (Behavior change) To ensure consistent behavior across compute types, PySpark UDFs on shared clusters now match the behavior of UDFs on no-isolation and assigned clusters. This update includes the following changes that might break existing code:
Databricks Runtime 14.2
See Databricks Runtime 14.2 (EoS).
- October 22, 2024
- [SPARK-49782][SQL] ResolveDataFrameDropColumns rule resolves UnresolvedAttribute with child output
- [SPARK-49905] Use dedicated ShuffleOrigin for stateful operator to prevent the shuffle to be modified from AQE
- Operating system security updates.
- October 10, 2024
- [SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields
- [BACKPORT][[SPARK-49474]]https://issues.apache.org/jira/browse/SPARK-49474)[SS] Classify Error class for FlatMapGroupsWithState user function error
- September 25, 2024
- [SPARK-48719][SQL] Fix the calculation bug of `RegrS…
- [SPARK-49628][SQL] ConstantFolding should copy stateful expression before evaluating
- [SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
- [SPARK-43242][CORE] Fix throw ‘Unexpected type of BlockId’ in shuffle corruption diagnose
- [SPARK-46601] [CORE] Fix log error in handleStatusMessage
- Operating system security updates.
- September 17, 2024
- [SPARK-49526][CONNECT] Support Windows-style paths in ArtifactManager
- August 29, 2024
- [SPARK-49263][CONNECT] Spark Connect python client: Consistently handle boolean Dataframe reader options
- [SPARK-49146][SS] Move assertion errors related to watermark missing in append mode streaming queries to error framework
- [SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly
- August 14, 2024
- [SPARK-48050][SS] Log logical plan at query start
- [SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan
- [SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
- [SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error
- August 1, 2024
- This release includes a bug fix for the
ColumnVector
andColumnarArray
classes in the Spark Java interface. Previous to this fix, anArrayIndexOutOfBoundsException
might be thrown or incorrect data returned when an instance of one of these classes containednull
values. - The output from a
SHOW CREATE TABLE
statement now includes any row filters or column masks defined on a materialized view or streaming table. See SHOW CREATE TABLE. To learn about row filters and column masks, see Filter sensitive table data using row filters and column masks. - [SPARK-47202][PYTHON] Fix typo breaking datetimes with tzinfo
- [SPARK-48705][PYTHON] Explicitly use worker_main when it starts with pyspark
- Operating system security updates.
- This release includes a bug fix for the
- July 11, 2024
- (Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
.checkpoint()
to persist a table state throughout the lifetime of a DataFrame. - The Snowflake JDBC Driver is updated to version 3.16.1
- This release includes a fix to an issue that prevented the Spark UI Environment tab from displaying correctly when running in Databricks Container Services.
- [SPARK-48292][CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
- [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier
- [SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed
- [SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset
- [SPARK-48475][PYTHON] Optimize _get_jvm_function in PySpark.
- [SPARK-48100][SQL] Fix issues in skipping nested structure fields not selected in schema
- [SPARK-48445][SQL] Don’t inline UDFs with expensive children
- [SPARK-48383][SS] Throw better error for mismatched partitions in startOffset option in Kafka
- Operating system security updates.
- (Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
- June 17, 2024
- Fixes a bug where the rank-window optimization using Photon TopK incorrectly handled partitions with structs.
- [SPARK-48276][PYTHON][CONNECT] Add the missing
__repr__
method forSQLExpression
- [SPARK-48277] Improve error message for ErrorClassesJsonReader.getErrorMessage
- Operating system security updates.
- May 21, 2024
- (Behavior change)
dbutils.widgets.getAll()
is now supported to get all widget values in a notebook. - [SPARK-48173][SQL] CheckAnalysis should see the entire query plan
- [SPARK-48197][SQL] Avoid assert error for invalid lambda function
- [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer
- [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting
- Operating system security updates.
- (Behavior change)
- May 9, 2024
- [SPARK-48044][PYTHON][CONNECT] Cache
DataFrame.isStreaming
- [SPARK-47956][SQL] Sanity check for unresolved LCA reference
- [SPARK-47371] [SQL] XML: Ignore row tags found in CDATA
- [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker
- [SPARK-47895][SQL] group by all should be idempotent
- [SPARK-47973][CORE] Log call site in SparkContext.stop() and later in SparkContext.assertNotStopped()
- Operating system security updates.
- [SPARK-48044][PYTHON][CONNECT] Cache
- April 25, 2024
- [SPARK-47704][SQL] JSON parsing fails with “java.lang.ClassCastException” when spark.sql.json.enablePartialResults is enabled
- [SPARK-47828][CONNECT][PYTHON]
DataFrameWriterV2.overwrite
fails with invalid plan - Operating system security updates.
- April 11, 2024
- [SPARK-47309][SQL][XML] Add schema inference unit tests
- [SPARK-46990][SQL] Fix loading empty Avro files emitted by event-hubs
- [SPARK-47638][PS][CONNECT] Skip column name validation in PS
- [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions
- [SPARK-38708][SQL] Upgrade Hive Metastore Client to the 3.1.3 for Hive 3.1
- Operating system security updates.
- April 1, 2024
- [SPARK-47322][PYTHON][CONNECT] Make
withColumnsRenamed
column names duplication handling consistent withwithColumnRenamed
- [SPARK-47385] Fix tuple encoders with Option inputs.
- [SPARK-47070] Fix invalid aggregation after subquery rewrite
- [SPARK-47218] [SQL] XML: Changed SchemaOfXml to fail on DROPMALFORMED mode
- [SPARK-47305][SQL] Fix PruneFilters to tag the isStreaming flag of LocalRelation correctly when the plan has both batch and streaming
- [SPARK-47218][SQL] XML: Ignore commented row tags in XML tokenizer
- Revert “[SPARK-46861][CORE] Avoid Deadlock in DAGScheduler”
- [SPARK-47300][SQL]
quoteIfNeeded
should quote identifier starts with digits - [SPARK-47368][SQL] Remove inferTimestampNTZ config check in ParquetRowConverter
- Operating system security updates.
- [SPARK-47322][PYTHON][CONNECT] Make
- March 14, 2024
- [SPARK-47035][SS][CONNECT] Protocol for Client-Side Listener
- [SPARK-47121][CORE] Avoid RejectedExecutionExceptions during StandaloneSchedulerBackend shutdown
- [SPARK-47145][SQL] Pass table identifier to row data source scan exec for V2 strategy.
- [SPARK-47176][SQL] Have a ResolveAllExpressionsUpWithPruning helper function
- [SPARK-47167][SQL] Add concrete class for JDBC anonymous relation
- [SPARK-47129][CONNECT][SQL] Make
ResolveRelations
cache connect plan properly - [SPARK-47044][SQL] Add executed query for JDBC external datasources to explain output
- Operating system security updates.
- February 29, 2024
- Fixed an issue where using a local collection as source in a MERGE command could result in the operation metric numSourceRows reporting double the correct number of rows.
- Creating a schema with a defined location now requires the user to have SELECT and MODIFY privileges on ANY FILE.
- You can now ingest XML files using Autoloader, read_files, COPY INTO, DLT, and DBSQL. XML file support can automatically infer and evolve schema, rescue data with type mismatches, validate XML using XSD, support SQL expressions like from_xml, schema_of_xml and to_xml. See XML file support for more details. If you had previously been using the external spark-xml package, please see here for migration guidance.
- [SPARK-46954][SQL] XML: Wrap InputStreamReader with BufferedReader
- [SPARK-46630][SQL] XML: Validate XML element name on write
- [SPARK-46248][SQL] XML: Support for ignoreCorruptFiles and ignoreMissingFiles options
- [SPARK-46954][SQL] XML: Optimize schema index lookup
- [SPARK-47059][SQL] Attach error context for ALTER COLUMN v1 command
- [SPARK-46993][SQL] Fix constant folding for session variables
- February 8, 2024
- Change data feed (CDF) queries on Unity Catalog materialized views are not supported, and attempting to run a CDF query with a Unity Catalog materialized view returns an error. Unity Catalog Streaming tables support CDF queries on non-
APPLY CHANGES
tables in Databricks Runtime 14.1 and later. CDF queries are not supported with Unity Catalog Streaming tables in Databricks Runtime 14.0 and earlier. - [SPARK-46930] Add support for a custom prefix for Union type fields in Avro.
- [SPARK-46822] Respect spark.sql.legacy.charVarcharAsString when casting jdbc type to catalyst type in jdbc.
- [SPARK-46952] XML: Limit size of corrupt record.
- [SPARK-46644] Change add and merge in SQLMetric to use isZero.
- [SPARK-46861] Avoid Deadlock in DAGScheduler.
- [SPARK-46794] Remove subqueries from LogicalRDD constraints.
- [SPARK-46941] Can’t insert window group limit node for top-k computation if contains SizeBasedWindowFunction.
- [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD.
- Operating system security updates.
- Change data feed (CDF) queries on Unity Catalog materialized views are not supported, and attempting to run a CDF query with a Unity Catalog materialized view returns an error. Unity Catalog Streaming tables support CDF queries on non-
- January 31, 2024
- [SPARK-46382] XML: Update doc for
ignoreSurroundingSpaces
. - [SPARK-46382] XML: Capture values interspersed between elements.
- [SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.
- Revert [SPARK-46769] Refine timestamp related schema inference.
- [SPARK-46677] Fix
dataframe["*"]
resolution. - [SPARK-46382] XML: Default ignoreSurroundingSpaces to true.
- [SPARK-46633] Fix Avro reader to handle zero-length blocks.
- [SPARK-45964] Remove private sql accessor in XML and JSON package under catalyst package.
- [SPARK-46581] Update comment on isZero in AccumulatorV2.
- [SPARK-45912] Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility.
- [SPARK-45182] Ignore task completion from old stage after retrying parent-indeterminate stage as determined by checksum.
- [SPARK-46660] ReattachExecute requests updates aliveness of SessionHolder.
- [SPARK-46610] Create table should throw exception when no value for a key in options.
- [SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of
TaskInfo.accumulables()
. - [SPARK-46769] Refine timestamp related schema inference.
- [SPARK-46684] Fix CoGroup.applyInPandas/Arrow to pass arguments properly.
- [SPARK-46676] dropDuplicatesWithinWatermark should not fail on canonicalization of the plan.
- [SPARK-45962] Remove
treatEmptyValuesAsNulls
and usenullValue
option instead in XML. - [SPARK-46541] Fix the ambiguous column reference in self join.
- [SPARK-46599] XML: Use TypeCoercion.findTightestCommonType for compatibility check.
- Operating system security updates.
- [SPARK-46382] XML: Update doc for
- January 17, 2024
- The
shuffle
node of the explain plan returned by a Photon query is updated to add thecausedBroadcastJoinBuildOOM=true
flag when an out-of-memory error occurs during a shuffle that is part of a broadcast join. - To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.
- [SPARK-46261]
DataFrame.withColumnsRenamed
should keep the dict/map ordering. - [SPARK-46538] Fix the ambiguous column reference issue in
ALSModel.transform
. - [SPARK-46145] spark.catalog.listTables does not throw exception when the table or view is not found.
- [SPARK-46484] Make
resolveOperators
helper functions keep the plan id. - [SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when
spark.sql.legacy.keepCommandOutputSchema
set to true. - [SPARK-46609] Avoid exponential explosion in PartitioningPreservingUnaryExecNode.
- [SPARK-46446] Disable subqueries with correlated OFFSET to fix correctness bug.
- [SPARK-46152] XML: Add DecimalType support in XML schema inference.
- [SPARK-46602] Propagate
allowExisting
in view creation when the view/table does not exists. - [SPARK-45814] Make ArrowConverters.createEmptyArrowBatch call close() to avoid memory leak.
- [SPARK-46058] Add separate flag for privateKeyPassword.
- [SPARK-46132] Support key password for JKS keys for RPC SSL.
- [SPARK-46600] Move shared code between SqlConf and SqlApiConf to SqlApiConfHelper.
- [SPARK-46478] Revert SPARK-43049 to use oracle varchar(255) for string.
- [SPARK-46417] Do not fail when calling hive.getTable and throwException is false.
- [SPARK-46153] XML: Add TimestampNTZType support.
- [SPARK-46056][BACKPORT] Fix Parquet vectorized read NPE with byteArrayDecimalType default value.
- [SPARK-46466] Vectorized parquet reader should never do rebase for timestamp ntz.
- [SPARK-46260]
DataFrame.withColumnsRenamed
should respect the dict ordering. - [SPARK-46036] Removing error-class from raise_error function.
- [SPARK-46294] Clean up semantics of init vs zero value.
- [SPARK-46173] Skipping trimAll call during date parsing.
- [SPARK-46250] Deflake test_parity_listener.
- [SPARK-46587] XML: Fix XSD big integer conversion.
- [SPARK-46396] Timestamp inference should not throw exception.
- [SPARK-46241] Fix error handling routine so it wouldn’t fall into infinite recursion.
- [SPARK-46355] XML: Close InputStreamReader on read completion.
- [SPARK-46370] Fix bug when querying from table after changing column defaults.
- [SPARK-46265] Assertions in AddArtifact RPC make the connect client incompatible with older clusters.
- [SPARK-46308] Forbid recursive error handling.
- [SPARK-46337] Make
CTESubstitution
retain thePLAN_ID_TAG
.
- The
- December 14, 2023
- [SPARK-46141] Change default for spark.sql.legacy.ctePrecedencePolicy to CORRECTED.
- [SPARK-45730] Make ReloadingX509TrustManagerSuite less flaky.
- [SPARK-45852] Gracefully deal with recursion error during logging.
- [SPARK-45808] Better error handling for SQL Exceptions.
- [SPARK-45920] group by ordinal should be idempotent.
- Revert “[SPARK-45649] Unify the prepare framework for
OffsetWindowFunctionFrame
”. - [SPARK-45733] Support multiple retry policies.
- [SPARK-45509] Fix df column reference behavior for Spark Connect.
- [SPARK-45655] Allow non-deterministic expressions inside AggregateFunctions in CollectMetrics.
- [SPARK-45905] Least common type between decimal types should retain integral digits first.
- [SPARK-45136] Enhance ClosureCleaner with Ammonite support.
- [SPARK-46255] Support complex type -> string conversion.
- [SPARK-45859] Make UDF objects in ml.functions lazy.
- [SPARK-46028] Make
Column.__getitem__
accept input column. - [SPARK-45798] Assert server-side session ID.
- [SPARK-45892] Refactor optimizer plan validation to decouple
validateSchemaOutput
andvalidateExprIdUniqueness
. - [SPARK-45844] Implement case-insensitivity for XML.
- [SPARK-45770] Introduce plan
DataFrameDropColumns
forDataframe.drop
. - [SPARK-44790] XML: to_xml implementation and bindings for python, connect and SQL.
- [SPARK-45851] Support multiple policies in scala client.
- Operating system security updates.
- November 29, 2023
- Installed a new package,
pyarrow-hotfix
to remediate a PyArrow RCE vulnerability. - Fixed an issue where escaped underscores in
getColumns
operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards. - [SPARK-45730] Improved time constraints for
ReloadingX509TrustManagerSuite
. - [SPARK-45852] The Python client for Spark Connect now catches recursion errors during text conversion.
- [SPARK-45808] Improved error handling for SQL exceptions.
- [SPARK-45920]
GROUP BY
ordinal is doesn’t replace the ordinal. - Revert [SPARK-45649].
- [SPARK-45733] Added support for multiple retry policies.
- [SPARK-45509] Fixed
df
column reference behavior for Spark Connect. - [SPARK-45655] Allow non-deterministic expressions inside
AggregateFunctions
inCollectMetrics
. - [SPARK-45905] The least common type between decimal types now retain integral digits first.
- [SPARK-45136] Enhance
ClosureCleaner
with Ammonite support. - [SPARK-45859] Made UDF objects in
ml.functions
lazy. - [SPARK-46028]
Column.__getitem__
accepts input columns. - [SPARK-45798] Assert server-side session ID.
- [SPARK-45892] Refactor optimizer plan validation to decouple
validateSchemaOutput
andvalidateExprIdUniqueness
. - [SPARK-45844] Implement case-insensitivity for XML.
- [SPARK-45770] Fixed column resolution with
DataFrameDropColumns
forDataframe.drop
. - [SPARK-44790] Added
to_xml
implementation and bindings for Python, Spark Connect, and SQL. - [SPARK-45851] Added support for multiple policies in the Scala client.
- Operating system security updates.
- Installed a new package,
Databricks Runtime 14.0
See Databricks Runtime 14.0 (EoS).
- February 8, 2024
- [SPARK-46396] Timestamp inference should not throw exception.
- [SPARK-46794] Remove subqueries from LogicalRDD constraints.
- [SPARK-45182] Ignore task completion from old stage after retrying parent-indeterminate stage as determined by checksum.
- [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD.
- [SPARK-45957] Avoid generating execution plan for non-executable commands.
- [SPARK-46861] Avoid Deadlock in DAGScheduler.
- [SPARK-46930] Add support for a custom prefix for Union type fields in Avro.
- [SPARK-46941] Can’t insert window group limit node for top-k computation if contains SizeBasedWindowFunction.
- [SPARK-45582] Ensure that store instance is not used after calling commit within output mode streaming aggregation.
- Operating system security updates.
- January 31, 2024
- [SPARK-46541] Fix the ambiguous column reference in self join.
- [SPARK-46676] dropDuplicatesWithinWatermark should not fail on canonicalization of the plan.
- [SPARK-46769] Refine timestamp related schema inference.
- [SPARK-45498] Followup: Ignore task completion from old stage attempts.
- Revert [SPARK-46769] Refine timestamp related schema inference.
- [SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of
TaskInfo.accumulables()
. - [SPARK-46633] Fix Avro reader to handle zero-length blocks.
- [SPARK-46677] Fix
dataframe["*"]
resolution. - [SPARK-46684] Fix CoGroup.applyInPandas/Arrow to pass arguments properly.
- [SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.
- [SPARK-46610] Create table should throw exception when no value for a key in options.
- Operating system security updates.
- January 17, 2024
- The
shuffle
node of the explain plan returned by a Photon query is updated to add thecausedBroadcastJoinBuildOOM=true
flag when an out-of-memory error occurs during a shuffle that is part of a broadcast join. - To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.
- [SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when
spark.sql.legacy.keepCommandOutputSchema
set to true. - [SPARK-46250] Deflake test_parity_listener.
- [SPARK-45814] Make ArrowConverters.createEmptyArrowBatch call close() to avoid memory leak.
- [SPARK-46173] Skipping trimAll call during date parsing.
- [SPARK-46484] Make
resolveOperators
helper functions keep the plan id. - [SPARK-46466] Vectorized parquet reader should never do rebase for timestamp ntz.
- [SPARK-46056] Fix Parquet vectorized read NPE with byteArrayDecimalType default value.
- [SPARK-46058] Add separate flag for privateKeyPassword.
- [SPARK-46478] Revert SPARK-43049 to use oracle varchar(255) for string.
- [SPARK-46132] Support key password for JKS keys for RPC SSL.
- [SPARK-46417] Do not fail when calling hive.getTable and throwException is false.
- [SPARK-46261]
DataFrame.withColumnsRenamed
should keep the dict/map ordering. - [SPARK-46370] Fix bug when querying from table after changing column defaults.
- [SPARK-46609] Avoid exponential explosion in PartitioningPreservingUnaryExecNode.
- [SPARK-46600] Move shared code between SqlConf and SqlApiConf to SqlApiConfHelper.
- [SPARK-46538] Fix the ambiguous column reference issue in
ALSModel.transform
. - [SPARK-46337] Make
CTESubstitution
retain thePLAN_ID_TAG
. - [SPARK-46602] Propagate
allowExisting
in view creation when the view/table does not exists. - [SPARK-46260]
DataFrame.withColumnsRenamed
should respect the dict ordering. - [SPARK-46145] spark.catalog.listTables does not throw exception when the table or view is not found.
- The
- December 14, 2023
- Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were handled incorrectly and interpreted as wildcards.
- [SPARK-46255] Support complex type -> string conversion.
- [SPARK-46028] Make
Column.__getitem__
accept input column. - [SPARK-45920] group by ordinal should be idempotent.
- [SPARK-45433] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat.
- [SPARK-45509] Fix df column reference behavior for Spark Connect.
- Operating system security updates.
- November 29, 2023
- Installed a new package,
pyarrow-hotfix
to remediate a PyArrow RCE vulnerability. - Fixed an issue where escaped underscores in
getColumns
operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards. - When ingesting CSV data using Auto Loader or streaming tables, large CSV files are now splittable and can be processed in parallel during both schema inference and data processing.
- Spark-snowflake connector is upgraded to 2.12.0.
- [SPARK-45859] Made UDF objects in
ml.functions
lazy. - Revert [SPARK-45592].
- [SPARK-45892] Refactor optimizer plan validation to decouple
validateSchemaOutput
andvalidateExprIdUniqueness
. - [SPARK-45592] Fixed correctness issue in AQE with
InMemoryTableScanExec
. - [SPARK-45620] APIs related to Python UDF now use camelCase.
- [SPARK-44784] Made SBT testing hermetic.
- [SPARK-45770] Fixed column resolution with
DataFrameDropColumns
forDataframe.drop
. - [SPARK-45544] Integrated SSL support into
TransportContext
. - [SPARK-45730] Improved time constraints for
ReloadingX509TrustManagerSuite
. - Operating system security updates.
- Installed a new package,
- November 10, 2023
- Changed data feed queries on Unity Catalog streaming tables and materialized views to display error messages.
- [SPARK-45545]
SparkTransportConf
inheritsSSLOptions
upon creation. - [SPARK-45584] Fixed subquery run failure with
TakeOrderedAndProjectExec
. - [SPARK-45427] Added RPC SSL settings to
SSLOptions
andSparkTransportConf
. - [SPARK-45541] Added
SSLFactory
. - [SPARK-45430]
FramelessOffsetWindowFunction
no longer fails whenIGNORE NULLS
andoffset > rowCount
. - [SPARK-45429] Added helper classes for SSL RPC communication.
- [SPARK-44219] Added extra per-rule validations for optimization rewrites.
- [SPARK-45543] Fixed an issue where
InferWindowGroupLimit
generated an error if the other window functions haven’t the same window frame as the rank-like functions. - Operating system security updates.
- October 23, 2023
- [SPARK-45426] Added support for
ReloadingX509TrustManager
. - [SPARK-45396] Added doc entry for
PySpark.ml.connect
module, and addedEvaluator
to__all__
atml.connect
. - [SPARK-45256] Fixed an issue where
DurationWriter
failed when writing more values than initial capacity. - [SPARK-45279] Attached
plan_id
to all logical plans. - [SPARK-45250] Added support for stage-level task resource profile for yarn clusters when dynamic allocation is turned off.
- [SPARK-45182] Added support for rolling back shuffle map stage so all stage tasks can be retried when the stage output is indeterminate.
- [SPARK-45419] Avoid reusing
rocksdb sst
files in a differentrocksdb
instance by removing file version map entries of larger versions. - [SPARK-45386] Fixed an issue where
StorageLevel.NONE
would incorrectly return 0. - Operating system security updates.
- [SPARK-45426] Added support for
- October 13, 2023
- Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.
- The
array_insert
function is 1-based for positive and negative indexes, while before, it was 0-based for negative indexes. It now inserts a new element at the end of input arrays for the index -1. To restore the previous behavior, setspark.sql.legacy.negativeIndexInArrayInsert
totrue
. - Azure Databricks no longer ignores corrupt files when a CSV schema inference with Auto Loader has enabled
ignoreCorruptFiles
. - [SPARK-45227] Fixed a subtle thread-safety issue with
CoarseGrainedExecutorBackend
. - [SPARK-44658]
ShuffleStatus.getMapStatus
should returnNone
instead ofSome(null)
. - [SPARK-44910]
Encoders.bean
does not support superclasses with generic type arguments. - [SPARK-45346] Parquet schema inference respects case-sensitive flags when merging schema.
- Revert [SPARK-42946].
- [SPARK-42205] Updated the JSON protocol to remove Accumulables logging in a task or stage start events.
- [SPARK-45360] Spark session builder supports initialization from
SPARK_REMOTE
. - [SPARK-45316] Add new parameters
ignoreCorruptFiles
/ignoreMissingFiles
toHadoopRDD
andNewHadoopRDD
. - [SPARK-44909] Skip running the torch distributor log streaming server when it is not available.
- [SPARK-45084]
StateOperatorProgress
now uses accurate shuffle partition number. - [SPARK-45371] Fixed shading issues in the Spark Connect Scala Client.
- [SPARK-45178] Fallback to running a single batch for
Trigger.AvailableNow
with unsupported sources rather than using the wrapper. - [SPARK-44840] Make
array_insert()
1-based for negative indexes. - [SPARK-44551] Edited comments to sync with OSS.
- [SPARK-45078] The
ArrayInsert
function now makes explicit casting when the element type does not equal the derived component type. - [SPARK-45339] PySpark now logs retry errors.
- [SPARK-45057] Avoid acquiring read lock when
keepReadLock
is false. - [SPARK-44908] Fixed cross-validator
foldCol
param functionality. - Operating system security updates.
Databricks Runtime 13.1
See Databricks Runtime 13.1 (EoS).
- November 29, 2023
- Fixed an issue where escaped underscores in
getColumns
operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards. - [SPARK-44846] Removed complex grouping expressions after
RemoveRedundantAggregates
. - [SPARK-43802] Fixed an issue where codegen for unhex and unbase64 expressions would fail.
- [SPARK-43718] Fixed nullability for keys in
USING
joins. - Operating system security updates.
- Fixed an issue where escaped underscores in
- November 14, 2023
- Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.
- Changed data feed queries on Unity Catalog Streaming tables and materialized views to display error messages.
- [SPARK-45584] Fixed subquery run failure with
TakeOrderedAndProjectExec
. - [SPARK-45430]
FramelessOffsetWindowFunction
no longer fails whenIGNORE NULLS
andoffset > rowCount
. - [SPARK-45543] Fixed an issue where
InferWindowGroupLimit
caused an issue if the other window functions didn’t have the same window frame as the rank-like functions. - Operating system security updates.
- October 24, 2023
- [SPARK-43799] Added descriptor binary option to PySpark
Protobuf
API. - Revert [SPARK-42946].
- [SPARK-45346] Parquet schema inference now respects case-sensitive flag when merging a schema.
- Operating system security updates.
- [SPARK-43799] Added descriptor binary option to PySpark
- October 13, 2023
- Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.
- No longer ignoring corrupt files when
ignoreCorruptFiles
is enabled during CSV schema inference with Auto Loader. - [SPARK-44658]
ShuffleStatus.getMapStatus
returnsNone
instead ofSome(null)
. - [SPARK-45178] Fallback to running a single batch for
Trigger.AvailableNow
with unsupported sources rather than using the wrapper. - [SPARK-42205] Updated the JSON protocol to remove Accumulables logging in a task or stage start events.
- Operating system security updates.
- September 12, 2023
- [SPARK-44718] Match
ColumnVector
memory-mode config default toOffHeapMemoryMode
config value. - SPARK-44878 Turned off strict limit for
RocksDB
write manager to avoid insertion exception on cache complete. - Miscellaneous fixes.
- [SPARK-44718] Match
- August 30, 2023
- [SPARK-44871] Fixed `percentile_disc behavior.
- [SPARK-44714] Ease restriction of LCA resolution regarding queries.
- [SPARK-44245]
PySpark.sql.dataframe sample()
doc tests are now illustrative-only. - [SPARK-44818] Fixed race for pending task interrupt issued before
taskThread
is initialized. - Operating system security updates.
- August 15, 2023
- [SPARK-44485] Optimized
TreeNode.generateTreeString
. - [SPARK-44643] Fixed
Row.__repr__
when the row is empty. - [SPARK-44504] Maintenance task now cleans up loaded providers on stop error.
- [SPARK-44479] Fixed
protobuf
conversion from an empty struct type. - [SPARK-44464] Fixed
applyInPandasWithStatePythonRunner
to output rows that haveNull
as the first column value. - Miscellaneous fixes.
- [SPARK-44485] Optimized
- July 27, 2023
- Fixed an issue where
dbutils.fs.ls()
returnedINVALID_PARAMETER_VALUE.LOCATION_OVERLAP
when called for a storage location path which clashed with other external or managed storage location. - [SPARK-44199]
CacheManager
no longer refreshes thefileIndex
unnecessarily. - [SPARK-44448] Fixed wrong results bug from
DenseRankLimitIterator
andInferWindowGroupLimit
. - Operating system security updates.
- Fixed an issue where
- July 24, 2023
- Revert [SPARK-42323].
- [SPARK-41848] Fixed task over-schedule issue with
TaskResourceProfile
. - [SPARK-44136] Fixed an issue where
StateManager
would get materialized in an executor instead of the driver inFlatMapGroupsWithStateExec
. - [SPARK-44337] Fixed an issue where any field set to
Any.getDefaultInstance
caused parse errors. - Operating system security updates.
- June 27, 2023
- Operating system security updates.
- June 15, 2023
- Photonized
approx_count_distinct
. - JSON parser in
failOnUnknownFields
mode now drops the record inDROPMALFORMED
mode and fails directly inFAILFAST
mode. - Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.
- The
PubSubRecord
attributes field is stored as JSON instead of the string from a Scala map for more straightforward serialization and deserialization. - The
EXPLAIN EXTENDED
command now returns the result cache eligibility of the query. - Improve the performance of incremental updates with
SHALLOW CLONE
Iceberg and Parquet. - [SPARK-43032] Python SQM bug fix.
- [SPARK-43404]Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.
- [SPARK-43340] Handle missing stack-trace field in eventlogs.
- [SPARK-43527] Fixed
catalog.listCatalogs
in PySpark. - [SPARK-43541] Propagate all
Project
tags in resolving of expressions and missing columns. - [SPARK-43300]
NonFateSharingCache
wrapper for Guava Cache. - [SPARK-43378] Properly close stream objects in
deserializeFromChunkedBuffer
. - [SPARK-42852] Revert
NamedLambdaVariable
related changes fromEquivalentExpressions
. - [SPARK-43779]
ParseToDate
now loadsEvalMode
in the main thread. - [SPARK-43413] Fix
IN
subqueryListQuery
nullability. - [SPARK-43889] Add check for column name for
__dir__()
to filter out error-prone column names. - [SPARK-43043] Improved the performance of
MapOutputTracker
.updateMapOutput - [SPARK-43522] Fixed creating struct column name with index of array.
- [SPARK-43457] Augument user agent with OS, Python and Spark versions.
- [SPARK-43286] Updated
aes_encrypt
CBC mode to generate random IVs. - [SPARK-42851] Guard
EquivalentExpressions.addExpr()
withsupportedExpression()
. - Revert [SPARK-43183].
- Operating system security updates.
- Photonized
Databricks Runtime 12.2 LTS
See Databricks Runtime 12.2 LTS.
- November 29, 2023
- Fixed an issue where escaped underscores in
getColumns
operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards. - [SPARK-42205] Removed logging accumulables in
Stage
andTask
start events. - [SPARK-44846] Removed complex grouping expressions after
RemoveRedundantAggregates
. - [SPARK-43718] Fixed nullability for keys in
USING
joins. - [SPARK-45544] Integrated SSL support into
TransportContext
. - [SPARK-43973] Structured Streaming UI now displays failed queries correctly.
- [SPARK-45730] Improved time constraints for
ReloadingX509TrustManagerSuite
. - [SPARK-45859] Made UDF objects in
ml.functions
lazy. - Operating system security updates.
- Fixed an issue where escaped underscores in
- November 14, 2023
- Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.
- [SPARK-45545]
SparkTransportConf
inheritsSSLOptions
upon creation. - [SPARK-45427] Added RPC SSL settings to
SSLOptions
andSparkTransportConf
. - [SPARK-45584] Fixed subquery run failure with
TakeOrderedAndProjectExec
. - [SPARK-45541] Added
SSLFactory
. - [SPARK-45430]
FramelessOffsetWindowFunction
no longer fails whenIGNORE NULLS
andoffset > rowCount
. - [SPARK-45429] Added helper classes for SSL RPC communication.
- Operating system security updates.
- October 24, 2023
- [SPARK-45426] Added support for
ReloadingX509TrustManager
. - Miscellaneous fixes.
- [SPARK-45426] Added support for
- October 13, 2023
- Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.
- [SPARK-42553] Ensure at least one time unit after interval.
- [SPARK-45346] Parquet schema inference respects case sensitive flag when merging schema.
- [SPARK-45178] Fallback to running a single batch for
Trigger.AvailableNow
with unsupported sources rather than using the wrapper. - [SPARK-45084]
StateOperatorProgress
to use an accurate, adequate shuffle partition number.
- September 12, 2023
- [SPARK-44873] Added support for
alter view
with nested columns in the Hive client. - [SPARK-44718] Match
ColumnVector
memory-mode config default toOffHeapMemoryMode
config value. - [SPARK-43799] Added descriptor binary option to PySpark
Protobuf
API. - Miscellaneous fixes.
- [SPARK-44873] Added support for
- August 30, 2023
- [SPARK-44485] Optimized
TreeNode.generateTreeString
. - [SPARK-44818] Fixed race for pending task interrupt issued before
taskThread
is initialized. - [SPARK-44871][11.3-13.0] Fixed
percentile_disc
behavior. - [SPARK-44714] Eased restriction of LCA resolution regarding queries.
- Operating system security updates.
- [SPARK-44485] Optimized
- August 15, 2023
- [SPARK-44504] Maintenance task cleans up loaded providers on stop error.
- [SPARK-44464] Fixed
applyInPandasWithStatePythonRunner
to output rows that haveNull
as the first column value. - Operating system security updates.
- July 29, 2023
- Fixed an issue where
dbutils.fs.ls()
returnedINVALID_PARAMETER_VALUE.LOCATION_OVERLAP
when called for a storage location path which clashed with other external or managed storage location. - [SPARK-44199]
CacheManager
no longer refreshes thefileIndex
unnecessarily. - Operating system security updates.
- Fixed an issue where
- July 24, 2023
- [SPARK-44337] Fixed an issue where any field set to
Any.getDefaultInstance
caused parse errors. - [SPARK-44136] Fixed an issue where
StateManager
would get materialized in an executor instead of the driver inFlatMapGroupsWithStateExec
. - Operating system security updates.
- [SPARK-44337] Fixed an issue where any field set to
- June 23, 2023
- Operating system security updates.
- June 15, 2023
- Photonized
approx_count_distinct
. - Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.
- [SPARK-43779]
ParseToDate
now loadsEvalMode
in the main thread. - [SPARK-43156][SPARK-43098] Extended scalar subquery count error test with
decorrelateInnerQuery
turned off. - Operating system security updates.
- Photonized
- June 2, 2023
- The JSON parser in
failOnUnknownFields
mode drops a record inDROPMALFORMED
mode and fails directly inFAILFAST
mode. - Improve the performance of incremental updates with
SHALLOW CLONE
Iceberg and Parquet. - Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.
- [SPARK-43404] Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.
- [SPARK-43413][11.3-13.0] Fixed
IN
subqueryListQuery
nullability. - [SPARK-43522] Fixed creating struct column name with index of array.
- [SPARK-43541] Propagate all
Project
tags in resolving of expressions and missing columns. - [SPARK-43527] Fixed
catalog.listCatalogs
in PySpark. - [SPARK-43123] Internal field metadata no longer leaks to catalogs.
- [SPARK-43340] Fixed missing stack trace field in eventlogs.
- [SPARK-42444]
DataFrame.drop
now handles duplicated columns correctly. - [SPARK-42937]
PlanSubqueries
now setsInSubqueryExec#shouldBroadcast
to true. - [SPARK-43286] Updated
aes_encrypt
CBC mode to generate random IVs. - [SPARK-43378] Properly close stream objects in
deserializeFromChunkedBuffer
.
- The JSON parser in
- May 17, 2023
- Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.
- If an Avro file was read with just the
failOnUnknownFields
option or with Auto Loader in thefailOnNewColumns
schema evolution mode, columns that have different data types would be read asnull
instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use therescuedDataColumn
option. - Auto Loader now does the following.
-
- Correctly reads and no longer rescues
Integer
,Short
, andByte
types if one of these data types is provided, but the Avro file suggests one of the other two types.
- Correctly reads and no longer rescues
-
- Prevents reading interval types as date or time stamp types to avoid getting corrupt dates.
-
- Prevents reading
Decimal
types with lower precision.
- Prevents reading
- [SPARK-43172] Exposes host and token from Spark connect client.
- [SPARK-43293]
__qualified_access_only
is ignored in normal columns. - [SPARK-43098] Fixed correctness
COUNT
bug when scalar subquery is grouped by clause. - [SPARK-43085] Support for column
DEFAULT
assignment for multi-part table names. - [SPARK-43190]
ListQuery.childOutput
is now consistent with secondary output. - [SPARK-43192] Removed user agent charset validation.
- Operating system security updates.
- April 25, 2023
- If a Parquet file was read with just the
failOnUnknownFields
option or with Auto Loader in thefailOnNewColumns
schema evolution mode, columns that had different data types would be read asnull
instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use therescuedDataColumn
option. - Auto Loader now correctly reads and no longer rescues
Integer
,Short
, andByte
types if one of these data types is provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be saved even though they were readable. - [SPARK-43009] Parameterized
sql()
withAny
constants - [SPARK-42406] Terminate Protobuf recursive fields by dropping the field
- [SPARK-43038] Support the CBC mode by
aes_encrypt()
/aes_decrypt()
- [SPARK-42971] Change to print
workdir
ifappDirs
is null when worker handleWorkDirCleanup
event - [SPARK-43018] Fix bug for INSERT commands with timestamp literals
- Operating system security updates.
- If a Parquet file was read with just the
- April 11, 2023
- Support legacy data source formats in the
SYNC
command. - Fixes an issue in the %autoreload behavior in notebooks outside of a repo.
- Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.
- [SPARK-42928] Makes
resolvePersistentFunction
synchronized. - [SPARK-42936] Fixes LCan issue when the clause can be resolved directly by its child aggregate.
- [SPARK-42967] Fixes
SparkListenerTaskStart.stageAttemptId
when a task starts after the stage is canceled. - Operating system security updates.
- Support legacy data source formats in the
- March 29, 2023
Databricks SQL now supports specifying default values for columns of Delta Lake tables, either at table creation time or afterward. Subsequent
INSERT
,UPDATE
,DELETE
, andMERGE
commands can refer to any column’s default value using the explicitDEFAULT
keyword. In addition, if anyINSERT
assignment has an explicit list of fewer columns than the target table, corresponding column default values are substituted for the remaining columns (or NULL if no default is specified).For example:
CREATE TABLE t (first INT, second DATE DEFAULT CURRENT_DATE()); INSERT INTO t VALUES (0, DEFAULT); INSERT INTO t VALUES (1, DEFAULT); SELECT first, second FROM t; \> 0, 2023-03-28 1, 2023-03-28z
Auto Loader now initiates at least one synchronous RocksDB log cleanup for
Trigger.AvailableNow
streams to check that the checkpoint can get regularly cleaned up for fast-running Auto Loader streams. This can cause some streams to take longer before they shut down, but it will save you storage costs and improve the Auto Loader experience in future runs.You can now modify a Delta table to add support to table features using
DeltaTable.addFeatureSupport(feature_name)
.[SPARK-42794] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming
[SPARK-42521] Add NULLs for INSERTs with user-specified lists of fewer columns than the target table
[SPARK-42702][SPARK-42623] Support parameterized query in subquery and CTE
[SPARK-42668] Catch exception while trying to close the compressed stream in HDFSStateStoreProvider stop
[SPARK-42403] JsonProtocol should handle null JSON strings
- March 8, 2023
- The error message “Failure to initialize configuration” has been improved to provide more context for the customer.
- There is a terminology change for adding features to a Delta table using the table property. The preferred syntax is now
'delta.feature.featureName'='supported'
instead of'delta.feature.featureName'='enabled'
. For backward compatibility, using'delta.feature.featureName'='enabled'
still works and will continue to work. - Starting from this release, it is possible to create/replace a table with an additional table property
delta.ignoreProtocolDefaults
to ignore protocol-related Spark configs, which includes default reader and writer versions and table features supported by default. - [SPARK-42070] Change the default value of the argument of the Mask function from -1 to NULL
- [SPARK-41793] Incorrect result for window frames defined by a range clause on significant decimals
- [SPARK-42484] UnsafeRowUtils better error message
- [SPARK-42516] Always capture the session time zone config while creating views
- [SPARK-42635] Fix the TimestampAdd expression.
- [SPARK-42622] Turned off substitution in values
- [SPARK-42534] Fix DB2Dialect Limit clause
- [SPARK-42121] Add built-in table-valued functions posexplode, posexplode_outer, json_tuple and stack
- [SPARK-42045] ANSI SQL mode: Round/Bround should return an error on tiny/small/significant integer overflow
- Operating system security updates.
Databricks Runtime 11.3 LTS
See Databricks Runtime 11.3 LTS.
- November 29, 2023
- Fixed an issue where escaped underscores in
getColumns
operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards. - [SPARK-43973] Structured Streaming UI now displays failed queries correctly.
- [SPARK-45730] Improved time constraints for
ReloadingX509TrustManagerSuite
. - [SPARK-45544] Integrated SSL support into
TransportContext
. - [SPARK-45859] Made UDF objects in
ml.functions
lazy. - [SPARK-43718] Fixed nullability for keys in
USING
joins. - [SPARK-44846] Removed complex grouping expressions after
RemoveRedundantAggregates
. - Operating system security updates.
- Fixed an issue where escaped underscores in
- November 14, 2023
- Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.
- [SPARK-42205] Removed logging accumulables in Stage and Task start events.
- [SPARK-45545]
SparkTransportConf
inheritsSSLOptions
upon creation. - Revert [SPARK-33861].
- [SPARK-45541] Added
SSLFactory
. - [SPARK-45429] Added helper classes for SSL RPC communication.
- [SPARK-45584] Fixed subquery run failure with
TakeOrderedAndProjectExec
. - [SPARK-45430]
FramelessOffsetWindowFunction
no longer fails whenIGNORE NULLS
andoffset > rowCount
. - [SPARK-45427] Added RPC SSL settings to
SSLOptions
andSparkTransportConf
. - Operating system security updates.
- October 24, 2023
- [SPARK-45426] Added support for
ReloadingX509TrustManager
. - Miscellaneous fixes.
- [SPARK-45426] Added support for
- October 13, 2023
- Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.
- [SPARK-45178] Fallback to running a single batch for
Trigger.AvailableNow
with unsupported sources rather than using the wrapper. - [SPARK-45084]
StateOperatorProgress
to use an accurate, adequate shuffle partition number. - [SPARK-45346] Parquet schema inference now respects case-sensitive flag when merging a schema.
- Operating system security updates.
- September 10, 2023
- Miscellaneous fixes.
- August 30, 2023
- [SPARK-44818] Fixed race for pending task interrupt issued before
taskThread
is initialized. - [SPARK-44871][11.3-13.0] Fixed
percentile_disc
behavior. - Operating system security updates.
- [SPARK-44818] Fixed race for pending task interrupt issued before
- August 15, 2023
- [SPARK-44485] Optimized
TreeNode.generateTreeString
. - [SPARK-44504] Maintenance task cleans up loaded providers on stop error.
- [SPARK-44464] Fixed
applyInPandasWithStatePythonRunner
to output rows that haveNull
as the first column value. - Operating system security updates.
- [SPARK-44485] Optimized
- July 27, 2023
- Fixed an issue where
dbutils.fs.ls()
returnedINVALID_PARAMETER_VALUE.LOCATION_OVERLAP
when called for a storage location path which clashed with other external or managed storage location. - [SPARK-44199]
CacheManager
no longer refreshes thefileIndex
unnecessarily. - Operating system security updates.
- Fixed an issue where
- July 24, 2023
- [SPARK-44136] Fixed an issue that StateManager can get materialized in executor instead of driver in FlatMapGroupsWithStateExec.
- Operating system security updates.
- June 23, 2023
- Operating system security updates.
- June 15, 2023
- Photonized
approx_count_distinct
. - Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.
- [SPARK-43779]
ParseToDate
now loadsEvalMode
in the main thread. - [SPARK-40862] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery
- [SPARK-43156][SPARK-43098] Extended scalar subquery count bug test with
decorrelateInnerQuery
turned off. - [SPARK-43098] Fix correctness COUNT bug when scalar subquery has a group by clause
- Operating system security updates.
- Photonized
- June 2, 2023
- The JSON parser in
failOnUnknownFields
mode drops a record inDROPMALFORMED
mode and fails directly inFAILFAST
mode. - Improve the performance of incremental updates with
SHALLOW CLONE
Iceberg and Parquet. - Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.
- [SPARK-43404]Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.
- [SPARK-43527] Fixed
catalog.listCatalogs
in PySpark. - [SPARK-43413][11.3-13.0] Fixed
IN
subqueryListQuery
nullability. - [SPARK-43340] Fixed missing stack trace field in eventlogs.
- The JSON parser in
Databricks Runtime 10.4 LTS
See Databricks Runtime 10.4 LTS.
- November 29, 2023
- [SPARK-45544] Integrated SSL support into
TransportContext
. - [SPARK-45859] Made UDF objects in
ml.functions
lazy. - [SPARK-43718] Fixed nullability for keys in
USING
joins. - [SPARK-45730] Improved time constraints for
ReloadingX509TrustManagerSuite
. - [SPARK-42205] Removed logging accumulables in Stage and Task start events.
- [SPARK-44846] Removed complex grouping expressions after
RemoveRedundantAggregates
. - Operating system security updates.
- [SPARK-45544] Integrated SSL support into
- November 14, 2023
- [SPARK-45541] Added
SSLFactory
. - [SPARK-45545]
SparkTransportConf
inheritsSSLOptions
upon creation. - [SPARK-45427] Added RPC SSL settings to
SSLOptions
andSparkTransportConf
. - [SPARK-45429] Added helper classes for SSL RPC communication.
- [SPARK-45584] Fixed subquery run failure with
TakeOrderedAndProjectExec
. - Revert [SPARK-33861].
- Operating system security updates.
- [SPARK-45541] Added
- October 24, 2023
- [SPARK-45426] Added support for
ReloadingX509TrustManager
. - Operating system security updates.
- [SPARK-45426] Added support for
- October 13, 2023
- [SPARK-45084]
StateOperatorProgress
to use an accurate, adequate shuffle partition number. - [SPARK-45178] Fallback to running a single batch for
Trigger.AvailableNow
with unsupported sources rather than using the wrapper. - Operating system security updates.
- [SPARK-45084]
- September 10, 2023
- Miscellaneous fixes.
- August 30, 2023
- [SPARK-44818] Fixed race for pending task interrupt issued before
taskThread
is initialized. - Operating system security updates.
- [SPARK-44818] Fixed race for pending task interrupt issued before
- August 15, 2023
- [SPARK-44504] Maintenance task cleans up loaded providers on stop error.
- [SPARK-43973] Structured Streaming UI now appears failed queries correctly.
- Operating system security updates.
- June 23, 2023
- Operating system security updates.
- June 15, 2023
- Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.
- [SPARK-43098] Fix correctness COUNT bug when scalar subquery has a group by clause
- [SPARK-40862] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery
- [SPARK-43156][SPARK-43098] Extended scalar subquery count test with
decorrelateInnerQuery
turned off. - Operating system security updates.
- June 2, 2023
- The JSON parser in
failOnUnknownFields
mode drops a record inDROPMALFORMED
mode and fails directly inFAILFAST
mode. - Fixed an issue in JSON rescued data parsing to prevent
UnknownFieldException
. - Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.
- [SPARK-43404] Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.
- [SPARK-43413] Fixed
IN
subqueryListQuery
nullability. - Operating system security updates.
- The JSON parser in
- May 17, 2023
- Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.
- [SPARK-41520] Split
AND_OR
tree pattern to separateAND
andOR
. - [SPARK-43190]
ListQuery.childOutput
is now consistent with secondary output. - Operating system security updates.
- April 25, 2023
- [SPARK-42928] Make
resolvePersistentFunction
synchronized. - Operating system security updates.
- [SPARK-42928] Make
- April 11, 2023
- Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.
- [SPARK-42937]
PlanSubqueries
now setsInSubqueryExec#shouldBroadcast
to true. - [SPARK-42967] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is canceled.
- March 29, 2023
- [SPARK-42668] Catch exception while trying to close the compressed stream in HDFSStateStoreProvider stop
- [SPARK-42635] Fix the …
- Operating system security updates.
- March 14, 2023
- [SPARK-41162] Fix anti- and semi-join for self-join with aggregations
- [SPARK-33206] Fix shuffle index cache weight calculation for small index files
- [SPARK-42484] Improved the
UnsafeRowUtils
error message - Miscellaneous fixes.
- February 28, 2023
- Support generated column for yyyy-MM-dd date_format. This change supports partition pruning for yyyy-MM-dd as a date_format in generated columns.
- Users can now read and write specific Delta tables requiring Reader version 3 and Writer version 7, using Databricks Runtime 9.1 LTS or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.
- Support generated column for yyyy-MM-dd date_format. This change supports partition pruning for yyyy-MM-dd as a date_format in generated columns.
- Operating system security updates.
- February 16, 2023
- [SPARK-30220] Enable using Exists/In subqueries outside of the Filter node
- Operating system security updates.
- January 31, 2023
- Table types of JDBC tables are now EXTERNAL by default.
- January 18, 2023
- Azure Synapse connector returns a more descriptive error message when a column name contains not valid characters such as whitespaces or semicolons. In such cases, the following message will be returned:
Azure Synapse Analytics failed to run the JDBC query produced by the connector. Check column names do not include not valid characters such as ';' or white space
. - [SPARK-38277] Clear write batch after RocksDB state store’s commit
- [SPARK-41199] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used
- [SPARK-41198] Fix metrics in streaming query having CTE and DSv1 streaming source
- [SPARK-41339] Close and recreate RocksDB write batch instead of just clearing
- [SPARK-41732] Apply tree-pattern based pruning for the rule SessionWindowing
- Operating system security updates.
- Azure Synapse connector returns a more descriptive error message when a column name contains not valid characters such as whitespaces or semicolons. In such cases, the following message will be returned:
- November 29, 2022
- Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control white space handling:
csvignoreleadingwhitespace
, when set totrue
, removes leading white space from values during writes whentempformat
is set toCSV
orCSV GZIP
. Whitespaces are retained when the config is set tofalse
. By default, the value istrue
.csvignoretrailingwhitespace
, when set totrue
, removes trailing white space from values during writes whentempformat
is set toCSV
orCSV GZIP
. Whitespaces are retained when the config is set tofalse
. By default, the value istrue
.
- Fixed an issue with JSON parsing in Auto Loader when all columns were left as strings (
cloudFiles.inferColumnTypes
was not set or set tofalse
) and the JSON contained nested objects. - Operating system security updates.
- Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control white space handling:
- November 15, 2022
- Upgraded Apache commons-text to 1.10.0.
- [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behavior, set
spark.sql.json.enablePartialResults
totrue
. The flag is turned off by default to preserve the original behavior. - [SPARK-40292] Fix column names in
arrays_zip
function when arrays are referenced from nested structs - Operating system security updates.
- November 1, 2022
- Fixed an issue where if a Delta table had a user-defined column named
_change_type
, but Change data feed was turned off on that table, data in that column would incorrectly fill with NULL values when runningMERGE
. - Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when
allowOverwrites
is enabled - [SPARK-40697] Add read-side char padding to cover external data files
- [SPARK-40596] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo
- Operating system security updates.
- Fixed an issue where if a Delta table had a user-defined column named
- October 18, 2022
- Operating system security updates.
- October 5, 2022
- [SPARK-40468] Fix column pruning in CSV when
_corrupt_record
is selected. - Operating system security updates.
- [SPARK-40468] Fix column pruning in CSV when
- September 22, 2022
- Users can set spark.conf.set(
spark.databricks.io.listKeysWithPrefix.azure.enabled
,true
) to re-enable the built-in listing for Auto Loader on ADLS Gen2. Built-in listing was previously turned off due to performance issues but can have led to increased storage costs for customers. - [SPARK-40315] Add hashCode() for Literal of ArrayBasedMapData
- [SPARK-40213] Support ASCII value conversion for Latin-1 characters
- [SPARK-40380] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan
- [SPARK-38404] Improve CTE resolution when a nested CTE references an outer CTE
- [SPARK-40089] Fix sorting for some Decimal types
- [SPARK-39887] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique
- Users can set spark.conf.set(
- September 6, 2022
- [SPARK-40235] Use interruptible lock instead of synchronized in Executor.updateDependencies()
- [SPARK-40218] GROUPING SETS should preserve the grouping columns
- [SPARK-39976] ArrayIntersect should handle null in left expression correctly
- [SPARK-40053] Add
assume
to dynamic cancel cases which require Python runtime environment - [SPARK-35542] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it
- [SPARK-40079] Add Imputer inputCols validation for empty input case
- August 24, 2022
- [SPARK-39983] Do not cache unserialized broadcast relations on the driver
- [SPARK-39775] Disable validate default values when parsing Avro schemas
- [SPARK-39962] Apply projection when group attributes are empty
- [SPARK-37643] when charVarcharAsString is true, for char datatype predicate query should skip rpadding rule
- Operating system security updates.
- August 9, 2022
- [SPARK-39847] Fix race condition in RocksDBLoader.loadLibrary() if the caller thread is interrupted
- [SPARK-39731] Fix issue in CSV and JSON data sources when parsing dates in “yyyyMMdd” format with CORRECTED time parser policy
- Operating system security updates.
- July 27, 2022
- [SPARK-39625] Add Dataset.as(StructType)
- [SPARK-39689]Support 2-chars
lineSep
in CSV data source - [SPARK-39104] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe
- [SPARK-39570] Inline table should allow expressions with alias
- [SPARK-39702] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel
- [SPARK-39575] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer
- [SPARK-39476] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float
- [SPARK-38868] Don’t propagate exceptions from filter predicate when optimizing outer joins
- Operating system security updates.
- July 20, 2022
- Make Delta MERGE operation results consistent when the source is non-deterministic.
- [SPARK-39355] Single column uses quoted to construct UnresolvedAttribute
- [SPARK-39548] CreateView Command with a window clause query press a wrong window definition not found issue
- [SPARK-39419] Fix ArraySort to throw an exception when the comparator returns null
- Turned off Auto Loader’s use of built-in cloud APIs for directory listing on Azure.
- Operating system security updates.
- July 5, 2022
- [SPARK-39376] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN
- Operating system security updates.
- June 15, 2022
- [SPARK-39283] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator
- [SPARK-39285] Spark should not check field names when reading files
- [SPARK-34096] Improve performance for nth_value ignore nulls over offset window
- [SPARK-36718] Fix the
isExtractOnly
check in CollapseProject
- June 2, 2022
- [SPARK-39093] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral
- [SPARK-38990] Avoid NullPointerException when evaluating date_trunc/trunc format as a bound reference
- Operating system security updates.
- May 18, 2022
- Fixes a potential built-in memory leak in Auto Loader.
- [SPARK-38918] Nested column pruning should filter out attributes that do not belong to the current relation
- [SPARK-37593] Reduce default page size by LONG_ARRAY_OFFSET if G1GC and ON_HEAP are used
- [SPARK-39084] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion
- [SPARK-32268] Add ColumnPruning in injectBloomFilter
- [SPARK-38974] Filter registered functions with a given database name in list functions
- [SPARK-38931] Create root dfs directory for RocksDBFileManager with an unknown number of keys on 1st checkpoint
- Operating system security updates.
- April 19, 2022
- Upgraded Java AWS SDK from version 1.11.655 to 1.12.1899.
- Fixed an issue with notebook-scoped libraries not working in batch streaming jobs.
- [SPARK-38616] Keep track of SQL query text in Catalyst TreeNode
- Operating system security updates.
- April 6, 2022
- The following Spark SQL functions are now available with this release:
timestampadd()
anddateadd()
: Add a time duration in a specified unit to a time stamp expression.timestampdiff()
anddatediff()
: Calculate the time difference between two-time stamp expressions in a specified unit.
- Parquet-MR has been upgraded to 1.12.2
- Improved support for comprehensive schemas in parquet files
- [SPARK-38631] Uses Java-based implementation for un-tarring at Utils.unpack
- [SPARK-38509][SPARK-38481] Cherry-pick three
timestmapadd/diff
changes. - [SPARK-38523] Fix referring to the corrupt record column from CSV
- [SPARK-38237] Allow
ClusteredDistribution
to require full clustering keys - [SPARK-38437] Lenient serialization of datetime from datasource
- [SPARK-38180] Allow safe up-cast expressions in correlated equality predicates
- [SPARK-38155] Disallow distinct aggregate in lateral subqueries with unsupported predicates
- Operating system security updates.
- The following Spark SQL functions are now available with this release:
Databricks Runtime 9.1 LTS
See Databricks Runtime 9.1 LTS.
- November 29, 2023
- [SPARK-45859] Made UDF objects in
ml.functions
lazy. - [SPARK-45544] Integrated SSL support into
TransportContext
. - [SPARK-45730] Improved time constraints for
ReloadingX509TrustManagerSuite
. - Operating system security updates.
- [SPARK-45859] Made UDF objects in
- November 14, 2023
- [SPARK-45545]
SparkTransportConf
inheritsSSLOptions
upon creation. - [SPARK-45429] Added helper classes for SSL RPC communication.
- [SPARK-45427] Added RPC SSL settings to
SSLOptions
andSparkTransportConf
. - [SPARK-45584] Fixed subquery run failure with
TakeOrderedAndProjectExec
. - [SPARK-45541] Added
SSLFactory
. - [SPARK-42205] Removed logging accumulables in Stage and Task start events.
- Operating system security updates.
- [SPARK-45545]
- October 24, 2023
- [SPARK-45426] Added support for
ReloadingX509TrustManager
. - Operating system security updates.
- [SPARK-45426] Added support for
- October 13, 2023
- Operating system security updates.
- September 10, 2023
- Miscellaneous fixes.
- August 30, 2023
- Operating system security updates.
- August 15, 2023
- Operating system security updates.
- June 23, 2023
- Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.
- Operating system security updates.
- June 15, 2023
- [SPARK-43098] Fix correctness COUNT bug when scalar subquery has a group by clause
- [SPARK-43156][SPARK-43098] Extend scalar subquery count bug test with
decorrelateInnerQuery
turned off. - [SPARK-40862] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery
- Operating system security updates.
- June 2, 2023
- The JSON parser in
failOnUnknownFields
mode drops a record inDROPMALFORMED
mode and fails directly inFAILFAST
mode. - Fixed an issue in JSON rescued data parsing to prevent
UnknownFieldException
. - Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.
- [SPARK-37520] Add the
startswith()
andendswith()
string functions - [SPARK-43413] Fixed
IN
subqueryListQuery
nullability. - Operating system security updates.
- The JSON parser in
- May 17, 2023
- Operating system security updates.
- April 25, 2023
- Operating system security updates.
- April 11, 2023
- Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.
- [SPARK-42967] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is canceled.
- March 29, 2023
- Operating system security updates.
- March 14, 2023
- [SPARK-42484] Improved error message for
UnsafeRowUtils
. - Miscellaneous fixes.
- [SPARK-42484] Improved error message for
- February 28, 2023
- Users can now read and write specific Delta tables requiring Reader version 3 and Writer version 7, using Databricks Runtime 9.1 LTS or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.
- Operating system security updates.
- February 16, 2023
- Operating system security updates.
- January 31, 2023
- Table types of JDBC tables are now EXTERNAL by default.
- January 18, 2023
- Operating system security updates.
- November 29, 2022
- Fixed an issue with JSON parsing in Auto Loader when all columns were left as strings (
cloudFiles.inferColumnTypes
was not set or set tofalse
) and the JSON contained nested objects. - Operating system security updates.
- Fixed an issue with JSON parsing in Auto Loader when all columns were left as strings (
- November 15, 2022
- Upgraded Apache commons-text to 1.10.0.
- Operating system security updates.
- Miscellaneous fixes.
- November 1, 2022
- Fixed an issue where if a Delta table had a user-defined column named
_change_type
, but Change data feed was turned off on that table, data in that column would incorrectly fill with NULL values when runningMERGE
. - Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when
allowOverwrites
is enabled - [SPARK-40596] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo
- Operating system security updates.
- Fixed an issue where if a Delta table had a user-defined column named
- October 18, 2022
- Operating system security updates.
- October 5, 2022
- Miscellaneous fixes.
- Operating system security updates.
- September 22, 2022
- Users can set spark.conf.set(“spark.databricks.io.listKeysWithPrefix.azure.enabled”, “true”) to re-enable the built-in listing for Auto Loader on ADLS Gen2. Built-in listing was previously turned off due to performance issues but can have led to increased storage costs for customers.
- [SPARK-40315] Add hashCode() for Literal of ArrayBasedMapData
- [SPARK-40089] Fix sorting for some Decimal types
- [SPARK-39887] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique
- September 6, 2022
- [SPARK-40235] Use interruptible lock instead of synchronized in Executor.updateDependencies()
- [SPARK-35542] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it
- [SPARK-40079] Add Imputer inputCols validation for empty input case
- August 24, 2022
- [SPARK-39666] Use UnsafeProjection.create to respect
spark.sql.codegen.factoryMode
in ExpressionEncoder - [SPARK-39962] Apply projection when group attributes are empty
- Operating system security updates.
- [SPARK-39666] Use UnsafeProjection.create to respect
- August 9, 2022
- Operating system security updates.
- July 27, 2022
- Make Delta MERGE operation results consistent when the source is non-deterministic.
- [SPARK-39689] Support for 2-chars
lineSep
in CSV data source - [SPARK-39575] Added
ByteBuffer#rewind
afterByteBuffer#get
inAvroDeserializer
. - [SPARK-37392] Fixed the performance error for catalyst optimizer.
- Operating system security updates.
- July 13, 2022
- [SPARK-39419]
ArraySort
throws an exception when the comparator returns null. - Turned off Auto Loader’s use of built-in cloud APIs for directory listing on Azure.
- Operating system security updates.
- [SPARK-39419]
- July 5, 2022
- Operating system security updates.
- Miscellaneous fixes.
- June 15, 2022
- [SPARK-39283] Fix deadlock between
TaskMemoryManager
andUnsafeExternalSorter.SpillableIterator
.
- [SPARK-39283] Fix deadlock between
- June 2, 2022
- [SPARK-34554] Implement the
copy()
method inColumnarMap
. - Operating system security updates.
- [SPARK-34554] Implement the
- May 18, 2022
- Fixed a potential built-in memory leak in Auto Loader.
- Upgrade AWS SDK version from 1.11.655 to 1.11.678.
- [SPARK-38918] Nested column pruning should filter out attributes that do not belong to the current relation
- [SPARK-39084] Fix
df.rdd.isEmpty()
by usingTaskContext
to stop iterator on task completion - Operating system security updates.
- April 19, 2022
- Operating system security updates.
- Miscellaneous fixes.
- April 6, 2022
- [SPARK-38631] Uses Java-based implementation for un-tarring at Utils.unpack
- Operating system security updates.
- March 22, 2022
- Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user’s home directory. Previously, the active directory was
/databricks/driver
. - [SPARK-38437] Lenient serialization of datetime from datasource
- [SPARK-38180] Allow safe up-cast expressions in correlated equality predicates
- [SPARK-38155] Disallow distinct aggregate in lateral subqueries with unsupported predicates
- [SPARK-27442] Removed a check field when reading or writing data in a parquet.
- Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user’s home directory. Previously, the active directory was
- March 14, 2022
- [SPARK-38236] Absolute file paths specified in the create/alter table are treated as relative
- [SPARK-34069] Interrupt task thread if local property
SPARK_JOB_INTERRUPT_ON_CANCEL
is set to true.
- February 23, 2022
- [SPARK-37859] SQL tables created with JDBC with Spark 3.1 are not readable with Spark 3.2.
- February 8, 2022
- [SPARK-27442] Removed a check field when reading or writing data in a parquet.
- Operating system security updates.
- February 1, 2022
- Operating system security updates.
- January 26, 2022
- Fixed an issue where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.
- Fixed an issue where the
OPTIMIZE
command could fail when the ANSI SQL dialect was enabled.
- January 19, 2022
- Minor fixes and security enhancements.
- Operating system security updates.
- November 4, 2021
- Fixed an issue that could cause Structured Streaming streams to fail with an
ArrayIndexOutOfBoundsException
. - Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: No FileSystem for scheme
or that might cause modifications tosparkContext.hadoopConfiguration
to not take effect in queries. - The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.
- Fixed an issue that could cause Structured Streaming streams to fail with an
- October 20, 2021
- Upgraded BigQuery connector from 0.18.1 to 0.22.2. This adds support for the BigNumeric type.
Databricks Runtime 13.0 (EoS)
See Databricks Runtime 13.0 (EoS).
October 13, 2023
- Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.
- [SPARK-42553][SQL] Ensure at least one time unit after interval.
- [SPARK-45178] Fallback to running a single batch for
Trigger.AvailableNow
with unsupported sources rather than using wrapper. - [SPARK-44658][CORE]
ShuffleStatus.getMapStatus
returnsNone
instead ofSome(null)
. - [SPARK-42205][CORE] Remove logging of Accumulables in Task/Stage start events in
JsonProtocol
. - Operating system security updates.
September 12, 2023
- [SPARK-44485][SQL] Optimize
TreeNode.generateTreeString
. - [SPARK-44718][SQL] Match
ColumnVector
memory-mode config default toOffHeapMemoryMode
config value. - Miscellaneous bug fixes.
- [SPARK-44485][SQL] Optimize
August 30, 2023
- [SPARK-44818][Backport] Fixed race for pending task interrupt issued before
taskThread
is initialized. - [SPARK-44714] Ease restriction of LCA resolution regarding queries.
- [SPARK-44245][PYTHON]
pyspark.sql.dataframe sample()
doctests is now illustrative-only. - [SPARK-44871][11.3-13.0][SQL] Fixed
percentile_disc
behavior. - Operating system security updates.
- [SPARK-44818][Backport] Fixed race for pending task interrupt issued before
August 15, 2023
- [SPARK-44643][SQL][PYTHON] Fix
Row.__repr__
when the row is empty. - [SPARK-44504][Backport] Maintenance task cleans up loaded providers on stop error.
- [SPARK-44479][CONNECT][PYTHON] Fixed
protobuf
conversion from an empty struct type. - [SPARK-44464][SS] Fixed
applyInPandasWithStatePythonRunner
to output rows that haveNull
as first column value. - Miscellaneous bug fixes.
- [SPARK-44643][SQL][PYTHON] Fix
July 29, 2023
- Fixed a bug where
dbutils.fs.ls()
returnedINVALID_PARAMETER_VALUE.LOCATION_OVERLAP
when called for a storage location path which clashed with other external or managed storage location. - [SPARK-44199]
CacheManager
no longer refreshes thefileIndex
unnecessarily. - Operating system security updates.
- Fixed a bug where
July 24, 2023
- [SPARK-44337][PROTOBUF] Fixed an issue where any field set to
Any.getDefaultInstance
caused parse errors. - [SPARK-44136] [SS] Fixed an issue where
StateManager
would get materialized in an executor instead of driver inFlatMapGroupsWithStateExec
. - Revert [SPARK-42323][SQL] Assign name to
_LEGACY_ERROR_TEMP_2332
. - Operating system security updates.
- [SPARK-44337][PROTOBUF] Fixed an issue where any field set to
June 23, 2023
- Operating system security updates.
June 15, 2023
- Photonized
approx_count_distinct
. - Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.
- [SPARK-43156][SPARK-43098][SQL] Extend scalar subquery count bug test with decorrelateInnerQuery disabled
- [SPARK-43779][SQL]
ParseToDate
now loadsEvalMode
in the main thread. - [SPARK-42937][SQL]
PlanSubqueries
should setInSubqueryExec#shouldBroadcast
to true - Operating system security updates.
- Photonized
June 2, 2023
- The JSON parser in
failOnUnknownFields
mode drops a record inDROPMALFORMED
mode and fails directly inFAILFAST
mode. - Improve the performance of incremental update with
SHALLOW CLONE
Iceberg and Parquet. - Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.
- [SPARK-43404][Backport] Skip reusing sst file for same version of RocksDB state store to avoid ID mismatch error.
- [SPARK-43340][CORE] Fixed missing stack trace field in eventlogs.
- [SPARK-43300][CORE]
NonFateSharingCache
wrapper for Guava Cache. - [SPARK-43378][CORE] Properly close stream objects in
deserializeFromChunkedBuffer
. - [SPARK-16484][SQL] Use 8-bit registers for representing DataSketches.
- [SPARK-43522][SQL] Fixed creating struct column name with index of array.
- [SPARK-43413][11.3-13.0][SQL] Fixed
IN
subqueryListQuery
nullability. - [SPARK-43043][CORE] Improved
MapOutputTracker.updateMapOutput
performance. - [SPARK-16484][SQL] Added support for DataSketches HllSketch.
- [SPARK-43123][SQL] Internal field metadata no longer leaks to catalogs.
- [SPARK-42851][SQL] Guard
EquivalentExpressions.addExpr()
withsupportedExpression()
. - [SPARK-43336][SQL] Casting between
Timestamp
andTimestampNTZ
requires timezone. - [SPARK-43286][SQL] Updated
aes_encrypt
CBC mode to generate random IVs. - [SPARK-42852][SQL] Reverted
NamedLambdaVariable
related changes fromEquivalentExpressions
. - [SPARK-43541][SQL] Propagate all
Project
tags in resolving of expressions and missing columns.. - [SPARK-43527][PYTHON] Fixed
catalog.listCatalogs
in PySpark. - Operating system security updates.
- The JSON parser in
May 31, 2023
- Default optimized write support for Delta tables registered in Unity Catalog has expanded to include
CTAS
statements andINSERT
operations for partitioned tables. This behavior aligns to defaults on SQL warehouses. See Optimized writes for Delta Lake on Azure Databricks.
- Default optimized write support for Delta tables registered in Unity Catalog has expanded to include
May 17, 2023
- Fixed a regression where
_metadata.file_path
and_metadata.file_name
would return incorrectly formatted strings. For example, now a path with spaces are be represented ass3://test-bucket/some%20directory/some%20data.csv
instead ofs3://test-bucket/some directory/some data.csv
. - Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.
-
- If an Avro file was read with just the
failOnUnknownFields
option or with Auto Loader in thefailOnNewColumns
schema evolution mode, columns that have different data types would be read asnull
instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use therescuedDataColumn
option.
- If an Avro file was read with just the
- Auto Loader now does the following.
-
- Correctly reads and no longer rescues
Integer
,Short
,Byte
types if one of these data types are provided, but the Avro file suggests one of the other two types.
- Correctly reads and no longer rescues
-
- Prevents reading interval types as date or timestamp types to avoid getting corrupt dates.
-
- Prevents reading
Decimal
types with lower precision.
- Prevents reading
- [SPARK-43172] [CONNECT] Exposes host and token from Spark connect client.
- [SPARK-43293][SQL]
__qualified_access_only
is ignored in normal columns. - [SPARK-43098][SQL] Fixed correctness
COUNT
bug when scalar subquery is grouped by clause. - [SPARK-43085][SQL] Support for column
DEFAULT
assignment for multi-part table names. - [SPARK-43190][SQL]
ListQuery.childOutput
is now consistent with secondary output. - [SPARK-43192] [CONNECT] Removed user agent charset validation.
- Fixed a regression where
April 25, 2023
- You can modify a Delta table to add support for a Delta table feature using
DeltaTable.addFeatureSupport(feature_name)
. - The
SYNC
command now supports legacy data source formats. - Fixed a bug where using the Python formatter before running any other commands in a Python notebook could cause the notebook path to be missing from
sys.path.
- Azure Databricks now supports specifying default values for columns of Delta tables.
INSERT
,UPDATE
,DELETE
, andMERGE
commands can refer to a column’s default value using the explicitDEFAULT
keyword. ForINSERT
commands with an explicit list of fewer columns than the target table, corresponding column default values are substituted for the remaining columns (orNULL
if no default is specified).
- You can modify a Delta table to add support for a Delta table feature using
Fixes a bug where the web terminal could not be used to access files in
/Workspace
for some users.- If a Parquet file was read with just the
failOnUnknownFields
option or with Auto Loader in thefailOnNewColumns
schema evolution mode, columns that had different data types would be read asnull
instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use therescuedDataColumn
option. - Auto Loader now correctly reads and no longer rescues
Integer
,Short
,Byte
types if one of these data types are provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be rescued even though they were readable. - Fixed a bug where Auto Loader schema evolution can go into an infinite fail loop, when a new column is detected in the schema of a nested JSON object.
- [SPARK-42794][SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming.
- [SPARK-39221][SQL] Make sensitive information be redacted correctly for thrift server job/stage tab.
- [SPARK-42971][CORE] Change to print
workdir
ifappDirs
is null when worker handleWorkDirCleanup
event. - [SPARK-42936][SQL] Fix LCA bug when the having clause can be resolved directly by its child aggregate.
- [SPARK-43018][SQL] Fix bug for
INSERT
commands with timestamp literals. - Revert [SPARK-42754][SQL][UI] Fix backward compatibility issue in nested SQL run.
- Revert [SPARK-41498] Propagate metadata through Union.
- [SPARK-43038][SQL] Support the CBC mode by
aes_encrypt()
/aes_decrypt()
. - [SPARK-42928][SQL] Make
resolvePersistentFunction
synchronized. - [SPARK-42521][SQL] Add
NULL
values forINSERT
with user-specified lists of fewer columns than the target table. - [SPARK-41391][SQL] The output column name of
groupBy.agg(count_distinct)
was incorrect. - [SPARK-42548][SQL] Add
ReferenceAllColumns
to skip rewriting attributes. - [SPARK-42423][SQL] Add metadata column file block start and length.
- [SPARK-42796][SQL] Support accessing
TimestampNTZ
columns inCachedBatch
. - [SPARK-42266][PYTHON] Remove the parent directory in shell.py run when IPython is used.
- [SPARK-43011][SQL]
array_insert
should fail with 0 index. - [SPARK-41874][CONNECT][PYTHON] Support
SameSemantics
in Spark Connect. - [SPARK-42702][SPARK-42623][SQL] Support parameterized query in subquery and CTE.
- [SPARK-42967][CORE] Fix
SparkListenerTaskStart.stageAttemptId
when a task is started after the stage is cancelled. - Operating system security updates.
- If a Parquet file was read with just the
Databricks Runtime 12.1 (EoS)
See Databricks Runtime 12.1 (EoS).
June 23, 2023
- Operating system security updates.
June 15, 2023
- Photonized
approx_count_distinct
. - Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.
- [SPARK-43779][SQL]
ParseToDate
now loadsEvalMode
in the main thread. - [SPARK-43156][SPARK-43098][SQL] Extend scalar subquery count bug test with decorrelateInnerQuery disabled
- Operating system security updates.
- Photonized
June 2, 2023
- The JSON parser in
failOnUnknownFields
mode drops a record inDROPMALFORMED
mode and fails directly inFAILFAST
mode. - Improve the performance of incremental update with
SHALLOW CLONE
Iceberg and Parquet. - Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.
- [SPARK-43404][Backport] Skip reusing sst file for same version of RocksDB state store to avoid ID mismatch error.
- [SPARK-43413][11.3-13.0][SQL] Fixed
IN
subqueryListQuery
nullability. - [SPARK-43522][SQL] Fixed creating struct column name with index of array.
- [SPARK-42444][PYTHON]
DataFrame.drop
now handles duplicated columns properly. - [SPARK-43541][SQL] Propagate all
Project
tags in resolving of expressions and missing columns.. - [SPARK-43340][CORE] Fixed missing stack trace field in eventlogs.
- [SPARK-42937][SQL]
PlanSubqueries
now setsInSubqueryExec#shouldBroadcast
to true. - [SPARK-43527][PYTHON] Fixed
catalog.listCatalogs
in PySpark. - [SPARK-43378][CORE] Properly close stream objects in
deserializeFromChunkedBuffer
.
- The JSON parser in
May 17, 2023
- Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.
- If an Avro file was read with just the
failOnUnknownFields
option or with Auto Loader in thefailOnNewColumns
schema evolution mode, columns that have different data types would be read asnull
instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use therescuedDataColumn
option. - Auto Loader now does the following.
-
- Correctly reads and no longer rescues
Integer
,Short
,Byte
types if one of these data types are provided, but the Avro file suggests one of the other two types.
- Correctly reads and no longer rescues
-
- Prevents reading interval types as date or timestamp types to avoid getting corrupt dates.
-
- Prevents reading
Decimal
types with lower precision.
- Prevents reading
- [SPARK-43098][SQL] Fixed correctness
COUNT
bug when scalar subquery is grouped by clause. - [SPARK-43190][SQL]
ListQuery.childOutput
is now consistent with secondary output. - Operating system security updates.
April 25, 2023
- If a Parquet file was read with just the
failOnUnknownFields
option or with Auto Loader in thefailOnNewColumns
schema evolution mode, columns that had different data types would be read asnull
instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use therescuedDataColumn
option. - Auto Loader now correctly reads and no longer rescues
Integer
,Short
,Byte
types if one of these data types are provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be rescued even though they were readable. - [SPARK-43009][SQL] Parameterized
sql()
withAny
constants. - [SPARK-42971][CORE] Change to print
workdir
ifappDirs
is null when worker handleWorkDirCleanup
event. - Operating system security updates.
- If a Parquet file was read with just the
April 11, 2023
- Support legacy data source formats in SYNC command.
- Fixes a bug in the %autoreload behavior in notebooks that are outside of a repo.
- Fixed a bug where Auto Loader schema evolution can go into an infinite fail loop, when a new column is detected in the schema of a nested JSON object.
- [SPARK-42928][SQL] Makes
resolvePersistentFunction
synchronized. - [SPARK-42967][CORE] Fixes
SparkListenerTaskStart.stageAttemptId
when a task starts after the stage is cancelled. - Operating system security updates.
March 29, 2023
- Auto Loader now triggers at least one synchronous RocksDB log clean up for
Trigger.AvailableNow
streams to ensure that the checkpoint can get regularly cleaned up for fast-running Auto Loader streams. This can cause some streams to take longer before they shut down, but will save you storage costs and improve the Auto Loader experience in future runs. - You can now modify a Delta table to add support to table features using
DeltaTable.addFeatureSupport(feature_name)
. - [SPARK-42702][SPARK-42623][SQL] Support parameterized query in subquery and CTE
- [SPARK-41162][SQL] Fix anti- and semi-join for self-join with aggregations
- [SPARK-42403][CORE] JsonProtocol should handle null JSON strings
- [SPARK-42668][SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort
- [SPARK-42794][SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming
- Auto Loader now triggers at least one synchronous RocksDB log clean up for
March 14, 2023
- There is a terminology change for adding features to a Delta table using the table property. The preferred syntax is now
'delta.feature.featureName'='supported'
instead of'delta.feature.featureName'='enabled'
. For backwards compatibility, using'delta.feature.featureName'='enabled'
still works and will continue to work. - [SPARK-42622][CORE] Disable substitution in values
- [SPARK-42534][SQL] Fix DB2Dialect Limit clause
- [SPARK-42635][SQL] Fix the TimestampAdd expression.
- [SPARK-42516][SQL] Always capture the session time zone config while creating views
- [SPARK-42484] [SQL] UnsafeRowUtils better error message
- [SPARK-41793][SQL] Incorrect result for window frames defined by a range clause on large decimals
- Operating system security updates.
- There is a terminology change for adding features to a Delta table using the table property. The preferred syntax is now
February 24, 2023
- You can now use a unified set of options (
host
,port
,database
,user
,password
) for connecting to the data sources supported in Query Federation (PostgreSQL, MySQL, Synapse, Snowflake, Redshift, SQL Server). Note thatport
is optional and uses the default port number for each data source if not provided.
Example of PostgreSQL connection configuration
CREATE TABLE postgresql_table USING postgresql OPTIONS ( dbtable '<table-name>', host '<host-name>', database '<database-name>', user '<user>', password secret('scope', 'key') );
Example of Snowflake connection configuration
CREATE TABLE snowflake_table USING snowflake OPTIONS ( dbtable '<table-name>', host '<host-name>', port '<port-number>', database '<database-name>', user secret('snowflake_creds', 'my_username'), password secret('snowflake_creds', 'my_password'), schema '<schema-name>', sfWarehouse '<warehouse-name>' );
- [SPARK-41989][PYTHON] Avoid breaking logging config from pyspark.pandas
- [SPARK-42346][SQL] Rewrite distinct aggregates after subquery merge
- [SPARK-41990][SQL] Use
FieldReference.column
instead ofapply
in V1 to V2 filter conversion - Revert [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile
- [SPARK-42162] Introduce MultiCommutativeOp expression as a memory optimization for canonicalizing large trees of commutative expressions
- Operating system security updates.
- You can now use a unified set of options (
February 16, 2023
- SYNC command supports syncing recreated Hive Metastore tables. If a HMS table has been SYNCed previously to Unity Catalog but then dropped and recreated, a subsequent re-sync will work instead of throwing TABLE_ALREADY_EXISTS status code.
- [SPARK-41219][SQL] IntegralDivide use decimal(1, 0) to represent 0
- [SPARK-36173][CORE] Support getting CPU number in TaskContext
- [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile
- [SPARK-42286][SQL] Fallback to previous codegen code path for complex expr with CAST
January 31, 2023
- Creating a schema with a defined location now requires the user to have SELECT and MODIFY privileges on ANY FILE.
- [SPARK-41581][SQL] Assign name to _LEGACY_ERROR_TEMP_1230
- [SPARK-41996][SQL][SS] Fix kafka test to verify lost partitions to account for slow Kafka operations
- [SPARK-41580][SQL] Assign name to _LEGACY_ERROR_TEMP_2137
- [SPARK-41666][PYTHON] Support parameterized SQL by
sql()
- [SPARK-41579][SQL] Assign name to _LEGACY_ERROR_TEMP_1249
- [SPARK-41573][SQL] Assign name to _LEGACY_ERROR_TEMP_2136
- [SPARK-41574][SQL] Assign name to _LEGACY_ERROR_TEMP_2009
- [SPARK-41049][Followup] Fix a code sync regression for ConvertToLocalRelation
- [SPARK-41576][SQL] Assign name to _LEGACY_ERROR_TEMP_2051
- [SPARK-41572][SQL] Assign name to _LEGACY_ERROR_TEMP_2149
- [SPARK-41575][SQL] Assign name to _LEGACY_ERROR_TEMP_2054
- Operating system security updates.
Databricks Runtime 12.0 (EoS)
See Databricks Runtime 12.0 (EoS).
June 15, 2023
- Photonized
approx_count_distinct
. - Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.
- [SPARK-43156][SPARK-43098][SQL] Extend scalar subquery count bug test with decorrelateInnerQuery disabled
- [SPARK-43779][SQL]
ParseToDate
now loadsEvalMode
in the main thread. - Operating system security updates.
- Photonized
June 2, 2023
- The JSON parser in
failOnUnknownFields
mode drops a record inDROPMALFORMED
mode and fails directly inFAILFAST
mode. - Improve the performance of incremental update with
SHALLOW CLONE
Iceberg and Parquet. - Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.
- [SPARK-42444][PYTHON]
DataFrame.drop
now handles duplicated columns properly. - [SPARK-43404][Backport] Skip reusing sst file for same version of RocksDB state store to avoid ID mismatch error.
- [SPARK-43413][11.3-13.0][SQL] Fixed
IN
subqueryListQuery
nullability. - [SPARK-43527][PYTHON] Fixed
catalog.listCatalogs
in PySpark. - [SPARK-43522][SQL] Fixed creating struct column name with index of array.
- [SPARK-43541][SQL] Propagate all
Project
tags in resolving of expressions and missing columns.. - [SPARK-43340][CORE] Fixed missing stack trace field in eventlogs.
- [SPARK-42937][SQL]
PlanSubqueries
setInSubqueryExec#shouldBroadcast
to true.
- The JSON parser in
May 17, 2023
- Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.
- If an Avro file was read with just the
failOnUnknownFields
option or with Auto Loader in thefailOnNewColumns
schema evolution mode, columns that have different data types would be read asnull
instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use therescuedDataColumn
option. - Auto Loader now does the following.
-
- Correctly reads and no longer rescues
Integer
,Short
,Byte
types if one of these data types are provided, but the Avro file suggests one of the other two types.
- Correctly reads and no longer rescues
-
- Prevents reading interval types as date or timestamp types to avoid getting corrupt dates.
-
- Prevents reading
Decimal
types with lower precision.
- Prevents reading
- [SPARK-43172] [CONNECT] Exposes host and token from Spark connect client.
- [SPARK-41520][SQL] Split
AND_OR
tree pattern to separateAND
andOR
. - [SPARK-43098][SQL] Fixed correctness
COUNT
bug when scalar subquery is grouped by clause. - [SPARK-43190][SQL]
ListQuery.childOutput
is now consistent with secondary output. - Operating system security updates.
April 25, 2023
- If a Parquet file was read with just the
failOnUnknownFields
option or with Auto Loader in thefailOnNewColumns
schema evolution mode, columns that had different data types would be read asnull
instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use therescuedDataColumn
option. - Auto Loader now correctly reads and no longer rescues
Integer
,Short
,Byte
types if one of these data types are provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be rescued even though they were readable. - [SPARK-42971][CORE] Change to print
workdir
ifappDirs
is null when worker handleWorkDirCleanup
event - Operating system security updates.
- If a Parquet file was read with just the
April 11, 2023
- Support legacy data source formats in
SYNC
command. - Fixes a bug in the %autoreload behavior in notebooks which are outside of a repo.
- Fixed a bug where Auto Loader schema evolution can go into an infinite fail loop, when a new column is detected in the schema of a nested JSON object.
- [SPARK-42928][SQL] Makes
resolvePersistentFunction
synchronized. - [SPARK-42967][CORE] Fixes
SparkListenerTaskStart.stageAttemptId
when a task starts after the stage is cancelled. - Operating system security updates.
- Support legacy data source formats in
March 29, 2023
- [SPARK-42794][SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming
- [SPARK-41162][SQL] Fix anti- and semi-join for self-join with aggregations
- [SPARK-42403][CORE] JsonProtocol should handle null JSON strings
- [SPARK-42668][SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort
- Miscellaneous bug fixes.
March 14, 2023
- [SPARK-42534][SQL] Fix DB2Dialect Limit clause
- [SPARK-42622][CORE] Disable substitution in values
- [SPARK-41793][SQL] Incorrect result for window frames defined by a range clause on large decimals
- [SPARK-42484] [SQL] UnsafeRowUtils better error message
- [SPARK-42635][SQL] Fix the TimestampAdd expression.
- [SPARK-42516][SQL] Always capture the session time zone config while creating views
- Operating system security updates.
February 24, 2023
Standardized Connection Options for Query Federation
You can now use a unified set of options (
host
,port
,database
,user
,password
) for connecting to the data sources supported in Query Federation (PostgreSQL, MySQL, Synapse, Snowflake, Redshift, SQL Server). Note thatport
is optional and will use the default port number for each data source if not provided.Example of PostgreSQL connection configuration
CREATE TABLE postgresql_table USING postgresql OPTIONS ( dbtable '<table-name>', host '<host-name>', database '<database-name>', user '<user>', password secret('scope', 'key') );
Example of Snowflake connection configuration
CREATE TABLE snowflake_table USING snowflake OPTIONS ( dbtable '<table-name>', host '<host-name>', port '<port-number>', database '<database-name>', user secret('snowflake_creds', 'my_username'), password secret('snowflake_creds', 'my_password'), schema '<schema-name>', sfWarehouse '<warehouse-name>' );
Revert [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile
[SPARK-42162] Introduce MultiCommutativeOp expression as a memory optimization for canonicalizing large trees of commutative expressions
[SPARK-41990][SQL] Use
FieldReference.column
instead ofapply
in V1 to V2 filter conversion[SPARK-42346][SQL] Rewrite distinct aggregates after subquery merge
Operating system security updates.
February 16, 2023
- Users can now read and write certain Delta tables that require Reader version 3 and Writer version 7, by using Databricks Runtime 9.1 or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.
- SYNC command supports syncing recreated Hive Metastore tables. If a HMS table has been SYNCed previously to Unity Catalog but then dropped and recreated, a subsequent re-sync will work instead of throwing TABLE_ALREADY_EXISTS status code.
- [SPARK-36173][CORE] Support getting CPU number in TaskContext
- [SPARK-42286][SQL] Fallback to previous codegen code path for complex expr with CAST
- [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile
- [SPARK-41219][SQL] IntegralDivide use decimal(1, 0) to represent 0
January 25, 2023
- [SPARK-41660][SQL] Only propagate metadata columns if they are used
- [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark
- [SPARK-41669][SQL] Early pruning in canCollapseExpressions
- Operating system security updates.
January 18, 2023
REFRESH FUNCTION
SQL command now supports SQL functions and SQL Table functions. For example, the command could be used to refresh a persistent SQL function that was updated in another SQL session.- Java Database Connectivity (JDBC) data source v1 now supports LIMIT clause pushdown to improve performance in queries. This feature is enabled by default and can be disabled with
spark.databricks.optimizer.jdbcDSv1LimitPushdown.enabled
set tofalse
. - In Legacy Table ACLs clusters, creating functions that reference JVM classes now requires the
MODIFY_CLASSPATH
privilege. - Java Database Connectivity (JDBC) data source v1 now supports LIMIT clause pushdown to improve performance in queries. This feature is enabled by default and can be disabled with spark.databricks.optimizer.jdbcDSv1LimitPushdown.enabled set to false.
- Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned:
Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Make sure column names do not include any invalid characters such as ';' or whitespace
. - Spark structured streaming now works with format(“deltasharing”) on a delta sharing table as a source.
- [SPARK-38277][SS] Clear write batch after RocksDB state store’s commit
- [SPARK-41733][SQL][SS] Apply tree-pattern based pruning for the rule ResolveWindowTime
- [SPARK-39591][SS] Async Progress Tracking
- [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing
- [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source
- [SPARK-41539][SQL] Remap stats and constraints against output in logical plan for LogicalRDD
- [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing
- [SPARK-41862][SQL] Fix correctness bug related to DEFAULT values in Orc reader
- [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used
- [SPARK-41261][PYTHON][SS] Fix issue for applyInPandasWithState when the columns of grouping keys are not placed in order from earliest
- Operating system security updates.
May 17, 2023
- Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.
- Fixed a regression that caused Azure Databricks jobs to persist after failing to connect to the metastore during cluster initialization.
- [SPARK-41520][SQL] Split
AND_OR
tree pattern to separateAND
andOR
. - [SPARK-43190][SQL]
ListQuery.childOutput
is now consistent with secondary output. - Operating system security updates.
April 25, 2023
- If a Parquet file was read with just the
failOnUnknownFields
option or with Auto Loader in thefailOnNewColumns
schema evolution mode, columns that had different data types would be read asnull
instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use therescuedDataColumn
option. - Auto Loader now correctly reads and no longer rescues
Integer
,Short
,Byte
types if one of these data types are provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be rescued even though they were readable. - [SPARK-42937][SQL]
PlanSubqueries
now setsInSubqueryExec#shouldBroadcast
to true. - Operating system security updates.
- If a Parquet file was read with just the
April 11, 2023
- Support legacy data source formats in SYNC command.
- Fixes a bug in the %autoreload behavior in notebooks which are outside of a repo.
- Fixed a bug where Auto Loader schema evolution can go into an infinite fail loop, when a new column is detected in the schema of a nested JSON object.
- [SPARK-42928][SQL] Make resolvePersistentFunction synchronized.
- [SPARK-42967][CORE] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is cancelled.
March 29, 2023
- [SPARK-42794][SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming
- [SPARK-42403][CORE] JsonProtocol should handle null JSON strings
- [SPARK-42668][SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort
- Operating system security updates.
March 14, 2023
- [SPARK-42635][SQL] Fix the TimestampAdd expression.
- [SPARK-41793][SQL] Incorrect result for window frames defined by a range clause on large decimals
- [SPARK-42484] [SQL] UnsafeRowUtils better error message
- [SPARK-42534][SQL] Fix DB2Dialect Limit clause
- [SPARK-41162][SQL] Fix anti- and semi-join for self-join with aggregations
- [SPARK-42516][SQL] Always capture the session time zone config while creating views
- Miscellaneous bug fixes.
February 28, 2023
Standardized Connection Options for Query Federation
You can now use a unified set of options (
host
,port
,database
,user
,password
) for connecting to the data sources supported in Query Federation (PostgreSQL, MySQL, Synapse, Snowflake, Redshift, SQL Server). Note thatport
is optional and uses the default port number for each data source if not provided.Example of PostgreSQL connection configuration
CREATE TABLE postgresql_table USING postgresql OPTIONS ( dbtable '<table-name>', host '<host-name>', database '<database-name>', user '<user>', password secret('scope', 'key') );
Example of Snowflake connection configuration
CREATE TABLE snowflake_table USING snowflake OPTIONS ( dbtable '<table-name>', host '<host-name>', port '<port-number>', database '<database-name>', user secret('snowflake_creds', 'my_username'), password secret('snowflake_creds', 'my_password'), schema '<schema-name>', sfWarehouse '<warehouse-name>' );
[SPARK-42286][SQL] Fallback to previous codegen code path for complex expr with CAST
[SPARK-41989][PYTHON] Avoid breaking logging config from pyspark.pandas
[SPARK-42346][SQL] Rewrite distinct aggregates after subquery merge
[SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost
[SPARK-42162] Introduce MultiCommutativeOp expression as a memory optimization for canonicalizing large trees of commutative expressions
[SPARK-41990][SQL] Use
FieldReference.column
instead ofapply
in V1 to V2 filter conversionOperating system security updates.
February 16, 2023
- Users can now read and write certain Delta tables that require Reader version 3 and Writer version 7, by using Databricks Runtime 9.1 or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.
- SYNC command supports syncing recreated Hive Metastore tables. If a HMS table has been SYNCed previously to Unity Catalog but then dropped and recreated, a subsequent re-sync will work instead of throwing TABLE_ALREADY_EXISTS status code.
- [SPARK-41219][SQL] IntegralDivide use decimal(1, 0) to represent 0
- [SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in
RewriteDistinctAggregates
- Operating system security updates.
January 25, 2023
- [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark
- [SPARK-41660][SQL] Only propagate metadata columns if they are used
- [SPARK-41669][SQL] Early pruning in canCollapseExpressions
- Miscellaneous bug fixes.
January 18, 2023
REFRESH FUNCTION
SQL command now supports SQL functions and SQL Table functions. For example, the command could be used to refresh a persistent SQL function that was updated in another SQL session.- Java Database Connectivity (JDBC) data source v1 now supports LIMIT clause pushdown to improve performance in queries. This feature is enabled by default and can be disabled with
spark.databricks.optimizer.jdbcDSv1LimitPushdown.enabled
set tofalse
. - Java Database Connectivity (JDBC) data source v1 now supports LIMIT clause pushdown to improve performance in queries. This feature is enabled by default and can be disabled with spark.databricks.optimizer.jdbcDSv1LimitPushdown.enabled set to false.
- Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned:
Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Make sure column names do not include any invalid characters such as ';' or whitespace
. - [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source
- [SPARK-41862][SQL] Fix correctness bug related to DEFAULT values in Orc reader
- [SPARK-41539][SQL] Remap stats and constraints against output in logical plan for LogicalRDD
- [SPARK-39591][SS] Async Progress Tracking
- [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used
- [SPARK-41261][PYTHON][SS] Fix issue for applyInPandasWithState when the columns of grouping keys are not placed in order from earliest
- [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing
- [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing
- [SPARK-38277][SS] Clear write batch after RocksDB state store’s commit
- Operating system security updates.
November 29, 2022
- Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling:
csvignoreleadingwhitespace
, when set totrue
, removes leading whitespace from values during writes whentempformat
is set toCSV
orCSV GZIP
. Whitespaces are retained when the config is set tofalse
. By default, the value istrue
.csvignoretrailingwhitespace
, when set totrue
, removes trailing whitespace from values during writes whentempformat
is set toCSV
orCSV GZIP
. Whitespaces are retained when the config is set tofalse
. By default, the value istrue
.
- Fixed a bug with JSON parsing in Auto Loader when all columns were left as strings (
cloudFiles.inferColumnTypes
was not set or set tofalse
) and the JSON contained nested objects. - Upgrade
snowflake-jdbc
dependency to version 3.13.22. - Table types of JDBC tables are now EXTERNAL by default.
- [SPARK-40906][SQL]
Mode
should copy keys before inserting into Map - Operating system security updates.
- Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling:
November 15, 2022
- Table ACLs and UC Shared clusters now allow the Dataset.toJSON method from python.
- [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behaviorset
spark.sql.json.enablePartialResults
totrue
. The flag is disabled by default to preserve the original behavior - [SPARK-40903][SQL] Avoid reordering decimal Add for canonicalization if data type is changed
- [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries using reference tracking
- [SPARK-40697][SQL] Add read-side char padding to cover external data files
- Operating system security updates.
November 1, 2022
- Structured Streaming in Unity Catalog now supports refreshing temporary access tokens. Streaming workloads running with Unity Catalog all purpose or jobs clusters no longer fail after the initial token expiry.
- Fixed an issue where if a Delta table had a user-defined column named
_change_type
, but Change data feed was disabled on that table, data in that column would incorrectly fill with NULL values when runningMERGE
. - Fixed an issue where running
MERGE
and using exactly 99 columns from the source in the condition could result injava.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to org.apache.spark.sql.catalyst.InternalRow
. - Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when
allowOverwrites
is enabled. - Upgraded Apache commons-text to 1.10.0.
- [SPARK-38881][DSTREAMS][KINESIS][PYSPARK] Added Support for CloudWatch MetricsLevel Config
- [SPARK-40596][CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo
- [SPARK-40670][SS][PYTHON] Fix NPE in applyInPandasWithState when the input schema has “non-nullable” column(s)
- Operating system security updates.
Databricks Runtime 11.2 (EoS)
See Databricks Runtime 11.2 (EoS).
- February 28, 2023
- [SPARK-42286][SQL] Fallback to previous codegen code path for complex expr with CAST
- [SPARK-42346][SQL] Rewrite distinct aggregates after subquery merge
- Operating system security updates.
- February 16, 2023
- Users can now read and write certain Delta tables that require Reader version 3 and Writer version 7, by using Databricks Runtime 9.1 or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.
- SYNC command supports syncing recreated Hive Metastore tables. If a HMS table has been SYNCed previously to Unity Catalog but then dropped and recreated, a subsequent re-sync will work instead of throwing TABLE_ALREADY_EXISTS status code.
- [SPARK-41219][SQL] IntegralDivide use decimal(1, 0) to represent 0
- Operating system security updates.
- January 31, 2023
- Table types of JDBC tables are now EXTERNAL by default.
- [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark
- January 18, 2023
- Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned:
Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Make sure column names do not include any invalid characters such as ';' or whitespace
. - [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source
- [SPARK-41862][SQL] Fix correctness bug related to DEFAULT values in Orc reader
- [SPARK-41539][SQL] Remap stats and constraints against output in logical plan for LogicalRDD
- [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used
- [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing
- [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing
- [SPARK-38277][SS] Clear write batch after RocksDB state store’s commit
- Operating system security updates.
- Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned:
- November 29, 2022
- Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling:
csvignoreleadingwhitespace
, when set totrue
, removes leading whitespace from values during writes whentempformat
is set toCSV
orCSV GZIP
. Whitespaces are retained when the config is set tofalse
. By default, the value istrue
.csvignoretrailingwhitespace
, when set totrue
, removes trailing whitespace from values during writes whentempformat
is set toCSV
orCSV GZIP
. Whitespaces are retained when the config is set tofalse
. By default, the value istrue
.
- Fixed a bug with JSON parsing in Auto Loader when all columns were left as strings (
cloudFiles.inferColumnTypes
was not set or set tofalse
) and the JSON contained nested objects. - [SPARK-40906][SQL]
Mode
should copy keys before inserting into Map - Operating system security updates.
- Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling:
- November 15, 2022
- [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behavior, set
spark.sql.json.enablePartialResults
totrue
. The flag is disabled by default to preserve the original behavior - [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries using reference tracking
- [SPARK-40697][SQL] Add read-side char padding to cover external data files
- Operating system security updates.
- [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behavior, set
- November 1, 2022
- Upgraded Apache commons-text to 1.10.0.
- Fixed an issue where if a Delta table had a user-defined column named
_change_type
, but Change data feed was disabled on that table, data in that column would incorrectly fill with NULL values when runningMERGE
. - Fixed an issue where running
MERGE
and using exactly 99 columns from the source in the condition could result injava.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to org.apache.spark.sql.catalyst.InternalRow
. - Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when
allowOverwrites
is enabled - [SPARK-40596][CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo
- Operating system security updates.
- October 19, 2022
- Fixed an issue with COPY INTO usage with temporary credentials on Unity Catalog enabled clusters / warehouses.
- [SPARK-40213][SQL] Support ASCII value conversion for Latin-1 characters
- Operating system security updates.
- October 5, 2022
- Users can set spark.conf.set(“spark.databricks.io.listKeysWithPrefix.azure.enabled”, “true”) to re-enable native listing for Auto Loader on ADLS Gen2. Native listing was previously turned off due to performance issues, but may have led to an increase in storage costs for customers. This change was rolled out to DBR 10.4 and 9.1 in the previous maintenance update.
- [SPARK-40315][SQL]Support url encode/decode as built-in function and tidy up url-related functions
- [SPARK-40156][SQL]
url_decode()
should the return an error class - [SPARK-40169] Don’t pushdown Parquet filters with no reference to data schema
- [SPARK-40460][SS] Fix streaming metrics when selecting
_metadata
- [SPARK-40468][SQL] Fix column pruning in CSV when _corrupt_record is selected
- [SPARK-40055][SQL] listCatalogs should also return spark_catalog even when spark_catalog implementation is defaultSessionCatalog
- Operating system security updates.
- September 22, 2022
- [SPARK-40315][SQL] Add hashCode() for Literal of ArrayBasedMapData
- [SPARK-40389][SQL] Decimals can’t upcast as integral types if the cast can overflow
- [SPARK-40380][SQL] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan
- [SPARK-40066][SQL][FOLLOW-UP] Check if ElementAt is resolved before getting its dataType
- [SPARK-40109][SQL] New SQL function: get()
- [SPARK-40066][SQL] ANSI mode: always return null on invalid access to map column
- [SPARK-40089][SQL] Fix sorting for some Decimal types
- [SPARK-39887][SQL] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique
- [SPARK-40152][SQL] Fix split_part codegen compilation issue
- [SPARK-40235][CORE] Use interruptible lock instead of synchronized in Executor.updateDependencies()
- [SPARK-40212][SQL] SparkSQL castPartValue does not properly handle byte, short, or float
- [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns
- [SPARK-35542][ML] Fix: Bucketizer created for multiple columns with parameters
- [SPARK-40079] Add Imputer inputCols validation for empty input case
- [SPARK-39912]SPARK-39828[SQL] Refine CatalogImpl
Databricks Runtime 11.1 (EoS)
See Databricks Runtime 11.1 (EoS).
January 31, 2023
- [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark
- Miscellaneous bug fixes.
January 18, 2023
- Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned:
Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Make sure column names do not include any invalid characters such as ';' or whitespace
. - [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source
- [SPARK-41862][SQL] Fix correctness bug related to DEFAULT values in Orc reader
- [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used
- [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing
- [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing
- [SPARK-38277][SS] Clear write batch after RocksDB state store’s commit
- Operating system security updates.
- Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned:
November 29, 2022
- Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling:
csvignoreleadingwhitespace
, when set totrue
, removes leading whitespace from values during writes whentempformat
is set toCSV
orCSV GZIP
. Whitespaces are retained when the config is set tofalse
. By default, the value istrue
.csvignoretrailingwhitespace
, when set totrue
, removes trailing whitespace from values during writes whentempformat
is set toCSV
orCSV GZIP
. Whitespaces are retained when the config is set tofalse
. By default, the value istrue
.
- Fixed a bug with JSON parsing in Auto Loader when all columns were left as strings (
cloudFiles.inferColumnTypes
was not set or set tofalse
) and the JSON contained nested objects. - [SPARK-39650][SS] Fix incorrect value schema in streaming deduplication with backward compatibility
- Operating system security updates.
- Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling:
November 15, 2022
- [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of record can still be parsed correctly instead of returning nulls.To opt-in for the improved behavior, set
spark.sql.json.enablePartialResults
totrue
. The flag is disabled by default to preserve the original behavior - Operating system security updates.
- [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of record can still be parsed correctly instead of returning nulls.To opt-in for the improved behavior, set
November 1, 2022
- Upgraded Apache commons-text to 1.10.0.
- Fixed an issue where if a Delta table had a user-defined column named
_change_type
, but Change data feed was disabled on that table, data in that column would incorrectly fill with NULL values when runningMERGE
. - Fixed an issue where running
MERGE
and using exactly 99 columns from the source in the condition could result injava.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to org.apache.spark.sql.catalyst.InternalRow
. - Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when
allowOverwrites
is enabled - [SPARK-40697][SQL] Add read-side char padding to cover external data files
- [SPARK-40596][CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo
- Operating system security updates.
October 18, 2022
- Fixed an issue with COPY INTO usage with temporary credentials on Unity Catalog enabled clusters / warehouses.
- [SPARK-40213][SQL] Support ASCII value conversion for Latin-1 characters
- Operating system security updates.
October 5, 2022
- Users can set spark.conf.set(“spark.databricks.io.listKeysWithPrefix.azure.enabled”, “true”) to re-enable native listing for Auto Loader on ADLS Gen2. Native listing was previously turned off due to performance issues, but may have led to an increase in storage costs for customers. This change was rolled out to DBR 10.4 and 9.1 in the previous maintenance update.
- [SPARK-40169] Don’t pushdown Parquet filters with no reference to data schema
- [SPARK-40460][SS] Fix streaming metrics when selecting
_metadata
- [SPARK-40468][SQL] Fix column pruning in CSV when _corrupt_record is selected
- [SPARK-40055][SQL] listCatalogs should also return spark_catalog even when spark_catalog implementation is defaultSessionCatalog
- Operating system security updates.
September 22, 2022
- [SPARK-40315][SQL] Add hashCode() for Literal of ArrayBasedMapData
- [SPARK-40380][SQL] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan
- [SPARK-40089][SQL] Fix sorting for some Decimal types
- [SPARK-39887][SQL] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique
- [SPARK-40152][SQL] Fix split_part codegen compilation issue
September 6, 2022
- We have updated the permission model in Table Access Controls (Table ACLs) so that only MODIFY permissions are needed to change a table’s schema or table properties with ALTER TABLE. Previously, these operations required a user to own the table. Ownership is still required to grant permissions on a table, change its owner, change its location, or rename it. This change makes the permission model for Table ACLs more consistent with Unity Catalog.
- [SPARK-40235][CORE] Use interruptible lock instead of synchronized in Executor.updateDependencies()
- [SPARK-40212][SQL] SparkSQL castPartValue does not properly handle byte, short, or float
- [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns
- [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly
- [SPARK-40053][CORE][SQL][TESTS] Add
assume
to dynamic cancel cases which requiring Python runtime environment - [SPARK-35542][CORE][ML] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it
- [SPARK-40079][CORE] Add Imputer inputCols validation for empty input case
August 24, 2022
- Shares, providers, and recipients now support SQL commands to change owners, comment, rename
- [SPARK-39983][CORE][SQL] Do not cache unserialized broadcast relations on the driver
- [SPARK-39912][SPARK-39828][SQL] Refine CatalogImpl
- [SPARK-39775][CORE][AVRO] Disable validate default values when parsing Avro schemas
- [SPARK-39806] Fixed the issue on queries accessing METADATA struct crash on partitioned tables
- [SPARK-39867][SQL] Global limit should not inherit OrderPreservingUnaryNode
- [SPARK-39962][PYTHON][SQL] Apply projection when group attributes are empty
- [SPARK-39839][SQL] Handle special case of null variable-length Decimal with non-zero offsetAndSize in UnsafeRow structural integrity check
- [SPARK-39713][SQL] ANSI mode: add suggestion of using try_element_at for INVALID_ARRAY_INDEX error
- [SPARK-39847][SS] Fix race condition in RocksDBLoader.loadLibrary() if caller thread is interrupted
- [SPARK-39731][SQL] Fix issue in CSV and JSON data sources when parsing dates in “yyyyMMdd” format with CORRECTED time parser policy
- Operating system security updates.
August 10, 2022
- For Delta tables with table access control, automatic schema evolution through DML statements such as
INSERT
andMERGE
is now available for all users who haveMODIFY
permissions on such tables. Additionally, permissions required to perform schema evolution withCOPY INTO
are now lowered fromOWNER
toMODIFY
for consistency with other commands. These changes make the table ACL security model more consistent with the Unity Catalog security model as well as with other operations such as replacing a table. - [SPARK-39889] Enhance the error message of division by 0
- [SPARK-39795] [SQL] New SQL function: try_to_timestamp
- [SPARK-39749] Always use plain string representation on casting decimal as string under ANSI mode
- [SPARK-39625] Rename df.as to df.to
- [SPARK-39787] [SQL] Use error class in the parsing error of function to_timestamp
- [SPARK-39625] [SQL] Add Dataset.as(StructType)
- [SPARK-39689] Support 2-chars
lineSep
in CSV datasource - [SPARK-39579] [SQL][PYTHON][R] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace
- [SPARK-39702] [CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel
- [SPARK-39575] [AVRO] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer
- [SPARK-39265] [SQL] Fix test failure when SPARK_ANSI_SQL_MODE is enabled
- [SPARK-39441] [SQL] Speed up DeduplicateRelations
- [SPARK-39497] [SQL] Improve the analysis exception of missing map key column
- [SPARK-39476] [SQL] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float
- [SPARK-39434] [SQL] Provide runtime error query context when array index is out of bounding
- For Delta tables with table access control, automatic schema evolution through DML statements such as
Databricks Runtime 11.0 (EoS)
See Databricks Runtime 11.0 (EoS).
- November 29, 2022
- Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling:
csvignoreleadingwhitespace
, when set totrue
, removes leading whitespace from values during writes whentempformat
is set toCSV
orCSV GZIP
. Whitespaces are retained when the config is set tofalse
. By default, the value istrue
.csvignoretrailingwhitespace
, when set totrue
, removes trailing whitespace from values during writes whentempformat
is set toCSV
orCSV GZIP
. Whitespaces are retained when the config is set tofalse
. By default, the value istrue
.
- Fixed a bug with JSON parsing in Auto Loader when all columns were left as strings (
cloudFiles.inferColumnTypes
was not set or set tofalse
) and the JSON contained nested objects. - [SPARK-39650][SS] Fix incorrect value schema in streaming deduplication with backward compatibility
- Operating system security updates.
- Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling:
- November 15, 2022
- [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behavior, set
spark.sql.json.enablePartialResults
totrue
. The flag is disabled by default to preserve the original behavior.
- [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behavior, set
- November 1, 2022
- Upgraded Apache commons-text to 1.10.0.
- Fixed an issue where if a Delta table had a user-defined column named
_change_type
, but Change data feed was disabled on that table, data in that column would incorrectly fill with NULL values when runningMERGE
. - Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when
allowOverwrites
is enabled - [SPARK-40697][SQL] Add read-side char padding to cover external data files
- [SPARK-40596][CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo
- Operating system security updates.
- October 18, 2022
- [SPARK-40213][SQL] Support ASCII value conversion for Latin-1 characters
- Operating system security updates.
- October 5, 2022
- Users can set spark.conf.set(“spark.databricks.io.listKeysWithPrefix.azure.enabled”, “true”) to re-enable native listing for Auto Loader on ADLS Gen2. Native listing was previously turned off due to performance issues, but may have led to an increase in storage costs for customers. This change was rolled out to DBR 10.4 and 9.1 in the previous maintenance update.
- [SPARK-40169] Don’t pushdown Parquet filters with no reference to data schema
- [SPARK-40460][SS] Fix streaming metrics when selecting
_metadata
- [SPARK-40468][SQL] Fix column pruning in CSV when _corrupt_record is selected
- Operating system security updates.
- September 22, 2022
- [SPARK-40315][SQL] Add hashCode() for Literal of ArrayBasedMapData
- [SPARK-40380][SQL] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan
- [SPARK-40089][SQL] Fix sorting for some Decimal types
- [SPARK-39887][SQL] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique
- [SPARK-40152][SQL] Fix split_part codegen compilation issue
- September 6, 2022
- [SPARK-40235][CORE] Use interruptible lock instead of synchronized in Executor.updateDependencies()
- [SPARK-40212][SQL] SparkSQL castPartValue does not properly handle byte, short, or float
- [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns
- [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly
- [SPARK-40053][CORE][SQL][TESTS] Add
assume
to dynamic cancel cases which requiring Python runtime environment - [SPARK-35542][CORE][ML] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it
- [SPARK-40079][CORE] Add Imputer inputCols validation for empty input case
- August 24, 2022
- [SPARK-39983][CORE][SQL] Do not cache unserialized broadcast relations on the driver
- [SPARK-39775][CORE][AVRO] Disable validate default values when parsing Avro schemas
- [SPARK-39806] Fixed the issue on queries accessing METADATA struct crash on partitioned tables
- [SPARK-39867][SQL] Global limit should not inherit OrderPreservingUnaryNode
- [SPARK-39962][PYTHON][SQL] Apply projection when group attributes are empty
- Operating system security updates.
- August 9, 2022
- [SPARK-39713][SQL] ANSI mode: add suggestion of using try_element_at for INVALID_ARRAY_INDEX error
- [SPARK-39847] Fix race condition in RocksDBLoader.loadLibrary() if caller thread is interrupted
- [SPARK-39731][SQL] Fix issue in CSV and JSON data sources when parsing dates in “yyyyMMdd” format with CORRECTED time parser policy
- [SPARK-39889] Enhance the error message of division by 0
- [SPARK-39795][SQL] New SQL function: try_to_timestamp
- [SPARK-39749] Always use plain string representation on casting decimal as string under ANSI mode
- [SPARK-39625][SQL] Add Dataset.to(StructType)
- [SPARK-39787][SQL] Use error class in the parsing error of function to_timestamp
- Operating system security updates.
- July 27, 2022
- [SPARK-39689]Support 2-chars
lineSep
in CSV datasource - [SPARK-39104][SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe
- [SPARK-39702][CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel
- [SPARK-39575][AVRO] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer
- [SPARK-39497][SQL] Improve the analysis exception of missing map key column
- [SPARK-39441][SQL] Speed up DeduplicateRelations
- [SPARK-39476][SQL] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float
- [SPARK-39434][SQL] Provide runtime error query context when array index is out of bounding
- [SPARK-39570][SQL] Inline table should allow expressions with alias
- Operating system security updates.
- [SPARK-39689]Support 2-chars
- July 13, 2022
- Make Delta MERGE operation results consistent when source is non-deterministic.
- Fixed an issue for the cloud_files_state TVF when running on non-DBFS paths.
- Disabled Auto Loader’s use of native cloud APIs for directory listing on Azure.
- [SPARK-38796][SQL] Update to_number and try_to_number functions to allow PR with positive numbers
- [SPARK-39272][SQL] Increase the start position of query context by 1
- [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null
- Operating system security updates.
- July 5, 2022
- Improvement on error messages for a range of error classes.
- [SPARK-39451][SQL] Support casting intervals to integrals in ANSI mode
- [SPARK-39361] Don’t use Log4J2’s extended throwable conversion pattern in default logging configurations
- [SPARK-39354][SQL] Ensure show
Table or view not found
even if there aredataTypeMismatchError
related toFilter
at the same time - [SPARK-38675][CORE] Fix race during unlock in BlockInfoManager
- [SPARK-39392][SQL] Refine ANSI error messages for try_* function hints
- [SPARK-39214][SQL][3.3] Improve errors related to CAST
- [SPARK-37939][SQL] Use error classes in the parsing errors of properties
- [SPARK-39085][SQL] Move the error message of
INCONSISTENT_BEHAVIOR_CROSS_VERSION
to error-classes.json - [SPARK-39376][SQL] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN
- [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator
- [SPARK-39285][SQL] Spark should not check field names when reading files
- Operating system security updates.
Databricks Runtime 10.5 (EoS)
See Databricks Runtime 10.5 (EoS).
- November 1, 2022
- Fixed an issue where if a Delta table had a user-defined column named
_change_type
, but Change data feed was disabled on that table, data in that column would incorrectly fill with NULL values when runningMERGE
. - [SPARK-40697][SQL] Add read-side char padding to cover external data files
- [SPARK-40596][CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo
- Operating system security updates.
- Fixed an issue where if a Delta table had a user-defined column named
- October 18, 2022
- Operating system security updates.
- October 5, 2022
- Users can set spark.conf.set(“spark.databricks.io.listKeysWithPrefix.azure.enabled”, “true”) to re-enable native listing for Auto Loader on ADLS Gen2. Native listing was previously turned off due to performance issues, but may have led to an increase in storage costs for customers. This change was rolled out to DBR 10.4 and 9.1 in the previous maintenance update.
- reload4j has been upgraded to 1.2.19 to fix vulnerabilities.
- [SPARK-40460][SS] Fix streaming metrics when selecting
_metadata
- [SPARK-40468][SQL] Fix column pruning in CSV when _corrupt_record is selected
- Operating system security updates.
- September 22, 2022
- [SPARK-40315][SQL] Add hashCode() for Literal of ArrayBasedMapData
- [SPARK-40213][SQL] Support ASCII value conversion for Latin-1 characters
- [SPARK-40380][SQL] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan
- [SPARK-38404][SQL] Improve CTE resolution when a nested CTE references an outer CTE
- [SPARK-40089][SQL] Fix sorting for some Decimal types
- [SPARK-39887][SQL] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique
- Operating system security updates.
- September 6, 2022
- [SPARK-40235][CORE] Use interruptible lock instead of synchronized in Executor.updateDependencies()
- [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly
- [SPARK-40053][CORE][SQL][TESTS] Add
assume
to dynamic cancel cases which requiring Python runtime environment - [SPARK-35542][CORE][ML] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it
- [SPARK-40079][CORE] Add Imputer inputCols validation for empty input case
- August 24, 2022
- [SPARK-39983][CORE][SQL] Do not cache unserialized broadcast relations on the driver
- [SPARK-39775][CORE][AVRO] Disable validate default values when parsing Avro schemas
- [SPARK-39806] Fixed the issue on queries accessing METADATA struct crash on partitioned tables
- [SPARK-39962][PYTHON][SQL] Apply projection when group attributes are empty
- [SPARK-37643][SQL] when charVarcharAsString is true, for char datatype predicate query should skip rpadding rule
- Operating system security updates.
- August 9, 2022
- [SPARK-39847] Fix race condition in RocksDBLoader.loadLibrary() if caller thread is interrupted
- [SPARK-39731][SQL] Fix issue in CSV and JSON data sources when parsing dates in “yyyyMMdd” format with CORRECTED time parser policy
- Operating system security updates.
- July 27, 2022
- [SPARK-39625][SQL] Add Dataset.as(StructType)
- [SPARK-39689]Support 2-chars
lineSep
in CSV datasource - [SPARK-39104][SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe
- [SPARK-39570][SQL] Inline table should allow expressions with alias
- [SPARK-39702][CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel
- [SPARK-39575][AVRO] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer
- [SPARK-39476][SQL] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float
- Operating system security updates.
- July 13, 2022
- Make Delta MERGE operation results consistent when source is non-deterministic.
- [SPARK-39355][SQL] Single column uses quoted to construct UnresolvedAttribute
- [SPARK-39548][SQL] CreateView Command with a window clause query hit a wrong window definition not found issue
- [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null
- Disabled Auto Loader’s use of native cloud APIs for directory listing on Azure.
- Operating system security updates.
- July 5, 2022
- [SPARK-39376][SQL] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN
- Operating system security updates.
- June 15, 2022
- [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator
- [SPARK-39285][SQL] Spark should not check field names when reading files
- [SPARK-34096][SQL] Improve performance for nth_value ignore nulls over offset window
- [SPARK-36718][SQL][FOLLOWUP] Fix the
isExtractOnly
check in CollapseProject
- June 2, 2022
- [SPARK-39166][SQL] Provide runtime error query context for binary arithmetic when WSCG is off
- [SPARK-39093][SQL] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral
- [SPARK-38990][SQL] Avoid NullPointerException when evaluating date_trunc/trunc format as a bound reference
- Operating system security updates.
- May 18, 2022
- Fixes a potential native memory leak in Auto Loader.
- [SPARK-38868][SQL]Don’t propagate exceptions from filter predicate when optimizing outer joins
- [SPARK-38796][SQL] Implement the to_number and try_to_number SQL functions according to a new specification
- [SPARK-38918][SQL] Nested column pruning should filter out attributes that do not belong to the current relation
- [SPARK-38929][SQL] Improve error messages for cast failures in ANSI
- [SPARK-38926][SQL] Output types in error messages in SQL style
- [SPARK-39084][PYSPARK] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion
- [SPARK-32268][SQL] Add ColumnPruning in injectBloomFilter
- [SPARK-38908][SQL] Provide query context in runtime error of Casting from String to Number/Date/Timestamp/Boolean
- [SPARK-39046][SQL] Return an empty context string if TreeNode.origin is wrongly set
- [SPARK-38974][SQL] Filter registered functions with a given database name in list functions
- [SPARK-38762][SQL] Provide query context in Decimal overflow errors
- [SPARK-38931][SS] Create root dfs directory for RocksDBFileManager with unknown number of keys on 1st checkpoint
- [SPARK-38992][CORE] Avoid using bash -c in ShellBasedGroupsMappingProvider
- [SPARK-38716][SQL] Provide query context in map key not exists error
- [SPARK-38889][SQL] Compile boolean column filters to use the bit type for MSSQL data source
- [SPARK-38698][SQL] Provide query context in runtime error of Divide/Div/Reminder/Pmod
- [SPARK-38823][SQL] Make
NewInstance
non-foldable to fix aggregation buffer corruption issue - [SPARK-38809][SS] Implement option to skip null values in symmetric hash implementation of stream-stream joins
- [SPARK-38676][SQL] Provide SQL query context in runtime error message of Add/Subtract/Multiply
- [SPARK-38677][PYSPARK] Python MonitorThread should detect deadlock due to blocking I/O
- Operating system security updates.
Databricks Runtime 10.3 (EoS)
See Databricks Runtime 10.3 (EoS).
- July 27, 2022
- [SPARK-39689]Support 2-chars
lineSep
in CSV datasource - [SPARK-39104][SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe
- [SPARK-39702][CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel
- Operating system security updates.
- [SPARK-39689]Support 2-chars
- July 20, 2022
- Make Delta MERGE operation results consistent when source is non-deterministic.
- [SPARK-39476][SQL] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float
- [SPARK-39548][SQL] CreateView Command with a window clause query hit a wrong window definition not found issue
- [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null
- Operating system security updates.
- July 5, 2022
- [SPARK-39376][SQL] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN
- Operating system security updates.
- June 15, 2022
- [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator
- [SPARK-39285][SQL] Spark should not check field names when reading files
- [SPARK-34096][SQL] Improve performance for nth_value ignore nulls over offset window
- [SPARK-36718][SQL][FOLLOWUP] Fix the
isExtractOnly
check in CollapseProject
- June 2, 2022
- [SPARK-38990][SQL] Avoid NullPointerException when evaluating date_trunc/trunc format as a bound reference
- Operating system security updates.
- May 18, 2022
- Fixes a potential native memory leak in Auto Loader.
- [SPARK-38918][SQL] Nested column pruning should filter out attributes that do not belong to the current relation
- [SPARK-37593][CORE] Reduce default page size by LONG_ARRAY_OFFSET if G1GC and ON_HEAP are used
- [SPARK-39084][PYSPARK] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion
- [SPARK-32268][SQL] Add ColumnPruning in injectBloomFilter
- [SPARK-38974][SQL] Filter registered functions with a given database name in list functions
- [SPARK-38889][SQL] Compile boolean column filters to use the bit type for MSSQL data source
- Operating system security updates.
- May 4, 2022
- Upgraded Java AWS SDK from version 1.11.655 to 1.12.1899.
- April 19, 2022
- [SPARK-38616][SQL] Keep track of SQL query text in Catalyst TreeNode
- Operating system security updates.
- April 6, 2022
- [SPARK-38631][CORE] Uses Java-based implementation for un-tarring at Utils.unpack
- Operating system security updates.
- March 22, 2022
- Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user’s home directory. Previously, the working directory was
/databricks/driver
. - [SPARK-38437][SQL] Lenient serialization of datetime from datasource
- [SPARK-38180][SQL] Allow safe up-cast expressions in correlated equality predicates
- [SPARK-38155][SQL] Disallow distinct aggregate in lateral subqueries with unsupported predicates
- [SPARK-38325][SQL] ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()
- Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user’s home directory. Previously, the working directory was
- March 14, 2022
- Improved transaction conflict detection for empty transactions in Delta Lake.
- [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty
- [SPARK-38318][SQL] regression when replacing a dataset view
- [SPARK-38236][SQL] Absolute file paths specified in create/alter table are treated as relative
- [SPARK-35937][SQL] Extracting date field from timestamp should work in ANSI mode
- [SPARK-34069][SQL] Kill barrier tasks should respect
SPARK_JOB_INTERRUPT_ON_CANCEL
- [SPARK-37707][SQL] Allow store assignment between TimestampNTZ and Date/Timestamp
- February 23, 2022
- [SPARK-27442][SQL] Remove check field name when reading/writing data in parquet
Databricks Runtime 10.2 (EoS)
See Databricks Runtime 10.2 (EoS).
- June 15, 2022
- [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator
- [SPARK-39285][SQL] Spark should not check field names when reading files
- [SPARK-34096][SQL] Improve performance for nth_value ignore nulls over offset window
- June 2, 2022
- [SPARK-38918][SQL] Nested column pruning should filter out attributes that do not belong to the current relation
- [SPARK-38990][SQL] Avoid NullPointerException when evaluating date_trunc/trunc format as a bound reference
- Operating system security updates.
- May 18, 2022
- Fixes a potential native memory leak in Auto Loader.
- [SPARK-39084][PYSPARK] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion
- [SPARK-38889][SQL] Compile boolean column filters to use the bit type for MSSQL data source
- [SPARK-38931][SS] Create root dfs directory for RocksDBFileManager with unknown number of keys on 1st checkpoint
- Operating system security updates.
- May 4, 2022
- Upgraded Java AWS SDK from version 1.11.655 to 1.12.1899.
- April 19, 2022
- Operating system security updates.
- Miscellaneous bug fixes.
- April 6, 2022
- [SPARK-38631][CORE] Uses Java-based implementation for un-tarring at Utils.unpack
- Operating system security updates.
- March 22, 2022
- Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user’s home directory. Previously, the working directory was
/databricks/driver
. - [SPARK-38437][SQL] Lenient serialization of datetime from datasource
- [SPARK-38180][SQL] Allow safe up-cast expressions in correlated equality predicates
- [SPARK-38155][SQL] Disallow distinct aggregate in lateral subqueries with unsupported predicates
- [SPARK-38325][SQL] ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()
- Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user’s home directory. Previously, the working directory was
- March 14, 2022
- Improved transaction conflict detection for empty transactions in Delta Lake.
- [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty
- [SPARK-38318][SQL] regression when replacing a dataset view
- [SPARK-38236][SQL] Absolute file paths specified in create/alter table are treated as relative
- [SPARK-35937][SQL] Extracting date field from timestamp should work in ANSI mode
- [SPARK-34069][SQL] Kill barrier tasks should respect
SPARK_JOB_INTERRUPT_ON_CANCEL
- [SPARK-37707][SQL] Allow store assignment between TimestampNTZ and Date/Timestamp
- February 23, 2022
- [SPARK-37577][SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning
- February 8, 2022
- [SPARK-27442][SQL] Remove check field name when reading/writing data in parquet.
- Operating system security updates.
- February 1, 2022
- Operating system security updates.
- January 26, 2022
- Fixed a bug where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.
- Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.
- January 19, 2022
- Introduced support for inlining temporary credentials to COPY INTO for loading the source data without requiring SQL ANY_FILE permissions
- Bug fixes and security enhancements.
- December 20, 2021
- Fixed a rare bug with Parquet column index based filtering.
Databricks Runtime 10.1 (EoS)
See Databricks Runtime 10.1 (EoS).
- June 15, 2022
- [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator
- [SPARK-39285][SQL] Spark should not check field names when reading files
- [SPARK-34096][SQL] Improve performance for nth_value ignore nulls over offset window
- June 2, 2022
- Operating system security updates.
- May 18, 2022
- Fixes a potential native memory leak in Auto Loader.
- [SPARK-39084][PYSPARK] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion
- [SPARK-38889][SQL] Compile boolean column filters to use the bit type for MSSQL data source
- Operating system security updates.
- April 19, 2022
- [SPARK-37270][SQL] Fix push foldable into CaseWhen branches if elseValue is empty
- Operating system security updates.
- April 6, 2022
- [SPARK-38631][CORE] Uses Java-based implementation for un-tarring at Utils.unpack
- Operating system security updates.
- March 22, 2022
- [SPARK-38437][SQL] Lenient serialization of datetime from datasource
- [SPARK-38180][SQL] Allow safe up-cast expressions in correlated equality predicates
- [SPARK-38155][SQL] Disallow distinct aggregate in lateral subqueries with unsupported predicates
- [SPARK-38325][SQL] ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()
- March 14, 2022
- Improved transaction conflict detection for empty transactions in Delta Lake.
- [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty
- [SPARK-38318][SQL] regression when replacing a dataset view
- [SPARK-38236][SQL] Absolute file paths specified in create/alter table are treated as relative
- [SPARK-35937][SQL] Extracting date field from timestamp should work in ANSI mode
- [SPARK-34069][SQL] Kill barrier tasks should respect
SPARK_JOB_INTERRUPT_ON_CANCEL
- [SPARK-37707][SQL] Allow store assignment between TimestampNTZ and Date/Timestamp
- February 23, 2022
- [SPARK-37577][SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning
- February 8, 2022
- [SPARK-27442][SQL] Remove check field name when reading/writing data in parquet.
- Operating system security updates.
- February 1, 2022
- Operating system security updates.
- January 26, 2022
- Fixed a bug where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.
- Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.
- January 19, 2022
- Introduced support for inlining temporary credentials to COPY INTO for loading the source data without requiring SQL ANY_FILE permissions
- Fixed an out of memory issue with query result caching under certain conditions.
- Fixed an issue with
USE DATABASE
when a user switches the current catalog to a non-default catalog. - Bug fixes and security enhancements.
- Operating system security updates.
- December 20, 2021
- Fixed a rare bug with Parquet column index based filtering.
Databricks Runtime 10.0 (EoS)
See Databricks Runtime 10.0 (EoS).
- April 19, 2022
- [SPARK-37270][SQL] Fix push foldable into CaseWhen branches if elseValue is empty
- Operating system security updates.
- April 6, 2022
- [SPARK-38631][CORE] Uses Java-based implementation for un-tarring at Utils.unpack
- Operating system security updates.
- March 22, 2022
- [SPARK-38437][SQL] Lenient serialization of datetime from datasource
- [SPARK-38180][SQL] Allow safe up-cast expressions in correlated equality predicates
- [SPARK-38155][SQL] Disallow distinct aggregate in lateral subqueries with unsupported predicates
- [SPARK-38325][SQL] ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()
- March 14, 2022
- Improved transaction conflict detection for empty transactions in Delta Lake.
- [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty
- [SPARK-38318][SQL] regression when replacing a dataset view
- [SPARK-38236][SQL] Absolute file paths specified in create/alter table are treated as relative
- [SPARK-35937][SQL] Extracting date field from timestamp should work in ANSI mode
- [SPARK-34069][SQL] Kill barrier tasks should respect
SPARK_JOB_INTERRUPT_ON_CANCEL
- [SPARK-37707][SQL] Allow store assignment between TimestampNTZ and Date/Timestamp
- February 23, 2022
- [SPARK-37577][SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning
- February 8, 2022
- [SPARK-27442][SQL] Remove check field name when reading/writing data in parquet.
- [SPARK-36905][SQL] Fix reading hive views without explicit column names
- [SPARK-37859][SQL] Fix issue that SQL tables created with JDBC with Spark 3.1 are not readable with 3.2
- Operating system security updates.
- February 1, 2022
- Operating system security updates.
- January 26, 2022
- Fixed a bug where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.
- Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.
- January 19, 2022
- Bug fixes and security enhancements.
- Operating system security updates.
- December 20, 2021
- Fixed a rare bug with Parquet column index based filtering.
- November 9, 2021
- Introduced additional configuration flags to enable fine grained control of ANSI behaviors.
- November 4, 2021
- Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException
- Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: No FileSystem for scheme
or that might cause modifications tosparkContext.hadoopConfiguration
to not take effect in queries. - The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.
- November 30, 2021
- Fixed an issue with timestamp parsing where a timezone string without a colon was considered invalid.
- Fixed an out of memory issue with query result caching under certain conditions.
- Fixed an issue with
USE DATABASE
when a user switches the current catalog to a non-default catalog.
Databricks Runtime 9.0 (EoS)
See Databricks Runtime 9.0 (EoS).
- February 8, 2022
- Operating system security updates.
- February 1, 2022
- Operating system security updates.
- January 26, 2022
- Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.
- January 19, 2022
- Bug fixes and security enhancements.
- Operating system security updates.
- November 4, 2021
- Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException
- Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: No FileSystem for scheme
or that might cause modifications tosparkContext.hadoopConfiguration
to not take effect in queries. - The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.
- September 22, 2021
- Fixed a bug in cast Spark array with null to string
- September 15, 2021
- Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x
.
- Fixed a race condition that might cause a query failure with an IOException like
- September 8, 2021
- Added support for schema name (
databaseName.schemaName.tableName
format) as the target table name for Azure Synapse Connector. - Added geometry and geography JDBC types support for Spark SQL.
- [SPARK-33527][SQL] Extended the function of decode to be consistent with mainstream databases.
- [SPARK-36532][CORE][3.1] Fixed deadlock in
CoarseGrainedExecutorBackend.onDisconnected
to avoidexecutorsconnected
to prevent executor shutdown hang.
- Added support for schema name (
- August 25, 2021
- SQL Server driver library was upgraded to 9.2.1.jre8.
- Snowflake connector was upgraded to 2.9.0.
- Fixed broken link to best trial notebook on AutoML experiment page.
Databricks Runtime 8.4 (EoS)
See Databricks Runtime 8.4 (EoS).
- January 19, 2022
- Operating system security updates.
- November 4, 2021
- Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException
- Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: No FileSystem for scheme
or that might cause modifications tosparkContext.hadoopConfiguration
to not take effect in queries. - The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.
- September 22, 2021
- Spark JDBC driver was upgraded to 2.6.19.1030
- [SPARK-36734][SQL] Upgrade ORC to 1.5.1
- September 15, 2021
- Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x
. - Operating system security updates.
- Fixed a race condition that might cause a query failure with an IOException like
- September 8, 2021
- [SPARK-36532][CORE][3.1] Fixed deadlock in
CoarseGrainedExecutorBackend.onDisconnected
to avoidexecutorsconnected
to prevent executor shutdown hang.
- [SPARK-36532][CORE][3.1] Fixed deadlock in
- August 25, 2021
- SQL Server driver library was upgraded to 9.2.1.jre8.
- Snowflake connector was upgraded to 2.9.0.
- Fixes a bug in credential passthrough caused by the new Parquet prefetch optimization, where user’s passthrough credential might not be found during file access.
- August 11, 2021
- Fixes a RocksDB incompatibility problem that prevents older Databricks Runtime 8.4. This fixes forward compatibility for Auto Loader,
COPY INTO
, and stateful streaming applications. - Fixes a bug when using Auto Loader to read CSV files with mismatching header files. If column names do not match, the column would be filled in with nulls. Now, if a schema is provided, it assumes the schema is the same and will only save column mismatches if rescued data columns are enabled.
- Adds a new option called
externalDataSource
into the Azure Synapse connector to remove theCONTROL
permission requirement on the database for PolyBase reading.
- Fixes a RocksDB incompatibility problem that prevents older Databricks Runtime 8.4. This fixes forward compatibility for Auto Loader,
- July 29, 2021
- [SPARK-36034][BUILD] Rebase datetime in pushed down filters to Parquet
- [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add
connectionProvider
option
Databricks Runtime 8.3 (EoS)
See Databricks Runtime 8.3 (EoS).
- January 19, 2022
- Operating system security updates.
- November 4, 2021
- Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException
- Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: No FileSystem for scheme
or that might cause modifications tosparkContext.hadoopConfiguration
to not take effect in queries.
- September 22, 2021
- Spark JDBC driver was upgraded to 2.6.19.1030
- September 15, 2021
- Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x
. - Operating system security updates.
- Fixed a race condition that might cause a query failure with an IOException like
- September 8, 2021
- [SPARK-35700][SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.
- [SPARK-36532][CORE][3.1] Fixed deadlock in
CoarseGrainedExecutorBackend.onDisconnected
to avoidexecutorsconnected
to prevent executor shutdown hang.
- August 25, 2021
- SQL Server driver library was upgraded to 9.2.1.jre8.
- Snowflake connector was upgraded to 2.9.0.
- Fixes a bug in credential passthrough caused by the new Parquet prefetch optimization, where user’s passthrough credential might not be found during file access.
- August 11, 2021
- Fixes a bug when using Auto Loader to read CSV files with mismatching header files. If column names do not match, the column would be filled in with nulls. Now, if a schema is provided, it assumes the schema is the same and will only save column mismatches if rescued data columns are enabled.
- July 29, 2021
- Upgrade Databricks Snowflake Spark connector to 2.9.0-spark-3.1
- [SPARK-36034][BUILD] Rebase datetime in pushed down filters to Parquet
- [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add
connectionProvider
option
- July 14, 2021
- Fixed an issue when using column names with dots in Azure Synapse connector.
- Introduced
database.schema.table
format for Synapse Connector. - Added support to provide
databaseName.schemaName.tableName
format as the target table instead of onlyschemaName.tableName
ortableName
.
- June 15, 2021
- Fixed a
NoSuchElementException
bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses - Adds SQL
CREATE GROUP
,DROP GROUP
,ALTER GROUP
,SHOW GROUPS
, andSHOW USERS
commands. For details, see Security statements and Show statements.
- Fixed a
Databricks Runtime 8.2 (EoS)
See Databricks Runtime 8.2 (EoS).
September 22, 2021
- Operating system security updates.
September 15, 2021
- Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x
.
- Fixed a race condition that might cause a query failure with an IOException like
September 8, 2021
- [SPARK-35700][SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.
- [SPARK-36532][CORE][3.1] Fixed deadlock in
CoarseGrainedExecutorBackend.onDisconnected
to avoidexecutorsconnected
to prevent executor shutdown hang.
August 25, 2021
- Snowflake connector was upgraded to 2.9.0.
August 11, 2021
- [SPARK-36034][SQL] Rebase datetime in pushed down filters to parquet.
July 29, 2021
- Upgrade Databricks Snowflake Spark connector to 2.9.0-spark-3.1
- [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add
connectionProvider
option
July 14, 2021
- Fixed an issue when using column names with dots in Azure Synapse connector.
- Introduced
database.schema.table
format for Synapse Connector. - Added support to provide
databaseName.schemaName.tableName
format as the target table instead of onlyschemaName.tableName
ortableName
. - Fixed a bug that prevents users from time traveling to older available versions with Delta tables.
June 15, 2021
- Fixes a
NoSuchElementException
bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses
- Fixes a
May 26, 2021
- Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).
April 30, 2021
- Operating system security updates.
- [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
- [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
- Fixed an OOM issue when Auto Loader reports Structured Streaming progress metrics.
Databricks Runtime 8.1 (EoS)
See Databricks Runtime 8.1 (EoS).
September 22, 2021
- Operating system security updates.
September 15, 2021
- Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x
.
- Fixed a race condition that might cause a query failure with an IOException like
September 8, 2021
- [SPARK-35700][SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.
- [SPARK-36532][CORE][3.1] Fixed deadlock in
CoarseGrainedExecutorBackend.onDisconnected
to avoidexecutorsconnected
to prevent executor shutdown hang.
August 25, 2021
- Snowflake connector was upgraded to 2.9.0.
August 11, 2021
- [SPARK-36034][SQL] Rebase datetime in pushed down filters to parquet.
July 29, 2021
- Upgrade Databricks Snowflake Spark connector to 2.9.0-spark-3.1
- [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add
connectionProvider
option
July 14, 2021
- Fixed an issue when using column names with dots in Azure Synapse connector.
- Fixed a bug that prevents users from time traveling to older available versions with Delta tables.
June 15, 2021
- Fixes a
NoSuchElementException
bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses
- Fixes a
May 26, 2021
- Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).
April 30, 2021
- Operating system security updates.
- [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
- Fixed an OOM issue when Auto Loader reports Structured Streaming progress metrics.
April 27, 2021
- [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
- [SPARK-34856][SQL] ANSI mode: Allow casting complex types as string type
- [SPARK-35014] Fix the PhysicalAggregation pattern to not rewrite foldable expressions
- [SPARK-34769][SQL] AnsiTypeCoercion: return narrowest convertible type among TypeCollection
- [SPARK-34614][SQL] ANSI mode: Casting String to Boolean will throw exception on parse error
- [SPARK-33794][SQL] ANSI mode: Fix NextDay expression to throw runtime IllegalArgumentException when receiving invalid input under
Databricks Runtime 8.0 (EoS)
See Databricks Runtime 8.0 (EoS).
September 15, 2021
- Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x
.
- Fixed a race condition that might cause a query failure with an IOException like
August 25, 2021
- Snowflake connector was upgraded to 2.9.0.
August 11, 2021
- [SPARK-36034][SQL] Rebase datetime in pushed down filters to parquet.
July 29, 2021
- [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add
connectionProvider
option
- [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add
July 14, 2021
- Fixed an issue when using column names with dots in Azure Synapse connector.
- Fixed a bug that prevents users from time traveling to older available versions with Delta tables.
May 26, 2021
- Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).
April 30, 2021
- Operating system security updates.
- [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
- [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
March 24, 2021
- [SPARK-34681][SQL] Fix bug for full outer shuffled hash join when building left side with non-equal condition
- [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks
- [SPARK-34613][SQL] Fix view does not capture disable hint config
March 9, 2021
- [SPARK-34543][SQL] Respect the
spark.sql.caseSensitive
config while resolving partition spec in v1SET LOCATION
- [SPARK-34392][SQL] Support ZoneOffset +h:mm in DateTimeUtils. getZoneId
- [UI] Fix the href link of Spark DAG Visualization
- [SPARK-34436][SQL] DPP support LIKE ANY/ALL expression
- [SPARK-34543][SQL] Respect the
Databricks Runtime 7.6 (EoS)
See Databricks Runtime 7.6 (EoS).
- August 11, 2021
- [SPARK-36034][SQL] Rebase datetime in pushed down filters to parquet.
- July 29, 2021
- [SPARK-32998][BUILD] Add ability to override default remote repos with internal repos only
- July 14, 2021
- Fixed a bug that prevents users from time traveling to older available versions with Delta tables.
- May 26, 2021
- Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).
- April 30, 2021
- Operating system security updates.
- [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
- [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
- March 24, 2021
- [SPARK-34768][SQL] Respect the default input buffer size in Univocity
- [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks
- March 9, 2021
- (Azure only) Fixed an Auto Loader bug that can cause NullPointerException when using Databricks Runtime 7.6 to run an old Auto Loader stream created in Databricks Runtime 7.2
- [UI] Fix the href link of Spark DAG Visualization
- Unknown leaf-node SparkPlan is not handled correctly in SizeInBytesOnlyStatsSparkPlanVisitor
- Restore the output schema of
SHOW DATABASES
- [Delta][8.0, 7.6] Fixed calculation bug in file size auto-tuning logic
- Disable staleness check for Delta table files in disk cache
- [SQL] Use correct dynamic pruning build key when range join hint is present
- Disable char type support in non-SQL code path
- Avoid NPE in DataFrameReader.schema
- Fix NPE when EventGridClient response has no entity
- Fix a read closed stream bug in Azure Auto Loader
- [SQL] Do not generate shuffle partition number advice when AOS is enabled
- February 24, 2021
- Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.
- Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file’s decimal precision and scale are different from the Spark schema.
- Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.
- Introduced a new configuration
spark.databricks.hive.metastore.init.reloadFunctions.enabled
. This configuration controls the built in Hive initialization. When set to true, Azure Databricks reloads all functions from all databases that users have intoFunctionRegistry
. This is the default behavior in Hive Metastore. When set to false, Azure Databricks disables this process for optimization. - [SPARK-34212] Fixed issues related to reading decimal data from Parquet files.
- [SPARK-34260][SQL] Fix UnresolvedException when creating temp view twice.
Databricks Runtime 7.5 (EoS)
See Databricks Runtime 7.5 (EoS).
- May 26, 2021
- Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).
- April 30, 2021
- Operating system security updates.
- [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
- [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
- March 24, 2021
- [SPARK-34768][SQL] Respect the default input buffer size in Univocity
- [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks
- March 9, 2021
- (Azure only) Fixed an Auto Loader bug that can cause NullPointerException when using Databricks Runtime 7.5 to run an old Auto Loader stream created in Databricks Runtime 7.2.
- [UI] Fix the href link of Spark DAG Visualization
- Unknown leaf-node SparkPlan is not handled correctly in SizeInBytesOnlyStatsSparkPlanVisitor
- Restore the output schema of
SHOW DATABASES
- Disable staleness check for Delta table files in disk cache
- [SQL] Use correct dynamic pruning build key when range join hint is present
- Disable char type support in non-SQL code path
- Avoid NPE in DataFrameReader.schema
- Fix NPE when EventGridClient response has no entity
- Fix a read closed stream bug in Azure Auto Loader
- February 24, 2021
- Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.
- Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file’s decimal precision and scale are different from the Spark schema.
- Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.
- Introduced a new configuration
spark.databricks.hive.metastore.init.reloadFunctions.enabled
. This configuration controls the built in Hive initialization. When set to true, Azure Databricks reloads all functions from all databases that users have intoFunctionRegistry
. This is the default behavior in Hive Metastore. When set to false, Azure Databricks disables this process for optimization. - [SPARK-34212] Fixed issues related to reading decimal data from Parquet files.
- [SPARK-34260][SQL] Fix UnresolvedException when creating temp view twice.
- February 4, 2021
- Fixed a regression that prevents the incremental execution of a query that sets a global limit such as
SELECT * FROM table LIMIT nrows
. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled. - Introduced write time checks to the Hive client to prevent the corruption of metadata in the Hive metastore for Delta tables.
- Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.
- Fixed a regression that prevents the incremental execution of a query that sets a global limit such as
- January 20, 2021
- Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
- These two DataFrames have common columns, but the output of the self join does not have common columns. For example,
df.join(df.select($"col" as "new_col"), cond)
- The derived DataFrame excludes some columns via select, groupBy, or window.
- The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example,
df.join(df.drop("a"), df("a") === 1)
- These two DataFrames have common columns, but the output of the self join does not have common columns. For example,
- Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
- January 12, 2021
- Upgrade Azure Storage SDK from 2.3.8 to 2.3.9.
- [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value
- [SPARK-33480][SQL] updates the error message of char/varchar table insertion length check
Databricks Runtime 7.3 LTS (EoS)
See Databricks Runtime 7.3 LTS (EoS).
September 10, 2023
- Miscellaneous bug fixes.
August 30, 2023
- Operating system security updates.
August 15, 2023
- Operating system security updates.
June 23, 2023
- Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.
- Operating system security updates.
June 15, 2023
- [SPARK-43413][SQL] Fix
IN
subqueryListQuery
nullability. - Operating system security updates.
- [SPARK-43413][SQL] Fix
June 2, 2023
- Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.
May 17, 2023
- Operating system security updates.
April 25, 2023
- Operating system security updates.
April 11, 2023
- [SPARK-42967][CORE] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is cancelled.
- Miscellaneous bug fixes.
March 29, 2023
- Operating system security updates.
March 14, 2023
- Miscellaneous bug fixes.
February 28, 2023
- Operating system security updates.
February 16, 2023
- Operating system security updates.
January 31, 2023
- Table types of JDBC tables are now EXTERNAL by default.
January 18, 2023
- Operating system security updates.
November 29, 2022
- Miscellaneous bug fixes.
November 15, 2022
- Upgraded Apache commons-text to 1.10.0.
- Operating system security updates.
- Miscellaneous bug fixes.
November 1, 2022
- [SPARK-38542][SQL] UnsafeHashedRelation should serialize numKeys out
October 18, 2022
- Operating system security updates.
October 5, 2022
- Miscellaneous bug fixes.
- Operating system security updates.
September 22, 2022
- [SPARK-40089][SQL] Fix sorting for some Decimal types
September 6, 2022
- [SPARK-35542][CORE][ML] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it
- [SPARK-40079][CORE] Add Imputer inputCols validation for empty input case
August 24, 2022
- [SPARK-39962][PYTHON][SQL] Apply projection when group attributes are empty
- Operating system security updates.
August 9, 2022
- Operating system security updates.
July 27, 2022
- Make Delta MERGE operation results consistent when source is non-deterministic.
- Operating system security updates.
- Miscellaneous bug fixes.
July 13, 2022
- [SPARK-32680][SQL] Don’t Preprocess V2 CTAS with Unresolved Query
- Disabled Auto Loader’s use of native cloud APIs for directory listing on Azure.
- Operating system security updates.
July 5, 2022
- Operating system security updates.
- Miscellaneous bug fixes.
June 2, 2022
- [SPARK-38918][SQL] Nested column pruning should filter out attributes that do not belong to the current relation
- Operating system security updates.
May 18, 2022
- Upgrade AWS SDK version from 1.11.655 to 1.11.678.
- Operating system security updates.
- Miscellaneous bug fixes.
April 19, 2022
- Operating system security updates.
- Miscellaneous bug fixes.
April 6, 2022
- Operating system security updates.
- Miscellaneous bug fixes.
March 14, 2022
- Remove vulnerable classes from log4j 1.2.17 jar
- Miscellaneous bug fixes.
February 23, 2022
- [SPARK-37859][SQL] Do not check for metadata during schema comparison
February 8, 2022
- Upgrade Ubuntu JDK to 1.8.0.312.
- Operating system security updates.
February 1, 2022
- Operating system security updates.
January 26, 2022
- Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.
January 19, 2022
- Conda defaults channel is removed from 7.3 ML LTS
- Operating system security updates.
December 7, 2021
- Operating system security updates.
November 4, 2021
- Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException
- Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: No FileSystem for scheme
or that might cause modifications tosparkContext.hadoopConfiguration
to not take effect in queries.
September 15, 2021
- Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x
. - Operating system security updates.
- Fixed a race condition that might cause a query failure with an IOException like
September 8, 2021
- [SPARK-35700][SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.
- [SPARK-36532][CORE][3.1] Fixed deadlock in
CoarseGrainedExecutorBackend.onDisconnected
to avoidexecutorsconnected
to prevent executor shutdown hang.
August 25, 2021
- Snowflake connector was upgraded to 2.9.0.
July 29, 2021
- [SPARK-36034][BUILD] Rebase datetime in pushed down filters to Parquet
- [SPARK-34508][BUILD] Skip
HiveExternalCatalogVersionsSuite
if network is down
July 14, 2021
- Introduced
database.schema.table
format for Azure Synapse connector. - Added support to provide
databaseName.schemaName.tableName
format as the target table instead of onlyschemaName.tableName
ortableName
. - Fixed a bug that prevents users from time traveling to older available versions with Delta tables.
- Introduced
June 15, 2021
- Fixes a
NoSuchElementException
bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses - Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).
- Fixes a
April 30, 2021
- Operating system security updates.
- [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
- [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
- [SPARK-35045][SQL] Add an internal option to control input buffer in univocity
March 24, 2021
- [SPARK-34768][SQL] Respect the default input buffer size in Univocity
- [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks
- [SPARK-33118][SQL]CREATE TEMPORARY TABLE fails with location
March 9, 2021
- The updated Azure Blob File System driver for Azure Data Lake Storage Gen2 is now enabled by default. It brings multiple stability improvements.
- Fix path separator on Windows for
databricks-connect get-jar-dir
- [UI] Fix the href link of Spark DAG Visualization
- [DBCONNECT] Add support for FlatMapCoGroupsInPandas in Databricks Connect 7.3
- Restore the output schema of
SHOW DATABASES
- [SQL] Use correct dynamic pruning build key when range join hint is present
- Disable staleness check for Delta table files in disk cache
- [SQL] Do not generate shuffle partition number advice when AOS is enable
February 24, 2021
- Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.
- Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file’s decimal precision and scale are different from the Spark schema.
- Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.
- Introduced a new configuration
spark.databricks.hive.metastore.init.reloadFunctions.enabled
. This configuration controls the built in Hive initialization. When set to true, Azure Databricks reloads all functions from all databases that users have intoFunctionRegistry
. This is the default behavior in Hive Metastore. When set to false, Azure Databricks disables this process for optimization. - [SPARK-34212] Fixed issues related to reading decimal data from Parquet files.
- [SPARK-33579][UI] Fix executor blank page behind proxy.
- [SPARK-20044][UI] Support Spark UI behind front-end reverse proxy using a path prefix.
- [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.
February 4, 2021
- Fixed a regression that prevents the incremental execution of a query that sets a global limit such as
SELECT * FROM table LIMIT nrows
. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled. - Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.
- Fixed a regression that prevents the incremental execution of a query that sets a global limit such as
January 20, 2021
- Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
- These two DataFrames have common columns, but the output of the self join does not have common columns. For example,
df.join(df.select($"col" as "new_col"), cond)
- The derived DataFrame excludes some columns via select, groupBy, or window.
- The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example,
df.join(df.drop("a"), df("a") === 1)
- These two DataFrames have common columns, but the output of the self join does not have common columns. For example,
- Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
January 12, 2021
- Operating system security updates.
- [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value
- [SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar
- [SPARK-33592][ML][PYTHON] Pyspark ML Validator params in estimatorParamMaps may be lost after saving and reloading
- [SPARK-33071][SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin
December 8, 2020
- [SPARK-33587][CORE] Kill the executor on nested fatal errors
- [SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column
- [SPARK-33316][SQL] Support user provided nullable Avro schema for non-nullable catalyst schema in Avro writing
- Spark Jobs launched using Databricks Connect could hang indefinitely with
Executor$TaskRunner.$anonfun$copySessionState
in executor stack trace - Operating system security updates.
December 1, 2020
- [SPARK-33404][SQL][3.0] Fix incorrect results in
date_trunc
expression - [SPARK-33339][PYTHON] Pyspark application will hang due to non Exception error
- [SPARK-33183][SQL][HOTFIX] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts
- [SPARK-33371][PYTHON][3.0] Update setup.py and tests for Python 3.9
- [SPARK-33391][SQL] element_at with CreateArray not respect one based index.
- [SPARK-33306][SQL]Timezone is needed when cast date to string
- [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream
- [SPARK-33404][SQL][3.0] Fix incorrect results in
November 5, 2020
- Fix ABFS and WASB locking with regard to
UserGroupInformation.getCurrentUser()
. - Fix an infinite loop bug when Avro reader reads the MAGIC bytes.
- Add support for the USAGE privilege.
- Performance improvements for privilege checking in table access control.
- Fix ABFS and WASB locking with regard to
October 13, 2020
- Operating system security updates.
- You can read and write from DBFS using the FUSE mount at /dbfs/ when on a high concurrency credential passthrough enabled cluster. Regular mounts are supported but mounts that need passthrough credentials are not supported yet.
- [SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
- [SPARK-32585][SQL] Support scala enumeration in ScalaReflection
- Fixed listing directories in FUSE mount that contain file names with invalid XML characters
- FUSE mount no longer uses ListMultipartUploads
September 29, 2020
- [SPARK-32718][SQL] Remove unnecessary keywords for interval units
- [SPARK-32635][SQL] Fix foldable propagation
- Add a new config
spark.shuffle.io.decoder.consolidateThreshold
. Set the config value toLong.MAX_VALUE
to skip the consolidation of netty FrameBuffers, which preventsjava.lang.IndexOutOfBoundsException
in corner cases.
April 25, 2023
- Operating system security updates.
April 11, 2023
- Miscellaneous bug fixes.
March 29, 2023
- Miscellaneous bug fixes.
March 14, 2023
- Operating system security updates.
February 28, 2023
- Operating system security updates.
February 16, 2023
- Operating system security updates.
January 31, 2023
- Miscellaneous bug fixes.
January 18, 2023
- Operating system security updates.
November 29, 2022
- Operating system security updates.
November 15, 2022
- Operating system security updates.
- Miscellaneous bug fixes.
November 1, 2022
- Operating system security updates.
October 18, 2022
- Operating system security updates.
- October 5, 2022
- Operating system security updates.
- August 24, 2022
- Operating system security updates.
- August 9, 2022
- Operating system security updates.
- July 27, 2022
- Operating system security updates.
- July 5, 2022
- Operating system security updates.
- June 2, 2022
- Operating system security updates.
- May 18, 2022
- Operating system security updates.
- April 19, 2022
- Operating system security updates.
- Miscellaneous bug fixes.
- April 6, 2022
- Operating system security updates.
- Miscellaneous bug fixes.
- March 14, 2022
- Miscellaneous bug fixes.
- February 23, 2022
- Miscellaneous bug fixes.
- February 8, 2022
- Upgrade Ubuntu JDK to 1.8.0.312.
- Operating system security updates.
- February 1, 2022
- Operating system security updates.
- January 19, 2022
- Operating system security updates.
- September 22, 2021
- Operating system security updates.
- April 30, 2021
- Operating system security updates.
- [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
- January 12, 2021
- Operating system security updates.
- December 8, 2020
- [SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column
- Operating system security updates.
- December 1, 2020
- [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream
- November 3, 2020
- Upgraded Java version from 1.8.0_252 to 1.8.0_265.
- Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()
- October 13, 2020
- Operating system security updates.
Databricks Runtime 6.4 Extended Support (EoS)
See Databricks Runtime 6.4 (EoS) and Databricks Runtime 6.4 Extended Support (EoS).
July 5, 2022
- Operating system security updates.
- Miscellaneous bug fixes.
June 2, 2022
- Operating system security updates.
May 18, 2022
- Operating system security updates.
April 19, 2022
- Operating system security updates.
- Miscellaneous bug fixes.
April 6, 2022
- Operating system security updates.
- Miscellaneous bug fixes.
March 14, 2022
- Remove vulnerable classes from log4j 1.2.17 jar
- Miscellaneous bug fixes.
February 23, 2022
- Miscellaneous bug fixes.
February 8, 2022
- Upgrade Ubuntu JDK to 1.8.0.312.
- Operating system security updates.
February 1, 2022
- Operating system security updates.
January 26, 2022
- Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.
January 19, 2022
- Operating system security updates.
December 8, 2021
- Operating system security updates.
September 22, 2021
- Operating system security updates.
June 15, 2021
- [SPARK-35576][SQL] Redact the sensitive info in the result of Set command
June 7, 2021
- Add a new config called
spark.sql.maven.additionalRemoteRepositories
, a comma-delimited string config of the optional additional remote maven mirror. The value defaults tohttps://maven-central.storage-download.googleapis.com/maven2/
.
- Add a new config called
April 30, 2021
- Operating system security updates.
- [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
March 9, 2021
- Port HADOOP-17215 to the Azure Blob File System driver (Support for conditional overwrite).
- Fix path separator on Windows for
databricks-connect get-jar-dir
- Added support for Hive metastore versions 2.3.5, 2.3.6, and 2.3.7
- Arrow “totalResultsCollected” reported incorrectly after spill
February 24, 2021
- Introduced a new configuration
spark.databricks.hive.metastore.init.reloadFunctions.enabled
. This configuration controls the built in Hive initialization. When set to true, Azure Databricks reloads all functions from all databases that users have intoFunctionRegistry
. This is the default behavior in Hive Metastore. When set to false, Azure Databricks disables this process for optimization.
- Introduced a new configuration
February 4, 2021
- Fixed a regression that prevents the incremental execution of a query that sets a global limit such as
SELECT * FROM table LIMIT nrows
. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled. - Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.
- Fixed a regression that prevents the incremental execution of a query that sets a global limit such as
January 12, 2021
- Operating system security updates.
December 8, 2020
- [SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column
- [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts
- [Runtime 6.4 ML GPU] We previously installed an incorrect version (2.7.8-1+cuda11.1) of NCCL. This release corrects it to 2.4.8-1+cuda10.0 that is compatible with CUDA 10.0.
- Operating system security updates.
December 1, 2020
- [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream
- [SPARK-32635][SQL] Fix foldable propagation
November 3, 2020
- Upgraded Java version from 1.8.0_252 to 1.8.0_265.
- Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()
- Fix an infinite loop bug of Avro reader when reading the MAGIC bytes.
October 13, 2020
- Operating system security updates.
- [SPARK-32999][SQL][2.4] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
- Fixed listing directories in FUSE mount that contain file names with invalid XML characters
- FUSE mount no longer uses ListMultipartUploads
September 24, 2020
- Fixed a previous limitation where passthrough on standard cluster would still restrict the filesystem implementation user uses. Now users would be able to access local filesystems without restrictions.
- Operating system security updates.
September 8, 2020
- A new parameter was created for Azure Synapse Analytics,
maxbinlength
. This parameter is used to control the column length of BinaryType columns, and is translated asVARBINARY(maxbinlength)
. It can be set using.option("maxbinlength", n)
, where 0 < n <= 8000. - Update Azure Storage SDK to 8.6.4 and enable TCP keep alive on connections made by the WASB driver
- A new parameter was created for Azure Synapse Analytics,
August 25, 2020
- Fixed ambiguous attribute resolution in self-merge
August 18, 2020
- [SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources
- Fixed a race condition in the AQS connector when using Trigger.Once.
August 11, 2020
- [SPARK-28676][CORE] Avoid Excessive logging from ContextCleaner
August 3, 2020
- You can now use the LDA transform function on a passthrough-enabled cluster.
- Operating system security updates.
July 7, 2020
- Upgraded Java version from 1.8.0_232 to 1.8.0_252.
April 21, 2020
- [SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper
April 7, 2020
- To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (
ARROW_PRE_0_15_IPC_FORMAT=1
) to enable support for those versions of PyArrow. See the instructions in [SPARK-29367].
- To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (
March 10, 2020
- Optimized autoscaling is now used by default on interactive clusters on the Security plan.
- The Snowflake connector (
spark-snowflake_2.11
) included in Databricks Runtime is updated to version 2.5.9.snowflake-jdbc
is updated to version 3.12.0.
Databricks Runtime 5.5 LTS (EoS)
See Databricks Runtime 5.5 LTS (EoS) and Databricks Runtime 5.5 Extended Support (EoS).
December 8, 2021
- Operating system security updates.
September 22, 2021
- Operating system security updates.
August 25, 2021
- Downgraded some previously upgraded python packages in 5.5 ML Extended Support Release to maintain better parity with 5.5 ML LTS (now deprecated). See [_]/release-notes/runtime/5.5xml.md) for the updated differences between the two versions.
June 15, 2021
- [SPARK-35576][SQL] Redact the sensitive info in the result of Set command
June 7, 2021
- Add a new config called
spark.sql.maven.additionalRemoteRepositories
, a comma-delimited string config of the optional additional remote maven mirror. The value defaults tohttps://maven-central.storage-download.googleapis.com/maven2/
.
- Add a new config called
April 30, 2021
- Operating system security updates.
- [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
March 9, 2021
- Port HADOOP-17215 to the Azure Blob File System driver (Support for conditional overwrite).
February 24, 2021
- Introduced a new configuration
spark.databricks.hive.metastore.init.reloadFunctions.enabled
. This configuration controls the built in Hive initialization. When set to true, Azure Databricks reloads all functions from all databases that users have intoFunctionRegistry
. This is the default behavior in Hive Metastore. When set to false, Azure Databricks disables this process for optimization.
- Introduced a new configuration
January 12, 2021
- Operating system security updates.
- Fix for [HADOOP-17130].
December 8, 2020
- [SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column
- Operating system security updates.
December 1, 2020
- [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream
- [SPARK-32635][SQL] Fix foldable propagation
October 29, 2020
- Upgraded Java version from 1.8.0_252 to 1.8.0_265.
- Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()
- Fix an infinite loop bug of Avro reader when reading the MAGIC bytes.
October 13, 2020
- Operating system security updates.
- [SPARK-32999][SQL][2.4] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
September 24, 2020
- Operating system security updates.
September 8, 2020
- A new parameter was created for Azure Synapse Analytics,
maxbinlength
. This parameter is used to control the column length of BinaryType columns, and is translated asVARBINARY(maxbinlength)
. It can be set using.option("maxbinlength", n)
, where 0 < n <= 8000.
- A new parameter was created for Azure Synapse Analytics,
August 18, 2020
- [SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources
- Fixed a race condition in the AQS connector when using Trigger.Once.
August 11, 2020
- [SPARK-28676][CORE] Avoid Excessive logging from ContextCleaner
August 3, 2020
- Operating system security updates
July 7, 2020
- Upgraded Java version from 1.8.0_232 to 1.8.0_252.
April 21, 2020
- [SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper
April 7, 2020
- To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (
ARROW_PRE_0_15_IPC_FORMAT=1
) to enable support for those versions of PyArrow. See the instructions in [SPARK-29367].
- To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (
March 25, 2020
- The Snowflake connector (
spark-snowflake_2.11
) included in Databricks Runtime is updated to version 2.5.9.snowflake-jdbc
is updated to version 3.12.0.
- The Snowflake connector (
March 10, 2020
- Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed. To avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the
spark.databricks.driver.disableScalaOutput
Spark configuration totrue
. By default the flag value isfalse
. The flag controls cell output for Scala JAR jobs and Scala notebooks. If the flag is enabled, Spark does not return job execution results to the client. The flag does not affect the data that is written in the cluster’s log files. Setting this flag is recommended only for automated clusters for JAR jobs, because it will disable notebook results.
- Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed. To avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the
February 18, 2020
- [SPARK-24783][SQL] spark.sql.shuffle.partitions=0 should throw exception
- Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.
January 28, 2020
- [SPARK-30447][SQL] Constant propagation nullability issue.
January 14, 2020
- Upgraded Java version from 1.8.0_222 to 1.8.0_232.
November 19, 2019
- [SPARK-29743] [SQL] sample should set needCopyResult to true if its child’s needCopyResult is true
- R version was unintendedly upgraded to 3.6.1 from 3.6.0. We downgraded it back to 3.6.0.
November 5, 2019
- Upgraded Java version from 1.8.0_212 to 1.8.0_222.
October 23, 2019
- [SPARK-29244][CORE] Prevent freed page in BytesToBytesMap free again
October 8, 2019
- Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires Simba Apache Spark ODBC driver version 2.6.10).
- Fixed an issue affecting using
Optimize
command with table ACL enabled clusters. - Fixed an issue where
pyspark.ml
libraries would fail due to Scala UDF forbidden error on table ACL and credential passthrough enabled clusters. - Allowlisted SerDe and SerDeUtil methods for credential passthrough.
- Fixed NullPointerException when checking error code in the WASB client.
September 24, 2019
- Improved stability of Parquet writer.
- Fixed the problem that Thrift query cancelled before it starts executing may stuck in STARTED state.
September 10, 2019
- Add thread safe iterator to BytesToBytesMap
- [SPARK-27992][SPARK-28881]Allow Python to join with connection thread to propagate errors
- Fixed a bug affecting certain global aggregation queries.
- Improved credential redaction.
- [SPARK-27330][SS] support task abort in foreach writer
- [SPARK-28642]Hide credentials in SHOW CREATE TABLE
- [SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
August 27, 2019
- [SPARK-20906][SQL]Allow user-specified schema in the API to_avro with schema registry
- [SPARK-27838][SQL] Support user provided non-nullable avro schema for nullable catalyst schema without any null record
- Improvement on Delta Lake time travel
- Fixed an issue affecting certain
transform
expression - Supports broadcast variables when Process Isolation is enabled
August 13, 2019
- Delta streaming source should check the latest protocol of a table
- [SPARK-28260]Add CLOSED state to ExecutionState
- [SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
July 30, 2019
- [SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
- [SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
- [SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
- [SPARK-28355][CORE][PYTHON] Use Spark conf for threshold at which UDF is compressed by broadcast
Databricks Light 2.4 Extended Support
See Databricks Light 2.4 (EoS) and Databricks Light 2.4 Extended Support (EoS).
Databricks Runtime 7.4 (EoS)
See Databricks Runtime 7.4 (EoS).
April 30, 2021
- Operating system security updates.
- [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
- [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
- [SPARK-35045][SQL] Add an internal option to control input buffer in univocity and a configuration for CSV input buffer size
March 24, 2021
- [SPARK-34768][SQL] Respect the default input buffer size in Univocity
- [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks
March 9, 2021
- The updated Azure Blob File System driver for Azure Data Lake Storage Gen2 is now enabled by default. It brings multiple stability improvements.
- [ES-67926][UI] Fix the href link of Spark DAG Visualization
- [ES-65064] Restore the output schema of
SHOW DATABASES
- [SC-70522][SQL] Use correct dynamic pruning build key when range join hint is present
- [SC-35081] Disable staleness check for Delta table files in disk cache
- [SC-70640] Fix NPE when EventGridClient response has no entity
- [SC-70220][SQL] Do not generate shuffle partition number advice when AOS is enabled
February 24, 2021
- Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.
- Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file’s decimal precision and scale are different from the Spark schema.
- Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.
- Introduced a new configuration
spark.databricks.hive.metastore.init.reloadFunctions.enabled
. This configuration controls the built in Hive initialization. When set to true, Azure Databricks reloads all functions from all databases that users have intoFunctionRegistry
. This is the default behavior in Hive Metastore. When set to false, Azure Databricks disables this process for optimization. - [SPARK-34212] Fixed issues related to reading decimal data from Parquet files.
- [SPARK-33579][UI] Fix executor blank page behind proxy.
- [SPARK-20044][UI] Support Spark UI behind front-end reverse proxy using a path prefix.
- [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.
February 4, 2021
- Fixed a regression that prevents the incremental execution of a query that sets a global limit such as
SELECT * FROM table LIMIT nrows
. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled. - Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.
- Fixed a regression that prevents the incremental execution of a query that sets a global limit such as
January 20, 2021
- Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
- These two DataFrames have common columns, but the output of the self join does not have common columns. For example,
df.join(df.select($"col" as "new_col"), cond)
- The derived DataFrame excludes some columns via select, groupBy, or window.
- The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example,
df.join(df.drop("a"), df("a") === 1)
- These two DataFrames have common columns, but the output of the self join does not have common columns. For example,
- Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
January 12, 2021
- Operating system security updates.
- [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value
- [SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar
- [SPARK-33071][SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin
December 8, 2020
- [SPARK-33587][CORE] Kill the executor on nested fatal errors
- [SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column
- [SPARK-33316][SQL] Support user provided nullable Avro schema for non-nullable catalyst schema in Avro writing
- Operating system security updates.
December 1, 2020
- [SPARK-33404][SQL][3.0] Fix incorrect results in
date_trunc
expression - [SPARK-33339][PYTHON] Pyspark application will hang due to non Exception error
- [SPARK-33183][SQL][HOTFIX] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts
- [SPARK-33371][PYTHON][3.0] Update setup.py and tests for Python 3.9
- [SPARK-33391][SQL] element_at with CreateArray not respect one based index.
- [SPARK-33306][SQL]Timezone is needed when cast date to string
- [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream
- [SPARK-33272][SQL] prune the attributes mapping in QueryPlan.transformUpWithNewOutput
- [SPARK-33404][SQL][3.0] Fix incorrect results in
Databricks Runtime 7.2 (EoS)
See Databricks Runtime 7.2 (EoS).
February 4, 2021
- Fixed a regression that prevents the incremental execution of a query that sets a global limit such as
SELECT * FROM table LIMIT nrows
. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled. - Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.
- Fixed a regression that prevents the incremental execution of a query that sets a global limit such as
January 20, 2021
- Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
- These two DataFrames have common columns, but the output of the self join does not have common columns. For example,
df.join(df.select($"col" as "new_col"), cond)
- The derived DataFrame excludes some columns via select, groupBy, or window.
- The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example,
df.join(df.drop("a"), df("a") === 1)
- These two DataFrames have common columns, but the output of the self join does not have common columns. For example,
- Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
January 12, 2021
- Operating system security updates.
- [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value
- [SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar
- [SPARK-33071][SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin
December 8, 2020
- [SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column
- [SPARK-33404][SQL] Fix incorrect results in
date_trunc
expression - [SPARK-33339][PYTHON] Pyspark application will hang due to non Exception error
- [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts
- [SPARK-33391][SQL] element_at with CreateArray not respect one based index.
- Operating system security updates.
December 1, 2020
- [SPARK-33306][SQL]Timezone is needed when cast date to string
- [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream
November 3, 2020
- Upgraded Java version from 1.8.0_252 to 1.8.0_265.
- Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()
- Fix an infinite loop bug of Avro reader when reading the MAGIC bytes.
October 13, 2020
- Operating system security updates.
- [SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
- Fixed listing directories in FUSE mount that contain file names with invalid XML characters
- FUSE mount no longer uses ListMultipartUploads
September 29, 2020
- [SPARK-28863][SQL][WARMFIX] Introduce AlreadyOptimized to prevent reanalysis of V1FallbackWriters
- [SPARK-32635][SQL] Fix foldable propagation
- Add a new config
spark.shuffle.io.decoder.consolidateThreshold
. Set the config value toLong.MAX_VALUE
to skip the consolidation of netty FrameBuffers, which preventsjava.lang.IndexOutOfBoundsException
in corner cases.
September 24, 2020
- [SPARK-32764][SQL] -0.0 should be equal to 0.0
- [SPARK-32753][SQL] Only copy tags to node with no tags when transforming plans
- [SPARK-32659][SQL] Fix the data issue of inserted Dynamic Partition Pruning on non-atomic type
- Operating system security updates.
September 8, 2020
- A new parameter was created for Azure Synapse Analytics,
maxbinlength
. This parameter is used to control the column length of BinaryType columns, and is translated asVARBINARY(maxbinlength)
. It can be set using.option("maxbinlength", n)
, where 0 < n <= 8000.
- A new parameter was created for Azure Synapse Analytics,
Databricks Runtime 7.1 (EoS)
See Databricks Runtime 7.1 (EoS).
February 4, 2021
- Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.
January 20, 2021
- Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
- These two DataFrames have common columns, but the output of the self join does not have common columns. For example,
df.join(df.select($"col" as "new_col"), cond)
- The derived DataFrame excludes some columns via select, groupBy, or window.
- The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example,
df.join(df.drop("a"), df("a") === 1)
- These two DataFrames have common columns, but the output of the self join does not have common columns. For example,
- Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
January 12, 2021
- Operating system security updates.
- [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value
- [SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar
- [SPARK-33071][SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin
December 8, 2020
- [SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column
- Spark Jobs launched using Databricks Connect could hang indefinitely with
Executor$TaskRunner.$anonfun$copySessionState
in executor stack trace - Operating system security updates.
December 1, 2020
- [SPARK-33404][SQL][3.0] Fix incorrect results in
date_trunc
expression - [SPARK-33339][PYTHON] Pyspark application will hang due to non Exception error
- [SPARK-33183][SQL][HOTFIX] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts
- [SPARK-33371][PYTHON][3.0] Update setup.py and tests for Python 3.9
- [SPARK-33391][SQL] element_at with CreateArray not respect one based index.
- [SPARK-33306][SQL]Timezone is needed when cast date to string
- [SPARK-33404][SQL][3.0] Fix incorrect results in
November 3, 2020
- Upgraded Java version from 1.8.0_252 to 1.8.0_265.
- Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()
- Fix an infinite loop bug of Avro reader when reading the MAGIC bytes.
October 13, 2020
- Operating system security updates.
- [SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
- Fixed listing directories in FUSE mount that contain file names with invalid XML characters
- FUSE mount no longer uses ListMultipartUploads
September 29, 2020
- [SPARK-28863][SQL][WARMFIX] Introduce AlreadyOptimized to prevent reanalysis of V1FallbackWriters
- [SPARK-32635][SQL] Fix foldable propagation
- Add a new config
spark.shuffle.io.decoder.consolidateThreshold
. Set the config value toLong.MAX_VALUE
to skip the consolidation of netty FrameBuffers, which preventsjava.lang.IndexOutOfBoundsException
in corner cases.
September 24, 2020
- [SPARK-32764][SQL] -0.0 should be equal to 0.0
- [SPARK-32753][SQL] Only copy tags to node with no tags when transforming plans
- [SPARK-32659][SQL] Fix the data issue of inserted Dynamic Partition Pruning on non-atomic type
- Operating system security updates.
September 8, 2020
- A new parameter was created for Azure Synapse Analytics,
maxbinlength
. This parameter is used to control the column length of BinaryType columns, and is translated asVARBINARY(maxbinlength)
. It can be set using.option("maxbinlength", n)
, where 0 < n <= 8000.
- A new parameter was created for Azure Synapse Analytics,
August 25, 2020
- [SPARK-32159][SQL] Fix integration between
Aggregator[Array[_], _, _]
andUnresolvedMapObjects
- [SPARK-32559][SQL] Fix the trim logic in
UTF8String.toInt/toLong
, which didn’t handle non-ASCII characters correctly - [SPARK-32543][R] Remove
arrow::as_tibble
usage in SparkR - [SPARK-32091][CORE] Ignore timeout error when removing blocks on the lost executor
- Fixed an issue affecting Azure Synapse connector with MSI credentials
- Fixed ambiguous attribute resolution in self-merge
- [SPARK-32159][SQL] Fix integration between
August 18, 2020
- [SPARK-32594][SQL] Fix serialization of dates inserted to Hive tables
- [SPARK-32237][SQL] Resolve hint in CTE
- [SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources
- [SPARK-32467][UI] Avoid encoding URL twice on https redirect
- Fixed a race condition in the AQS connector when using Trigger.Once.
August 11, 2020
- [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan
- [SPARK-32234][SQL] Spark SQL commands are failing on selecting the ORC tables
August 3, 2020
- You can now use the LDA transform function on a passthrough-enabled cluster.
Databricks Runtime 7.0 (EoS)
See Databricks Runtime 7.0 (EoS).
February 4, 2021
- Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.
January 20, 2021
- Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
- These two DataFrames have common columns, but the output of the self join does not have common columns. For example,
df.join(df.select($"col" as "new_col"), cond)
- The derived DataFrame excludes some columns via select, groupBy, or window.
- The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example,
df.join(df.drop("a"), df("a") === 1)
- These two DataFrames have common columns, but the output of the self join does not have common columns. For example,
- Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
January 12, 2021
- Operating system security updates.
- [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value
- [SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar
- [SPARK-33071][SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin
December 8, 2020
- [SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column
- [SPARK-33404][SQL] Fix incorrect results in
date_trunc
expression - [SPARK-33339][PYTHON] Pyspark application will hang due to non Exception error
- [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts
- [SPARK-33391][SQL] element_at with CreateArray not respect one based index.
- Operating system security updates.
December 1, 2020
- [SPARK-33306][SQL]Timezone is needed when cast date to string
November 3, 2020
- Upgraded Java version from 1.8.0_252 to 1.8.0_265.
- Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()
- Fix an infinite loop bug of Avro reader when reading the MAGIC bytes.
October 13, 2020
- Operating system security updates.
- [SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
- Fixed listing directories in FUSE mount that contain file names with invalid XML characters
- FUSE mount no longer uses ListMultipartUploads
September 29, 2020
- [SPARK-28863][SQL][WARMFIX] Introduce AlreadyOptimized to prevent reanalysis of V1FallbackWriters
- [SPARK-32635][SQL] Fix foldable propagation
- Add a new config
spark.shuffle.io.decoder.consolidateThreshold
. Set the config value toLong.MAX_VALUE
to skip the consolidation of netty FrameBuffers, which preventsjava.lang.IndexOutOfBoundsException
in corner cases.
September 24, 2020
- [SPARK-32764][SQL] -0.0 should be equal to 0.0
- [SPARK-32753][SQL] Only copy tags to node with no tags when transforming plans
- [SPARK-32659][SQL] Fix the data issue of inserted Dynamic Partition Pruning on non-atomic type
- Operating system security updates.
September 8, 2020
- A new parameter was created for Azure Synapse Analytics,
maxbinlength
. This parameter is used to control the column length of BinaryType columns, and is translated asVARBINARY(maxbinlength)
. It can be set using.option("maxbinlength", n)
, where 0 < n <= 8000.
- A new parameter was created for Azure Synapse Analytics,
August 25, 2020
- [SPARK-32159][SQL] Fix integration between
Aggregator[Array[_], _, _]
andUnresolvedMapObjects
- [SPARK-32559][SQL] Fix the trim logic in
UTF8String.toInt/toLong
, which didn’t handle non-ASCII characters correctly - [SPARK-32543][R] Remove
arrow::as_tibble
usage in SparkR - [SPARK-32091][CORE] Ignore timeout error when removing blocks on the lost executor
- Fixed an issue affecting Azure Synapse connector with MSI credentials
- Fixed ambiguous attribute resolution in self-merge
- [SPARK-32159][SQL] Fix integration between
August 18, 2020
- [SPARK-32594][SQL] Fix serialization of dates inserted to Hive tables
- [SPARK-32237][SQL] Resolve hint in CTE
- [SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources
- [SPARK-32467][UI] Avoid encoding URL twice on https redirect
- Fixed a race condition in the AQS connector when using Trigger.Once.
August 11, 2020
- [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan
- [SPARK-32234][SQL] Spark SQL commands are failing on selecting the ORC tables
- You can now use the LDA transform function on a passthrough-enabled cluster.
Databricks Runtime 6.6 (EoS)
See Databricks Runtime 6.6 (EoS).
December 1, 2020
- [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream
- [SPARK-32635][SQL] Fix foldable propagation
November 3, 2020
- Upgraded Java version from 1.8.0_252 to 1.8.0_265.
- Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()
- Fix an infinite loop bug of Avro reader when reading the MAGIC bytes.
October 13, 2020
- Operating system security updates.
- [SPARK-32999][SQL][2.4] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
- Fixed listing directories in FUSE mount that contain file names with invalid XML characters
- FUSE mount no longer uses ListMultipartUploads
September 24, 2020
- Operating system security updates.
September 8, 2020
- A new parameter was created for Azure Synapse Analytics,
maxbinlength
. This parameter is used to control the column length of BinaryType columns, and is translated asVARBINARY(maxbinlength)
. It can be set using.option("maxbinlength", n)
, where 0 < n <= 8000. - Update Azure Storage SDK to 8.6.4 and enable TCP keep alive on connections made by the WASB driver
- A new parameter was created for Azure Synapse Analytics,
August 25, 2020
- Fixed ambiguous attribute resolution in self-merge
August 18, 2020
- [SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources
- Fixed a race condition in the AQS connector when using Trigger.Once.
August 11, 2020
- [SPARK-28676][CORE] Avoid Excessive logging from ContextCleaner
- [SPARK-31967][UI] Downgrade to vis.js 4.21.0 to fix Jobs UI loading time regression
August 3, 2020
- You can now use the LDA transform function on a passthrough-enabled cluster.
- Operating system security updates.
Databricks Runtime 6.5 (EoS)
See Databricks Runtime 6.5 (EoS).
- September 24, 2020
- Fixed a previous limitation where passthrough on standard cluster would still restrict the filesystem implementation user uses. Now users would be able to access local filesystems without restrictions.
- Operating system security updates.
- September 8, 2020
- A new parameter was created for Azure Synapse Analytics,
maxbinlength
. This parameter is used to control the column length of BinaryType columns, and is translated asVARBINARY(maxbinlength)
. It can be set using.option("maxbinlength", n)
, where 0 < n <= 8000. - Update Azure Storage SDK to 8.6.4 and enable TCP keep alive on connections made by the WASB driver
- A new parameter was created for Azure Synapse Analytics,
- August 25, 2020
- Fixed ambiguous attribute resolution in self-merge
- August 18, 2020
- [SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources
- Fixed a race condition in the AQS connector when using Trigger.Once.
- August 11, 2020
- [SPARK-28676][CORE] Avoid Excessive logging from ContextCleaner
- August 3, 2020
- You can now use the LDA transform function on a passthrough-enabled cluster.
- Operating system security updates.
- July 7, 2020
- Upgraded Java version from 1.8.0_242 to 1.8.0_252.
- April 21, 2020
- [SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper
Databricks Runtime 6.3 (EoS)
See Databricks Runtime 6.3 (EoS).
- July 7, 2020
- Upgraded Java version from 1.8.0_232 to 1.8.0_252.
- April 21, 2020
- [SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper
- April 7, 2020
- To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (
ARROW_PRE_0_15_IPC_FORMAT=1
) to enable support for those versions of PyArrow. See the instructions in [SPARK-29367].
- To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (
- March 10, 2020
- The Snowflake connector (
spark-snowflake_2.11
) included in Databricks Runtime is updated to version 2.5.9.snowflake-jdbc
is updated to version 3.12.0.
- The Snowflake connector (
- February 18, 2020
- Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.
- February 11, 2020
- [SPARK-24783][SQL] spark.sql.shuffle.partitions=0 should throw exception
- [SPARK-30447][SQL] Constant propagation nullability issue
- [SPARK-28152][SQL] Add a legacy conf for old MsSqlServerDialect numeric mapping
- Allowlisted the overwrite function so that the MLModels extends MLWriter could call the function.
Databricks Runtime 6.2 (EoS)
See Databricks Runtime 6.2 (EoS).
- April 21, 2020
- [SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper
- April 7, 2020
- To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (
ARROW_PRE_0_15_IPC_FORMAT=1
) to enable support for those versions of PyArrow. See the instructions in [SPARK-29367].
- To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (
- March 25, 2020
- Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed. To avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the
spark.databricks.driver.disableScalaOutput
Spark configuration totrue
. By default the flag value isfalse
. The flag controls cell output for Scala JAR jobs and Scala notebooks. If the flag is enabled, Spark does not return job execution results to the client. The flag does not affect the data that is written in the cluster’s log files. Setting this flag is recommended only for automated clusters for JAR jobs, because it will disable notebook results.
- Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed. To avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the
- March 10, 2020
- The Snowflake connector (
spark-snowflake_2.11
) included in Databricks Runtime is updated to version 2.5.9.snowflake-jdbc
is updated to version 3.12.0.
- The Snowflake connector (
- February 18, 2020
- [SPARK-24783][SQL] spark.sql.shuffle.partitions=0 should throw exception
- Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.
- January 28, 2020
- Allowlisted ML Model Writers’ overwrite function for clusters enabled for credential passthrough, so that model save can use overwrite mode on credential passthrough clusters.
- [SPARK-30447][SQL] Constant propagation nullability issue.
- [SPARK-28152][SQL] Add a legacy conf for old MsSqlServerDialect numeric mapping.
- January 14, 2020
- Upgraded Java version from 1.8.0_222 to 1.8.0_232.
- December 10, 2019
- [SPARK-29904][SQL] Parse timestamps in microsecond precision by JSON/CSV data sources.
Databricks Runtime 6.1 (EoS)
See Databricks Runtime 6.1 (EoS).
- April 7, 2020
- To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (
ARROW_PRE_0_15_IPC_FORMAT=1
) to enable support for those versions of PyArrow. See the instructions in [SPARK-29367].
- To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (
- March 25, 2020
- Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed. To avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the
spark.databricks.driver.disableScalaOutput
Spark configuration totrue
. By default the flag value isfalse
. The flag controls cell output for Scala JAR jobs and Scala notebooks. If the flag is enabled, Spark does not return job execution results to the client. The flag does not affect the data that is written in the cluster’s log files. Setting this flag is recommended only for automated clusters for JAR jobs, because it will disable notebook results.
- Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed. To avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the
- March 10, 2020
- The Snowflake connector (
spark-snowflake_2.11
) included in Databricks Runtime is updated to version 2.5.9.snowflake-jdbc
is updated to version 3.12.0.
- The Snowflake connector (
- February 18, 2020
- [SPARK-24783][SQL] spark.sql.shuffle.partitions=0 should throw exception
- Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.
- January 28, 2020
- [SPARK-30447][SQL] Constant propagation nullability issue.
- [SPARK-28152][SQL] Add a legacy conf for old MsSqlServerDialect numeric mapping.
- January 14, 2020
- Upgraded Java version from 1.8.0_222 to 1.8.0_232.
- November 7, 2019
- [SPARK-29743][SQL] sample should set needCopyResult to true if its child’s needCopyResult is true.
- Secrets referenced from Spark configuration properties and environment variables in Public Preview. See Use a secret in a Spark configuration property or environment variable.
- November 5, 2019
- Fixed a bug in DBFS FUSE to handle mount points having
//
in its path. - [SPARK-29081] Replace calls to SerializationUtils.clone on properties with a faster implementation
- [SPARK-29244][CORE] Prevent freed page in BytesToBytesMap free again
- (6.1 ML) Library mkl version 2019.4 was installed unintentionally. We downgraded it to mkl version 2019.3 to match Anaconda Distribution 2019.03.
- Fixed a bug in DBFS FUSE to handle mount points having
Databricks Runtime 6.0 (EoS)
See Databricks Runtime 6.0 (EoS).
- March 25, 2020
- Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed. To avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the
spark.databricks.driver.disableScalaOutput
Spark configuration totrue
. By default the flag value isfalse
. The flag controls cell output for Scala JAR jobs and Scala notebooks. If the flag is enabled, Spark does not return job execution results to the client. The flag does not affect the data that is written in the cluster’s log files. Setting this flag is recommended only for automated clusters for JAR jobs, because it will disable notebook results.
- Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed. To avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the
- February 18, 2020
- Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.
- February 11, 2020
- [SPARK-24783][SQL] spark.sql.shuffle.partitions=0 should throw exception
- January 28, 2020
- [SPARK-30447][SQL] Constant propagation nullability issue.
- [SPARK-28152][SQL] Add a legacy conf for old MsSqlServerDialect numeric mapping.
- January 14, 2020
- Upgraded Java version from 1.8.0_222 to 1.8.0_232.
- November 19, 2019
- [SPARK-29743] [SQL] sample should set needCopyResult to true if its child’s needCopyResult is true
- November 5, 2019
dbutils.tensorboard.start()
now supports TensorBoard 2.0 (if installed manually).- Fixed a bug in DBFS FUSE to handle mount points having
//
in its path. - [SPARK-29081]Replace calls to SerializationUtils.clone on properties with a faster implementation
- October 23, 2019
- [SPARK-29244][CORE] Prevent freed page in BytesToBytesMap free again
- October 8, 2019
- Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires Simba Apache Spark ODBC driver version 2.6.10).
- Fixed an issue affecting using
Optimize
command with table ACL enabled clusters. - Fixed an issue where
pyspark.ml
libraries would fail due to Scala UDF forbidden error on table ACL and credential passthrough enabled clusters. - Allowlisted SerDe/SerDeUtil methods for credential passthrough.
- Fixed NullPointerException when checking error code in the WASB client.
- Fixed the issue where user credentials were not forwarded to jobs created by
dbutils.notebook.run()
.
Databricks Runtime 5.4 ML (EoS)
See Databricks Runtime 5.4 for ML (EoS).
- June 18, 2019
- Improved handling of MLflow active runs in Hyperopt integration
- Improved messages in Hyperopt
- Updated package
Marchkdown
from 3.1 to 3.1.1
Databricks Runtime 5.4 (EoS)
See Databricks Runtime 5.4 (EoS).
- November 19, 2019
- [SPARK-29743] [SQL] sample should set needCopyResult to true if its child’s needCopyResult is true
- October 8, 2019
- Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires Simba Apache Spark ODBC driver update to version 2.6.10).
- Fixed NullPointerException when checking error code in the WASB client.
- September 10, 2019
- Add thread safe iterator to BytesToBytesMap
- Fixed a bug affecting certain global aggregation queries.
- [SPARK-27330][SS] support task abort in foreach writer
- [SPARK-28642]Hide credentials in SHOW CREATE TABLE
- [SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
- [SPARK-28699][CORE] Fix a corner case for aborting indeterminate stage
- August 27, 2019
- Fixed an issue affecting certain
transform
expressions
- Fixed an issue affecting certain
- August 13, 2019
- Delta streaming source should check the latest protocol of a table
- [SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
- July 30, 2019
- [SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
- [SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
- [SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
- July 2, 2019
- Upgraded snappy-java from 1.1.7.1 to 1.1.7.3.
- June 18, 2019
- Improved handling of MLflow active runs in MLlib integration
- Improved Databricks Advisor message related to using disk caching
- Fixed a bug affecting using higher order functions
- Fixed a bug affecting Delta metadata queries
Databricks Runtime 5.3 (EoS)
See Databricks Runtime 5.3 (EoS).
- November 7, 2019
- [SPARK-29743][SQL] sample should set needCopyResult to true if its child’s needCopyResult is true
- October 8, 2019
- Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires Simba Apache Spark ODBC driver update to version 2.6.10).
- Fixed NullPointerException when checking error code in the WASB client.
- September 10, 2019
- Add thread safe iterator to BytesToBytesMap
- Fixed a bug affecting certain global aggregation queries.
- [SPARK-27330][SS] support task abort in foreach writer
- [SPARK-28642]Hide credentials in SHOW CREATE TABLE
- [SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
- [SPARK-28699][CORE] Fix a corner case for aborting indeterminate stage
- August 27, 2019
- Fixed an issue affecting certain
transform
expressions
- Fixed an issue affecting certain
- August 13, 2019
- Delta streaming source should check the latest protocol of a table
- [SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
- July 30, 2019
- [SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
- [SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
- [SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
- June 18, 2019
- Improved Databricks Advisor message related to using disk caching
- Fixed a bug affecting using higher order functions
- Fixed a bug affecting Delta metadata queries
- May 28, 2019
- Improved the stability of Delta
- Tolerate IOExceptions when reading Delta LAST_CHECKPOINT file
- Added recovery to failed library installation
- May 7, 2019
- Port HADOOP-15778 (ABFS: Fix client side throttling for read) to Azure Data Lake Storage Gen2 connector
- Port HADOOP-16040 (ABFS: Bug fix for tolerateOobAppends configuration) to Azure Data Lake Storage Gen2 connector
- Fixed a bug affecting table ACLs
- Fixed a race condition when loading a Delta log checksum file
- Fixed Delta conflict detection logic to not identify “insert + overwrite” as pure “append” operation
- Ensure that disk caching is not disabled when table ACLs are enabled
- [SPARK-27494][SS] Null keys/values don’t work in Kafka source v2
- [SPARK-27446][R] Use existing spark conf if available.
- [SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images
- [SPARK-27160][SQL] Fix DecimalType when building orc filters
- [SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager
Databricks Runtime 5.2 (EoS)
See Databricks Runtime 5.2 (EoS).
- September 10, 2019
- Add thread safe iterator to BytesToBytesMap
- Fixed a bug affecting certain global aggregation queries.
- [SPARK-27330][SS] support task abort in foreach writer
- [SPARK-28642]Hide credentials in SHOW CREATE TABLE
- [SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
- [SPARK-28699][CORE] Fix a corner case for aborting indeterminate stage
- August 27, 2019
- Fixed an issue affecting certain
transform
expressions
- Fixed an issue affecting certain
- August 13, 2019
- Delta streaming source should check the latest protocol of a table
- [SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
- July 30, 2019
- [SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
- [SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
- [SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
- July 2, 2019
- Tolerate IOExceptions when reading Delta LAST_CHECKPOINT file
- June 18, 2019
- Improved Databricks Advisor message related to using disk cache
- Fixed a bug affecting using higher order functions
- Fixed a bug affecting Delta metadata queries
- May 28, 2019
- Added recovery to failed library installation
- May 7, 2019
- Port HADOOP-15778 (ABFS: Fix client side throttling for read) to Azure Data Lake Storage Gen2 connector
- Port HADOOP-16040 (ABFS: Bug fix for tolerateOobAppends configuration) to Azure Data Lake Storage Gen2 connector
- Fixed a race condition when loading a Delta log checksum file
- Fixed Delta conflict detection logic to not identify “insert + overwrite” as pure “append” operation
- Ensure that disk caching is not disabled when table ACLs are enabled
- [SPARK-27494][SS] Null keys/values don’t work in Kafka source v2
- [SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images
- [SPARK-27160][SQL] Fix DecimalType when building orc filters
- [SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager
- March 26, 2019
- Avoid embedding platform-dependent offsets literally in whole-stage generated code
- [SPARK-26665][CORE] Fix a bug that BlockTransferService.fetchBlockSync may hang forever.
- [SPARK-27134][SQL] array_distinct function does not work correctly with columns containing array of array.
- [SPARK-24669][SQL] Invalidate tables in case of DROP DATABASE CASCADE.
- [SPARK-26572][SQL] fix aggregate codegen result evaluation.
- Fixed a bug affecting certain PythonUDFs.
- February 26, 2019
- [SPARK-26864][SQL] Query may return incorrect result when python udf is used as a left-semi join condition.
- [SPARK-26887][PYTHON] Create datetime.date directly instead of creating datetime64 as intermediate data.
- Fixed a bug affecting JDBC/ODBC server.
- Fixed a bug affecting PySpark.
- Exclude the hidden files when building HadoopRDD.
- Fixed a bug in Delta that caused serialization issues.
- February 12, 2019
- Fixed an issue affecting using Delta with Azure ADLS Gen2 mount points.
- Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when
spark.network.crypto.enabled
is set to true).
- January 30, 2019
- Fixed the StackOverflowError when putting skew join hint on cached relation.
- Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
- [SPARK-26706][SQL] Fix
illegalNumericPrecedence
for ByteType. - [SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
- CSV/JSON data sources should avoid globbing paths when inferring schema.
- Fixed constraint inference on Window operator.
- Fixed an issue affecting installing egg libraries with clusters having table ACL enabled.
Databricks Runtime 5.1 (EoS)
See Databricks Runtime 5.1 (EoS).
- August 13, 2019
- Delta streaming source should check the latest protocol of a table
- [SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
- July 30, 2019
- [SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
- [SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
- [SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
- July 2, 2019
- Tolerate IOExceptions when reading Delta LAST_CHECKPOINT file
- June 18, 2019
- Fixed a bug affecting using higher order functions
- Fixed a bug affecting Delta metadata queries
- May 28, 2019
- Added recovery to failed library installation
- May 7, 2019
- Port HADOOP-15778 (ABFS: Fix client side throttling for read) to Azure Data Lake Storage Gen2 connector
- Port HADOOP-16040 (ABFS: Bug fix for tolerateOobAppends configuration) to Azure Data Lake Storage Gen2 connector
- Fixed a race condition when loading a Delta log checksum file
- Fixed Delta conflict detection logic to not identify “insert + overwrite” as pure “append” operation
- [SPARK-27494][SS] Null keys/values don’t work in Kafka source v2
- [SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images
- [SPARK-27160][SQL] Fix DecimalType when building orc filters
- [SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager
- March 26, 2019
- Avoid embedding platform-dependent offsets literally in whole-stage generated code
- Fixed a bug affecting certain PythonUDFs.
- February 26, 2019
- [SPARK-26864][SQL] Query may return incorrect result when python udf is used as a left-semi join condition.
- Fixed a bug affecting JDBC/ODBC server.
- Exclude the hidden files when building HadoopRDD.
- February 12, 2019
- Fixed an issue affecting installing egg libraries with clusters having table ACL enabled.
- Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
- [SPARK-26706][SQL] Fix
illegalNumericPrecedence
for ByteType. - [SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
- Fixed constraint inference on Window operator.
- Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when
spark.network.crypto.enabled
is set to true).
- January 30, 2019
- Fixed an issue that can cause
df.rdd.count()
with UDT to return incorrect answer for certain cases. - Fixed an issue affecting installing wheelhouses.
- [SPARK-26267]Retry when detecting incorrect offsets from Kafka.
- Fixed a bug that affects multiple file stream sources in a streaming query.
- Fixed the StackOverflowError when putting skew join hint on cached relation.
- Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
- Fixed an issue that can cause
- January 8, 2019
- Fixed issue that causes the error
org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted
. - [SPARK-26352]join reordering should not change the order of output attributes.
- [SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
- Stability improvement for Delta Lake.
- Delta Lake is enabled.
- Fixed the issue that caused failed Azure Data Lake Storage Gen2 access when Microsoft Entra ID Credential Passthrough is enabled for Azure Data Lake Storage Gen1.
- Databricks IO Cache is now enabled for Ls series worker instance types for all pricing tiers.
- Fixed issue that causes the error
Databricks Runtime 5.0 (EoS)
See Databricks Runtime 5.0 (EoS).
- June 18, 2019
- Fixed a bug affecting using higher order functions
- May 7, 2019
- Fixed a race condition when loading a Delta log checksum file
- Fixed Delta conflict detection logic to not identify “insert + overwrite” as pure “append” operation
- [SPARK-27494][SS] Null keys/values don’t work in Kafka source v2
- [SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images
- [SPARK-27160][SQL] Fix DecimalType when building orc filters
- [SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager
- March 26, 2019
- Avoid embedding platform-dependent offsets literally in whole-stage generated code
- Fixed a bug affecting certain PythonUDFs.
- March 12, 2019
- [SPARK-26864][SQL] Query may return incorrect result when python udf is used as a left-semi join condition.
- February 26, 2019
- Fixed a bug affecting JDBC/ODBC server.
- Exclude the hidden files when building HadoopRDD.
- February 12, 2019
- Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
- [SPARK-26706][SQL] Fix
illegalNumericPrecedence
for ByteType. - [SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
- Fixed constraint inference on Window operator.
- Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when
spark.network.crypto.enabled
is set to true).
- January 30, 2019
- Fixed an issue that can cause
df.rdd.count()
with UDT to return incorrect answer for certain cases. - [SPARK-26267]Retry when detecting incorrect offsets from Kafka.
- Fixed a bug that affects multiple file stream sources in a streaming query.
- Fixed the StackOverflowError when putting skew join hint on cached relation.
- Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
- Fixed an issue that can cause
- January 8, 2019
- Fixed issue that caused the error
org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted
. - [SPARK-26352]join reordering should not change the order of output attributes.
- [SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
- Stability improvement for Delta Lake.
- Delta Lake is enabled.
- Databricks IO Cache is now enabled for Ls series worker instance types for all pricing tiers.
- Fixed issue that caused the error
- December 18, 2018
- [SPARK-26293]Cast exception when having Python UDF in subquery
- Fixed an issue affecting certain queries using Join and Limit.
- Redacted credentials from RDD names in Spark UI
- December 6, 2018
- Fixed an issue that caused incorrect query result when using orderBy followed immediately by groupBy with group-by key as the leading part of the sort-by key.
- Upgraded Snowflake Connector for Spark from 2.4.9.2-spark_2.4_pre_release to 2.4.10.
- Only ignore corrupt files after one or more retries when
spark.sql.files.ignoreCorruptFiles
orspark.sql.files.ignoreMissingFiles
flag is enabled. - Fixed an issue affecting certain self union queries.
- Fixed a bug with the thrift server where sessions are sometimes leaked when cancelled.
- [SPARK-26307]Fixed CTAS when INSERT a partitioned table using Hive SerDe.
- [SPARK-26147]Python UDFs in join condition fail even when using columns from only one side of join
- [SPARK-26211]Fix InSet for binary, and struct and array with null.
- [SPARK-26181]the
hasMinMaxStats
method ofColumnStatsMap
is not correct. - Fixed an issue affecting installing Python Wheels in environments without Internet access.
- November 20, 2018
- Fixed an issue that caused a notebook not usable after cancelling a streaming query.
- Fixed an issue affecting certain queries using window functions.
- Fixed an issue affecting a stream from Delta with multiple schema changes.
- Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.
Databricks Runtime 4.3 (EoS)
See Databricks Runtime 4.3 (EoS).
April 9, 2019
- [SPARK-26665][CORE] Fix a bug that can cause BlockTransferService.fetchBlockSync to hang forever.
- [SPARK-24669][SQL] Invalidate tables in case of DROP DATABASE CASCADE.
March 12, 2019
- Fixed a bug affecting code generation.
- Fixed a bug affecting Delta.
February 26, 2019
- Fixed a bug affecting JDBC/ODBC server.
February 12, 2019
- [SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
- Excluding the hidden files when building HadoopRDD.
- Fixed Parquet Filter Conversion for IN predicate when its value is empty.
- Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when
spark.network.crypto.enabled
is set to true).
January 30, 2019
- Fixed an issue that can cause
df.rdd.count()
with UDT to return incorrect answer for certain cases. - Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
- Fixed an issue that can cause
January 8, 2019
- Fixed the issue that causes the error
org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted
. - Redacted credentials from RDD names in Spark UI
- [SPARK-26352]join reordering should not change the order of output attributes.
- [SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
- Delta Lake is enabled.
- Databricks IO Cache is now enabled for Ls series worker instance types for all pricing tiers.
- Fixed the issue that causes the error
December 18, 2018
- [SPARK-25002]Avro: revise the output record namespace.
- Fixed an issue affecting certain queries using Join and Limit.
- [SPARK-26307]Fixed CTAS when INSERT a partitioned table using Hive SerDe.
- Only ignore corrupt files after one or more retries when
spark.sql.files.ignoreCorruptFiles
orspark.sql.files.ignoreMissingFiles
flag is enabled. - [SPARK-26181]the
hasMinMaxStats
method ofColumnStatsMap
is not correct. - Fixed an issue affecting installing Python Wheels in environments without Internet access.
- Fixed a performance issue in query analyzer.
- Fixed an issue in PySpark that caused DataFrame actions failed with “connection refused” error.
- Fixed an issue affecting certain self union queries.
November 20, 2018
- [SPARK-17916][SPARK-25241]Fix empty string being parsed as null when nullValue is set.
- [SPARK-25387]Fix for NPE caused by bad CSV input.
- Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.
November 6, 2018
- [SPARK-25741]Long URLs are not rendered properly in web UI.
- [SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
- Fixed an issue affecting temporary objects cleanup in Synapse Analytics connector.
- [SPARK-25816]Fix attribute resolution in nested extractors.
October 16, 2018
- Fixed a bug affecting the output of running
SHOW CREATE TABLE
on Delta tables. - Fixed a bug affecting
Union
operation.
- Fixed a bug affecting the output of running
September 25, 2018
- [SPARK-25368][SQL] Incorrect constraint inference returns wrong result.
- [SPARK-25402][SQL] Null handling in BooleanSimplification.
- Fixed
NotSerializableException
in Avro data source.
September 11, 2018
- [SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when
failOnDataLoss=false
. - [SPARK-24987][SS] Fix Kafka consumer leak when no new offsets for articlePartition.
- Filter reduction should handle null value correctly.
- Improved stability of execution engine.
- [SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when
August 28, 2018
- Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
- [SPARK-25142]Add error messages when Python worker could not open socket in
_load_from_socket
.
August 23, 2018
- [SPARK-23935]mapEntry throws
org.codehaus.commons.compiler.CompileException
. - Fixed nullable map issue in Parquet reader.
- [SPARK-25051][SQL] FixNullability should not stop on AnalysisBarrier.
- [SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
- Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
- [SPARK-25084]”distribute by” on multiple columns (wrap in brackets) may lead to codegen issue.
- [SPARK-25096]Loosen nullability if the cast is force-nullable.
- Lowered the default number of threads used by the Delta Lake Optimize command, reducing memory overhead and committing data faster.
- [SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
- Fixed secret manager redaction when command partially succeed.
- [SPARK-23935]mapEntry throws
Databricks Runtime 4.2 (EoS)
See Databricks Runtime 4.2 (EoS).
February 26, 2019
- Fixed a bug affecting JDBC/ODBC server.
February 12, 2019
- [SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
- Excluding the hidden files when building HadoopRDD.
- Fixed Parquet Filter Conversion for IN predicate when its value is empty.
- Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when
spark.network.crypto.enabled
is set to true).
January 30, 2019
- Fixed an issue that can cause
df.rdd.count()
with UDT to return incorrect answer for certain cases.
- Fixed an issue that can cause
January 8, 2019
- Fixed issue that causes the error
org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted
. - Redacted credentials from RDD names in Spark UI
- [SPARK-26352]join reordering should not change the order of output attributes.
- [SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
- Delta Lake is enabled.
- Databricks IO Cache is now enabled for Ls series worker instance types for all pricing tiers.
- Fixed issue that causes the error
December 18, 2018
- [SPARK-25002]Avro: revise the output record namespace.
- Fixed an issue affecting certain queries using Join and Limit.
- [SPARK-26307]Fixed CTAS when INSERT a partitioned table using Hive SerDe.
- Only ignore corrupt files after one or more retries when
spark.sql.files.ignoreCorruptFiles
orspark.sql.files.ignoreMissingFiles
flag is enabled. - [SPARK-26181]the
hasMinMaxStats
method ofColumnStatsMap
is not correct. - Fixed an issue affecting installing Python Wheels in environments without Internet access.
- Fixed a performance issue in query analyzer.
- Fixed an issue in PySpark that caused DataFrame actions failed with “connection refused” error.
- Fixed an issue affecting certain self union queries.
November 20, 2018
- [SPARK-17916][SPARK-25241]Fix empty string being parsed as null when nullValue is set.
- Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.
November 6, 2018
- [SPARK-25741]Long URLs are not rendered properly in web UI.
- [SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
October 16, 2018
- Fixed a bug affecting the output of running
SHOW CREATE TABLE
on Delta tables. - Fixed a bug affecting
Union
operation.
- Fixed a bug affecting the output of running
September 25, 2018
- [SPARK-25368][SQL] Incorrect constraint inference returns wrong result.
- [SPARK-25402][SQL] Null handling in BooleanSimplification.
- Fixed
NotSerializableException
in Avro data source.
September 11, 2018
- [SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when
failOnDataLoss=false
. - [SPARK-24987][SS] Fix Kafka consumer leak when no new offsets for articlePartition.
- Filter reduction should handle null value correctly.
- [SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when
August 28, 2018
- Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
August 23, 2018
- Fixed NoClassDefError for Delta Snapshot
- [SPARK-23935]mapEntry throws
org.codehaus.commons.compiler.CompileException
. - [SPARK-24957][SQL] Average with decimal followed by aggregation returns wrong result. The incorrect results of AVERAGE might be returned. The CAST added in the Average operator will be bypassed if the result of Divide is the same type which it is casted to.
- [SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
- Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
- [SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
- [SPARK-25084]”distribute by” on multiple columns (wrap in brackets) may lead to codegen issue.
- [SPARK-24934][SQL] Explicitly allowlist supported types in upper/lower bounds for in-memory partition pruning. When complex data types are used in query filters against cached data, Spark always returns an empty result set. The in-memory stats-based pruning generates incorrect results, because null is set for upper/lower bounds for complex types. The fix is to not use in-memory stats-based pruning for complex types.
- Fixed secret manager redaction when command partially succeed.
- Fixed nullable map issue in Parquet reader.
August 2, 2018
- Added writeStream.table API in Python.
- Fixed an issue affecting Delta checkpointing.
- [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter. SQL cache is not being used when using DataFrameWriter to write a DataFrame with UDF. This is a regression caused by the changes we made in AnalysisBarrier, since not all the Analyzer rules are idempotent.
- Fixed an issue that could cause
mergeInto
command to produce incorrect results. - Improved stability on accessing Azure Data Lake Storage Gen1.
- [SPARK-24809]Serializing LongHashedRelation in executor may result in data error.
- [SPARK-24878][SQL] Fix reverse function for array type of primitive type containing null.
July 11, 2018
- Fixed a bug in query execution that would cause aggregations on decimal columns with different precisions to return incorrect results in some cases.
- Fixed a
NullPointerException
bug that was thrown during advanced aggregation operations like grouping sets.
Databricks Runtime 4.1 ML (EoS)
See Databricks Runtime 4.1 ML (EoS).
- July 31, 2018
- Added Azure Synapse Analytics to ML Runtime 4.1
- Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
- Fixed a bug affecting Spark SQL execution engine.
- Fixed a bug affecting code generation.
- Fixed a bug (
java.lang.NoClassDefFoundError
) affecting Delta Lake. - Improved error handling in Delta Lake.
- Fixed a bug that caused incorrect data skipping statistics to be collected for string columns 32 characters or greater.
Databricks Runtime 4.1 (EoS)
See Databricks Runtime 4.1 (EoS).
January 8, 2019
- [SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
- Delta Lake is enabled.
December 18, 2018
- [SPARK-25002]Avro: revise the output record namespace.
- Fixed an issue affecting certain queries using Join and Limit.
- [SPARK-26307]Fixed CTAS when INSERT a partitioned table using Hive SerDe.
- Only ignore corrupt files after one or more retries when
spark.sql.files.ignoreCorruptFiles
orspark.sql.files.ignoreMissingFiles
flag is enabled. - Fixed an issue affecting installing Python Wheels in environments without Internet access.
- Fixed an issue in PySpark that caused DataFrame actions failed with “connection refused” error.
- Fixed an issue affecting certain self union queries.
November 20, 2018
- [SPARK-17916][SPARK-25241]Fix empty string being parsed as null when nullValue is set.
- Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.
November 6, 2018
- [SPARK-25741]Long URLs are not rendered properly in web UI.
- [SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
October 16, 2018
- Fixed a bug affecting the output of running
SHOW CREATE TABLE
on Delta tables. - Fixed a bug affecting
Union
operation.
- Fixed a bug affecting the output of running
September 25, 2018
- [SPARK-25368][SQL] Incorrect constraint inference returns wrong result.
- [SPARK-25402][SQL] Null handling in BooleanSimplification.
- Fixed
NotSerializableException
in Avro data source.
September 11, 2018
- [SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when
failOnDataLoss=false
. - [SPARK-24987][SS] Fix Kafka consumer leak when no new offsets for articlePartition.
- Filter reduction should handle null value correctly.
- [SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when
August 28, 2018
- Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
- [SPARK-25084]”distribute by” on multiple columns (wrap in brackets) may lead to codegen issue.
- [SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
August 23, 2018
- Fixed NoClassDefError for Delta Snapshot.
- [SPARK-24957][SQL] Average with decimal followed by aggregation returns wrong result. The incorrect results of AVERAGE might be returned. The CAST added in the Average operator will be bypassed if the result of Divide is the same type which it is casted to.
- Fixed nullable map issue in Parquet reader.
- [SPARK-24934][SQL] Explicitly allowlist supported types in upper/lower bounds for in-memory partition pruning. When complex data types are used in query filters against cached data, Spark always returns an empty result set. The in-memory stats-based pruning generates incorrect results, because null is set for upper/lower bounds for complex types. The fix is to not use in-memory stats-based pruning for complex types.
- [SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
- Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
- Fixed secret manager redaction when command partially succeed
August 2, 2018
- [SPARK-24613][SQL] Cache with UDF could not be matched with subsequent dependent caches. Wraps the logical plan with a AnalysisBarrier for execution plan compilation in CacheManager, in order to avoid the plan being analyzed again. This is also a regression of Spark 2.3.
- Fixed a Synapse Analytics connector issue affecting timezone conversion for writing DateType data.
- Fixed an issue affecting Delta checkpointing.
- Fixed an issue that could cause
mergeInto
command to produce incorrect results. - [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter. SQL cache is not being used when using DataFrameWriter to write a DataFrame with UDF. This is a regression caused by the changes we made in AnalysisBarrier, since not all the Analyzer rules are idempotent.
- [SPARK-24809]Serializing LongHashedRelation in executor may result in data error.
July 11, 2018
- Fixed a bug in query execution that would cause aggregations on decimal columns with different precisions to return incorrect results in some cases.
- Fixed a
NullPointerException
bug that was thrown during advanced aggregation operations like grouping sets.
June 28, 2018
- Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
June 7, 2018
- Fixed a bug affecting Spark SQL execution engine.
- Fixed a bug affecting code generation.
- Fixed a bug (
java.lang.NoClassDefFoundError
) affecting Delta Lake. - Improved error handling in Delta Lake.
May 17, 2018
- Fixed a bug that caused incorrect data skipping statistics to be collected for string columns 32 characters or greater.
Databricks Runtime 4.0 (EoS)
See Databricks Runtime 4.0 (EoS).
November 6, 2018
- [SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
October 16, 2018
- Fixed a bug affecting
Union
operation.
- Fixed a bug affecting
September 25, 2018
- [SPARK-25368][SQL] Incorrect constraint inference returns wrong result.
- [SPARK-25402][SQL] Null handling in BooleanSimplification.
- Fixed
NotSerializableException
in Avro data source.
September 11, 2018
- Filter reduction should handle null value correctly.
August 28, 2018
- Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
August 23, 2018
- Fixed nullable map issue in Parquet reader.
- Fixed secret manager redaction when command partially succeed
- Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
- [SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
- [SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
August 2, 2018
- [SPARK-24452]Avoid possible overflow in int add or multiple.
- [SPARK-24588]Streaming join should require HashClusteredPartitioning from children.
- Fixed an issue that could cause
mergeInto
command to produce incorrect results. - [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter. SQL cache is not being used when using DataFrameWriter to write a DataFrame with UDF. This is a regression caused by the changes we made in AnalysisBarrier, since not all the Analyzer rules are idempotent.
- [SPARK-24809]Serializing LongHashedRelation in executor may result in data error.
June 28, 2018
- Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
June 7, 2018
- Fixed a bug affecting Spark SQL execution engine.
- Improved error handling in Delta Lake.
May 17, 2018
- Bug fixes for Databricks secret management.
- Improved stability on reading data stored in Azure Data Lake Store.
- Fixed a bug affecting RDD caching.
- Fixed a bug affecting Null-safe Equal in Spark SQL.
April 24, 2018
- Upgraded Azure Data Lake Store SDK from 2.0.11 to 2.2.8 to improve the stability of access to Azure Data Lake Store.
- Fixed a bug affecting the insertion of overwrites to partitioned Hive tables when
spark.databricks.io.hive.fastwriter.enabled
isfalse
. - Fixed an issue that failed task serialization.
- Improved Delta Lake stability.
March 14, 2018
- Prevent unnecessary metadata updates when writing into Delta Lake.
- Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.
Databricks Runtime 3.5 LTS (EoS)
See Databricks Runtime 3.5 LTS (EoS).
November 7, 2019
- [SPARK-29743][SQL] sample should set needCopyResult to true if its child’s needCopyResult is true
October 8, 2019
- Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires Simba Apache Spark ODBC driver update to version 2.6.10).
September 10, 2019
- [SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
April 9, 2019
- [SPARK-26665][CORE] Fix a bug that can cause BlockTransferService.fetchBlockSync to hang forever.
February 12, 2019
- Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when
spark.network.crypto.enabled
is set to true).
- Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when
January 30, 2019
- Fixed an issue that can cause
df.rdd.count()
with UDT to return incorrect answer for certain cases.
- Fixed an issue that can cause
December 18, 2018
- Only ignore corrupt files after one or more retries when
spark.sql.files.ignoreCorruptFiles
orspark.sql.files.ignoreMissingFiles
flag is enabled. - Fixed an issue affecting certain self union queries.
- Only ignore corrupt files after one or more retries when
November 20, 2018
- [SPARK-25816]Fixed attribute resolution in nested extractors.
November 6, 2018
- [SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
October 16, 2018
- Fixed a bug affecting
Union
operation.
- Fixed a bug affecting
September 25, 2018
- [SPARK-25402][SQL] Null handling in BooleanSimplification.
- Fixed
NotSerializableException
in Avro data source.
September 11, 2018
- Filter reduction should handle null value correctly.
August 28, 2018
- Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
- [SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
August 23, 2018
- [SPARK-24809]Serializing LongHashedRelation in executor may result in data error.
- Fixed nullable map issue in Parquet reader.
- [SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
- Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
June 28, 2018
- Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
June 28, 2018
- Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
June 7, 2018
- Fixed a bug affecting Spark SQL execution engine.
- Improved error handling in Delta Lake.
May 17, 2018
- Improved stability on reading data stored in Azure Data Lake Store.
- Fixed a bug affecting RDD caching.
- Fixed a bug affecting Null-safe Equal in Spark SQL.
- Fixed a bug affecting certain aggregations in streaming queries.
April 24, 2018
- Upgraded Azure Data Lake Store SDK from 2.0.11 to 2.2.8 to improve the stability of access to Azure Data Lake Store.
- Fixed a bug affecting the insertion of overwrites to partitioned Hive tables when
spark.databricks.io.hive.fastwriter.enabled
isfalse
. - Fixed an issue that failed task serialization.
March 09, 2018
- Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.
March 01, 2018
- Improved the efficiency of handling streams that can take a long time to stop.
- Fixed an issue affecting Python autocomplete.
- Applied Ubuntu security patches.
- Fixed an issue affecting certain queries using Python UDFs and window functions.
- Fixed an issue affecting the use of UDFs on a cluster with table access control enabled.
January 29, 2018
- Fixed an issue affecting the manipulation of tables stored in Azure Blob storage.
- Fixed aggregation after dropDuplicates on empty DataFrame.
Databricks Runtime 3.4 (EoS)
See Databricks Runtime 3.4 (EoS).
June 7, 2018
- Fixed a bug affecting Spark SQL execution engine.
- Improved error handling in Delta Lake.
May 17, 2018
- Improved stability on reading data stored in Azure Data Lake Store.
- Fixed a bug affecting RDD caching.
- Fixed a bug affecting Null-safe Equal in Spark SQL.
April 24, 2018
- Fixed a bug affecting the insertion of overwrites to partitioned Hive tables when
spark.databricks.io.hive.fastwriter.enabled
isfalse
.
- Fixed a bug affecting the insertion of overwrites to partitioned Hive tables when
March 09, 2018
- Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.
December 13, 2017
- Fixed an issue affecting UDFs in Scala.
- Fixed an issue affecting the use of Data Skipping Index on data source tables stored in non-DBFS paths.
December 07, 2017
- Improved shuffle stability.
Unsupported Databricks Runtime releases
For the original release notes, follow the link below the subheading.