Compute access mode limitations for Unity Catalog
Databricks recommends using Unity Catalog and shared access mode for most workloads. This article outlines limitations and requirements for each access mode with Unity Catalog. For details on access modes, see Access modes.
Databricks recommends using compute policies to simplify configuration options for most users. See Create and manage compute policies.
Note
No-isolation shared and credential passthrough are legacy access modes that do not support Unity Catalog.
Important
Init scripts and libraries have different support across access modes and Databricks Runtime versions. See Where can init scripts be installed? and Cluster-scoped libraries.
Single user access mode limitations on Unity Catalog
Single user access mode on Unity Catalog has the following limitations. These are in addition to the general limitations for all Unity Catalog access mode. See General limitations for Unity Catalog.
Fine-grained access control limitations for Unity Catalog single user access mode
On Databricks Runtime 15.3 and below, fine-grained access control on single user compute is not supported. Specifically:
- You cannot access a table that has a row filter or column mask.
- You cannot access dynamic views.
- To read from any view, you must have
SELECT
on all tables and views that are referenced by the view.
To query dynamic views, views on which you don’t have SELECT
on the underlying tables and views, and tables with row filters or column masks, use one of the following:
A SQL warehouse.
Compute with shared access mode.
Compute with single user access mode on Databricks Runtime 15.4 LTS or above.
Databricks Runtime 15.4 LTS and above support fine-grained access control on single user compute. To take advantage of the data filtering provided in Databricks Runtime 15.4 LTS and above, verify that your workspace is enabled for serverless compute.
Serverless compute handles data filtering, which allows access to a view without requiring permissions on its underlying tables and views. Because serverless compute handles data filtering, you might incur serverless compute charges when you use single user compute to query views. For more information, see Fine-grained access control on single user compute.
Streaming table and materialized view limitations for Unity Catalog single user access mode
On Databricks Runtime 15.3 and below, you cannot use single user compute to query tables that were created using a Delta Live Tables pipeline, including streaming tables and materialized views, if those tables are owned by other users. The user who creates a table is the owner.
To query Streaming tables and materialized views created by Delta Live Tables and owned by other users, use one of the following:
A SQL warehouse.
Compute with shared access mode on Databricks Runtime 13.3 LTS or above.
Compute with single user access mode on Databricks Runtime 15.4 LTS or above.
Your workspace must also be enabled for serverless compute. For more information, see Fine-grained access control on single user compute.
Streaming limitations for Unity Catalog single user access mode
- Asynchronous checkpointing is not supported in Databricks Runtime 11.3 LTS and below.
StreamingQueryListener
requires Databricks Runtime 15.1 or above to use credentials or interact with objects managed by Unity Catalog on single user compute.
Shared access mode limitations on Unity Catalog
Shared access mode in Unity Catalog has the following limitations. These are in addition to the general limitations for all Unity Catalog access modes. See General limitations for Unity Catalog.
Databricks Runtime ML and Spark Machine Learning Library (MLlib) are not supported.
Spark-submit job tasks are not supported. Use a JAR task instead.
DBUtils and other clients that directly read the data from cloud storage are only supported when you use an external location to access the storage location. See Create an external location to connect cloud storage to Azure Databricks.
In Databricks Runtime 13.3 and above, individual rows must not exceed 128MB.
DBFS root and mounts do not support FUSE.
Custom containers are not supported.
Language support for Unity Catalog shared access mode
- R is not supported.
- Scala is supported in Databricks Runtime 13.3 and above.
- In Databricks Runtime 15.4 LTS and above, all Java or Scala libraries (JAR files) bundled with Databricks Runtime are available on compute in Unity Catalog access modes.
- For Databricks Runtime 15.3 or below on compute that uses shared access mode, set the Spark config
spark.databricks.scala.kernel.fullClasspath.enabled
totrue
.
Spark API limitations and requirements for Unity Catalog shared access mode
- RDD APIs are not supported.
- Spark Context (
sc
),spark.sparkContext
, andsqlContext
are not supported for Scala in any Databricks Runtime and are not supported for Python in Databricks Runtime 14.0 and above.- Databricks recommends using the
spark
variable to interact with theSparkSession
instance. - The following
sc
functions are also not supported:emptyRDD
,range
,init_batched_serializer
,parallelize
,pickleFile
,textFile
,wholeTextFiles
,binaryFiles
,binaryRecords
,sequenceFile
,newAPIHadoopFile
,newAPIHadoopRDD
,hadoopFile
,hadoopRDD
,union
,runJob
,setSystemProperty
,uiWebUrl
,stop
,setJobGroup
,setLocalProperty
,getConf
.
- Databricks recommends using the
- The following Scala Dataset API operations require Databricks Runtime 15.4 LTS or above:
map
,mapPartitions
,foreachPartition
,flatMap
,reduce
andfilter
. - The Spark configuration property
spark.executor.extraJavaOptions
is not supported.
UDF limitations and requirements for Unity Catalog shared access mode
User-defined functions (UDFs) have the following limitations with shared access mode:
Hive UDFs are not supported.
applyInPandas
andmapInPandas
require Databricks Runtime 14.3 or above.PySpark UDFs cannot access Git folders, workspace files, or volumes to import modules in Databricks Runtime 14.2 and below.
Scala scalar UDFs require Databricks Runtime 14.2 or above. Other Scala UDFs and UDAFs are not supported.
In Databricks Runtime 14.2 and below, using a custom version of
grpc
,pyarrow
, orprotobuf
in a PySpark UDF through notebook-scoped or cluster-scoped libraries is not supported because the installed version is always preferred. To find the version of installed libraries, see the System Environment section of the specific Databricks Runtime version release notes.Python scalar UDFs and Pandas UDFs require Databricks Runtime 13.3 LTS or above.
Non-scalar Python and Pandas UDFs, including UDAFs, UDTFs, and Pandas on Spark, require Databricks Runtime 14.3 LTS or above.
See User-defined functions (UDFs) in Unity Catalog.
Streaming limitations and requirements for Unity Catalog shared access mode
Note
Some of the listed Kafka options have limited support when used for supported configurations on Azure Databricks. All listed Kafka limitations are valid for both batch and stream processing. See Stream processing with Apache Kafka and Azure Databricks.
- For Scala,
foreach
requires Databricks Runtime 16.1 or above.foreachBatch
, andFlatMapGroupWithState
are not supported. - For Python,
foreachBatch
has the following behavior changes in Databricks Runtime 14.0 and above:print()
commands write output to the driver logs.- You cannot access the
dbutils.widgets
submodule inside the function. - Any files, modules, or objects referenced in the function must be serializable and available on Spark.
- For Scala,
from_avro
requires Databricks Runtime 14.2 or above. applyInPandasWithState
requires Databricks Runtime 14.3 LTS or above.- Working with socket sources is not supported.
- The
sourceArchiveDir
must be in the same external location as the source when you useoption("cleanSource", "archive")
with a data source managed by Unity Catalog. - For Kafka sources and sinks, the following options are not supported:
kafka.sasl.client.callback.handler.class
kafka.sasl.login.callback.handler.class
kafka.sasl.login.class
kafka.partition.assignment.strategy
- The following Kafka options are supported in Databricks Runtime 13.3 LTS and above but unsupported in Databricks Runtime 12.2 LTS. You can only specify external locations managed by Unity Catalog for these options:
kafka.ssl.truststore.location
kafka.ssl.keystore.location
- For Scala,
StreamingQueryListener
requires Databricks Runtime 16.1 and above. - For Python,
StreamingQueryListener
requires Databricks Runtime 14.3 LTS or above to use credentials or interact with objects managed by Unity Catalog on shared compute.
Network and file system access limitations and requirements for Unity Catalog shared access mode
You must run commands on compute nodes as a low-privilege user forbidden from accessing sensitive parts of the filesystem.
In Databricks Runtime 11.3 LTS and below, you can only create network connections to ports 80 and 443.
You cannot connect to the instance metadata service or Azure WireServer.
General limitations for Unity Catalog
The following limitations apply to all Unity Catalog-enabled access modes.
Streaming limitations for Unity Catalog
- Apache Spark continuous processing mode is not supported. See Continuous Processing in the Spark Structured Streaming Programming Guide.
See also Streaming limitations for Unity Catalog single user access mode and Streaming limitations and requirements for Unity Catalog shared access mode.
For more on streaming with Unity Catalog, see Using Unity Catalog with Structured Streaming.