Enable Hive metastore table access control on a cluster (legacy)
This article describes how to enable table access control for the built-in Hive metastore on a cluster.
For information about how to set privileges on Hive metastore securable objects once table access control has been enabled on a cluster, see Hive metastore privileges and securable objects (legacy).
Note
Hive metastore table access control is a legacy data governance model. Databricks recommends that you use Unity Catalog instead for its simplicity and account-centered governance model. You can upgrade the tables managed by the Hive metastore to the Unity Catalog metastore.
Enable table access control for a cluster
Table access control is available in two versions:
- SQL-only table access control, which restricts users to SQL commands.
- Python and SQL table access control, which allows users to run SQL, Python, and PySpark commands.
Table access control is not supported with Machine Learning Runtime.
Important
Even if table access control is enabled for a cluster, Azure Databricks workspace administrators have access to file-level data.
SQL-only table access control
This version of table access control restricts users to SQL commands only.
To enable SQL-only table access control on a cluster and restrict that cluster to use only SQL commands, set the following flag in the cluster’s Spark conf:
spark.databricks.acl.sqlOnly true
Note
Access to SQL-only table access control is not affected by the Enable Table Access Control setting in the admin settings page. That setting controls only the workspace-wide enablement of Python and SQL table access control.
Python and SQL table access control
This version of table access control lets users run Python commands that use the DataFrame API as well as SQL. When it is enabled on a cluster, users on that cluster:
- Can access Spark only using the Spark SQL API or DataFrame API. In both cases, access to tables and views is restricted by administrators according to the Azure Databricks Privileges you can grant on Hive metastore objects.
- Must run their commands on cluster nodes as a low-privilege user forbidden from accessing sensitive parts of the filesystem or creating network connections to ports other than 80 and 443.
- Only built-in Spark functions can create network connections on ports other than 80 and 443.
- Only workspace admin users or users with ANY FILE privilege can read data from external databases through the PySpark JDBC connector.
- If you want Python processes to be able to access additional outbound ports, you can set the Spark config
spark.databricks.pyspark.iptable.outbound.whitelisted.ports
to the ports you want to allow access. The supported format of the configuration value is[port[:port][,port[:port]]...]
, for example:21,22,9000:9999
. The port must be within the valid range, that is,0-65535
.
Attempts to get around these restrictions will fail with an exception. These restrictions are in place so that users can never access unprivileged data through the cluster.
Enable table access control for your workspace
Before users can configure Python and SQL table access control, an Azure Databricks workspace admin must enable table access control for the Azure Databricks workspace and deny users access to clusters that are not enabled for table access control.
- Go to the settings page.
- Click the Security tab.
- Turn on the Table Access Control option.
Enforce table access control
To ensure that your users access only the data that you want them to, you must restrict your users to clusters with table access control enabled. In particular, you should ensure that:
- Users do not have permission to create clusters. If they create a cluster without table access control, they can access any data from that cluster.
- Users do not have CAN ATTACH TO permission for any cluster that is not enabled for table access control.
See Compute permissions for more information.
Create a cluster enabled for table access control
Table access control is enabled by default in clusters with Shared access mode.
To create the cluster using the REST API, see Create new cluster.
Set privileges on a data object
See Hive metastore privileges and securable objects (legacy).