Κοινή χρήση μέσω


Compute policy reference

This article is a reference for compute policy definitions. The articles includes a reference of the available policy attributes and limitation types. There are also sample policies you can reference for common use cases.

What are policy definitions?

Policy definitions are individual policy rules expressed in JSON. A definition can add a rule to any of the attributes controlled with the Clusters API. For example, these definitions set a default autotermination time, forbid users from using pools, and enforce the use of Photon:

{
   "autotermination_minutes" : {
    "type" : "unlimited",
    "defaultValue" : 4320,
    "isOptional" : true
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "runtime_engine": {
    "type": "fixed",
    "value": "PHOTON",
    "hidden": true
  }
}

There can only be one limitation per attribute. An attribute’s path reflects the API attribute name. For nested attributes, the path concatenates the nested attribute names using dots. Attributes that aren’t defined in a policy definition won’t be limited.

Supported attributes

Policies support all attributes controlled with the Clusters API. The type of restrictions you can place on attributes may vary per setting based on their type and relation to the UI elements. You cannot use policies to define compute permissions.

You can also use policies to set the max DBUs per hour and cluster type. See Virtual attribute paths.

The following table lists the supported policy attribute paths:

Attribute path Type Description
autoscale.max_workers optional number When hidden, removes the maximum worker number field from the UI.
autoscale.min_workers optional number When hidden, removes the minimum worker number field from the UI.
autotermination_minutes number A value of 0 represents no auto termination. When hidden, removes the auto termination checkbox and value input from the UI.
azure_attributes.availability string Controls the compute uses on-demand or spot instances (ON_DEMAND_AZURE or SPOT_WITH_FALLBACK_AZURE).
azure_attributes.first_on_demand number Controls the number of nodes to put on on-demand instances.
azure_attributes.spot_bid_max_price number Controls the maximum price for Azure spot instances.
cluster_log_conf.path string The destination URL of the log files.
cluster_log_conf.type string The type of log destination. DBFS is the only acceptable value.
cluster_name string The cluster name.
custom_tags.* string Control specific tag values by appending the tag name, for example: custom_tags.<mytag>.
data_security_mode string Sets the access mode of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION (shared access mode in the UI). A value of NONE means no security features are enabled.
docker_image.basic_auth.password string The password for the Databricks Container Services image basic authentication.
docker_image.basic_auth.username string The user name for the Databricks Container Services image basic authentication.
docker_image.url string Controls the Databricks Container Services image URL. When hidden, removes the Databricks Container Services section from the UI.
driver_node_type_id optional string When hidden, removes the driver node type selection from the UI.
enable_local_disk_encryption boolean Set to true to enable, or false to disable, encrypting disks that are locally attached to the cluster (as specified through the API).
init_scripts.*.workspace.destination init_scripts.*.volumes.destination init_scripts.*.abfss.destination init_scripts.*.file.destination string * refers to the index of the init script in the attribute array. See Writing policies for array attributes.
instance_pool_id string Controls the pool used by worker nodes if driver_instance_pool_id is also defined, or for all cluster nodes otherwise. If you use pools for worker nodes, you must also use pools for the driver node. When hidden, removes pool selection from the UI.
driver_instance_pool_id string If specified, configures different pool for the driver node than for worker nodes. If not specified, inherits instance_pool_id. If you use pools for worker nodes, you must also use pools for the driver node. When hidden, removes driver pool selection from the UI.
node_type_id string When hidden, removes the worker node type selection from the UI.
num_workers optional number When hidden, removes the worker number specification from the UI.
runtime_engine string Determines whether the cluster uses Photon or not. Possible values are PHOTON or STANDARD.
single_user_name string The user name for credential passthrough single user access.
spark_conf.* optional string Controls specific configuration values by appending the configuration key name, for example: spark_conf.spark.executor.memory.
spark_env_vars.* optional string Controls specific Spark environment variable values by appending the environment variable, for example: spark_env_vars.<environment variable name>.
spark_version string The Spark image version name as specified through the API (the Databricks Runtime). You can also use special policy values that dynamically select the Databricks Runtime. See Special policy values for Databricks Runtime selection.
workload_type.clients.jobs boolean Defines whether the compute resource can be used for jobs. See Prevent compute from being used with jobs.
workload_type.clients.notebooks boolean Defines whether the compute resource can be used with notebooks. See Prevent compute from being used with jobs.

Virtual attribute paths

This table includes two additional synthetic attributes supported by policies:

Attribute path Type Description
dbus_per_hour number Calculated attribute representing the maximum DBUs a resource can use on an hourly basis including the driver node. This metric is a direct way to control cost at the individual compute level. Use with range limitation.
cluster_type string Represents the type of cluster that can be created:

- all-purpose for Azure Databricks all-purpose compute
- job for job compute created by the job scheduler
- dlt for compute created for Delta Live Tables pipelines

Allow or block specified types of compute to be created from the policy. If the all-purpose value is not allowed, the policy is not shown in the all-purpose create compute UI. If the job value is not allowed, the policy is not shown in the create job compute UI.

Special policy values for Databricks Runtime selection

The spark_version attribute supports special values that dynamically map to a Databricks Runtime version based on the current set of supported Databricks Runtime versions.

The following values can be used in the spark_version attribute:

  • auto:latest: Maps to the latest GA Databricks Runtime version.
  • auto:latest-ml: Maps to the latest Databricks Runtime ML version.
  • auto:latest-lts: Maps to the latest long-term support (LTS) Databricks Runtime version.
  • auto:latest-lts-ml: Maps to the latest LTS Databricks Runtime ML version.
  • auto:prev-major: Maps to the second-latest GA Databricks Runtime version. For example, if auto:latest is 14.2, then auto:prev-major is 13.3.
  • auto:prev-major-ml: Maps to the second-latest GA Databricks Runtime ML version. For example, if auto:latest is 14.2, then auto:prev-major is 13.3.
  • auto:prev-lts: Maps to the second-latest LTS Databricks Runtime version. For example, if auto:latest-lts is 13.3, then auto:prev-lts is 12.2.
  • auto:prev-lts-ml: Maps to the second-latest LTS Databricks Runtime ML version. For example, if auto:latest-lts is 13.3, then auto:prev-lts is 12.2.

Note

Using these values does not make the compute auto-update when a new runtime version is released. A user must explicitly edit the compute for the Databricks Runtime version to change.

Supported policy types

This section includes a reference for each of the available policy types. There are two categories of policy types: fixed policies and limiting policies.

Fixed policies prevent user configuration on an attribute. The two types of fixed policies are:

Limiting policies limit a user’s options for configuring an attribute. Limiting policies also allow you to set default values and make attributes optional. See Additional limiting policy fields.

Your options for limiting policies are:

Fixed policy

Fixed policies limit the attribute to the specified value. For attribute values other than numeric and boolean, the value must be represented by or convertible to a string.

With fixed policies, you can also hide the attribute from the UI by setting the hidden field to true.

interface FixedPolicy {
    type: "fixed";
    value: string | number | boolean;
    hidden?: boolean;
}

This example policy fixes the Databricks Runtime version and hides the field from the user’s UI:

{
  "spark_version": { "type": "fixed", "value": "auto:latest-lts", "hidden": true }
}

Forbidden policy

A forbidden policy prevents users from configuring an attribute. Forbidden policies are only compatible with optional attributes.

interface ForbiddenPolicy {
    type: "forbidden";
}

This policy forbids attaching pools to the compute for worker nodes. Pools are also forbidden for the driver node, because driver_instance_pool_id inherits the policy.

{
  "instance_pool_id": { "type": "forbidden" }
}

Allowlist policy

An allowlist policy specifies a list of values the user can choose between when configuring an attribute.

interface AllowlistPolicy {
  type: "allowlist";
  values: (string | number | boolean)[];
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This allowlist example allows the user to select between two Databricks Runtime versions:

{
  "spark_version":  { "type": "allowlist", "values": [ "13.3.x-scala2.12", "12.2.x-scala2.12" ] }
}

Blocklist policy

The blocklist policy lists disallowed values. Since the values must be exact matches, this policy might not work as expected when the attribute is lenient in how the value is represented (for example, allowing leading and trailing spaces).

interface BlocklistPolicy {
  type: "blocklist";
  values: (string | number | boolean)[];
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This example blocks the user from selecting 7.3.x-scala2.12 as the Databricks Runtime.

{
  "spark_version":  { "type": "blocklist", "values": [ "7.3.x-scala2.12" ] }
}

Regex policy

A regex policy limits the available values to ones that match the regex. For safety, make sure your regex is anchored to the beginning and end of the string value.

interface RegexPolicy {
  type: "regex";
  pattern: string;
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This example limits the Databricks Runtime versions a user can select from:

{
  "spark_version":  { "type": "regex", "pattern": "13\\.[3456].*" }
}

Range policy

A range policy limits the value to a specified range using the minValue and maxValue fields. The value must be a decimal number. The numeric limits must be representable as a double floating point value. To indicate lack of a specific limit, you can omit either minValue or maxValue.

interface RangePolicy {
  type: "range";
  minValue?: number;
  maxValue?: number;
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This example limits the maximum amount of workers to 10:

{
  "num_workers":  { "type": "range", "maxValue": 10 }
}

Unlimited policy

The unlimited policy is used to make attributes required or to set the default value in the UI.

interface UnlimitedPolicy {
  type: "unlimited";
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This example adds the COST_BUCKET tag to the compute:

{
  "custom_tags.COST_BUCKET":  { "type": "unlimited" }
}

To set a default value for a Spark configuration variable, but also allow omitting (removing) it:

{
  "spark_conf.spark.my.conf":  { "type": "unlimited", "isOptional": true, "defaultValue": "my_value" }
}

Additional limiting policy fields

For limiting policy types you can specify two additional fields:

  • defaultValue - The value that automatically populates in the create compute UI.
  • isOptional - A limiting policy on an attribute automatically makes it required. To make the attribute optional, set the isOptional field to true.

Note

Default values don’t automatically get applied to compute created with the Clusters API. To apply default values using the API, add the parameter apply_policy_default_values to the compute definition and set it to true.

This example policy specifies the default value id1 for the pool for worker nodes, but makes it optional. When creating the compute, you can select a different pool or choose not to use one. If driver_instance_pool_id isn’t defined in the policy or when creating the compute, the same pool is used for worker nodes and the driver node.

{
  "instance_pool_id": { "type": "unlimited", "isOptional": true, "defaultValue": "id1" }
}

Writing policies for array attributes

You can specify policies for array attributes in two ways:

  • Generic limitations for all array elements. These limitations use the * wildcard symbol in the policy path.
  • Specific limitations for an array element at a specific index. These limitation use a number in the path.

For example, for the array attribute init_scripts, the generic paths start with init_scripts.* and the specific paths with init_scripts.<n>, where <n> is an integer index in the array (starting with 0). You can combine generic and specific limitations, in which case the generic limitation applies to each array element that does not have a specific limitation. In each case only one policy limitation will apply.

The following sections show examples of common examples that use array attributes.

Require inclusion-specific entries

You cannot require specific values without specifying the order. For example:

{
  "init_scripts.0.volumes.destination": {
    "type": "fixed",
    "value": "<required-script-1>"
  },
  "init_scripts.1.volumes.destination": {
    "type": "fixed",
    "value": "<required-script-2>"
  }
}

Require a fixed value of the entire list

{
  "init_scripts.0.volumes.destination": {
    "type": "fixed",
    "value": "<required-script-1>"
  },
  "init_scripts.*.volumes.destination": {
    "type": "forbidden"
  }
}

Disallow the use altogether

{
   "init_scripts.*.volumes.destination": {
    "type": "forbidden"
  }
}

Allow entries that follow specific restriction

{
    "init_scripts.*.volumes.destination": {
    "type": "regex",
    "pattern": ".*<required-content>.*"
  }
}

Fix a specific set of init scripts

In case of init_scripts paths, the array can contain one of multiple structures for which all possible variants may need to be handled depending on the use case. For example, to require a specific set of init scripts, and disallow any variant of the other version, you can use the following pattern:

{
  "init_scripts.0.volumes.destination": {
    "type": "fixed",
    "value": "<volume-paths>"
  },
  "init_scripts.1.volumes.destination": {
    "type": "fixed",
    "value": "<volume-paths>"
  },
  "init_scripts.*.workspace.destination": {
    "type": "forbidden"
  },
  "init_scripts.*.abfss.destination": {
    "type": "forbidden"
  },
  "init_scripts.*.file.destination": {
    "type": "forbidden"
  }
}

Policy examples

This section includes policy examples you can use as references for creating your own policies. You can also use the Azure Databricks-provided policy families as templates for common policy use cases.

General compute policy

A general purpose compute policy meant to guide users and restrict some functionality, while requiring tags, restricting the maximum number of instances, and enforcing timeout.

{
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_version": {
    "type": "regex",
    "pattern": "12\\.[0-9]+\\.x-scala.*"
  },
  "node_type_id": {
    "type": "allowlist",
    "values": [
      "Standard_L4s",
      "Standard_L8s",
      "Standard_L16s"
    ],
    "defaultValue": "Standard_L16s_v2"
  },
  "driver_node_type_id": {
    "type": "fixed",
    "value": "Standard_L16s_v2",
    "hidden": true
  },
  "autoscale.min_workers": {
    "type": "fixed",
    "value": 1,
    "hidden": true
  },
  "autoscale.max_workers": {
    "type": "range",
    "maxValue": 25,
    "defaultValue": 5
  },
  "autotermination_minutes": {
    "type": "fixed",
    "value": 30,
    "hidden": true
  },
  "custom_tags.team": {
    "type": "fixed",
    "value": "product"
  }
}

Define limits on Delta Live Tables pipeline compute

Note

When using policies to configure Delta Live Tables compute, Databricks recommends applying a single policy to both the default and maintenance compute.

To configure a policy for a pipeline compute, create a policy with the cluster_type field set to dlt. The following example creates a minimal policy for a Delta Live Tables compute:

{
  "cluster_type": {
    "type": "fixed",
    "value": "dlt"
  },
  "num_workers": {
    "type": "unlimited",
    "defaultValue": 3,
    "isOptional": true
  },
  "node_type_id": {
    "type": "unlimited",
    "isOptional": true
  },
  "spark_version": {
    "type": "unlimited",
    "hidden": true
  }
}

Simple medium-sized policy

Allows users to create a medium-sized compute with minimal configuration. The only required field at creation time is compute name; the rest is fixed and hidden.

{
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "forbidden",
    "hidden": true
  },
  "autoscale.min_workers": {
    "type": "fixed",
    "value": 1,
    "hidden": true
  },
  "autoscale.max_workers": {
    "type": "fixed",
    "value": 10,
    "hidden": true
  },
  "autotermination_minutes": {
    "type": "fixed",
    "value": 60,
    "hidden": true
  },
  "node_type_id": {
    "type": "fixed",
    "value": "Standard_L8s_v2",
    "hidden": true
  },
  "driver_node_type_id": {
    "type": "fixed",
    "value": "Standard_L8s_v2",
    "hidden": true
  },
  "spark_version": {
    "type": "fixed",
    "value": "auto:latest-ml",
    "hidden": true
  },
  "custom_tags.team": {
    "type": "fixed",
    "value": "product"
  }
}

Job-only policy

Allows users to create job compute to run jobs. Users cannot create all-purpose compute using this policy.

{
  "cluster_type": {
    "type": "fixed",
    "value": "job"
  },
  "dbus_per_hour": {
    "type": "range",
    "maxValue": 100
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "num_workers": {
    "type": "range",
    "minValue": 1
  },
  "node_type_id": {
    "type": "regex",
    "pattern": "Standard_[DLS]*[1-6]{1,2}_v[2,3]"
  },
  "driver_node_type_id": {
    "type": "regex",
    "pattern": "Standard_[DLS]*[1-6]{1,2}_v[2,3]"
  },
  "spark_version": {
    "type": "unlimited",
    "defaultValue": "auto:latest-lts"
  },
  "custom_tags.team": {
    "type": "fixed",
    "value": "product"
  }
}

External metastore policy

Allows users to create compute with an admin-defined metastore already attached. This is useful to allow users to create their own compute without requiring additional configuration.

{
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionURL": {
      "type": "fixed",
      "value": "jdbc:sqlserver://<jdbc-url>"
  },
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionDriverName": {
      "type": "fixed",
      "value": "com.microsoft.sqlserver.jdbc.SQLServerDriver"
  },
  "spark_conf.spark.databricks.delta.preview.enabled": {
      "type": "fixed",
      "value": "true"
  },
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionUserName": {
      "type": "fixed",
      "value": "<metastore-user>"
  },
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionPassword": {
      "type": "fixed",
      "value": "<metastore-password>"
  }
}

Prevent compute from being used with jobs

This policy prevents users from using the compute to run jobs. Users will only be able to use the compute with notebooks.

{
  "workload_type.clients.notebooks": {
    "type": "fixed",
    "value": true
  },
  "workload_type.clients.jobs": {
    "type": "fixed",
    "value": false
  }
}

Remove autoscaling policy

This policy disables autoscaling and allows the user to set the number of workers within a given range.

{
  "num_workers": {
  "type": "range",
  "maxValue": 25,
  "minValue": 1,
  "defaultValue": 5
  }
}

Custom tag enforcement

To add a compute tag rule to a policy, use the custom_tags.<tag-name> attribute.

For example, any user using this policy needs to fill in a COST_CENTER tag with 9999, 9921, or 9531 for the compute to launch:

   {"custom_tags.COST_CENTER": {"type":"allowlist", "values":["9999", "9921", "9531" ]}}