Compute policy reference

Artikkeli
12/17/2024

This article is a reference for compute policy definitions. The articles includes a reference of the available policy attributes and limitation types. There are also sample policies you can reference for common use cases.

What are policy definitions?

Policy definitions are individual policy rules expressed in JSON. A definition can add a rule to any of the attributes controlled with the Clusters API. For example, these definitions set a default autotermination time, forbid users from using pools, and enforce the use of Photon:

{
   "autotermination_minutes" : {
    "type" : "unlimited",
    "defaultValue" : 4320,
    "isOptional" : true
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "runtime_engine": {
    "type": "fixed",
    "value": "PHOTON",
    "hidden": true
  }
}

There can only be one limitation per attribute. An attribute’s path reflects the API attribute name. For nested attributes, the path concatenates the nested attribute names using dots. Attributes that aren’t defined in a policy definition won’t be limited.

Supported attributes

Policies support all attributes controlled with the Clusters API. The type of restrictions you can place on attributes may vary per setting based on their type and relation to the UI elements. You cannot use policies to define compute permissions.

You can also use policies to set the max DBUs per hour and cluster type. See Virtual attribute paths.

The following table lists the supported policy attribute paths:

Attribute path	Type	Description
`autoscale.max_workers`	optional number	When hidden, removes the maximum worker number field from the UI.
`autoscale.min_workers`	optional number	When hidden, removes the minimum worker number field from the UI.
`autotermination_minutes`	number	A value of 0 represents no auto termination. When hidden, removes the auto termination checkbox and value input from the UI.
`azure_attributes.availability`	string	Controls the compute uses on-demand or spot instances (`ON_DEMAND_AZURE` or `SPOT_WITH_FALLBACK_AZURE`).
`azure_attributes.first_on_demand`	number	Controls the number of nodes to put on on-demand instances.
`azure_attributes.spot_bid_max_price`	number	Controls the maximum price for Azure spot instances.
`cluster_log_conf.path`	string	The destination URL of the log files.
`cluster_log_conf.type`	string	The type of log destination. `DBFS` is the only acceptable value.
`cluster_name`	string	The cluster name.
`custom_tags.*`	string	Control specific tag values by appending the tag name, for example: `custom_tags.<mytag>`.
`data_security_mode`	string	Sets the access mode of the cluster. Unity Catalog requires `SINGLE_USER` or `USER_ISOLATION` (shared access mode in the UI). A value of `NONE` means no security features are enabled.
`docker_image.basic_auth.password`	string	The password for the Databricks Container Services image basic authentication.
`docker_image.basic_auth.username`	string	The user name for the Databricks Container Services image basic authentication.
`docker_image.url`	string	Controls the Databricks Container Services image URL. When hidden, removes the Databricks Container Services section from the UI.
`driver_node_type_id`	optional string	When hidden, removes the driver node type selection from the UI.
`enable_local_disk_encryption`	boolean	Set to `true` to enable, or `false` to disable, encrypting disks that are locally attached to the cluster (as specified through the API).
`init_scripts..workspace.destination` `init_scripts..volumes.destination` `init_scripts..abfss.destination` `init_scripts..file.destination`	string	`*` refers to the index of the init script in the attribute array. See Writing policies for array attributes.
`instance_pool_id`	string	Controls the pool used by worker nodes if `driver_instance_pool_id` is also defined, or for all cluster nodes otherwise. If you use pools for worker nodes, you must also use pools for the driver node. When hidden, removes pool selection from the UI.
`driver_instance_pool_id`	string	If specified, configures different pool for the driver node than for worker nodes. If not specified, inherits `instance_pool_id`. If you use pools for worker nodes, you must also use pools for the driver node. When hidden, removes driver pool selection from the UI.
`node_type_id`	string	When hidden, removes the worker node type selection from the UI.
`num_workers`	optional number	When hidden, removes the worker number specification from the UI.
`runtime_engine`	string	Determines whether the cluster uses Photon or not. Possible values are `PHOTON` or `STANDARD`.
`single_user_name`	string	Controls which users or groups can be assigned to the compute resource.
`spark_conf.*`	optional string	Controls specific configuration values by appending the configuration key name, for example: `spark_conf.spark.executor.memory`.
`spark_env_vars.*`	optional string	Controls specific Spark environment variable values by appending the environment variable, for example: `spark_env_vars.<environment variable name>`.
`spark_version`	string	The Spark image version name as specified through the API (the Databricks Runtime). You can also use special policy values that dynamically select the Databricks Runtime. See Special policy values for Databricks Runtime selection.
`workload_type.clients.jobs`	boolean	Defines whether the compute resource can be used for jobs. See Prevent compute from being used with jobs.
`workload_type.clients.notebooks`	boolean	Defines whether the compute resource can be used with notebooks. See Prevent compute from being used with jobs.

Virtual attribute paths

This table includes two additional synthetic attributes supported by policies:

Attribute path	Type	Description
`dbus_per_hour`	number	Calculated attribute representing the maximum DBUs a resource can use on an hourly basis including the driver node. This metric is a direct way to control cost at the individual compute level. Use with range limitation.
`cluster_type`	string	Represents the type of cluster that can be created: - `all-purpose` for Azure Databricks all-purpose compute - `job` for job compute created by the job scheduler - `dlt` for compute created for Delta Live Tables pipelines Allow or block specified types of compute to be created from the policy. If the `all-purpose` value is not allowed, the policy is not shown in the all-purpose create compute UI. If the `job` value is not allowed, the policy is not shown in the create job compute UI.

Special policy values for Databricks Runtime selection

The spark_version attribute supports special values that dynamically map to a Databricks Runtime version based on the current set of supported Databricks Runtime versions.

The following values can be used in the spark_version attribute:

auto:latest: Maps to the latest GA Databricks Runtime version.
auto:latest-ml: Maps to the latest Databricks Runtime ML version.
auto:latest-lts: Maps to the latest long-term support (LTS) Databricks Runtime version.
auto:latest-lts-ml: Maps to the latest LTS Databricks Runtime ML version.
auto:prev-major: Maps to the second-latest GA Databricks Runtime version. For example, if auto:latest is 14.2, then auto:prev-major is 13.3.
auto:prev-major-ml: Maps to the second-latest GA Databricks Runtime ML version. For example, if auto:latest is 14.2, then auto:prev-major is 13.3.
auto:prev-lts: Maps to the second-latest LTS Databricks Runtime version. For example, if auto:latest-lts is 13.3, then auto:prev-lts is 12.2.
auto:prev-lts-ml: Maps to the second-latest LTS Databricks Runtime ML version. For example, if auto:latest-lts is 13.3, then auto:prev-lts is 12.2.

Note

Using these values does not make the compute auto-update when a new runtime version is released. A user must explicitly edit the compute for the Databricks Runtime version to change.

Supported policy types

This section includes a reference for each of the available policy types. There are two categories of policy types: fixed policies and limiting policies.

Fixed policies prevent user configuration on an attribute. The two types of fixed policies are:

Fixed policy
Forbidden policy

Limiting policies limit a user’s options for configuring an attribute. Limiting policies also allow you to set default values and make attributes optional. See Additional limiting policy fields.

Your options for limiting policies are:

Allowlist policy
Blocklist policy
Regex policy
Range policy
Unlimited policy

Fixed policy

Fixed policies limit the attribute to the specified value. For attribute values other than numeric and boolean, the value must be represented by or convertible to a string.

With fixed policies, you can also hide the attribute from the UI by setting the hidden field to true.

interface FixedPolicy {
    type: "fixed";
    value: string | number | boolean;
    hidden?: boolean;
}

This example policy fixes the Databricks Runtime version and hides the field from the user’s UI:

{
  "spark_version": { "type": "fixed", "value": "auto:latest-lts", "hidden": true }
}

Forbidden policy

A forbidden policy prevents users from configuring an attribute. Forbidden policies are only compatible with optional attributes.

interface ForbiddenPolicy {
    type: "forbidden";
}

This policy forbids attaching pools to the compute for worker nodes. Pools are also forbidden for the driver node, because driver_instance_pool_id inherits the policy.

{
  "instance_pool_id": { "type": "forbidden" }
}

Allowlist policy

An allowlist policy specifies a list of values the user can choose between when configuring an attribute.

interface AllowlistPolicy {
  type: "allowlist";
  values: (string | number | boolean)[];
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This allowlist example allows the user to select between two Databricks Runtime versions:

{
  "spark_version":  { "type": "allowlist", "values": [ "13.3.x-scala2.12", "12.2.x-scala2.12" ] }
}

Blocklist policy

The blocklist policy lists disallowed values. Since the values must be exact matches, this policy might not work as expected when the attribute is lenient in how the value is represented (for example, allowing leading and trailing spaces).

interface BlocklistPolicy {
  type: "blocklist";
  values: (string | number | boolean)[];
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This example blocks the user from selecting 7.3.x-scala2.12 as the Databricks Runtime.

{
  "spark_version":  { "type": "blocklist", "values": [ "7.3.x-scala2.12" ] }
}

Regex policy

A regex policy limits the available values to ones that match the regex. For safety, make sure your regex is anchored to the beginning and end of the string value.

interface RegexPolicy {
  type: "regex";
  pattern: string;
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This example limits the Databricks Runtime versions a user can select from:

{
  "spark_version":  { "type": "regex", "pattern": "13\\.[3456].*" }
}

Range policy

A range policy limits the value to a specified range using the minValue and maxValue fields. The value must be a decimal number. The numeric limits must be representable as a double floating point value. To indicate lack of a specific limit, you can omit either minValue or maxValue.

interface RangePolicy {
  type: "range";
  minValue?: number;
  maxValue?: number;
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This example limits the maximum amount of workers to 10:

{
  "num_workers":  { "type": "range", "maxValue": 10 }
}

Unlimited policy

The unlimited policy is used to make attributes required or to set the default value in the UI.

interface UnlimitedPolicy {
  type: "unlimited";
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This example adds the COST_BUCKET tag to the compute:

{
  "custom_tags.COST_BUCKET":  { "type": "unlimited" }
}

To set a default value for a Spark configuration variable, but also allow omitting (removing) it:

{
  "spark_conf.spark.my.conf":  { "type": "unlimited", "isOptional": true, "defaultValue": "my_value" }
}

Additional limiting policy fields

For limiting policy types you can specify two additional fields:

defaultValue - The value that automatically populates in the create compute UI.
isOptional - A limiting policy on an attribute automatically makes it required. To make the attribute optional, set the isOptional field to true.

Note

Default values don’t automatically get applied to compute created with the Clusters API. To apply default values using the API, add the parameter apply_policy_default_values to the compute definition and set it to true.

This example policy specifies the default value id1 for the pool for worker nodes, but makes it optional. When creating the compute, you can select a different pool or choose not to use one. If driver_instance_pool_id isn’t defined in the policy or when creating the compute, the same pool is used for worker nodes and the driver node.

{
  "instance_pool_id": { "type": "unlimited", "isOptional": true, "defaultValue": "id1" }
}

Writing policies for array attributes

You can specify policies for array attributes in two ways:

Generic limitations for all array elements. These limitations use the * wildcard symbol in the policy path.
Specific limitations for an array element at a specific index. These limitation use a number in the path.

For example, for the array attribute init_scripts, the generic paths start with init_scripts.* and the specific paths with init_scripts.<n>, where <n> is an integer index in the array (starting with 0). You can combine generic and specific limitations, in which case the generic limitation applies to each array element that does not have a specific limitation. In each case only one policy limitation will apply.

The following sections show examples of common examples that use array attributes.

Require inclusion-specific entries

You cannot require specific values without specifying the order. For example:

{
  "init_scripts.0.volumes.destination": {
    "type": "fixed",
    "value": "<required-script-1>"
  },
  "init_scripts.1.volumes.destination": {
    "type": "fixed",
    "value": "<required-script-2>"
  }
}

Require a fixed value of the entire list

{
  "init_scripts.0.volumes.destination": {
    "type": "fixed",
    "value": "<required-script-1>"
  },
  "init_scripts.*.volumes.destination": {
    "type": "forbidden"
  }
}

Disallow the use altogether

{
   "init_scripts.*.volumes.destination": {
    "type": "forbidden"
  }
}

Allow entries that follow specific restriction

{
    "init_scripts.*.volumes.destination": {
    "type": "regex",
    "pattern": ".*<required-content>.*"
  }
}

Fix a specific set of init scripts

In case of init_scripts paths, the array can contain one of multiple structures for which all possible variants may need to be handled depending on the use case. For example, to require a specific set of init scripts, and disallow any variant of the other version, you can use the following pattern:

{
  "init_scripts.0.volumes.destination": {
    "type": "fixed",
    "value": "<volume-paths>"
  },
  "init_scripts.1.volumes.destination": {
    "type": "fixed",
    "value": "<volume-paths>"
  },
  "init_scripts.*.workspace.destination": {
    "type": "forbidden"
  },
  "init_scripts.*.abfss.destination": {
    "type": "forbidden"
  },
  "init_scripts.*.file.destination": {
    "type": "forbidden"
  }
}

Policy examples

This section includes policy examples you can use as references for creating your own policies. You can also use the Azure Databricks-provided policy families as templates for common policy use cases.

General compute policy
Define limits on Delta Live Tables pipeline compute
Simple medium-sized policy
Job-only policy
External metastore policy
Prevent compute from being used with jobs
Remove autoscaling policy
Custom tag enforcement

General compute policy

A general purpose compute policy meant to guide users and restrict some functionality, while requiring tags, restricting the maximum number of instances, and enforcing timeout.

{
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_version": {
    "type": "regex",
    "pattern": "12\\.[0-9]+\\.x-scala.*"
  },
  "node_type_id": {
    "type": "allowlist",
    "values": [
      "Standard_L4s",
      "Standard_L8s",
      "Standard_L16s"
    ],
    "defaultValue": "Standard_L16s_v2"
  },
  "driver_node_type_id": {
    "type": "fixed",
    "value": "Standard_L16s_v2",
    "hidden": true
  },
  "autoscale.min_workers": {
    "type": "fixed",
    "value": 1,
    "hidden": true
  },
  "autoscale.max_workers": {
    "type": "range",
    "maxValue": 25,
    "defaultValue": 5
  },
  "autotermination_minutes": {
    "type": "fixed",
    "value": 30,
    "hidden": true
  },
  "custom_tags.team": {
    "type": "fixed",
    "value": "product"
  }
}

Define limits on Delta Live Tables pipeline compute

Note

When using policies to configure Delta Live Tables compute, Databricks recommends applying a single policy to both the default and maintenance compute.

To configure a policy for a pipeline compute, create a policy with the cluster_type field set to dlt. The following example creates a minimal policy for a Delta Live Tables compute:

{
  "cluster_type": {
    "type": "fixed",
    "value": "dlt"
  },
  "num_workers": {
    "type": "unlimited",
    "defaultValue": 3,
    "isOptional": true
  },
  "node_type_id": {
    "type": "unlimited",
    "isOptional": true
  },
  "spark_version": {
    "type": "unlimited",
    "hidden": true
  }
}

Simple medium-sized policy

Allows users to create a medium-sized compute with minimal configuration. The only required field at creation time is compute name; the rest is fixed and hidden.

{
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "forbidden",
    "hidden": true
  },
  "autoscale.min_workers": {
    "type": "fixed",
    "value": 1,
    "hidden": true
  },
  "autoscale.max_workers": {
    "type": "fixed",
    "value": 10,
    "hidden": true
  },
  "autotermination_minutes": {
    "type": "fixed",
    "value": 60,
    "hidden": true
  },
  "node_type_id": {
    "type": "fixed",
    "value": "Standard_L8s_v2",
    "hidden": true
  },
  "driver_node_type_id": {
    "type": "fixed",
    "value": "Standard_L8s_v2",
    "hidden": true
  },
  "spark_version": {
    "type": "fixed",
    "value": "auto:latest-ml",
    "hidden": true
  },
  "custom_tags.team": {
    "type": "fixed",
    "value": "product"
  }
}

Job-only policy

Allows users to create job compute to run jobs. Users cannot create all-purpose compute using this policy.

{
  "cluster_type": {
    "type": "fixed",
    "value": "job"
  },
  "dbus_per_hour": {
    "type": "range",
    "maxValue": 100
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "num_workers": {
    "type": "range",
    "minValue": 1
  },
  "node_type_id": {
    "type": "regex",
    "pattern": "Standard_[DLS]*[1-6]{1,2}_v[2,3]"
  },
  "driver_node_type_id": {
    "type": "regex",
    "pattern": "Standard_[DLS]*[1-6]{1,2}_v[2,3]"
  },
  "spark_version": {
    "type": "unlimited",
    "defaultValue": "auto:latest-lts"
  },
  "custom_tags.team": {
    "type": "fixed",
    "value": "product"
  }
}

External metastore policy

Allows users to create compute with an admin-defined metastore already attached. This is useful to allow users to create their own compute without requiring additional configuration.

{
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionURL": {
      "type": "fixed",
      "value": "jdbc:sqlserver://<jdbc-url>"
  },
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionDriverName": {
      "type": "fixed",
      "value": "com.microsoft.sqlserver.jdbc.SQLServerDriver"
  },
  "spark_conf.spark.databricks.delta.preview.enabled": {
      "type": "fixed",
      "value": "true"
  },
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionUserName": {
      "type": "fixed",
      "value": "<metastore-user>"
  },
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionPassword": {
      "type": "fixed",
      "value": "<metastore-password>"
  }
}

Prevent compute from being used with jobs

This policy prevents users from using the compute to run jobs. Users will only be able to use the compute with notebooks.

{
  "workload_type.clients.notebooks": {
    "type": "fixed",
    "value": true
  },
  "workload_type.clients.jobs": {
    "type": "fixed",
    "value": false
  }
}

Remove autoscaling policy

This policy disables autoscaling and allows the user to set the number of workers within a given range.

{
  "num_workers": {
  "type": "range",
  "maxValue": 25,
  "minValue": 1,
  "defaultValue": 5
  }
}

Custom tag enforcement

To add a compute tag rule to a policy, use the custom_tags.<tag-name> attribute.

For example, any user using this policy needs to fill in a COST_CENTER tag with 9999, 9921, or 9531 for the compute to launch:

   {"custom_tags.COST_CENTER": {"type":"allowlist", "values":["9999", "9921", "9531" ]}}

Jaa

Compute policy reference

What are policy definitions?

Supported attributes

Virtual attribute paths

Special policy values for Databricks Runtime selection

Supported policy types

Fixed policy

Forbidden policy

Allowlist policy

Blocklist policy

Regex policy

Range policy

Unlimited policy

Additional limiting policy fields

Writing policies for array attributes

Require inclusion-specific entries

Require a fixed value of the entire list

Disallow the use altogether

Allow entries that follow specific restriction

Fix a specific set of init scripts

Policy examples

General compute policy

Define limits on Delta Live Tables pipeline compute

Simple medium-sized policy

Job-only policy

External metastore policy

Prevent compute from being used with jobs

Remove autoscaling policy

Custom tag enforcement

Palaute

Lisäresursseja