Databricks Utilities (`dbutils`) reference

Άρθρο
12/31/2024

This article contains reference for Databricks Utilities (dbutils). The utilities provide commands that enable you to work with your Databricks environment from notebooks. For example, you can manage files and object storage, and work with secrets. dbutils are available in Python, R, and Scala notebooks.

Note

dbutils only supports compute environments that use DBFS.

Utility modules

The following table lists the Databricks Utilities modules, which you can retrieve using dbutils.help().

Module	Description
data	Utilities for understanding and interacting with datasets (EXPERIMENTAL)
fs	Utilities for accessing the Databricks file system (DBFS)
jobs	Utilities for leveraging job features
library	Deprecated. Utilities for managing session-scoped libraries
notebook	Utilities for managing the control flow of notebooks (EXPERIMENTAL)
secrets	Utilities for leveraging secrets within notebooks
widgets	Utilities for parameterizing notebooks.
api	Utilities for managing application builds

Command help

To list commands for a utility module along with a short description of each command, append .help() after the name of the utility module. The following example lists available commands for the notebook utility:

dbutils.notebook.help()

The notebook module.

exit(value: String): void -> This method lets you exit a notebook with a value
run(path: String, timeoutSeconds: int, arguments: Map): String -> This method runs a notebook and returns its exit value

To output help for a command, run dbutils.<utility-name>.help("<command-name>"). The following example displays help for the file system utilities copy command, dbutils.fs.cp:

dbutils.fs.help("cp")

/**
* Copies a file or directory, possibly across FileSystems.
*
* Example: cp("/mnt/my-folder/a", "dbfs:/a/b")
*
* @param from FileSystem URI of the source file or directory
* @param to FileSystem URI of the destination file or directory
* @param recurse if true, all files and directories will be recursively copied
* @return true if all files were successfully copied
*/
cp(from: java.lang.String, to: java.lang.String, recurse: boolean = false): boolean

Data utility (dbutils.data)

Important

This feature is in Public Preview.

Note

Available in Databricks Runtime 9.0 and above.

The data utility allows you to understand and interact with datasets.

The following table lists the available commands for this utility, which you can retrieve using dbutils.data.help().

Command	Description
summarize	Summarize a Spark DataFrame and visualize the statistics to get quick insights

summarize command (dbutils.data.summarize)

Note

This feature is in Public Preview.

summarize(df: Object, precise: boolean): void

Calculates and displays summary statistics of an Apache Spark DataFrame or pandas DataFrame. This command is available for Python, Scala and R.

Important

This command analyzes the complete contents of the DataFrame. Running this command for very large DataFrames can be very expensive.

To display complete help for this command, run:

dbutils.data.help("summarize")

In Databricks Runtime 10.4 LTS and above, you can use the additional precise parameter to adjust the precision of the computed statistics.

When precise is set to false (the default), some returned statistics include approximations to reduce run time.
- The number of distinct values for categorical columns may have ~5% relative error for high-cardinality columns.
- The frequent value counts may have an error of up to 0.01% when the number of distinct values is greater than 10000.
- The histograms and percentile estimates may have an error of up to 0.01% relative to the total number of rows.
When precise is set to true, the statistics are computed with higher precision. All statistics except for the histograms and percentiles for numeric columns are now exact.
- The histograms and percentile estimates may have an error of up to 0.0001% relative to the total number of rows.

The tooltip at the top of the data summary output indicates the mode of the current run.

Example

This example displays summary statistics for an Apache Spark DataFrame with approximations enabled by default. To see the results, run this command in a notebook. This example is based on Sample datasets.

Python

df = spark.read.format('csv').load(
  '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv',
  header=True,
  inferSchema=True
)
dbutils.data.summarize(df)

R

df <- read.df("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", source = "csv", header="true", inferSchema = "true")
dbutils.data.summarize(df)

Scala

val df = spark.read.format("csv")
  .option("inferSchema", "true")
  .option("header", "true")
  .load("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv")
dbutils.data.summarize(df)

The visualization uses SI notation to concisely render numerical values smaller than 0.01 or larger than 10000. As an example, the numerical value 1.25e-15 will be rendered as 1.25f. One exception: the visualization uses “B” for 1.0e9 (giga) instead of “G”.

File system utility (dbutils.fs)

The file system utility allows you to access What is DBFS?, making it easier to use Azure Databricks as a file system.

Warning

The Python implementation of all dbutils.fs methods uses snake_case rather than camelCase for keyword formatting.

For example, dbutils.fs.help() displays the option extraConfigs for dbutils.fs.mount(). However, in Python you would use the keyword extra_configs.

The following table lists the available commands for this utility, which you can retrieve using dbutils.fs.help().

Command	Description
cp	Copies a file or directory, possibly across FileSystems
head	Returns up to the first ‘maxBytes’ bytes of the given file as a String encoded in UTF-8
ls	Lists the contents of a directory
mkdirs	Creates the given directory if it does not exist, also creating any necessary parent directories
mount	Mounts the given source directory into DBFS at the given mount point
mounts	Displays information about what is mounted within DBFS
mv	Moves a file or directory, possibly across FileSystems
put	Writes the given String out to a file, encoded in UTF-8
refreshMounts	Forces all machines in this cluster to refresh their mount cache, ensuring they receive the most recent information
rm	Removes a file or directory
unmount	Deletes a DBFS mount point
updateMount	Similar to mount(), but updates an existing mount point instead of creating a new one

Tip

In notebooks, you can use the %fs magic command to access DBFS. For example, %fs ls /Volumes/main/default/my-volume/ is the same as dbutils.fs.ls("/Volumes/main/default/my-volume/"). See magic commands.

cp command (dbutils.fs.cp)

cp(from: String, to: String, recurse: boolean = false): boolean

Copies a file or directory, possibly across filesystems.

To display complete help for this command, run:

dbutils.fs.help("cp")

Example

This example copies the file named data.csv from /Volumes/main/default/my-volume/ to new-data.csv in the same volume.

Python

dbutils.fs.cp("/Volumes/main/default/my-volume/data.csv", "/Volumes/main/default/my-volume/new-data.csv")

# Out[4]: True

R

dbutils.fs.cp("/Volumes/main/default/my-volume/data.csv", "/Volumes/main/default/my-volume/new-data.csv")

# [1] TRUE

Scala

dbutils.fs.cp("/Volumes/main/default/my-volume/data.csv", "/Volumes/main/default/my-volume/new-data.csv")

// res3: Boolean = true

head command (dbutils.fs.head)

head(file: String, maxBytes: int = 65536): String

Returns up to the specified maximum number of bytes in the given file. The bytes are returned as a UTF-8 encoded string.

To display complete help for this command, run:

dbutils.fs.help("head")

Example

This example displays the first 25 bytes of the file data.csv located in /Volumes/main/default/my-volume/.

Python

dbutils.fs.head("/Volumes/main/default/my-volume/data.csv", 25)

# [Truncated to first 25 bytes]
# Out[12]: 'Year,First Name,County,Se'

R

dbutils.fs.head("/Volumes/main/default/my-volume/data.csv", 25)

# [1] "Year,First Name,County,Se"

Scala

dbutils.fs.head("/Volumes/main/default/my-volume/data.csv", 25)

// [Truncated to first 25 bytes]
// res4: String =
// "Year,First Name,County,Se"

ls command (dbutils.fs.ls)

ls(dir: String): Seq

Lists the contents of a directory.

To display complete help for this command, run:

dbutils.fs.help("ls")

Example

This example displays information about the contents of /Volumes/main/default/my-volume/. The modificationTime field is available in Databricks Runtime 10.4 LTS and above. In R, modificationTime is returned as a string.

Python

dbutils.fs.ls("/Volumes/main/default/my-volume/")

# Out[13]: [FileInfo(path='dbfs:/Volumes/main/default/my-volume/data.csv', name='data.csv', size=2258987, modificationTime=1711357839000)]

R

dbutils.fs.ls("/Volumes/main/default/my-volume/")

# For prettier results from dbutils.fs.ls(<dir>), please use `%fs ls <dir>`

# [[1]]
# [[1]]$path
# [1] "/Volumes/main/default/my-volume/data.csv"

# [[1]]$name
# [1] "data.csv"

# [[1]]$size
# [1] 2258987

# [[1]]$isDir
# [1] FALSE

# [[1]]$isFile
# [1] TRUE

# [[1]]$modificationTime
# [1] "1711357839000"

Scala

dbutils.fs.ls("/tmp")

// res6: Seq[com.databricks.backend.daemon.dbutils.FileInfo] = WrappedArray(FileInfo(/Volumes/main/default/my-volume/data.csv, 2258987, 1711357839000))

mkdirs command (dbutils.fs.mkdirs)

mkdirs(dir: String): boolean

Creates the given directory if it does not exist. Also creates any necessary parent directories.

To display complete help for this command, run:

dbutils.fs.help("mkdirs")

Example

This example creates the directory my-data within /Volumes/main/default/my-volume/.

Python

dbutils.fs.mkdirs("/Volumes/main/default/my-volume/my-data")

# Out[15]: True

R

dbutils.fs.mkdirs("/Volumes/main/default/my-volume/my-data")

# [1] TRUE

Scala

dbutils.fs.mkdirs("/Volumes/main/default/my-volume/my-data")

// res7: Boolean = true

mount command (dbutils.fs.mount)

mount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean

Mounts the specified source directory into DBFS at the specified mount point.

To display complete help for this command, run:

dbutils.fs.help("mount")

Example

Python

dbutils.fs.mount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
  mount_point = "/mnt/<mount-name>",
  extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})

Scala

dbutils.fs.mount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>",
  mountPoint = "/mnt/<mount-name>",
  extraConfigs = Map("<conf-key>" -> dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")))

For additional code examples, see Connect to Azure Data Lake Storage Gen2 and Blob Storage.

mounts command (dbutils.fs.mounts)

mounts: Seq

Displays information about what is currently mounted within DBFS.

To display complete help for this command, run:

dbutils.fs.help("mounts")

Example

Warning

Call dbutils.fs.refreshMounts() on all other running clusters to propagate the new mount. See refreshMounts command (dbutils.fs.refreshMounts).

Python

dbutils.fs.mounts()

Scala

dbutils.fs.mounts()

For additional code examples, see Connect to Azure Data Lake Storage Gen2 and Blob Storage.

mv command (dbutils.fs.mv)

mv(from: String, to: String, recurse: boolean = false): boolean

Moves a file or directory, possibly across filesystems. A move is a copy followed by a delete, even for moves within filesystems.

To display complete help for this command, run:

dbutils.fs.help("mv")

Example

This example moves the file rows.csv from /Volumes/main/default/my-volume/ to /Volumes/main/default/my-volume/my-data/.

Python

dbutils.fs.mv("/Volumes/main/default/my-volume/rows.csv", "/Volumes/main/default/my-volume/my-data/")

# Out[2]: True

R

dbutils.fs.mv("/Volumes/main/default/my-volume/rows.csv", "/Volumes/main/default/my-volume/my-data/")

# [1] TRUE

Scala

dbutils.fs.mv("/Volumes/main/default/my-volume/rows.csv", "/Volumes/main/default/my-volume/my-data/")

// res1: Boolean = true

put command (dbutils.fs.put)

put(file: String, contents: String, overwrite: boolean = false): boolean

Writes the specified string to a file. The string is UTF-8 encoded.

To display complete help for this command, run:

dbutils.fs.help("put")

Example

This example writes the string Hello, Databricks! to a file named hello.txt in /Volumes/main/default/my-volume/. If the file exists, it will be overwritten.

Python

dbutils.fs.put("/Volumes/main/default/my-volume/hello.txt", "Hello, Databricks!", True)

# Wrote 2258987 bytes.
# Out[6]: True

R

dbutils.fs.put("/Volumes/main/default/my-volume/hello.txt", "Hello, Databricks!", TRUE)

# [1] TRUE

Scala

dbutils.fs.put("/Volumes/main/default/my-volume/hello.txt", "Hello, Databricks!", true)

// Wrote 2258987 bytes.
// res2: Boolean = true

refreshMounts command (dbutils.fs.refreshMounts)

refreshMounts: boolean

Forces all machines in the cluster to refresh their mount cache, ensuring they receive the most recent information.

To display complete help for this command, run:

dbutils.fs.help("refreshMounts")

Example

Python

dbutils.fs.refreshMounts()

Scala

dbutils.fs.refreshMounts()

For additiional code examples, see Connect to Azure Data Lake Storage Gen2 and Blob Storage.

rm command (dbutils.fs.rm)

rm(dir: String, recurse: boolean = false): boolean

Removes a file or directory and, optionally, all of its contents. If a file is specified, the recurse parameter is ignored. If a directory is specified, an error occurs when recurse is disabled and the directory is not empty.

To display complete help for this command, run:

dbutils.fs.help("rm")

Example

This example removes the entire directory /Volumes/main/default/my-volume/my-data/ including its contents.

Python

dbutils.fs.rm("/Volumes/main/default/my-volume/my-data/", True)

# Out[8]: True

R

dbutils.fs.rm("/Volumes/main/default/my-volume/my-data/", TRUE)

# [1] TRUE

Scala

dbutils.fs.rm("/Volumes/main/default/my-volume/my-data/", true)

// res6: Boolean = true

unmount command (dbutils.fs.unmount)

unmount(mountPoint: String): boolean

Deletes a DBFS mount point.

Warning

To avoid errors, never modify a mount point while other jobs are reading or writing to it. After modifying a mount, always run dbutils.fs.refreshMounts() on all other running clusters to propagate any mount updates. See refreshMounts command (dbutils.fs.refreshMounts).

To display complete help for this command, run:

dbutils.fs.help("unmount")

Example

dbutils.fs.unmount("/mnt/<mount-name>")

For additional code examples, see Connect to Azure Data Lake Storage Gen2 and Blob Storage.

updateMount command (dbutils.fs.updateMount)

updateMount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean

Similar to the dbutils.fs.mount command, but updates an existing mount point instead of creating a new one. Returns an error if the mount point is not present.

Warning

This command is available in Databricks Runtime 10.4 LTS and above.

To display complete help for this command, run:

dbutils.fs.help("updateMount")

Example

Python

dbutils.fs.updateMount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
  mount_point = "/mnt/<mount-name>",
  extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})

Scala

dbutils.fs.updateMount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>",
  mountPoint = "/mnt/<mount-name>",
  extraConfigs = Map("<conf-key>" -> dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")))

Jobs utility (dbutils.jobs)

Provides utilities for leveraging jobs features.

Note

This utility is available only for Python.

The following table lists the available modules for this utility, which you can retrieve using dbutils.jobs.help().

Submodule	Description
taskValues	Provides utilities for leveraging job task values

taskValues subutility (dbutils.jobs.taskValues)

Note

This subutility is available only for Python.

Provides commands for leveraging job task values.

Use this sub-utility to set and get arbitrary values during a job run. These values are called task values. Any task can get values set by upstream tasks and set values for downstream tasks to use.

Each task value has a unique key within the same task. This unique key is known as the task value’s key. A task value is accessed with the task name and the task value’s key. You can use this to pass information downstream from task to task within the same job run. For example, you can pass identifiers or metrics, such as information about the evaluation of a machine learning model, between different tasks within a job run.

The following table lists available commands for this subutility, which you can retrieve using dbutils.jobs.taskValues.help().

Command	Description
get	Gets the contents of the specified task value for the specified task in the current job run.
set	Sets or updates a task value. You can set up to 250 task values for a job run.

get command (dbutils.jobs.taskValues.get)

Note

This command is available only for Python.

On Databricks Runtime 10.4 and earlier, if get cannot find the task, a Py4JJavaError is raised instead of a ValueError.

get(taskKey: String, key: String, default: int, debugValue: int): Seq

Gets the contents of the specified task value for the specified task in the current job run.

To display complete help for this command, run:

dbutils.jobs.taskValues.help("get")

Example

For example:

dbutils.jobs.taskValues.get(taskKey    = "my-task", \
                            key        = "my-key", \
                            default    = 7, \
                            debugValue = 42)

In the preceding example:

taskKey is the name of the task that sets the task value. If the command cannot find this task, a ValueError is raised.
key is the name of the task value’s key that you set with the set command (dbutils.jobs.taskValues.set). If the command cannot find this task value’s key, a ValueError is raised (unless default is specified).
default is an optional value that is returned if key cannot be found. default cannot be None.
debugValue is an optional value that is returned if you try to get the task value from within a notebook that is running outside of a job. This can be useful during debugging when you want to run your notebook manually and return some value instead of raising a TypeError by default. debugValue cannot be None.

If you try to get a task value from within a notebook that is running outside of a job, this command raises a TypeError by default. However, if the debugValue argument is specified in the command, the value of debugValue is returned instead of raising a TypeError.

set command (dbutils.jobs.taskValues.set)

Note

This command is available only for Python.

set(key: String, value: String): boolean

Sets or updates a task value. You can set up to 250 task values for a job run.

To display complete help for this command, run:

dbutils.jobs.taskValues.help("set")

Example

Some examples include:

dbutils.jobs.taskValues.set(key   = "my-key", \
                            value = 5)

dbutils.jobs.taskValues.set(key   = "my-other-key", \
                            value = "my other value")

In the preceding examples:

key is the task value’s key. This key must be unique to the task. That is, if two different tasks each set a task value with key K, these are two different task values that have the same key K.
value is the value for this task value’s key. This command must be able to represent the value internally in JSON format. The size of the JSON representation of the value cannot exceed 48 KiB.

If you try to set a task value from within a notebook that is running outside of a job, this command does nothing.

Library utility (dbutils.library)

Most methods in the dbutils.library submodule are deprecated. See Library utility (dbutils.library) (legacy).

You might need to programmatically restart the Python process on Azure Databricks to ensure that locally installed or upgraded libraries function correctly in the Python kernel for your current SparkSession. To do this, run the dbutils.library.restartPython command. See Restart the Python process on Azure Databricks.

Notebook utility (dbutils.notebook)

The notebook utility allows you to chain together notebooks and act on their results. See Orchestrate notebooks and modularize code in notebooks.

The following table lists the available commands for this utility, which you can retrieve using dbutils.notebook.help().

Command	Description
exit	Exits a notebook with a value
run	Runs a notebook and returns its exit value

exit command (dbutils.notebook.exit)

exit(value: String): void

Exits a notebook with a value.

To display complete help for this command, run:

dbutils.notebook.help("exit")

Example

This example exits the notebook with the value Exiting from My Other Notebook.

Python

dbutils.notebook.exit("Exiting from My Other Notebook")

# Notebook exited: Exiting from My Other Notebook

R

dbutils.notebook.exit("Exiting from My Other Notebook")

# Notebook exited: Exiting from My Other Notebook

Scala

dbutils.notebook.exit("Exiting from My Other Notebook")

// Notebook exited: Exiting from My Other Notebook

Note

If the run has a query with structured streaming running in the background, calling dbutils.notebook.exit() does not terminate the run. The run will continue to execute for as long as the query is executing in the background. You can stop the query running in the background by clicking Cancel in the cell of the query or by running query.stop(). When the query stops, you can terminate the run with dbutils.notebook.exit().

run command (dbutils.notebook.run)

run(path: String, timeoutSeconds: int, arguments: Map): String

Runs a notebook and returns its exit value. The notebook will run in the current cluster by default.

Note

The maximum length of the string value returned from the run command is 5 MB. See Get the output for a single run (GET /jobs/runs/get-output).

To display complete help for this command, run:

dbutils.notebook.help("run")

Example

This example runs a notebook named My Other Notebook in the same location as the calling notebook. The called notebook ends with the line of code dbutils.notebook.exit("Exiting from My Other Notebook"). If the called notebook does not finish running within 60 seconds, an exception is thrown.

Python

dbutils.notebook.run("My Other Notebook", 60)

# Out[14]: 'Exiting from My Other Notebook'

Scala

dbutils.notebook.run("My Other Notebook", 60)

// res2: String = Exiting from My Other Notebook

Secrets utility (dbutils.secrets)

The secrets utility allows you to store and access sensitive credential information without making them visible in notebooks. See Secret management and Step 3: Use the secrets in a notebook.

The following table lists the available commands for this utility, which you can retrieve using dbutils.secrets.help().

Command	Description
get	Gets the string representation of a secret value with scope and key
getBytes	Gets the bytes representation of a secret value with scope and key
list	Lists secret metadata for secrets within a scope
listScopes	Lists secret scopes

get command (dbutils.secrets.get)

get(scope: String, key: String): String

Gets the string representation of a secret value for the specified secrets scope and key.

Warning

Administrators, secret creators, and users granted permission can read Azure Databricks secrets. While Azure Databricks makes an effort to redact secret values that might be displayed in notebooks, it is not possible to prevent such users from reading secrets. For more information, see Secret redaction.

To display complete help for this command, run:

dbutils.secrets.help("get")

Example

This example gets the string representation of the secret value for the scope named my-scope and the key named my-key.

Python

dbutils.secrets.get(scope="my-scope", key="my-key")

# Out[14]: '[REDACTED]'

R

dbutils.secrets.get(scope="my-scope", key="my-key")

# [1] "[REDACTED]"

Scala

dbutils.secrets.get(scope="my-scope", key="my-key")

// res0: String = [REDACTED]

getBytes command (dbutils.secrets.getBytes)

getBytes(scope: String, key: String): byte[]

Gets the bytes representation of a secret value for the specified scope and key.

To display complete help for this command, run:

dbutils.secrets.help("getBytes")

Example

This example gets the byte representation of the secret value (in this example, a1!b2@c3#) for the scope named my-scope and the key named my-key.

Python

dbutils.secrets.getBytes(scope="my-scope", key="my-key")

# Out[1]: b'a1!b2@c3#'

R

dbutils.secrets.getBytes(scope="my-scope", key="my-key")

# [1] 61 31 21 62 32 40 63 33 23

Scala

dbutils.secrets.getBytes(scope="my-scope", key="my-key")

// res1: Array[Byte] = Array(97, 49, 33, 98, 50, 64, 99, 51, 35)

list command (dbutils.secrets.list)

list(scope: String): Seq

Lists the metadata for secrets within the specified scope.

To display complete help for this command, run:

dbutils.secrets.help("list")

Example

This example lists the metadata for secrets within the scope named my-scope.

Python

dbutils.secrets.list("my-scope")

# Out[10]: [SecretMetadata(key='my-key')]

R

dbutils.secrets.list("my-scope")

# [[1]]
# [[1]]$key
# [1] "my-key"

Scala

dbutils.secrets.list("my-scope")

// res2: Seq[com.databricks.dbutils_v1.SecretMetadata] = ArrayBuffer(SecretMetadata(my-key))

listScopes command (dbutils.secrets.listScopes)

listScopes: Seq

Lists the available scopes.

To display complete help for this command, run:

dbutils.secrets.help("listScopes")

Example

This example lists the available scopes.

Python

dbutils.secrets.listScopes()

# Out[14]: [SecretScope(name='my-scope')]

R

dbutils.secrets.listScopes()

# [[1]]
# [[1]]$name
# [1] "my-scope"

Scala

dbutils.secrets.listScopes()

// res3: Seq[com.databricks.dbutils_v1.SecretScope] = ArrayBuffer(SecretScope(my-scope))

Widgets utility (dbutils.widgets)

The widgets utility allows you to parameterize notebooks. See Databricks widgets.

The following table lists the available commands for this utility, which you can retrieve using dbutils.widgets.help().

Command	Description
combobox	Creates a combobox input widget with a given name, default value, and choices
dropdown	Creates a dropdown input widget a with given name, default value, and choices
get	Retrieves current value of an input widget
getAll	Retrieves a map of all widget names and their values
getArgument	Deprecated. Equivalent to get
multiselect	Creates a multiselect input widget with a given name, default value, and choices
remove	Removes an input widget from the notebook
removeAll	Removes all widgets in the notebook
text	Creates a text input widget with a given name and default value

combobox command (dbutils.widgets.combobox)

combobox(name: String, defaultValue: String, choices: Seq, label: String): void

Creates and displays a combobox widget with the specified programmatic name, default value, choices, and optional label.

To display complete help for this command, run:

dbutils.widgets.help("combobox")

Example

This example creates and displays a combobox widget with the programmatic name fruits_combobox. It offers the choices apple, banana, coconut, and dragon fruit and is set to the initial value of banana. This combobox widget has an accompanying label Fruits. This example ends by printing the initial value of the combobox widget, banana.

Python

dbutils.widgets.combobox(
  name='fruits_combobox',
  defaultValue='banana',
  choices=['apple', 'banana', 'coconut', 'dragon fruit'],
  label='Fruits'
)

print(dbutils.widgets.get("fruits_combobox"))

# banana

R

dbutils.widgets.combobox(
  name='fruits_combobox',
  defaultValue='banana',
  choices=list('apple', 'banana', 'coconut', 'dragon fruit'),
  label='Fruits'
)

print(dbutils.widgets.get("fruits_combobox"))

# [1] "banana"

Scala

dbutils.widgets.combobox(
  "fruits_combobox",
  "banana",
  Array("apple", "banana", "coconut", "dragon fruit"),
  "Fruits"
)

print(dbutils.widgets.get("fruits_combobox"))

// banana

SQL

CREATE WIDGET COMBOBOX fruits_combobox DEFAULT "banana" CHOICES SELECT * FROM (VALUES ("apple"), ("banana"), ("coconut"), ("dragon fruit"))

SELECT :fruits_combobox

-- banana

dropdown(name: String, defaultValue: String, choices: Seq, label: String): void

Creates and displays a dropdown widget with the specified programmatic name, default value, choices, and optional label.

To display complete help for this command, run:

dbutils.widgets.help("dropdown")

Example

This example creates and displays a dropdown widget with the programmatic name toys_dropdown. It offers the choices alphabet blocks, basketball, cape, and doll and is set to the initial value of basketball. This dropdown widget has an accompanying label Toys. This example ends by printing the initial value of the dropdown widget, basketball.

Python

dbutils.widgets.dropdown(
  name='toys_dropdown',
  defaultValue='basketball',
  choices=['alphabet blocks', 'basketball', 'cape', 'doll'],
  label='Toys'
)

print(dbutils.widgets.get("toys_dropdown"))

# basketball

R

dbutils.widgets.dropdown(
  name='toys_dropdown',
  defaultValue='basketball',
  choices=list('alphabet blocks', 'basketball', 'cape', 'doll'),
  label='Toys'
)

print(dbutils.widgets.get("toys_dropdown"))

# [1] "basketball"

Scala

dbutils.widgets.dropdown(
  "toys_dropdown",
  "basketball",
  Array("alphabet blocks", "basketball", "cape", "doll"),
  "Toys"
)

print(dbutils.widgets.get("toys_dropdown"))

// basketball

SQL

CREATE WIDGET DROPDOWN toys_dropdown DEFAULT "basketball" CHOICES SELECT * FROM (VALUES ("alphabet blocks"), ("basketball"), ("cape"), ("doll"))

SELECT :toys_dropdown

-- basketball

get command (dbutils.widgets.get)

get(name: String): String

Gets the current value of the widget with the specified programmatic name. This programmatic name can be either:

The name of a custom widget in the notebook, for example, fruits_combobox or toys_dropdown.
The name of a custom parameter passed to the notebook as part of a notebook task, for example name or age. For more information, see the coverage of parameters for notebook tasks in the jobs UI or the notebook_params field in the Trigger a new job run (POST /jobs/run-now) operation in the Jobs API.

To display complete help for this command, run:

dbutils.widgets.help("get")

Example

This example gets the value of the widget that has the programmatic name fruits_combobox.

Python

dbutils.widgets.get('fruits_combobox')

# banana

R

dbutils.widgets.get('fruits_combobox')

# [1] "banana"

Scala

dbutils.widgets.get("fruits_combobox")

// res6: String = banana

SQL

SELECT :fruits_combobox

-- banana

This example gets the value of the notebook task parameter that has the programmatic name age. This parameter was set to 35 when the related notebook task was run.

Python

dbutils.widgets.get('age')

# 35

R

dbutils.widgets.get('age')

# [1] "35"

Scala

dbutils.widgets.get("age")

// res6: String = 35

SQL

SELECT :age

-- 35

getAll command (dbutils.widgets.getAll)

getAll: map

Gets a mapping of all current widget names and values. This can be especially useful to quickly pass widget values to a spark.sql() query.

This command is available in Databricks Runtime 13.3 LTS and above. It is only available for Python and Scala.

To display complete help for this command, run:

dbutils.widgets.help("getAll")

Example

This example gets the map of widget values and passes it as parameter arguments in a Spark SQL query.

Python

df = spark.sql("SELECT * FROM table where col1 = :param", dbutils.widgets.getAll())
df.show()

# Query output

Scala

val df = spark.sql("SELECT * FROM table where col1 = :param", dbutils.widgets.getAll())
df.show()

// res6: Query output

getArgument command (dbutils.widgets.getArgument)

getArgument(name: String, optional: String): String

Gets the current value of the widget with the specified programmatic name. If the widget does not exist, an optional message can be returned.

Note

This command is deprecated. Use dbutils.widgets.get instead.

To display complete help for this command, run:

dbutils.widgets.help("getArgument")

Example

This example gets the value of the widget that has the programmatic name fruits_combobox. If this widget does not exist, the message Error: Cannot find fruits combobox is returned.

Python

dbutils.widgets.getArgument('fruits_combobox', 'Error: Cannot find fruits combobox')

# Deprecation warning: Use dbutils.widgets.text() or dbutils.widgets.dropdown() to create a widget and dbutils.widgets.get() to get its bound value.
# Out[3]: 'banana'

R

dbutils.widgets.getArgument('fruits_combobox', 'Error: Cannot find fruits combobox')

# Deprecation warning: Use dbutils.widgets.text() or dbutils.widgets.dropdown() to create a widget and dbutils.widgets.get() to get its bound value.
# [1] "banana"

Scala

dbutils.widgets.getArgument("fruits_combobox", "Error: Cannot find fruits combobox")

// command-1234567890123456:1: warning: method getArgument in trait WidgetsUtils is deprecated: Use dbutils.widgets.text() or dbutils.widgets.dropdown() to create a widget and dbutils.widgets.get() to get its bound value.
// dbutils.widgets.getArgument("fruits_combobox", "Error: Cannot find fruits combobox")
//                 ^
// res7: String = banana

multiselect command (dbutils.widgets.multiselect)

multiselect(name: String, defaultValue: String, choices: Seq, label: String): void

Creates and displays a multiselect widget with the specified programmatic name, default value, choices, and optional label.

To display complete help for this command, run:

dbutils.widgets.help("multiselect")

Example

This example creates and displays a multiselect widget with the programmatic name days_multiselect. It offers the choices Monday through Sunday and is set to the initial value of Tuesday. This multiselect widget has an accompanying label Days of the Week. This example ends by printing the initial value of the multiselect widget, Tuesday.

Python

dbutils.widgets.multiselect(
  name='days_multiselect',
  defaultValue='Tuesday',
  choices=['Monday', 'Tuesday', 'Wednesday', 'Thursday',
    'Friday', 'Saturday', 'Sunday'],
  label='Days of the Week'
)

print(dbutils.widgets.get("days_multiselect"))

# Tuesday

R

dbutils.widgets.multiselect(
  name='days_multiselect',
  defaultValue='Tuesday',
  choices=list('Monday', 'Tuesday', 'Wednesday', 'Thursday',
    'Friday', 'Saturday', 'Sunday'),
  label='Days of the Week'
)

print(dbutils.widgets.get("days_multiselect"))

# [1] "Tuesday"

Scala

dbutils.widgets.multiselect(
  "days_multiselect",
  "Tuesday",
  Array("Monday", "Tuesday", "Wednesday", "Thursday",
    "Friday", "Saturday", "Sunday"),
  "Days of the Week"
)

print(dbutils.widgets.get("days_multiselect"))

// Tuesday

SQL

CREATE WIDGET MULTISELECT days_multiselect DEFAULT "Tuesday" CHOICES SELECT * FROM (VALUES ("Monday"), ("Tuesday"), ("Wednesday"), ("Thursday"), ("Friday"), ("Saturday"), ("Sunday"))

SELECT :days_multiselect

-- Tuesday

remove command (dbutils.widgets.remove)

remove(name: String): void

Removes the widget with the specified programmatic name.

To display complete help for this command, run:

dbutils.widgets.help("remove")

Important

If you add a command to remove a widget, you cannot add a subsequent command to create a widget in the same cell. You must create the widget in another cell.

Example

This example removes the widget with the programmatic name fruits_combobox.

Python

dbutils.widgets.remove('fruits_combobox')

R

dbutils.widgets.remove('fruits_combobox')

Scala

dbutils.widgets.remove("fruits_combobox")

SQL

REMOVE WIDGET fruits_combobox

removeAll command (dbutils.widgets.removeAll)

removeAll: void

Removes all widgets from the notebook.

To display complete help for this command, run:

dbutils.widgets.help("removeAll")

Important

If you add a command to remove all widgets, you cannot add a subsequent command to create any widgets in the same cell. You must create the widgets in another cell.

Example

This example removes all widgets from the notebook.

Python

dbutils.widgets.removeAll()

R

dbutils.widgets.removeAll()

Scala

dbutils.widgets.removeAll()

text command (dbutils.widgets.text)

text(name: String, defaultValue: String, label: String): void

Creates and displays a text widget with the specified programmatic name, default value, and optional label.

To display complete help for this command, run:

dbutils.widgets.help("text")

Example

This example creates and displays a text widget with the programmatic name your_name_text. It is set to the initial value of Enter your name. This text widget has an accompanying label Your name. This example ends by printing the initial value of the text widget, Enter your name.

Python

dbutils.widgets.text(
  name='your_name_text',
  defaultValue='Enter your name',
  label='Your name'
)

print(dbutils.widgets.get("your_name_text"))

# Enter your name

R

dbutils.widgets.text(
  name='your_name_text',
  defaultValue='Enter your name',
  label='Your name'
)

print(dbutils.widgets.get("your_name_text"))

# [1] "Enter your name"

Scala

dbutils.widgets.text(
  "your_name_text",
  "Enter your name",
  "Your name"
)

print(dbutils.widgets.get("your_name_text"))

// Enter your name

SQL

CREATE WIDGET TEXT your_name_text DEFAULT "Enter your name"

SELECT :your_name_text

-- Enter your name

Databricks Utilities API library

Important

The Databricks Utilities API (dbutils-api) library is deprecated. Databricks recommends that you use one of the following instead:

To accelerate application development, it can be helpful to compile, build, and test applications before you deploy them as production jobs. To enable you to compile against Databricks Utilities, Databricks provides the dbutils-api library. You can download the dbutils-api library from the DBUtils API webpage on the Maven Repository website or include the library by adding a dependency to your build file:

SBT

libraryDependencies += "com.databricks" % "dbutils-api_TARGET" % "VERSION"

Maven

<dependency>
    <groupId>com.databricks</groupId>
    <artifactId>dbutils-api_TARGET</artifactId>
    <version>VERSION</version>
</dependency>

Gradle

compile 'com.databricks:dbutils-api_TARGET:VERSION'

Replace TARGET with the desired target (for example, 2.12) and VERSION with the desired version (for example, 0.0.5). For a list of available targets and versions, see the DBUtils API webpage on the Maven Repository website.

Once you build your application against this library, you can deploy the application.

Important

The dbutils-api library only allows you to locally compile an application that uses dbutils, not to run it. To run the application, you must deploy it in Azure Databricks.

Limitations

Calling dbutils inside of executors can produce unexpected results or errors.

If you need to run file system operations on executors using dbutils, refer to Parallelize filesystem operations.

For information about executors, see Cluster Mode Overview on the Apache Spark website.

Κοινή χρήση μέσω

Databricks Utilities (dbutils) reference

Utility modules

Command help

Data utility (dbutils.data)

summarize command (dbutils.data.summarize)

Example

Python

R

Scala

File system utility (dbutils.fs)

cp command (dbutils.fs.cp)

Example

Python

R

Scala

head command (dbutils.fs.head)

Example

Python

R

Scala

ls command (dbutils.fs.ls)

Example

Python

R

Scala

mkdirs command (dbutils.fs.mkdirs)

Example

Python

R

Scala

mount command (dbutils.fs.mount)

Example

Python

Scala

mounts command (dbutils.fs.mounts)

Example

Python

Scala

mv command (dbutils.fs.mv)

Example

Python

R

Scala

put command (dbutils.fs.put)

Example

Python

R

Scala

refreshMounts command (dbutils.fs.refreshMounts)

Example

Python

Scala

rm command (dbutils.fs.rm)

Example

Python

R

Scala

unmount command (dbutils.fs.unmount)

Example

updateMount command (dbutils.fs.updateMount)

Example

Python

Scala

Jobs utility (dbutils.jobs)

taskValues subutility (dbutils.jobs.taskValues)

get command (dbutils.jobs.taskValues.get)

Example

set command (dbutils.jobs.taskValues.set)

Example

Library utility (dbutils.library)

Notebook utility (dbutils.notebook)

exit command (dbutils.notebook.exit)

Example

Python

R

Scala

run command (dbutils.notebook.run)

Example

Python

Databricks Utilities (`dbutils`) reference