Jaa


DBFS CLI (legacy)

Important

This documentation has been retired and might not be updated.

This information applies to legacy Databricks CLI versions 0.18 and below. Databricks recommends that you use newer Databricks CLI version 0.205 or above instead. See What is the Databricks CLI?. To find your version of the Databricks CLI, run databricks -v.

To migrate from Databricks CLI version 0.18 or below to Databricks CLI version 0.205 or above, see Databricks CLI migration.

You run Databricks DBFS CLI subcommands appending them to databricks fs (or the alias dbfs), prefixing all DBFS paths with dbfs:/. These subcommands call the DBFS API.

databricks fs -h
Usage: databricks fs [OPTIONS] COMMAND [ARGS]...

  Utility to interact with DBFS. DBFS paths are all prefixed
  with dbfs:/. Local paths can be absolute or local.

Options:
  -v, --version
  -h, --help     Show this message and exit.

Commands:
  cat        Shows the contents of a file. Does not work for directories.
  configure
  cp         Copies files to and from DBFS.
    Options:
      -r, --recursive
      --overwrite     Overwrites files that exist already.
  ls         Lists files in DBFS.
    Options:
      --absolute      Displays absolute paths.
      -l              Displays full information including size and file type.
  mkdirs     Makes directories in DBFS.
  mv         Moves a file between two DBFS paths.
  rm         Removes files from DBFS.
    Options:
      -r, --recursive

For operations that list, move, or delete more than 10k files, we strongly discourage using the DBFS CLI.

  • The list operation (databricks fs ls) will time out after approximately 60s.
  • The move operation (databricks fs mv) will time out after approximately 60s, potentially resulting in partially moved data.
  • The delete operation (databricks fs rm) will incrementally delete batches of files.

We recommend that you perform such operations in the context of a cluster, using File system utility (dbutils.fs). dbutils.fs covers the functional scope of the DBFS REST API, but from notebooks. Running such operations using notebooks provides better control, such as selective deletes, manageability, and the possibility to automate periodic jobs.

Limitations

Using the Databricks DBFS CLI with firewall enabled storage containers is not supported. Databricks recommends you use Databricks Connect or az storage.

List the contents of a file

To display usage documentation, run databricks fs cat --help.

databricks fs cat dbfs:/tmp/my-file.txt
Apache Spark is awesome!

Copy a file

To display usage documentation, run databricks fs cp --help.

databricks fs cp dbfs:/tmp/your_file.txt dbfs:/parent/child/grandchild/my_file.txt --overwrite

On success, this command displays nothing.

List information about files and directories

To display usage documentation, run databricks fs ls --help.

databricks fs ls dbfs:/tmp --absolute -l
file  42408084  dbfs:/tmp/LoanStats.csv    1590005159000
file        40  dbfs:/tmp/file_b.txt       1603991038000
dir          0  dbfs:/tmp/hive                         0
dir          0  dbfs:/tmp/mlflow                       0
file       385  dbfs:/tmp/multi-line.json  1597770632000
dir          0  dbfs:/tmp/new                          0
dir          0  dbfs:/tmp/parent                       0
file       243  dbfs:/tmp/test.json        1597770628000
file        40  dbfs:/tmp/test_dbfs.txt    1603989162000

Create a directory

To display usage documentation, run databricks fs mkdirs --help.

databricks fs mkdirs dbfs:/tmp/new-dir

On success, this command displays nothing.

Move a file

To display usage documentation, run databricks fs mv --help.

databricks fs mv dbfs:/tmp/my-file.txt dbfs:/parent/child/grandchild/my-file.txt

On success, this command displays nothing.

Delete a file

To display usage documentation, run databricks fs rm --help.

databricks fs rm dbfs:/tmp/parent/child/grandchild/my-file.txt
Delete finished successfully.