CACHE TABLE

Applies to: check marked yes Databricks Runtime

Caches contents of a table or output of a query with the given storage level in Apache Spark cache. If a query is cached, then a temp view is created for this query. This reduces scanning of the original files in future queries.

Syntax

CACHE [ LAZY ] TABLE table_name
  [ OPTIONS ( 'storageLevel' [ = ] value ) ] [ [ AS ] query ]

See Disk cache vs. Spark cache for the differences between disk caching and the Apache Spark cache.

Parameters

  • LAZY

    Only cache the table when it is first used, instead of immediately.

  • table_name

    Identifies the Delta table or view to cache. The name must not include a temporal specification or options specification. If the table cannot be found Azure Databricks raises a TABLE_OR_VIEW_NOT_FOUND error.

  • OPTIONS ( ‘storageLevel’ [ = ] value )

    OPTIONS clause with storageLevel key and value pair. A warning is issued when a key other than storageLevel is used. The valid options for storageLevel are:

    • NONE
      • DISK_ONLY
      • DISK_ONLY_2
      • MEMORY_ONLY
      • MEMORY_ONLY_2
      • MEMORY_ONLY_SER
      • MEMORY_ONLY_SER_2
      • MEMORY_AND_DISK
      • MEMORY_AND_DISK_2
      • MEMORY_AND_DISK_SER
      • MEMORY_AND_DISK_SER_2
      • OFF_HEAP

    An Exception is thrown when an invalid value is set for storageLevel. If storageLevel is not explicitly set using OPTIONS clause, the default storageLevel is set to MEMORY_AND_DISK.

  • query

    A query that produces the rows to be cached. It can be in one of following formats:

    • A SELECT statement
    • A TABLE statement
    • A FROM statement

Examples

> CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') SELECT * FROM testData;