Retrieving data from Azure storage account archive tier through Azure Databricks

BrianC 60 Reputation points
2025-02-27T08:58:17.7566667+00:00

Hi Community,

Recently we are studying the practice to perform data retention in Azure. We are using Databricks and Storage Account as data lake.

We are thinking of using Databricks to extract part of the data from delta table and write them to the archive tier file location in storage account.

After storing data in archive tier, we may need to retrieve the data someday. I would like to know could I simply read the data from the archive tier file location through Databricks and write them back to the delta table that sitting at hot tier? Do I need any extra effort?

Thanks in advance.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,345 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 20,630 Reputation points Microsoft Vendor
    2025-02-27T15:05:39.43+00:00

    Hi @BrianC
    Welcome to Microsoft Q&A platform and thanks for posting your query here.

    Yes, you can retrieve data from the Azure Storage Account archive tier using Azure Databricks, but there are a few steps to keep in mind:

    1. Rehydrate the Data: Before you can read the data from the archive tier, you’ll need to rehydrate it (move it to the hot or cool tier). This is because data in the archive tier is offline and cannot be accessed directly. You can do this by changing the blob access tier in the Azure portal, PowerShell, or Azure CLI.
    2. Read the Data: Once the data is rehydrated to the hot or cool tier, you can use Databricks to read it. You can use the same methods you use to read data from other tiers, such as using the spark.read function.
    3. Write Back to Delta Table: After reading the data, you can write it back to your Delta table in the hot tier as usual.

    Keep in mind that rehydrating data from the archive tier can take several hours, depending on the size of the data. Also, there may be additional costs associated with rehydration and early deletion from the archive tier.
    For more information you can refer this article: https://learn.microsoft.com/en-us/azure/databricks/connect/storage/azure-storage

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.