Retrieving data from Azure storage account archive tier through Azure Databricks

Question

Hi Community,

Recently we are studying the practice to perform data retention in Azure. We are using Databricks and Storage Account as data lake.

We are thinking of using Databricks to extract part of the data from delta table and write them to the archive tier file location in storage account.

After storing data in archive tier, we may need to retrieve the data someday. I would like to know could I simply read the data from the archive tier file location through Databricks and write them back to the delta table that sitting at hot tier? Do I need any extra effort?

Thanks in advance.

Answer

Hi @BrianC
Welcome to Microsoft Q&A platform and thanks for posting your query here.

Yes, you can retrieve data from the Azure Storage Account archive tier using Azure Databricks, but there are a few steps to keep in mind:

Rehydrate the Data: Before you can read the data from the archive tier, you’ll need to rehydrate it (move it to the hot or cool tier). This is because data in the archive tier is offline and cannot be accessed directly. You can do this by changing the blob access tier in the Azure portal, PowerShell, or Azure CLI.
Read the Data: Once the data is rehydrated to the hot or cool tier, you can use Databricks to read it. You can use the same methods you use to read data from other tiers, such as using the spark.read function.
Write Back to Delta Table: After reading the data, you can write it back to your Delta table in the hot tier as usual.

Keep in mind that rehydrating data from the archive tier can take several hours, depending on the size of the data. Also, there may be additional costs associated with rehydration and early deletion from the archive tier.
For more information you can refer this article: https://learn.microsoft.com/en-us/azure/databricks/connect/storage/azure-storage

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Retrieving data from Azure storage account archive tier through Azure Databricks

1 answer

Your answer