I have a set of pickle files, for which I want to write a python script to read them and store them as datasets in azure. Whats the procedure to do that?

Question

I have a set of pickle files, for which I want to write a python script to read them and store them as datasets in azure. Whats the best procedure to do that?

Accepted Answer

Hi @Chitti, Srinivasa

Thanks for posting your question in Microsoft Q&A.

To store pickle files as datasets in Azure, you can use Azure Blob Storage for storage and authentication via a connection string or Azure AD. Below is a sample script using a connection string for authentication:

import os
import pickle
import pandas as pd
from azure.storage.blob import BlobServiceClient

# Authenticate using connection string
AZURE_STORAGE_CONNECTION_STRING = "your_connection_string"
CONTAINER_NAME = "your-container-name"
blob_service_client = BlobServiceClient.from_connection_string(AZURE_STORAGE_CONNECTION_STRING)
container_client = blob_service_client.get_container_client(CONTAINER_NAME)

# Path to pickle files
pickle_folder = "path/to/pickle/files"
for filename in os.listdir(pickle_folder):
    if filename.endswith(".pkl"):
        file_path = os.path.join(pickle_folder, filename)
        # Load pickle file
        with open(file_path, "rb") as file:
            data = pickle.load(file)
        # Convert to CSV (if DataFrame)
        if isinstance(data, pd.DataFrame):
            csv_data = data.to_csv(index=False)

            # Upload to Azure Blob
            blob_client = container_client.get_blob_client(f"datasets/{filename}.csv")
            blob_client.upload_blob(csv_data, overwrite=True)
            print(f"Uploaded: {filename}.csv")

print("All files uploaded successfully.")

This script will read each pickle file in the specified folder and upload it to the designated Azure container. Alternatively, for Azure AD authentication, you can use DefaultAzureCredential from azure.identity Make sure you have the azure-storage-blob library installed (pip install azure-storage-blob).

References:

Do let me know if you are facing any issue.

Answer

Hi Vikram..

Thanks for answering.. This was really helpful. I also got clarity on the different between Azure Blob Storage and Azure Data Lake Gen2(that is the Hierarchical way of organizing).

Can I ask a follow-up question if you don't mind? Since my data is now present in Blob/DIR can you please direct me to some material on how I can create an automatic ETL pipeline to process the Data and fill a relational table with data?

Thanks a lot in advance

Phani

Share via

I have a set of pickle files, for which I want to write a python script to read them and store them as datasets in azure. Whats the procedure to do that?

1 additional answer

Your answer