Thanks for posting your question in Microsoft Q&A.
To store pickle files as datasets in Azure, you can use Azure Blob Storage for storage and authentication via a connection string or Azure AD. Below is a sample script using a connection string for authentication:
import os
import pickle
import pandas as pd
from azure.storage.blob import BlobServiceClient
# Authenticate using connection string
AZURE_STORAGE_CONNECTION_STRING = "your_connection_string"
CONTAINER_NAME = "your-container-name"
blob_service_client = BlobServiceClient.from_connection_string(AZURE_STORAGE_CONNECTION_STRING)
container_client = blob_service_client.get_container_client(CONTAINER_NAME)
# Path to pickle files
pickle_folder = "path/to/pickle/files"
for filename in os.listdir(pickle_folder):
if filename.endswith(".pkl"):
file_path = os.path.join(pickle_folder, filename)
# Load pickle file
with open(file_path, "rb") as file:
data = pickle.load(file)
# Convert to CSV (if DataFrame)
if isinstance(data, pd.DataFrame):
csv_data = data.to_csv(index=False)
# Upload to Azure Blob
blob_client = container_client.get_blob_client(f"datasets/{filename}.csv")
blob_client.upload_blob(csv_data, overwrite=True)
print(f"Uploaded: {filename}.csv")
print("All files uploaded successfully.")
This script will read each pickle file in the specified folder and upload it to the designated Azure container. Alternatively, for Azure AD authentication, you can use DefaultAzureCredential from azure.identity Make sure you have the azure-storage-blob library installed (pip install azure-storage-blob).
References:
Do let me know if you are facing any issue.