@Adam Kupiec You can read the content of .docx and .doc files stored in Azure Blob Storage directly in Python without downloading them locally. This can be achieved by using the Azure Storage Blob SDK for Python, which allows you to stream the content of the files directly into memory.
Here's a general approach to achieve this:
Install the Azure Storage Blob SDK: You need to install the Azure Storage Blob SDK for Python. You can do this using pip:
pip install azure-storage-blob
Authenticate and Access the Blob Storage: Use the SDK to authenticate and access the blob storage container where your files are stored. Here's an example code snippet to read the content of a .docx file:
from azure.storage.blob import BlobServiceClient
from io import BytesIO
import docx
# Replace with your connection string and container name
connection_string = "your_connection_string"
container_name = "your_container_name"
blob_name = "your_file.docx"
# Create a BlobServiceClient
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
container_client = blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(blob_name)
# Read the blob content into memory
stream = BytesIO()
blob_client.download_blob().readinto(stream)
stream.seek(0)
# Use python-docx to read the content of the .docx file
doc = docx.Document(stream)
for paragraph in doc.paragraphs:
print(paragraph.text)
Handling .doc Files: For .doc files, you can use the python-docx
library to read .docx files, but for .doc files, you might need to use the pywin32
library or other libraries like pypandoc
to convert .doc files to .docx format before reading them.
This approach allows you to read the content of the files directly from Azure Blob Storage into memory, avoiding the need to download them locally.
References
Access azure blob storage files with python without downloading
Read Big Azure Blob Storage file – Best practices with examples
Please let us know if you have any further queries. I’m happy to assist you further.
Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.