Auto Document Splitting Prebuilt Bank Statements Extractor

Question

I am currently using the prebuilt bank statement data extractor from AI Document Intelligence. However, my PDF can often contain multiple bank statements. From my understanding the default split is "none" so it always returns only one statement. How do I change this to "auto". In order to break them up. This is the current snippet of my code.

	from azure.core.credentials import AzureKeyCredential
    from azure.ai.documentintelligence import DocumentIntelligenceClient
    from azure.ai.documentintelligence.models import SplitMode
    
    poller = document_intelligence_client.begin_analyze_document(

        model_id="prebuilt-bankStatement.us", body=file_bytes
    )
    bankstatements = poller.result()

Answer

Hi Pile, Joshua,
Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

To enable auto document splitting for your bank statements, you need to set the split_mode parameter to SplitMode.AUTO in your begin_analyze_document method.

poller = document_intelligence_client.begin_analyze_document(
    model_id="prebuilt-bankStatement.us",
    body=file_bytes,
    split_mode=SplitMode.AUTO
)

This should enable the automatic splitting of your PDF into multiple bank statements.
For more information please follow: Document splitting.

Hope this helps. Do let us know if you have any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Answer

Hello Pile, Joshua,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you are having issue with Auto Document Splitting Prebuilt Bank Statements Extractor in Azure AI Document Intelligence.

I will provide you here the code to resolves the TypeError and ensures proper splitting of multi-statement PDFs.

Start by making sure you are using the latest version of azure-ai-documentintelligence by running bash command to update: pip install --upgrade azure-ai-documentintelligence

Secondly, in the code below I corrected the previous version and ensure that the split_mode is included in the request body (an AnalyzeDocumentRequest object).

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest, SplitMode
# Initialize the client
document_intelligence_client = DocumentIntelligenceClient(
    endpoint="your_endpoint",
    credential=AzureKeyCredential("your_key")
)
# Create a request with the file bytes and split mode
request = AnalyzeDocumentRequest(
    base64_source=file_bytes,  # Ensure file_bytes is properly encoded
    split_mode=SplitMode.AUTO
)
# Analyze the document
poller = document_intelligence_client.begin_analyze_document(
    model_id="prebuilt-bankStatement.us",
    analyze_request=request  # Pass the request object here
)
# Get results (will return multiple documents if split)
bank_statements = poller.result()

NOTE THAT:

Use AnalyzeDocumentRequest to encapsulate parameters like split_mode.
Ensure file_bytes is a base64-encoded string (use base64.b64encode(file_bytes).decode('utf-8') if needed).
The newer SDK (azure-ai-documentintelligence) uses base64_source or url_source in the request body.

To read more, use the following links:

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

Auto Document Splitting Prebuilt Bank Statements Extractor

2 answers

Your answer