Auto Document Splitting Prebuilt Bank Statements Extractor

Pile, Joshua 0 Reputation points
2025-02-07T17:16:25.8533333+00:00

I am currently using the prebuilt bank statement data extractor from AI Document Intelligence. However, my PDF can often contain multiple bank statements. From my understanding the default split is "none" so it always returns only one statement. How do I change this to "auto". In order to break them up. This is the current snippet of my code.

	from azure.core.credentials import AzureKeyCredential
    from azure.ai.documentintelligence import DocumentIntelligenceClient
    from azure.ai.documentintelligence.models import SplitMode
    
    poller = document_intelligence_client.begin_analyze_document(

        model_id="prebuilt-bankStatement.us", body=file_bytes
    )
    bankstatements = poller.result()
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,912 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Pavankumar Purilla 3,235 Reputation points Microsoft Vendor
    2025-02-07T18:52:07.6766667+00:00

    Hi Pile, Joshua,
    Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

    To enable auto document splitting for your bank statements, you need to set the split_mode parameter to SplitMode.AUTO in your begin_analyze_document method.

    poller = document_intelligence_client.begin_analyze_document(
        model_id="prebuilt-bankStatement.us",
        body=file_bytes,
        split_mode=SplitMode.AUTO
    )
    
    

    This should enable the automatic splitting of your PDF into multiple bank statements.
    For more information please follow: Document splitting.

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.


  2. Sina Salam 17,336 Reputation points
    2025-02-10T10:36:27.78+00:00

    Hello Pile, Joshua,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having issue with Auto Document Splitting Prebuilt Bank Statements Extractor in Azure AI Document Intelligence.

    I will provide you here the code to resolves the TypeError and ensures proper splitting of multi-statement PDFs.

    Start by making sure you are using the latest version of azure-ai-documentintelligence by running bash command to update: pip install --upgrade azure-ai-documentintelligence

    Secondly, in the code below I corrected the previous version and ensure that the split_mode is included in the request body (an AnalyzeDocumentRequest object).

    from azure.core.credentials import AzureKeyCredential
    from azure.ai.documentintelligence import DocumentIntelligenceClient
    from azure.ai.documentintelligence.models import AnalyzeDocumentRequest, SplitMode
    # Initialize the client
    document_intelligence_client = DocumentIntelligenceClient(
        endpoint="your_endpoint",
        credential=AzureKeyCredential("your_key")
    )
    # Create a request with the file bytes and split mode
    request = AnalyzeDocumentRequest(
        base64_source=file_bytes,  # Ensure file_bytes is properly encoded
        split_mode=SplitMode.AUTO
    )
    # Analyze the document
    poller = document_intelligence_client.begin_analyze_document(
        model_id="prebuilt-bankStatement.us",
        analyze_request=request  # Pass the request object here
    )
    # Get results (will return multiple documents if split)
    bank_statements = poller.result()
    

    NOTE THAT:

    • Use AnalyzeDocumentRequest to encapsulate parameters like split_mode.
    • Ensure file_bytes is a base64-encoded string (use base64.b64encode(file_bytes).decode('utf-8') if needed).
    • The newer SDK (azure-ai-documentintelligence) uses base64_source or url_source in the request body.

    To read more, use the following links:

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.