Is there a way to configure the distance of connecting digits into a single number?

Adam Mucha 10 Reputation points
2024-12-13T09:20:44.2+00:00

Hello!

We have a case with many of our invoices, where upon reading digits separated by relatively long distances, for example as a thousands separator, we receive two separate numbers (instead of 1032 we receive 1 and 032), is there a way to configure the distance between digits within Azure, so that this issue doesn't persist?

Azure Computer Vision
Azure Computer Vision
An Azure artificial intelligence service that analyzes content in images and video.
397 questions
{count} votes

Accepted answer
  1. Sina Salam 15,011 Reputation points
    2024-12-13T15:03:04.74+00:00

    Hello Adam Mucha,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you're facing an issue with digit recognition on your invoices, where digits separated by long distances are being read as separate numbers (e.g., 1 and 032 instead of 1032). Unfortunately, Azure does not have a built-in function to adjust the spacing threshold for recognizing digits as part of the same number.

    However, here are some steps you can take to mitigate this issue:

    • If not using Azure Form Recognizer, consider it as it is optimized for extracting structured data like tables and numbers. Use the prebuilt or custom model options, depending on your invoice format.
    • Develop logic to merge split digits by identifying patterns (e.g., spacing or context like thousands separators). For an example:
        def merge_digits(text):
            return text.replace(' ', '')  # Basic example to remove spaces in numbers
      
    • If using Form Recognizer’s custom model, include training data with spaced numbers to improve recognition accuracy. Use labeled data with expected outputs to guide the model on how to interpret such cases.
    • Increase scanning resolution to 300 DPI or higher. You can use preprocessing techniques like binarization to improve OCR results.
    • Explore OCR API Parameters to fine-tune recognition (if applicable). For instance, some OCR tools allow tweaking settings for character separation.
    • If none of the above resolves the issue, raise a support request to Azure for possible feature enhancements or technical guidance.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.