How to improve line breaks in Document Intelligence's OCR output?

Yathharthha Kaushal 0 Reputation points
2025-02-08T07:23:42.6166667+00:00

Hello everyone,

I'm currently working with an image of code that I need to convert into text using OCR. The issue is that the code uses line breaks in a way that, if modified, could lead to compile/runtime errors.

Here's an example of the OCR output I am getting:

Paragraph:
public static int foo(int bar) {
==========
Paragraph:
bar++; if (bar < 10) bar = foo(bar);
==========
Paragraph:
int i = 0; int j = 0; while (i > foo(j - bar)) { j++; bar += j;
==========
Paragraph:
3
==========
Paragraph:
return bar;
==========
Paragraph:
}
==========

However, the code should look like this:

public static int foo(int bar) {
    bar++;
    if (bar < 10)
        bar = foo(bar);

    int i = 0;
    int j = 0;
    while (i > foo(j - bar)) {
        j++;
        bar += j;
    }

    return bar;
}

Here's the actual image used for the ocr:

Actual image for the ocr

Is there any way to make the Document Intelligence OCR output line breaks better, ensuring the code is correctly formatted?

Best regards,
YK

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,984 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.