Add subscript/superscript handling to Azure DocumentAI

Florian F 0 Reputation points
2025-01-22T14:37:02.3+00:00

Hi,

I am using Azure DocAI layout model and I really like its high-quality results. One thing I am recently struggling with is the handling of subscript and superscript characters. I am using PDF input documents (not scneed), though I guess this doesn't matter much.

Subscript and superscript characters (e.g. a superscript ² referring to footnote "2") are not recognized and instead output as a "normal" 2 (instead of superscript ²). If the values around the superscript character are also numbers, this actually results in incorrect extracted numbers. It would be great, if susbcript and superscript characters would be recognized and rendered correctly in the output (guess this would hold for the read model as well as the layout model).

For the built-in layout model, the top-notch feature would be to recognize such references and output it as new types (e.g. FootnoteReference and Footnote) in the analysis result.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,882 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Pavankumar Purilla 2,930 Reputation points Microsoft Vendor
    2025-01-22T21:11:53.77+00:00

    Hi Florian F,
    Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

    I understand the challenge you're facing with subscript and superscript characters not being recognized correctly in your PDF input documents. This can indeed lead to inaccuracies, especially when dealing with numerical data.
    I encourage you to submit a feature request to enhance Azure Document Intelligence to support the recognition and handling of subscript and superscript characters. You can request features such as accurate detection of typographical positioning (e.g., subscript and superscript), along with structured output for references like FootnoteReference and Footnote. This enhancement would significantly improve the analysis of documents with specialized formatting, such as scientific or legal texts.

    You can submit your idea through the Azure Feedback Forum: Post idea · Community (azure.com).

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.