Add subscript/superscript handling to Azure DocumentAI

Question

Hi,

I am using Azure DocAI layout model and I really like its high-quality results. One thing I am recently struggling with is the handling of subscript and superscript characters. I am using PDF input documents (not scneed), though I guess this doesn't matter much.

Subscript and superscript characters (e.g. a superscript ² referring to footnote "2") are not recognized and instead output as a "normal" 2 (instead of superscript ²). If the values around the superscript character are also numbers, this actually results in incorrect extracted numbers. It would be great, if susbcript and superscript characters would be recognized and rendered correctly in the output (guess this would hold for the read model as well as the layout model).

For the built-in layout model, the top-notch feature would be to recognize such references and output it as new types (e.g. FootnoteReference and Footnote) in the analysis result.

Answer

Hi Florian F,
Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

I understand the challenge you're facing with subscript and superscript characters not being recognized correctly in your PDF input documents. This can indeed lead to inaccuracies, especially when dealing with numerical data.
I encourage you to submit a feature request to enhance Azure Document Intelligence to support the recognition and handling of subscript and superscript characters. You can request features such as accurate detection of typographical positioning (e.g., subscript and superscript), along with structured output for references like FootnoteReference and Footnote. This enhancement would significantly improve the analysis of documents with specialized formatting, such as scientific or legal texts.

You can submit your idea through the Azure Feedback Forum: Post idea · Community (azure.com).

Hope this helps. Do let us know if you have any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Share via

Add subscript/superscript handling to Azure DocumentAI

1 answer

Your answer