Concerns Regarding the tables in markdown output changes in 2024-07-31-preview

Xi, Jonathan 20 Reputation points
2024-10-27T15:09:43.2366667+00:00

Hello Azure AI Document Intelligence Team,

We have some concerns regarding the tables in markdown output changes in 2024-07-31-preview release:

Starting from 2024-07-31-preview, the representation of tables is changed to HTML tables to enable rendering of merged cells, multi-row headers, etc.

A significant concern is that HTML tables require nearly twice the number of tokens compared to markdown tables based on our testing. Since we're developing GenAI apps using Azure AI Document Intelligence, this increase in token usage will reduce chunk context and might negatively impact the accuracy of our applications.

Could we explore alternative solutions for handling merged cells and multi-row headers? E.g., could we consider duplicating cells for row spans instead of converting everything to HTML? Additionally, it would be beneficial to provide options for end-users to choose tables in Markdown or HTML, so that end-users can do customization for markdown merged cells issue (if Azure can't address the issue).

Thank you for considering these suggestions.

Best regards, Jonathan

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,721 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.