How to summarize pdf, .json or .txt file using Azure AI?

Eefke 0 Reputation points
2025-02-04T18:15:36.1833333+00:00

For a company, I am experimenting with Azures AI playground. I want to generate a summary of a Dutch medical document containing notes over a certain period of time. Does Azure provide any facilities for this?

So far, I have tried the language playground, but the text summarization function is barely customize-able with regard to length and content and it is not available in Dutch. I also tried the chat playground, but I need to feed to document paragraph for paragraph and it keeps giving time-out errors as my prompts get too big over time.

Does anyone know how I can achieve the summarization of a large document using the AI playground? Or, if this proves to be too inconvenient / not possible, are there better applications of Azure I could use for this?

Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
442 questions
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,892 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,093 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Manas Mohanty (Quadrant Resource LLC) 135 Reputation points Microsoft Vendor
    2025-02-05T04:49:01.0966667+00:00

    Hi Eefke!

    Welcome to Azure AI Q and A forum. Thank you for posting your query here.

    Yes, Dutch is not supported in Text summarization. Here is the approach to handle constraints on target language, file size and word counts using Python SDK.

    1. A function to- Extract the text information from PDF, Json, text files using Read (OCR model).
    2. A function to - save them smaller text files using python SDK so that you can stay under character and file size limit mentioned in Data limit document (1 MB max for all files in a single request) while sending it for summarization.
    3. A function - to Convert the above smaller files "Dutch" to English using Translator in batches.
    4. A function - to Summarize each document in batches, generate overall summary out of those summaries to summarize the entire large document.
    5. Translate the summary back to Dutch if needed.

    You can adjust the time to live, retry setting, sleep time, no of files in a batch to optimize the entire process.

    Reference:

    Data limits

    Sample Python SDK for read model to split into paragraphs

    If this answer is helpful, please don't forget to upvote this answer and say "yes".

    Thank you.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.