Hi Eefke!
Welcome to Azure AI Q and A forum. Thank you for posting your query here.
Yes, Dutch is not supported in Text summarization. Here is the approach to handle constraints on target language, file size and word counts using Python SDK.
- A function to- Extract the text information from PDF, Json, text files using Read (OCR model).
- A function to - save them smaller text files using python SDK so that you can stay under character and file size limit mentioned in Data limit document (1 MB max for all files in a single request) while sending it for summarization.
- A function - to Convert the above smaller files "Dutch" to English using Translator in batches.
- A function - to Summarize each document in batches, generate overall summary out of those summaries to summarize the entire large document.
- Translate the summary back to Dutch if needed.
You can adjust the time to live, retry setting, sleep time, no of files in a batch to optimize the entire process.
Reference:
Sample Python SDK for read model to split into paragraphs
If this answer is helpful, please don't forget to upvote this answer and say "yes".
Thank you.