Training a Custom Translator Model with Less Than 10,000 Sentences

Tyler Chlumecky 20 Reputation points
2025-01-06T19:45:36.96+00:00

Is there a workaround for training a custom translator model with a dataset of approximately 9.4k translated sentences? This is just below the 10k sentence threshold so I am unable to begin training a custom Model.

Azure Translator
Azure Translator
An Azure service to easily conduct machine translation with a simple REST API call.
437 questions
0 comments No comments
{count} votes

Accepted answer
  1. SriLakshmi C 2,010 Reputation points Microsoft Vendor
    2025-01-06T20:22:40.0533333+00:00

    Hello Tyler Chlumecky,

    Greetings and Welcome to Microsoft Q&A! Thanks for posting the question.

    Training a Custom Translator model typically requires at least 10,000 sentence pairs to start the process. However, there are a few workarounds you can consider:

    Dictionary-Only Model: For better results, we recommended letting the system learn from your training data. However, when you don't have enough parallel sentences to meet the 10,000 minimum requirements, or sentences and compound nouns must be rendered as-is, use dictionary-only training. Your model typically completes training faster than with full training. When to select dictionary-only training.

    Data Augmentation creates more data by rewriting sentences, splitting or combining them, using synonyms, or translating them back and forth. These methods help expand and diversify your dataset for training.

    Also refer this https://learn.microsoft.com/en-us/azure/ai-services/translator/custom-translator/beginners-guide#is-a-custom-translation-model-the-right-choice-for-me.

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    Thank you!

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.