Custom translator model is generating additional/unnecessary characters

Tyler Chlumecky 20 Reputation points
2025-01-14T20:04:38.76+00:00

I have noticed an issue where the training mechanism seems to be adding unexpected characters to the test-output/trained model, but only for one of my models.

I have two separate Custom Models that I am training via Azure Translator.

One for each language (es-MX and fr-CA)

There are 3 documents I have prepared and provided in order to train each of these models:

  1. Human translated phrases from English to [target language]
  2. Vernacular Dictionary (Domain-specific phrases for Proper nouns commonly used throughout our system)
  3. Token-based Dictionary (Phrases throughout our website use tokens for singular/plural representations of commonly used Proper nouns as well as additional use cases where we do string-interpolation on portions of phrases that we translate.
    1. For example: "There are {0} course(s) in this Curriculum" or "You have {submissionCount} submissions available for review"
      1. We do not want to translate these place holders annotated within these brackets (or similar ones), so these have been defined within said-dictionary

The Spanish model seems to be correct every time and I have no issues when training/translating tokens as needed.

The French model however, provides output like this in both the Test output and the translations I request via the model's API call.

Instead of obtaining the representation in French for these sentences:
"There are {0} course(s) in this Curriculum" or "You have {submissionCount}"

They wind up with an extra curly brace for the tokens like this:
"There are {{0} course(s) in this Curriculum" or "You have {{submissionCount}"

Please Advise

Azure Translator
Azure Translator
An Azure service to easily conduct machine translation with a simple REST API call.
437 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.