Custom translator model is generating additional/unnecessary characters
I have noticed an issue where the training mechanism seems to be adding unexpected characters to the test-output/trained model, but only for one of my models.
I have two separate Custom Models that I am training via Azure Translator.
One for each language (es-MX and fr-CA)
There are 3 documents I have prepared and provided in order to train each of these models:
- Human translated phrases from English to [target language]
- Vernacular Dictionary (Domain-specific phrases for Proper nouns commonly used throughout our system)
- Token-based Dictionary (Phrases throughout our website use tokens for singular/plural representations of commonly used Proper nouns as well as additional use cases where we do string-interpolation on portions of phrases that we translate.
- For example: "There are {0} course(s) in this Curriculum" or "You have {submissionCount} submissions available for review"
- We do not want to translate these place holders annotated within these brackets (or similar ones), so these have been defined within said-dictionary
- For example: "There are {0} course(s) in this Curriculum" or "You have {submissionCount} submissions available for review"
The Spanish model seems to be correct every time and I have no issues when training/translating tokens as needed.
The French model however, provides output like this in both the Test output and the translations I request via the model's API call.
Instead of obtaining the representation in French for these sentences:
"There are {0} course(s) in this Curriculum" or "You have {submissionCount}"
They wind up with an extra curly brace for the tokens like this:
"There are {{0} course(s) in this Curriculum" or "You have {{submissionCount}"
Please Advise