Fine-tuning speech-to-text base model for better address recognition

Question

Hello, my team is creating a solution to transcribe addresses with higher accuracy. Our initial benchmarks for using a STT base model for address transcription suggests that it needs to be improved in order to be utilized in a production environment. I am wondering if Azure have base models specific to recognizing address or if there is a best way to fine-tune a base model that would have the same result.

Accepted Answer

Hello, thanks for reaching out to us, I can see three possible solutions here for your reference.

The first one is training your custom model or fine-tuning the base model, which is more related to a better speech recognition result.

Document reference for Fine-tuning - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/custom-speech-ai-foundry-portal

Document reference for Custom model - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/custom-speech-overview

The second one is combining Azure Language Service and Azure Speech Service, because Azure Language Service is good at extract the address, to extract an address from text using Azure Language Service, you would utilize the "Named Entity Recognition" (NER) feature, which identifies and classifies entities like locations (including addresses) within a text. You can play with it in the Azure Language Studio and see how it works on your scenario.

The last one I may try is Azure OpenAI, you may combine Azure Speech Service with Azure OpenAI to get a better result of address, but please also consider the price.

I hope this helps!

Regards,

Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

Share via

Fine-tuning speech-to-text base model for better address recognition

0 additional answers

Your answer