Speech Recognition customize engine question on Windows

clark-zh 40

Hi MSFT member,

On windows, when developing a Speech recognition app, there are about three solution APIs:

Since We want to use local offline solution, Azure speech service is not included.

Q1: Does OEM can customize SR engine？and how to make a custom SR engine with ASR AI model? Do you a guide document to make a SR engine? Is there any restriction?

Q2: If we can custom the ASR AI model, can we config where the model is running, such as CPU/GPU/NPU/eNPU?

Q3: For Windows.Media.SpeechRecognition,System.Speech.Recognition and Microsoft Speech API (SAPI), where are their default model running? CPU, GPU or NPU?

Q4: Are there any permission restrictions on the use of these APIs?

thank you very much

1 answer

Jiale Xue - MSFT 49,051 Reputation points Microsoft Vendor

2025-02-26T06:22:52.09+00:00
Hi @clark-zh , Welcome to Microsoft Q&A,

It seems that you have already asked for opinions from UWP.

I will answer what I can from the perspective of C#.

For Windows.Media.SpeechRecognition,System.Speech.Recognition and Microsoft Speech API (SAPI), where are their default model running? CPU, GPU or NPU?

They all default to CPU. Windows.Media.SpeechRecognition is optimized for UWP apps and may use NPU on newer devices.

System.Speech.Recognition is based on the traditional speech engine of .NET and only uses CPU. Microsoft Speech API (SAPI) is a legacy API and does not support GPU/NPU.

Are there any permission restrictions on the use of these APIs?

Windows.Media.SpeechRecognition (UWP):

Requires Microphone and voice permissions ("Capabilities" in the manifest).

Speech recognition requires User consent.

Some features are regionally restricted (e.g. some languages may not work offline)

System.Speech.Recognition (.NET Framework): No strict permission requirements, but **limited to installed system voices.

Microsoft Speech API (SAPI):

No explicit permission requirements, but requires installation of SAPI-compatible voices.

Some voices have license restrictions

For more details, it is recommended that you ask more professional personnel.

Best Regards,

Jiale

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.
Please sign in to rate this answer.
clark-zh 40 Reputation points

2025-02-26T07:44:04.9033333+00:00

Hi Jiale,

Thank you very much for your reply.

can you answer question 1 ?

Q1: Does OEM can customize SR engine？and how to make a custom SR engine with ASR AI model? Do you a guide document to make a SR engine?

and, if we can run SR engine or ASR model on GPU/NPU, what solution do you recommend?

Best Regards

Jiale Xue - MSFT 49,051 Reputation points Microsoft Vendor

2025-02-26T07:56:01.0166667+00:00

Microsoft does not provide public APIs (such as Windows.Media.SpeechRecognition, System.Speech.Recognition, or SAPI) for modifying built-in speech engines. You can use machine learning frameworks to develop custom SR engines, which is not supported by this forum's C#. You can add ".NET Machine learning", and perhaps a relevant member can help you. And Windows does not allow replacing the built-in SR engines of Windows.Media.SpeechRecognition or System.Speech.Recognition. Training models requires the use of open or proprietary datasets.

clark-zh 40 Reputation points

2025-02-26T08:11:06.7933333+00:00

can windows laptop OEM develop custom SR engine? and then put a engine inside windows kernel or driver ?

Jiale Xue - MSFT 49,051 Reputation points Microsoft Vendor

2025-02-26T08:19:36.2966667+00:00

In theory, OEMs can develop custom speech recognition (SR) engines, but Windows does not allow direct replacement or modification of the system's built-in SR engine. There is currently no further support for this in the forum.

clark-zh 40 Reputation points

2025-02-26T08:40:07.9166667+00:00

From https://learn.microsoft.com/zh-cn/previous-versions/windows/desktop/ee125096(v=vs.85), it seems we can register a new custom engine? can we use this engine from engine collection?

ref: https://learn.microsoft.com/en-us/dotnet/api/system.speech.recognition.speechrecognitionengine.installedrecognizers?view=net-9.0-pp

RecognizerInfo ri in SpeechRecognitionEngine.InstalledRecognizer

Jiale Xue - MSFT 49,051 Reputation points Microsoft Vendor

2025-02-26T08:51:30.0133333+00:00

Yes, you can call it if that meets your requirements.

clark-zh 40 Reputation points

2025-02-26T11:12:02.1+00:00

Hi Jiale,

You said Windows does not allow direct replacement or modification of the system's built-in SR engine. Does it mean we can develop a sample engine, but we cannot deploy or publish our engine to user's Windows computer ？we are concern about whether we can append a customer SR engine into windows as OEM.

If we register a engine by SAPI, and then, can we use this engine by API System.Speech.Recognition ?

Registration command:

regsvr32 C:\Program Files\Microsoft Speech SDK5.0\bin\sreng.dll

B.R.

Jiale Xue - MSFT 49,051 Reputation points Microsoft Vendor

2025-02-27T02:10:16.28+00:00

OEMs can develop their own SR engines and register them with the Microsoft Speech API (SAPI) to make them available on Windows, similar to a piece of software. System.Speech.Recognition relies on SAPI, so if your SR engine has been registered with SAPI and complies with the SAPI SR Engine specification, System.Speech.Recognition can use it.

clark-zh 40 Reputation points

2025-02-27T03:12:02.68+00:00

Thanks.

How about Q1 ?

You said Windows does not allow direct replacement or modification of the system's built-in SR engine. Does it mean we can develop a sample engine, but we cannot deploy or publish our engine to user's Windows computer ？we are concern about whether we can append a customer SR engine into windows as OEM.

Jiale Xue - MSFT 49,051 Reputation points Microsoft Vendor

2025-02-27T03:15:00.4133333+00:00

OEMs can develop their own SR engines and register them with the Microsoft Speech API (SAPI) to make them available on Windows, similar to a piece of software.

clark-zh 40 Reputation points

2025-02-28T08:22:54.27+00:00

Hi Jiale,

thanks for patient reply.

There are some features like Grammar, Use Custom Pronunciation

so where does Grammar working? engine level or upper layer such as application interface level ？and which level is Custom Pronunciation working at？ any architect graph for better understanding for us？

thanks again

Jiale Xue - MSFT 49,051 Reputation points Microsoft Vendor

2025-02-28T08:28:09.04+00:00

Sorry, I only have the general idea of it, and this in-depth stuff is beyond the scope of support. If the above efforts help you, I hope you can mark it as an answer.
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

Speech Recognition customize engine question on Windows

1 answer

Your answer