Speech Recognition customize engine question on Windows

clark-zh 40 Reputation points
2025-02-25T12:16:55.0766667+00:00

Hi MSFT member,

On windows, when developing a Speech recognition app, there are about three solution APIs:

  1. Windows.Media.SpeechRecognition
  2. System.Speech Programming Guide for .NET Framework
  3. Microsoft Speech API (SAPI)

Since We want to use local offline solution, Azure speech service is not included.

Q1: Does OEM can customize SR engine?and how to make a custom SR engine with ASR AI model? Do you a guide document to make a SR engine? Is there any restriction?

Q2: If we can custom the ASR AI model, can we config where the model is running, such as CPU/GPU/NPU/eNPU?

Q3: For Windows.Media.SpeechRecognition,System.Speech.Recognition and Microsoft Speech API (SAPI), where are their default model running? CPU, GPU or NPU?

Q4: Are there any permission restrictions on the use of these APIs?

thank you very much

C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
11,317 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Jiale Xue - MSFT 49,051 Reputation points Microsoft Vendor
    2025-02-26T06:22:52.09+00:00

    Hi @clark-zh , Welcome to Microsoft Q&A,

    It seems that you have already asked for opinions from UWP.

    I will answer what I can from the perspective of C#.

    For Windows.Media.SpeechRecognition,System.Speech.Recognition and Microsoft Speech API (SAPI), where are their default model running? CPU, GPU or NPU?

    They all default to CPU. Windows.Media.SpeechRecognition is optimized for UWP apps and may use NPU on newer devices.

    System.Speech.Recognition is based on the traditional speech engine of .NET and only uses CPU. Microsoft Speech API (SAPI) is a legacy API and does not support GPU/NPU.

    Are there any permission restrictions on the use of these APIs?

    Windows.Media.SpeechRecognition (UWP):

    • Requires Microphone and voice permissions ("Capabilities" in the manifest).
    • Speech recognition requires User consent.
    • Some features are regionally restricted (e.g. some languages ​​may not work offline)

    System.Speech.Recognition (.NET Framework): No strict permission requirements, but **limited to installed system voices.

    Microsoft Speech API (SAPI):

    • No explicit permission requirements, but requires installation of SAPI-compatible voices.
    • Some voices have license restrictions

    For more details, it is recommended that you ask more professional personnel.

    Best Regards,

    Jiale


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment". 

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.