De Microsoft Audio Stack (MAS) gebruiken

Artikel
10/16/2024

De Speech SDK integreert Microsoft Audio Stack (MAS), zodat elke toepassing of elk product de audioverwerkingsmogelijkheden voor invoeraudio kan gebruiken. Zie de documentatie over audioverwerking voor een overzicht.

In dit artikel leert u hoe u de Microsoft Audio Stack (MAS) gebruikt met de Speech SDK.

Belangrijk

Op speech-SDK voor C++ en C# v1.33.0 en hoger moet het Microsoft.CognitiveServices.Speech.Extension.MAS pakket worden geïnstalleerd om de Microsoft Audio Stack in Windows te gebruiken en op Linux als u de Speech SDK installeert met NuGet.

Standaardopties

In dit voorbeeld ziet u hoe u MAS gebruikt met alle standaarduitbreidingsopties voor invoer van de standaardmicrofoon van het apparaat.

var speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

var audioProcessingOptions = AudioProcessingOptions.Create(AudioProcessingConstants.AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT);
var audioInput = AudioConfig.FromDefaultMicrophoneInput(audioProcessingOptions);

var recognizer = new SpeechRecognizer(speechConfig, audioInput);

auto speechConfig = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");

auto audioProcessingOptions = AudioProcessingOptions::Create(AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT);
auto audioInput = AudioConfig::FromDefaultMicrophoneInput(audioProcessingOptions);

auto recognizer = SpeechRecognizer::FromConfig(speechConfig, audioInput);

SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");

AudioProcessingOptions audioProcessingOptions = AudioProcessingOptions.create(AudioProcessingConstants.AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT);
AudioConfig audioInput = AudioConfig.fromDefaultMicrophoneInput(audioProcessingOptions);

SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioInput);

Vooraf ingestelde microfoongeometrie

In dit voorbeeld ziet u hoe u MAS gebruikt met een vooraf gedefinieerde microfoongeometrie op een opgegeven audio-invoerapparaat. In dit voorbeeld:

Uitbreidingsopties : de standaardverbeteringen worden toegepast op de audiostroom voor invoer.
Vooraf ingestelde geometrie: de vooraf ingestelde geometrie vertegenwoordigt een lineaire matrix met twee microfoons.
Audio-invoerapparaat - De id van het audio-invoerapparaat is hw:0,1. Zie Instructies voor het selecteren van een audio-invoerapparaat voor meer informatie over het selecteren van een audio-invoerapparaat met de Speech SDK.

var speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

var audioProcessingOptions = AudioProcessingOptions.Create(AudioProcessingConstants.AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT, PresetMicrophoneArrayGeometry.Linear2);
var audioInput = AudioConfig.FromMicrophoneInput("hw:0,1", audioProcessingOptions);

var recognizer = new SpeechRecognizer(speechConfig, audioInput);

auto speechConfig = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");

auto audioProcessingOptions = AudioProcessingOptions::Create(AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT, PresetMicrophoneArrayGeometry::Linear2);
auto audioInput = AudioConfig::FromMicrophoneInput("hw:0,1", audioProcessingOptions);

auto recognizer = SpeechRecognizer::FromConfig(speechConfig, audioInput);

SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");

AudioProcessingOptions audioProcessingOptions = AudioProcessingOptions.create(AudioProcessingConstants.AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT, PresetMicrophoneArrayGeometry.Linear2);
AudioConfig audioInput = AudioConfig.fromMicrophoneInput("hw:0,1", audioProcessingOptions);

SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioInput);

Aangepaste microfoongeometrie

In dit voorbeeld ziet u hoe u MAS gebruikt met een aangepaste microfoongeometrie op een opgegeven audio-invoerapparaat. In dit voorbeeld:

Uitbreidingsopties : de standaardverbeteringen worden toegepast op de audiostroom voor invoer.
Aangepaste geometrie: een aangepaste microfoongeometrie voor een matrix met 7 microfoons wordt geleverd via de microfooncoördinaten. De eenheden voor coördinaten zijn millimeters.
Audio-invoer : de audio-invoer is afkomstig van een bestand, waarbij de audio in het bestand wordt verwacht van een audio-invoerapparaat dat overeenkomt met de aangepaste geometrie die is opgegeven.

var speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

MicrophoneCoordinates[] microphoneCoordinates = new MicrophoneCoordinates[7]
{
    new MicrophoneCoordinates(0, 0, 0),
    new MicrophoneCoordinates(40, 0, 0),
    new MicrophoneCoordinates(20, -35, 0),
    new MicrophoneCoordinates(-20, -35, 0),
    new MicrophoneCoordinates(-40, 0, 0),
    new MicrophoneCoordinates(-20, 35, 0),
    new MicrophoneCoordinates(20, 35, 0)
};
var microphoneArrayGeometry = new MicrophoneArrayGeometry(MicrophoneArrayType.Planar, microphoneCoordinates);
var audioProcessingOptions = AudioProcessingOptions.Create(AudioProcessingConstants.AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT, microphoneArrayGeometry, SpeakerReferenceChannel.LastChannel);
var audioInput = AudioConfig.FromWavFileInput("katiesteve.wav", audioProcessingOptions);

var recognizer = new SpeechRecognizer(speechConfig, audioInput);

auto speechConfig = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");

MicrophoneArrayGeometry microphoneArrayGeometry
{
    MicrophoneArrayType::Planar,
    { { 0, 0, 0 }, { 40, 0, 0 }, { 20, -35, 0 }, { -20, -35, 0 }, { -40, 0, 0 }, { -20, 35, 0 }, { 20, 35, 0 } }
};
auto audioProcessingOptions = AudioProcessingOptions::Create(AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT, microphoneArrayGeometry, SpeakerReferenceChannel::LastChannel);
auto audioInput = AudioConfig::FromWavFileInput("katiesteve.wav", audioProcessingOptions);

auto recognizer = SpeechRecognizer::FromConfig(speechConfig, audioInput);

SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");

MicrophoneCoordinates[] microphoneCoordinates = new MicrophoneCoordinates[7];
microphoneCoordinates[0] = new MicrophoneCoordinates(0, 0, 0);
microphoneCoordinates[1] = new MicrophoneCoordinates(40, 0, 0);
microphoneCoordinates[2] = new MicrophoneCoordinates(20, -35, 0);
microphoneCoordinates[3] = new MicrophoneCoordinates(-20, -35, 0);
microphoneCoordinates[4] = new MicrophoneCoordinates(-40, 0, 0);
microphoneCoordinates[5] = new MicrophoneCoordinates(-20, 35, 0);
microphoneCoordinates[6] = new MicrophoneCoordinates(20, 35, 0);
MicrophoneArrayGeometry microphoneArrayGeometry = new MicrophoneArrayGeometry(MicrophoneArrayType.Planar, microphoneCoordinates);
AudioProcessingOptions audioProcessingOptions = AudioProcessingOptions.create(AudioProcessingConstants.AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT, microphoneArrayGeometry, SpeakerReferenceChannel.LastChannel);
AudioConfig audioInput = AudioConfig.fromWavFileInput("katiesteve.wav", audioProcessingOptions);

SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioInput);

Verbeteringen selecteren

In dit voorbeeld ziet u hoe u MAS gebruikt met een aangepaste set verbeteringen in de invoeraudio. Standaard zijn alle verbeteringen ingeschakeld, maar er zijn opties om dereverberatie, ruisonderdrukking, automatische versterking en echo-annulering afzonderlijk uit te schakelen met behulp van AudioProcessingOptions.

In dit voorbeeld:

Uitbreidingsopties : echoonderdrukking en ruisonderdrukking zijn uitgeschakeld, terwijl alle andere verbeteringen ingeschakeld blijven.
Audio-invoerapparaat : het audio-invoerapparaat is de standaardmicrofoon van het apparaat.

var speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

var audioProcessingOptions = AudioProcessingOptions.Create(AudioProcessingConstants.AUDIO_INPUT_PROCESSING_DISABLE_ECHO_CANCELLATION | AudioProcessingConstants.AUDIO_INPUT_PROCESSING_DISABLE_NOISE_SUPPRESSION | AudioProcessingConstants.AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT);
var audioInput = AudioConfig.FromDefaultMicrophoneInput(audioProcessingOptions);

var recognizer = new SpeechRecognizer(speechConfig, audioInput);

auto speechConfig = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");

auto audioProcessingOptions = AudioProcessingOptions::Create(AUDIO_INPUT_PROCESSING_DISABLE_ECHO_CANCELLATION | AUDIO_INPUT_PROCESSING_DISABLE_NOISE_SUPPRESSION | AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT);
auto audioInput = AudioConfig::FromDefaultMicrophoneInput(audioProcessingOptions);

auto recognizer = SpeechRecognizer::FromConfig(speechConfig, audioInput);

SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");

AudioProcessingOptions audioProcessingOptions = AudioProcessingOptions.create(AudioProcessingConstants.AUDIO_INPUT_PROCESSING_DISABLE_ECHO_CANCELLATION | AudioProcessingConstants.AUDIO_INPUT_PROCESSING_DISABLE_NOISE_SUPPRESSION | AudioProcessingConstants.AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT);
AudioConfig audioInput = AudioConfig.fromDefaultMicrophoneInput(audioProcessingOptions);

SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioInput);

Balkvormende hoeken opgeven

In dit voorbeeld ziet u hoe u MAS gebruikt met een aangepaste microfoongeometrie en straalvormende hoeken op een opgegeven audio-invoerapparaat. In dit voorbeeld:

Uitbreidingsopties : de standaardverbeteringen worden toegepast op de audiostroom voor invoer.
Aangepaste geometrie: een aangepaste microfoongeometrie voor een matrix met 4 microfoons wordt geleverd door de microfooncoördinaten op te geven. De eenheden voor coördinaten zijn millimeters.
Stralende hoeken- Beamforming hoeken zijn opgegeven om te optimaliseren voor audio afkomstig uit dat bereik. De eenheden voor hoeken zijn graden.
Audio-invoer : de audio-invoer is afkomstig van een pushstream, waarbij de audio binnen de stream wordt verwacht van een audio-invoerapparaat dat overeenkomt met de opgegeven aangepaste geometrie.

In het volgende codevoorbeeld wordt de beginhoek ingesteld op 70 graden en wordt de eindhoek ingesteld op 110 graden.

var speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

MicrophoneCoordinates[] microphoneCoordinates = new MicrophoneCoordinates[4]
{
    new MicrophoneCoordinates(-60, 0, 0),
    new MicrophoneCoordinates(-20, 0, 0),
    new MicrophoneCoordinates(20, 0, 0),
    new MicrophoneCoordinates(60, 0, 0)
};
var microphoneArrayGeometry = new MicrophoneArrayGeometry(MicrophoneArrayType.Linear, 70, 110, microphoneCoordinates);
var audioProcessingOptions = AudioProcessingOptions.Create(AudioProcessingConstants.AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT, microphoneArrayGeometry, SpeakerReferenceChannel.LastChannel);
var pushStream = AudioInputStream.CreatePushStream();
var audioInput = AudioConfig.FromStreamInput(pushStream, audioProcessingOptions);

var recognizer = new SpeechRecognizer(speechConfig, audioInput);

auto speechConfig = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");

MicrophoneArrayGeometry microphoneArrayGeometry
{
    MicrophoneArrayType::Linear,
    70,
    110,
    { { -60, 0, 0 }, { -20, 0, 0 }, { 20, 0, 0 }, { 60, 0, 0 } }
};
auto audioProcessingOptions = AudioProcessingOptions::Create(AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT, microphoneArrayGeometry, SpeakerReferenceChannel::LastChannel);
auto pushStream = AudioInputStream::CreatePushStream();
auto audioInput = AudioConfig::FromStreamInput(pushStream, audioProcessingOptions);

auto recognizer = SpeechRecognizer::FromConfig(speechConfig, audioInput);

SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");

MicrophoneCoordinates[] microphoneCoordinates = new MicrophoneCoordinates[4];
microphoneCoordinates[0] = new MicrophoneCoordinates(-60, 0, 0);
microphoneCoordinates[1] = new MicrophoneCoordinates(-20, 0, 0);
microphoneCoordinates[2] = new MicrophoneCoordinates(20, 0, 0);
microphoneCoordinates[3] = new MicrophoneCoordinates(60, 0, 0);
MicrophoneArrayGeometry microphoneArrayGeometry = new MicrophoneArrayGeometry(MicrophoneArrayType.Planar, 70, 110, microphoneCoordinates);
AudioProcessingOptions audioProcessingOptions = AudioProcessingOptions.create(AudioProcessingConstants.AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT, microphoneArrayGeometry, SpeakerReferenceChannel.LastChannel);
PushAudioInputStream pushStream = AudioInputStream.createPushStream();
AudioConfig audioInput = AudioConfig.fromStreamInput(pushStream, audioProcessingOptions);

SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioInput);

Referentiekanaal voor echo-annulering

Microsoft Audio Stack vereist het referentiekanaal (ook wel loopback-kanaal genoemd) om echo-annulering uit te voeren. De bron van het referentiekanaal verschilt per platform:

Windows : het referentiekanaal wordt automatisch verzameld door de Speech SDK als de optie wordt opgegeven bij het SpeakerReferenceChannel::LastChannel maken AudioProcessingOptions.
Linux - ALSA (Advanced Linux Sound Architecture) moet worden geconfigureerd om de referentieaudiostream te leveren als het laatste kanaal voor het gebruikte audio-invoerapparaat. ALSA is geconfigureerd naast het bieden van de optie bij het SpeakerReferenceChannel::LastChannel maken AudioProcessingOptions.

Taal- en platformondersteuning

Taal	Platform	Naslagdocumentatie
C++	Windows, Linux	C++-documenten
C#	Windows, Linux	C#-documenten
Java	Windows, Linux	Java-documenten

Ontwikkelomgeving instellen

Delen via

De Microsoft Audio Stack (MAS) gebruiken

Standaardopties

Vooraf ingestelde microfoongeometrie

Aangepaste microfoongeometrie

Verbeteringen selecteren

Balkvormende hoeken opgeven

Referentiekanaal voor echo-annulering

Taal- en platformondersteuning

Feedback

Aanvullende resources

Delen via

De Microsoft Audio Stack (MAS) gebruiken

Standaardopties

Vooraf ingestelde microfoongeometrie

Aangepaste microfoongeometrie

Verbeteringen selecteren

Balkvormende hoeken opgeven

Referentiekanaal voor echo-annulering

Taal- en platformondersteuning

Gerelateerde inhoud

Feedback

Aanvullende resources