Audio Stream

Article
09/12/2013

Kinect for Windows 1.5, 1.6, 1.7, 1.8

The Kinect sensor includes a four-element, linear microphone array, shown here in purple.

JJ131026.k4w_microphone_array2(en-us,IEB.10).png

The microphone array captures audio data at a 24-bit resolution, which allows accuracy across a wide dynamic range of voice data, from normal speech at three or more meters to a person yelling.

What Can You Do with Audio?

The sensor (microphone array) enables several user scenarios, such as:

High-quality audio capture
Focus on audio coming from a particular direction with beamforming
Identification of the direction of audio sources
Improved speech recognition as a result of audio capture and beamforming
Raw voice data access

Implementing Audio in a Native (Unmanaged) Application

A native application can use one of two different approaches for implementing solutions for these audio scenarios:

Use the KinectAudio DirectX Media Object (DMO), as shown in the AudioBasics-D2D C++ sample
Use the Windows Audio Session API (WASAPI), as shown in the AudioCaptureRaw-Console C++ sample

Using the KinectAudio DirectX Media Object (DMO)

Windows Vista, Windows 7, and Windows 8 include a voice-capture digital signal processor (DSP) that supports microphone arrays. Developers typically access that DSP through a DMO, which is a standard COM object that can be incorporated into a DirectShow graph or a Microsoft Media Foundation topology. The SDK includes an extended version of the Windows microphone array DMO, referred to here as the KinectAudio DMO, to support the Kinect microphone array.

Access a DMO in C++ by calling NuiGetAudioSource or INuiSensor::NuiGetAudioSource.

Using the Windows Audio Session API (WASAPI)

Use the Windows Audio Session API (WASAPI) to capture the raw audio stream as shown in the AudioCaptureRaw-Console C++ sample in the Developer Toolkit.

For more information about WASAPI, see About WASAPI (Windows).

Implenting Audio in a Managed Application

Managed applications use a KinectAudioSource object to implement all of the scenarios listed above.

KinectAudioSource Wraps a DirectX Media Object

A Windows DirectX Media Object (DMO) is a common Windows component for a single-channel microphone. Using this as a building block, the KinectAudio class extends this component with the following additional capabilities:

An additional microphone mode, which is customized to support the Kinect microphone array
Beamforming and source localization
Noise suppression and automatic echo cancellation using the 24-bit ADC built into the DMO

Access the audio stream in managed code using the KinectSensor.AudioSource property.

Share via