Using the Kinect as an Audio Device
Kinect for Windows 1.5, 1.6, 1.7, 1.8
As mentioned above, the Kinect sensor contains a four-microphone array.
Several samples in the Kinect for Windows Developer Toolkit show how to use the Kinect sensor as an audio capture device. The audio stream is exposed slightly differently in the C++ and C# APIs.
In addition, it is possible to do raw audio capture from the sensor using the WASAPI interfaces available in Windows 7 and Windows 8.
C++ Audio API
KinectAudio DMO
Windows Vista, Windows 7, and Windows 8 include a voice-capture digital signal processor (DSP) that supports microphone arrays. Developers typically access that DSP through a DMO, which is a standard COM object that can be incorporated into a DirectShow graph or a Microsoft Media Foundation topology. The SDK includes an extended version of the Windows microphone array DMO, referred to here as the KinectAudio DMO, to support the Kinect microphone array.
Although the internal details are different, the KinectAudio DMO supports the same interfaces as the standard microphone array DMO and works in much the same way. However, the KinectAudio DMO:
- Supports an additional microphone mode, which is customized to support the Kinect microphone array.
- Includes beamforming and source-localization functionality, which are exposed through the INuiAudioBeam interface.
To access the DMO in C++, call NuiGetAudioSource or INuiSensor::NuiGetAudioSource.
The beamforming functionality supports 11 fixed beams, which range from -0.875 to 0.875 radians (approximately -50 to 50 degrees in 10-degree increments). Applications can use the DMO’s adaptive beamforming option, which automatically selects the optimal beam, or specify a particular beam. The DMO also includes a source localization algorithm, which estimates the source direction.
The KinectAudio DMO also supports noise suppression and automatic echo cancellation.
Raw Audio Capture
Applications can use WASAPI (Windows Audio Session API) to capture the raw audio stream from the Kinect sensor’s microphones. The Developer Toolkit contains a C++ sample, AudioCaptureRaw-Console, that illustrates the use of WASAPI. A later topic provides a walkthrough of this sample.
For developers wishing to know more about Windows Audio, full details on WASAPI can be found here.
Samples
The toolkit includes the following C++ samples illustrating audio capture:
- The AudioBasics-D2D C++ sample shows how to capture sound and determine the selected beam and source direction by using the KinectAudio DMO.
- The AudioCaptureRaw-Console C++ sample shows how to use WASAPI to access the raw audio from the Kinect sensor. The KinectAudio DMO is not used in this sample.
- The Audio Explorer-D2D C++ sample shows how to capture sound by using the KinectAudio DMO. It also shows how automatic echo cancellation affects the audio stream, and the difference between treating the Kinect microphone array as a single microphone, and as an array microphone.
C# Audio API
The SDK includes a managed Audio API, which is basically a wrapper over the KinectAudio DMO that supports the same functionality but is much simpler to use. The managed API allows applications to configure the DMO and perform operations such as starting, capturing, and stopping the audio stream. The managed API also includes events that provide the source and beam directions to the application.
To access the KinectAudio DMO in C#, use the AudioSource property of the KinectSensor object.
Samples
The Developer Toolkit contains one C# sample for accessing audio:
- The AudioBasics-WPF C# sample shows how to capture sound and determine the selected beam and source direction by using the managed wrapper over the KinectAudio DMO.