New Audio APIs for Vista

In an earlier post, I mentioned that we totally re-wrote the audio stack for Windows Vista.  Today I want to talk a bit about the APIs that came along with the new stack.

There are three major API components to the Vista audio architecture:

  • Multimedia Device API (MMDEVAPI) - an API for enumerating and managing audio endpoints.
  • Device Topology - an API for discovering the internals of your audio card's topology.
  • Windows Audio Session API ((WASAPI) - the low level API for rendering audio.

All the existing audio APIs have been re-plumbed to use these APIs internally, for Vista, all audio goes through these three APIs.  For the vast majority of the existing audio applications, things should "just work"...

In general, we don't expect that anyone will move to these new APIs, they're documented for completeness reasons, but the reality is that unless you're dealing with extremely low latency audio (sub 20ms), or writing a control panel applet for a specific audio adapter, you're not likely to ever want to deal with them (the new APIs really are very low level APIs - using the higher level APIs is both easier and less error prone).

MMDEVAPI:

MMDEVAPI is the entrypoint API - it's a COM class that allows applications to enumerate endpoints and "activate" interfaces on them.  Endpoints fall into two general types: Capture and Render (You can consider Capture endpionts as microphones and line in, Render endpoints are things like speakers).  MMDEVAPI also allows the user to manage defaults for each of the types. As I write this, are actually three different sets of defaults supported in Vista: "Console", "Multimedia", and "Communications".  "Console" is used for general purpose audio, "Multimedia" is intended for audio playback applications (media players, etc), and "Communications" is intended for voice communications (applications like Yahoo! Messenger, Microsoft Communicator, etc). 

Windows XP had two sets of defaults (the "default" default and the "communications" default), we're adding a 3rd default type to enable multimedia playback.  Consider the following scenario.  I have a Media Center computer.  The SPDIF output from the audio adapter's connected to my home AV receiver, I have a USB headset that I want to use for VOIP, and there are stereo speakers connected to the machine that I use for day-to-day operations.  We want to enable applications to make intelligent choices when they choose which audio device to use - the default in this scenario is to use the desktop speakers, but we want to allow Communicator (or Messenger, or whatever) to use the headset, and Media Center to use the external receiver.  We may end up changing these sets before Vista ships, but this give a flavor of what we're thinking about.

MMDEVAPI supports an "activation" design pattern - essentially, instead of calling a class factory to create a generic object, then binding the object to another object, with activation, you can enumerate objects (endpoints in this case) and "activate" an interface on that object.  It's a really convenient pattern when you have a set of objects that may or may not have the same type.

Btw, you can access the category defaults using wave or mixer messages, this page from MSDN describes how to access them - the console default is accessed via DRVM_MAPPER_PREFERRED_GET and the communications default is accessed via DRVM_MAPPER_CONSOLEVOICECOM_GET.

Device Topology:

Personally, I don't believe that anyone will ever use Device Topology, except for audio hardware vendors who are writing custom control panel extensions.  It exists for control panel type applications that need to be able to determine information about the actual hardware. 

Device Topology exposes collections of parts and the connections between those parts.  On any part, there are zero or more controls, which roughly correspond to the controls exposed by the audio driver.  One cool thing about device topologies is that topologies can connect to other topologies.  So in the future, it's possible that an application running on an RDP server may be able to enumerate and address the audio devices on the RDP client - instead of treating the client as an endpoint, the server might be able to enumerate the device topology on the RDP client and manipulate controls directly on the client.  Similarly, in the future, the hardware volume control for a SPDIF connector might manipulate the volume on an external AV receiver via an external control connection (1394 or S/LINK).

One major change between XP and Vista is that Device Topology will never lie about the capabilities of the hardware - before Vista, if a piece of hardware didn't have a particular control the system tried to be helpful and provide controls that it thought ought to be there (for instance if a piece of hardware didn't have a volume control, the system helpfully added one).  For Vista, we're reporting exactly what the audio hardware reports, and nothing more.  This is a part of our philosophy of "don't mess with the user's audio streams if we don't have to" - emulating a hardware control when it's not necessary adds potentially unwanted DSP to the audio stream.

Again, the vast majority of applications shouldn't need to use these controls, for most applications, the functionality provided by the primary APIs (mixerLine, wave, DSound, etc) are going to be more suitable for their needs.

WASAPI:

WASAPI is the "big kahuna" for the audio engine.  You activate WASAPI on an endpoint, and it provides functionality for rendering/capturing audio streams.  It also provides functions to manage the audio clock and manipulate the volume of the audio stream.

In general, WASAPI operates in two modes.  In "shared" mode, audio streams are rendered by the application and mixed by the global audio engine before they're rendered out the audio device.  In "exclusive" mode, audio streams are rendered directly to the audio adapter, and no other application's audio will play.  Obviously the vast majority of applications will operate in shared mode, that's the default for the wave APIs and DSound.  One relatively common scenario that WILL use exclusive mode is rendering content that requires a codec that's present in the hardware that Windows doesn't understand.  A simple example of this is compressed AC3 audio rendered over a SPDIF connection - if you attempt to render this content, if Windows doesn't have a decoder for this content, then DSound will automatically initialize WASAPI in exclusive mode and will render the content directly to the hardware.

If your application is a pro audio application, or is interested in extremely low latency audio then you probably want to consider using WASAPI, otherwise it's better to stick with the existing APIs.

Tomorrow: Volume control (a subject that's near and dear to my heart) :)

Comments

  • Anonymous
    September 23, 2005
    The comment has been removed

  • Anonymous
    September 23, 2005
    Do I understand correctly that DSound is now layered on top of this new API (WASAPI)?

  • Anonymous
    September 23, 2005
    <i>"MMDEVAPI is the entrypoint API - it's a COM class..."</i>

    Still COM? Is there a specific reason you guys (and gals) made the API set a COM object? I had hoped that every new API set in Vista would be implemented as a .NET class. I thought this was the basic idea of WinFX.

  • Anonymous
    September 23, 2005
    This is cool. WASAPI sounds a lot promising.
    Will Device Topology APIs let developers do for example device aggregation? (when i have 2 audio devices with 2 input both, it sees them as one device with 4 inputs?)

  • Anonymous
    September 23, 2005
    The comment has been removed

  • Anonymous
    September 23, 2005
    Well I'm a pro audio guy and this low latency stuff sounds like a winner. Keep up the good work.

  • Anonymous
    September 23, 2005
    Dennis, EVERYTHING involving audio is layered over WASAPI.

    And the reason it's all unmanaged is relatively simple: We don't want every single application calling PlaySound to have to have the .Net framework injected into it.

    steamy, I don't think so, but I'm not sure it matters - MMDEVAPI presents the inputs from all the different adapters as separate capture inputs - the endpoint abstraction allows you to divorce the capture/render device from the hardware.

    Stebet, that's an interesting thought...

  • Anonymous
    September 23, 2005
    The comment has been removed

  • Anonymous
    September 23, 2005
    Minh, I'd look at DirectSound, it should have the flexibility to do what you need. The mixing tracks is easy (you just add the samples and clip), the volume envelope makes things trickier.

    Essentially you write your streams in direct sound secondary buffers, you should be able to tap off the primary buffer before it gets rendered (I think). If not, then you can certainly do this with DirectShow.

  • Anonymous
    September 23, 2005
    I just wanted to say that I appreciate that the effort that has been put into ensuring old wave APIs still work and rerouting them through new systems. It was nice when the multimedia APIs got rerouted through the kernel mixer and us luddites that still used waveIn/waveOut got the benefits of lower latency and sound device sharing.

  • Anonymous
    September 26, 2005

    Does this then support some obvious use cases for a family (shared) media computer. E.g. playing AC3 out of SPDIF to AV amp for the DVD being player while running the VOIP calls output to a USB handset?

  • Anonymous
    September 27, 2005
    The comment has been removed

  • Anonymous
    November 17, 2005
    Does that mean, that all the "old" low-latency entrypoints (kernel-streaming either from user-space as well as kernel-space) will not exist anymore?

    How would a kernel-mode-driver that needs to stream audio to a soundcard work using Vista?

    Best regards,
    Tobias

  • Anonymous
    December 05, 2005
    The comment has been removed

  • Anonymous
    September 17, 2006
    Dans le monde de l&amp;#39;enregistrement audio num&amp;eacute;rique, la latence correspond au temps d&amp;rsquo;attente

  • Anonymous
    July 11, 2007
    PingBack from http://www.itwriting.com/blog/?p=272

  • Anonymous
    June 10, 2008
    PingBack from http://www.hardwarecanucks.com/forum/audio/7895-audio-device-management-vista.html#post61545

  • Anonymous
    May 30, 2009
    PingBack from http://outdoorceilingfansite.info/story.php?id=165

  • Anonymous
    May 31, 2009
    PingBack from http://outdoorceilingfansite.info/story.php?id=17801

  • Anonymous
    June 08, 2009
    PingBack from http://hairgrowthproducts.info/story.php?id=4383

  • Anonymous
    June 17, 2009
    PingBack from http://patiosetsite.info/story.php?id=608

  • Anonymous
    June 18, 2009
    PingBack from http://onlyoutdoorrugs.info/story.php?id=772