What's an audio endpoint?

One of the parts of the audio engine rework was a paradigm shift in how audio devices are addressed.

Before Vista, audio devices were enumerated (more or less) by the KSCATEGORY_AUDIO PnP devinterface that exposed a Wave filter.

The big problem with this is that it doesn't even come close to representing how users think about their audio solution (audio adapter, USB audio device, motherboard audio chipset connected to output speaker and microphone jacks).  I'm willing to bet that 90% of the people reading this post have no idea what a "KSCATEGORY_AUDIO PnP devinterface that exposed a Wave filter" is.  But every single one of you knows what a speaker is.

It also turns out that the PnP definition is unnecessarily simplistic.  It doesn't cover scenarios where the device that renders the audio isn't physically attached to the PC.  There are a number of these scenarios in Windows today (for instance, remote desktop audio is a perfect example), and to solve them, developers have designed a number of hack-o-rama solutions to the problem, none of which is particularly attractive.

For Vista, what we've done is to define a new concept called an "audio endpoint".  An "audio endpoint" represents the ultimate destination for audio rendering.  It might be the speakers on your local workstation, it might be speakers on the workstation of a remote machine running an RDP client, it might be the speakers connected to the receiver of your home stereo, it might be the microphone or headset connected to your laptop, it might be something we've not yet figured out.

The key thing about an audio endpoint is that it represents a piece of plastic, and NOT a PnP thingamajig[1].  The concept of endpoints goes directly to the 3rd set of problems - troubleshooting audio is simply too hard, because the objects that the pre-Vista audio subsystem referenced dealt with the physical audio hardware, and NOT with the things to which users relate.

Adding the concept of an audio endpoint also makes some scenarios like the RDP scenario I mentioned above orders of magnitude simpler.  Before Vista, remote desktop audio was implemented with a DLL that effectively replaced winmm on the RDP server and redirected audio to the RDP client.  With this architecture, it would have been extremely difficult to implement features like per-application volume and other DSP related scenarios for remote clients.  For Vista, remote desktop audio was implemented as an audio endpoint.  As such, applications running on a remote desktop server function just like they would on the local machine - instead of bypassing the audio engine in the client application, remote desktop audio runs through the audio engine just like local audio does, it gets redirected at the back end of the engine - instead of playing out the local audio adapter, the audio is redirected over the RDP channel.

Once again, some of this stuff won't appear until Vista Beta2 - the Beta1 RDP code still uses the old RDP infrastructure.  In addition, while the endpoint mechanism provides a paradigm that allows a mechanism to address the speakers connected to your AV receiver as an endpoint, the functionality to implement this doesn't exist in Vista.

[1] Did you know that Microsoft Offices spelling corrector will correct the spelling of  thingamajig?

Comments

  • Anonymous
    September 21, 2005
    How will this show up in Vista? I mean what will really be different. You are correct I don't know what KSCATEGORY_AUDIO is, however right now when I look through things like hardware applet I or output controls I see things like Speaker, Microphone, headset etc.

    [2] Did you know spelling correction corrects the work juju as well. http://en.wikipedia.org/wiki/Juju

    I learned that one today when sending an email to a end user that no matter what type of .jpg file he clicked on it gave him unrecognized format error and wouldn't open. I told him in the opening sentence to my reply "It sounded like he had some bad juju".

  • Anonymous
    September 21, 2005
    Re: I'm willing to bet that 90% of the people reading this post have no idea ....

    Yes exactly. The problem is that this information is hard to come by. You could learn by writing a audio driver but as no audio manufacturer releases specs for their cards it is unrealistic, and who is going to sit there learning about the sound driver model just "in case" they ever had to build an audio driver?

    When looking into /low level/ graphics ((Hardware <- 2D API) level stuff) I found that the only information was a decade old... I found a book from like 1994 about implementing VGA and that is it. It seems that sound/graphics hardware people and others in that area don't want general developers to know how this stuff works.

    Intel documents their hardware REALLY well; but they are the exception and not the rule. You might ask yourself "Intel have to, developers need to know that stuff" -- That isn't really true, with compilers the developer only needs to know C or C# to get along these days... The same can be said for graphics, sound and other areas, the people producing them could document as well as Intel but they've found they can get away with not so do.

    Frankly I blame the Microsoft monopoly for some of this (if lots of people implemented a graphics / sound interface they would have to release it to the 'public').

    Sorry about the rant.... :-(

  • Anonymous
    September 21, 2005
    But apparently its grammar correction didn't find the missing apostrophe..... ("Office's")

  • Anonymous
    September 21, 2005
    Jeff, the hardware control describes the hardware - it's got things like "Bass", "Treble", "CD Audio", "Wave", etc.

    Many/most of these things have no relationship to the user - what does the "CD Audio" control do, for example. And trying to figure out how your spdif connection works is an absolute nightmare.

    The endpoints are what the new mmsys.cpl will expose as primary devices - users will interact with endpoints, not with PnP devices like they do today.

  • Anonymous
    September 21, 2005
    The comment has been removed

  • Anonymous
    September 21, 2005
    The 5.1 test tool's included in Vista, it's a part of the new mmsys.cpl UI. And playback of CD audio through WMP should use very little DSP (you'll have volume converters, a format converters (or two), and a mixer should be about it (there might be a matrix DSP in the mix if the endpoint's output isn't stereo)).

    As I said, we're pretty darned serious about audio quality in Vista.

    The current clock on the audio adapter's not actually that interesting, IMHO - but we've enhanced the ability of apps that do multimedia to do clock sync.

    The "What's connected to what" application can be built on one of our new APIs (more on those tomorrow).

    As far as what apps are using what, we don't expose that - we do show (in the UI) what apps are rendering currently and what apps have recently rendered, but that's it.

  • Anonymous
    September 21, 2005
    The comment has been removed

  • Anonymous
    September 21, 2005
    Thanks for the answer! This direct interaction thing with you MS blog guys is still quite surreal. I'm very curious about your forthcoming post dealing with latency too. I gotta say I'm pretty happy about your newfound focus for better audio in Vista. Keep up the good work.

  • Anonymous
    September 20, 2007
    PingBack from http://blogs.zdnet.com/Bott/?p=309

  • Anonymous
    May 31, 2009
    PingBack from http://outdoorceilingfansite.info/story.php?id=184

  • Anonymous
    May 31, 2009
    PingBack from http://outdoorceilingfansite.info/story.php?id=17820