What's in an audio volume?

I've been talking about audio controls - volumes and mutes and the like, but one of the more confusing things I've run into here at work is the concept of "volume".

First off, what IS volume?

Well, roughly speaking (and I know the audiophiles out there will get on my case about this), there are actually several concepts when you talk about "volume".  The first (and most common) is that volume is a representation of "loudness".  But it turns out that in practice, volume is a representation of "intensity".

The difference between "loudness" and "intensity" is that "loudness" is perceptual - how do you perceive a sound.  But "intensity" is actually what's measured - as SPL (Sound Pressure Level), which is a representation of energy in the sound space.

Typically volumes are measured in decibels - a decibel is a logarithmic scale (each 10dB increase is a 10x increase in sound pressure).  20dB is about the volume of a whisper, 140dB is that of a jet airplane taking off next door. 

Now when you deal with volumes in pro audio equipment, volume is measured by two factors - attenuation and amplification.  0 means that the sound is playing at its native level, negative numbers are reductions in volume from that native level, and positive numbers indicate amplification. 

For most computer hardware, volume is measured as attenuations - negative numbers running from 0 (max volume) to -infinity (0 volume).  In practice, the number runs from 0 to -96dB.  Typically computers don't ever amplify signals, just attenuate them.  If you think about how digital audio works this makes sense.  Since an audio sample at full volume is at 0dB, it's easy to attenuate the samples (just scale them down appropriately).  On the other hand, it's not easy to amplify them - they're already AT 100% - any amplification would have to come AFTER the DAC.  So digital volumes ultimately measure attenuation.

But audio volumes AREN'T in decibels (because that would be easy).  Instead, the audio volume is represented in a number of different sets of units, depending on your API.

And that's where it gets really, really ugly.  There are at least five different sets of APIs in the system that measure audio volume, and they use totally different units.

For example, the wave APIs ((waveOutSetVolume, waveOutGetVolume) represent volume as a number between 0x0000 and 0xffff, where 0 represents silence and 0xffff represents full volume.  The wave APIs assume that all audio outputs are stereo, and they pack the left and right channels into a single DWORD.  Of course if your audio system has more than two channels, that's a problem, but the reality is that almost nobody ever wants to adjust the balance as a normal activity (it's typically done once during system setup and then ignored).

The mixer APIs on the other hand set their volumes with the mixerSetControlDetails API.  That API takes an integer between a low bound and a high bound, determined from the dwMinimum and dwMaximum fields of the relevent MIXERCONTROL.  The MIXERCONTROL structure also defines the number of steps between the low and the high value.  For most audio adapters, this is a number between 0 and 0xffff, with 0xffff steps, but this is not guaranteed - I've seen audio adapters with discrete volumes - 256 steps, for example.

And then there's direct sound.  DirectSound sets volume on individual DSound buffers - you set the volume with the IDirectSoundBuffer8::SetVolume API.  The DSound set volume API sets the volume as a DWORD with the volume measured in hundredths of a dB, ranging from 0 to -10,000 (0 to -100dB).

Oh, and I can't forget the audio CD playback volume.  The IOCTL_CDROM_GET_VOLUME (which is used to control the volume of CD playback when you're playing an audio CD over the analog connector to your sound card) specifies volumes in numbers between 0 and 255.

And of course, the audio device driver that's actually used to render all these different volume levels takes a fourth type of volume.  The KSPROPERTY_AUDIO_VOLUMELEVELEL API takes a number from -2147483648 to +2147483647 where -2147483648 is silence (-32767 dB), 0 is max volume and 2147473647 is +32767 decibels (gain).  The units for the sysaudio volume are in 1/65536th decibel, which is nice since the high 16 bits represents the decibel value, the low 16 bits represent the fractional portion of the volume (typically 0).

Sigh.

Comments

  • Anonymous
    June 15, 2005
    Larry, your posts are incredibly interesting. One question: I understood that a 10dB increase is double, not 10x. Which is correct?
  • Anonymous
    June 15, 2005
    Sorry for snarking tangents since I don't entirely follow the main thread here (even though my stereo back in the day was considered more or less audio filesystem).

    > The IOCTL_CDROM_GET_VOLUME (which is used to
    > control the volume of CD playback when
    > you're playing an audio CD

    Is there a corresponding IOCTL_CDAUDIO_GET_VOLUME which is used to control the serial number, amount of free space, etc., on a data CD? ^u^

    > KSPROPERTY_AUDIO_VOLUMELEVELEL [...]
    > -2147483648 is silence (-32767 dB), [...]
    > 2147473647 is +32767 decibels (gain).

    Computer scientists wonder why -2147483648 isn't -32768 dB. Complements of the house ^-^
  • Anonymous
    June 15, 2005
    The comment has been removed
  • Anonymous
    June 15, 2005
    Steve: You're probably right - I quoted that from an article I read.

    Norman: "Computer scientists wonder why -2147483648 isn't -32768 dB. Complements of the house ^-^ "

    Because I rounded. It's actually -32767.9994

    And I don't know about the CDROM IOCTLs. You'd have to look in the DDK documentation.

    Chris: You're 100% right, they did. The KSPROPERTY_AUDIO_VOLUMELEVEL API settings are close to being right, but we can (and will :)) do better in Longhorn (as if that's not a big enough hint).
  • Anonymous
    June 15, 2005
    Does any audio equipment use all of that range of dB in the kernel API? Its just that I would have thought that the windows startup tone played some 10^3262.8 times louder than a jet engine is likely to do some serious interstellar damage :)
  • Anonymous
    June 15, 2005
    Just to clarify, 3dB equals a doubling of sound intensity. So an increase in 10dB would mean the sound intensity increases by a factor of ~6.66.
  • Anonymous
    June 15, 2005
    Oh, and by the way. Reading your blog has become part of my daily routine :)
  • Anonymous
    June 15, 2005
    My home stereo has the volume going from -72 (I think) to 0, which confuses us greatly. The old one went from 0 to 63 or so.
  • Anonymous
    June 15, 2005
    So, the secret is finally out, you're working on "Unified Volume Control"?

    Note that this stuff was already announced at WinHEC2003, I wonder why you can't talk about it?

    See:
    http://download.microsoft.com/download/c/f/1/cf1806ad-5a4f-4f7d-a5b2-07fdb59a7adb/WH03_TPA66a.exe
  • Anonymous
    June 15, 2005
    NorwegianGuy: Just to clarify a bit more, you are wrong. 10 dB difference really does mean a 10x increase (by definition). 3 dB is roughly a 2x increase, exactly it is 10^0.3 ~ 1.995. (If you want to compute it the other way around, if 3 dB was taken as exactly 2x, then 10 dB would be 2^(10/3) = 2^3.333... ~ 10.079.)

    See e. g. http://en.wikipedia.org/wiki/Decibel
  • Anonymous
    June 16, 2005
    Something like this http://www.winsupersite.com/images/showcase/lh-winhec-01.png maybe?

    Yeah, I know this is probably not true anymore, since it is from WinHEC 2003. But looks a lot better than sndvol32 :)
  • Anonymous
    June 16, 2005
    The comment has been removed
  • Anonymous
    June 16, 2005
    Sorta offtopic but...if longhorn will implement a recent list of apps (like in the screenshot posted) please add a "clear recent list option" having items from some autorun menu app from a cd you used years ago is kinda pointless (hidden tray items list suffers from this)
  • Anonymous
    June 16, 2005
    The comment has been removed
  • Anonymous
    June 16, 2005
    Thanks for clarifying the decibel-stuff. :) Guess I'll go read some wikipedia now ;)
  • Anonymous
    June 16, 2005
    Norman: I missed the bad math, sorry. I thought he was talking about the 3 dB = 2x assertion. (2^3.333... = 10.08 roughly, not 6.66.)