What IS audio on a PC anyway?

项目
10/26/2004

This may be well known, but maybe not (I didn’t understand it until I joined the Windows Audio team).

Just what is digital audio, anyway? Well, at its core, all of digital audio is a “pop” sound made on the speaker. When you get right down to it, that’s all it is. A “sound” in digital audio is a voltage spike applied to a speaker jack, with a specific amplitude. The amplitude determines how much the speaker diaphragm moves when the signal is received by the speaker.

That’s it, that’s all that digital audio is – it’s a “pop” noise. The trick that makes it sound like Sondheim is that you make a LOT of pops every second – thousands and thousands of pops per second. When you make the pops quickly enough, your ear puts the pops together to turn them into a discrete sound. You can hear a simple example of this effect when you walk near a high voltage power transformer. AC power in the US runs at 60 cycles per second, and as the transformer works, it emits a noise on each cycle. The brain smears that 60 Hz sound together and turns it into the “hum” that you hear near power equipment.

Another way of thinking about this (thanks Frank) is to consider the speaker on your home stereo. As you’re listening to music, if you pull the cover off the speaker, you can see the cone move in and out with the music. Well, if you were to take a ruler and measure the displacement of the cone from 0, the distance that it moves from the origin is the volume of the pop. Now start measuring really fast – thousands of times a second. Your collected measurements make up an encoded representation of the sound you just heard.

To play back the audio, take your measurements, and move the cone the same amount, and it will reproduce the original sound.

Since a picture is worth a thousand words, Simon Cooke was gracious enough to draw the following...

Take an audio signal, say a sine wave:

Then, you sample the sine wave (in this case, 16 samples per frequency):

Each of the bars under the sine wave is the sample. When you play back the samples, the speaker will reproduce the original sound. One thing to keep in mind (as Simon commented) is that the output waveform doesn't look quite like the stepped function that the samples would generate. Instead, after the Digital-to-Audio-Converter (DAC) in the sound card, there's a low pass filter that smooths the output of the signal.

When you take an analog audio signal, and encode it in this format, it’s also known as “Pulse Coded Modulation”, or “PCM”. Ultimately, all PC audio comes out in PCM, that’s typically what’s sent to the sound card when you’re playing back audio.

When an analog signal is captured (in a recording studio, for example), the volume of the signal is sampled at some frequency (typically 44.1 kHz for CD audio). Each of the samples is captured with a particular range of amplitudes (or quantization). For CD audio, the quantization is 16 bits, in two samples. Obviously, this means that each sample has one of at most 65,536 values, which is typically enough for most audio applications. Since the CD audio is stereo, there are two 16 bit values for each sample.

Other devices, like telephones, on the other hand, typically uses 8 bit samples, and acquires their samples at 8kHz – that’s why the sound quality on telephone communications is so poor (btw, telephones don’t actually use direct 8 bit samples, instead their data stream is compressed using a format called mu-law (or a-law in Europe), or G.711). On the other hand, the bandwidth used by typical telephone communication is significantly lower than CD audio – CD audio’s bandwidth is 44,100*16*2=1.35Mb/second, or 176KB/second. The bandwidth of a telephone conversation is 64Kb/second, or 8KB/second (reduced to from 3.2Kb/s to 11Kb/s with compression), an order of magnitude lower. When you’re dealing with low bandwidth networks like the analog phone network or wireless networks, this reduction in bandwidth is critical.

It’s also possible to sample at higher frequencies and higher sample sizes. Some common sample sizes are 20bits/sample and 24bits/sample. I’ve also seen 96.2 kHz sample frequencies and sometimes even higher.

When you’re ripping your CDs, on the other hand, it’s pointless to rip them at anything other than 44.1 kHz, 16 bit stereo, there’s nothing you can do to improve the resolution. There ARE other forms of audio that have a higher bit rate, for example, DVD-Audio allows samples at 44.1, 48, 88.2, 96, 176.4 or 192 kHz, and sample sizes of 16, 20, or 24 bits/sample, with up to 6 96 kHz audio channels or 2 192 kHz samples.

One thing to realize about PCM audio is that it’s extraordinarily sparse – there is a huge amount of compression that can be done to the data to reduce the size of the audio data. But in most cases, when the data finally hits your sound card, it’s represented as PCM data (this isn’t always the case, for example, if you’re using the SPDIF connector on your sound card, then the data sent to the card isn’t PCM).

Edit: Corrected math slightly.

Edit: Added a couple of pictures (Thanks Simon!)

Edit3: Not high pass, low pass filter, thanks Stefan.

Comments

Anonymous
October 26, 2004
The comment has been removed
Anonymous
October 26, 2004
The comment has been removed
Anonymous
October 26, 2004
Simon, you're right, the analog filtering after the DAC smears them out, but from a conceptual standpoint, that was the easiest way of explaining the idea that these are discrete samples.

And 60Hz is above the human threshold of hearing, which is why you can hear the transformer, I chose it because it was the most common example of a low frequency sound I could come up with.
Anonymous
October 26, 2004
Oh, and you're right, phone is 8kHz, not 11kHz. That's why I indicated that it was for devices <i> like </i> telephones.
Anonymous
October 26, 2004
Your units are wrong on your math. Your "44,1000" also has one too many 0s. The correct formula is:

44,100 (1/s) * 16 (b) * 2 = 1411200 b/s = 1378.125 Kb/s = 1.35 Mb/s.

That's in bits. In bytes, you get:

(44,100 (1/s) * 16 (b) * 2) * 1/8 (B/b) = 176400 B/s = 172.27 KB/s = 0.17 MB/s

Thus, your numbers are correct but your units (bytes vs. bits) are wrong assuming the standard notataion of 'b' = bits and 'B' = bytes. I also used 1024 B/KB (b/Kb, KB/MB, Kb/Mb) rather than 1000.
Anonymous
October 26, 2004
Strangely, given the topic, this is one case where a picture really is worth a thousand words.
Anonymous
October 26, 2004
The comment has been removed
Anonymous
October 26, 2004
I'd like to do a picture, but I'm not good enough in Mathematica to do justice, and I'm not going to steal someone elses work.
Anonymous
October 26, 2004
Now another question along the same lines, in like Media Player, the visualizations when listening to music, like the graphical equalizer and such. Do those pretty much tap into the same signal to draw on the screen? I would assume so, but not sure, one of those things I just sat back and enjoyed was curious but never dug into it.

But I am assuming those pulses for the speaker are the same pulses that you see on the screen.
Anonymous
October 26, 2004
Yup, visualizations operate on the samples being sent to the sound card. Internally they're implemented as dshow filters that render their samples to the screen instead of performing some kind of transform on them.
Anonymous
October 26, 2004
Might want to clarify that you're talking about cellphones, not landline phones. The yunguns might be confused.

You might also clarify that you can transfer PCM over S/PDIF, but that isn't the only data format available. Further, in the PCM case, the PCM data might be encoded for SPDIF on the card itself, not on the host PC. :)
Anonymous
October 26, 2004
Larry - send me an email, and let me know what you need. I should be able to knock something out pretty quickly.
Anonymous
October 26, 2004
The comment has been removed
Anonymous
October 26, 2004
Actually, modern audio compression algorithms can do much more than simply cut off the higher frequencies -- encoders like MP3 have a psycho-acoustic model of the way we hear things to improve compression even further. For example, a strong tone will often "mask" a weak tone which is close to it in frequency, so the second one can often be thrown out as part of the lossy compression.
Anonymous
October 26, 2004
I've read that a sample rate higher than 2x the highest sampled frequency is unnecessary according to Nyquist's theorem. If this is so, and human hearing tops out at around 20KHz (or let's say 30KHz for the super humans), then sample rates higher than 40-60KHz are just taking up more space on our storage media for no real gain.

Some say that there are still effects from those inaudible frequencies on those in the audible range. If that is so then recording microphones (which also top out at around 20KHz) would still record the effects of those frequencies. Any other improvement in perceived sound quality is attributed to the quality of the equipment in the recording and/or playback chain (or is psychological).

Search for Dan Lavry on usenet or elsewhere for a more in depth discussion/argument of this.
Anonymous
October 26, 2004
Chris - there is a subtlety here.

The Nyquist limit details how many samples one must take to accurately reproduce a signal of a given frequency without aliasing.

If a sound was generated by a single staionary point source in an infinitely absorbing room (ie. no echo), then Nyquist will tell you everything you need to know to reproduce that sound.

However, when you start positioning sounds in space, higher frequencies become important. While the human ear can only hear frequencies up to about 22.5kHz (on average - some people can hear more, some less), it can discriminate between the arrival times of sounds at much higher resolution - on the order of what would be a frequency of 100,000Hz. That is, if the same sound wave arrives 10 microseconds apart, at one ear first and then the other, the listener can tell the difference, and interprets this as spatial separation of the sounds.

A lot of positioning information is encoded in the higher frequency domain. So while Nyquist is strictly correct for a given signal, it's a very much idealised form when you're dealing with stereo positional audio.
Anonymous
October 26, 2004
Wow.

Thanks for that.
Anonymous
October 26, 2004
10/26/2004 2:31 PM Eric Lippert

> That's why when you listen to overly
> compressed audio, things like applause
> and symbol crashes sound awful.

Ann if ewe overtly comprise dictionaries four spilling chequers, sings like cymbal clashes look awe full? @ leased they sound like symbol crashes.
Anonymous
October 26, 2004
The comment has been removed
Anonymous
October 26, 2004
The comment has been removed
Anonymous
October 26, 2004
The comment has been removed
Anonymous
October 26, 2004
The comment has been removed
Anonymous
October 27, 2004
There is a different method of raw audio encoding than PCM which is used by Super Audio CD; a 1 bit digital stream known as DSD.

"The DSD technology uses a sampling frequency of 2.8224 MHz, which is 64 times higher than that of CD. This enables a frequency response up to 100 kHz and a dynamic range of 120 dB across the entire audible range."

From http://www.licensing.philips.com/information/sacd/protec/documents1089.html
Anonymous
October 27, 2004
The comment has been removed
Anonymous
October 27, 2004
DAT is still used in the music industry though.
Anonymous
October 27, 2004
The comment has been removed
Anonymous
October 27, 2004
I remember being told that one of the other problems with the sampling rates is that before encoding the source signel must be low pass filtered to prevent frequencies above the nyquist limit sneaking through and causing aliasing.

Because this filtering cannot be ideal (because of the group delays mentioned above) you lose some of the higher frequencies below the nyquist limit. If you expand the sample rate, you can make the filter higher and you save more of the perceptible audio
Anonymous
October 28, 2004
The comment has been removed
Anonymous
October 31, 2004
The comment has been removed
Anonymous
November 01, 2004
It might annoy some at Microsoft to call out the comparison but when I read Scoble this morning I couldn't help but think of Big Blue. One of factors about IBM i always find impressive is seniority. Folks stay at...
Anonymous
April 29, 2005
Before I can talk about reading audio CDs using DAE (Digital Audio Extraction), I need to talk a bit...
Anonymous
June 15, 2005
I've been talking about audio controls - volumes and mutes and the like, but one of the more confusing...
Anonymous
January 08, 2007
Before I can talk about reading audio CDs using DAE (Digital Audio Extraction), I need to talk a bit
Anonymous
April 15, 2008
PingBack from http://findsongbylyricsblog.info/larry-ostermans-weblog-what-is-audio-on-a-pc-anyway/
Anonymous
May 31, 2009
PingBack from http://outdoorceilingfansite.info/story.php?id=17798
Anonymous
June 12, 2009
PingBack from http://cellulitecreamsite.info/story.php?id=684
Anonymous
June 18, 2009
PingBack from http://homelightingconcept.info/story.php?id=3204
Anonymous
June 18, 2009
PingBack from http://fancyporchswing.info/story.php?id=2208

通过

What IS audio on a PC anyway?

Comments

其他资源