How long is a WAV file?

One question that kept on coming up during my earlier post was "How long is it going to take to play a .WAV file?".

It turns out that this isn't actually a hard question to answer.  The answer is embedded in the .WAV file if you know where to look.  Just for grins, I spent a few minutes and whipped up a function that will parse a WAV file and return the length of the function.

Remember that a .WAV file is a RIFF file which contains a "WAVE" chunk, the "WAVE" chunk in turn contains two chunks called "fmt " and "data".  The "fmt " chunk contains a WAVEFORMATEX structure that describes the file.  It's roughly based on the "ReversePlay" sample (but I didn't learn about that sample until after I'd written this code :)).

I'm using the built-in multimedia I/O functions, which have the added benefit of being able to parse RIFF files without my having to come up with a ton of code.

#define FOURCC_WAVE mmioFOURCC('W', 'A', 'V', 'E')
#define FOURCC_FMT mmioFOURCC('f', 'm', 't', ' ')
#define FOURCC_DATA mmioFOURCC('d', 'a', 't', 'a')

DWORD CalculateWaveLength(LPTSTR FileName)
{
    MMIOINFO mmioinfo = {0};
    MMCKINFO mmckinfoRIFF = {0};
    MMCKINFO mmckinfoFMT = {0};
    MMCKINFO mmckinfoDATA = {0};
    MMRESULT mmr;
    WAVEFORMATEXTENSIBLE waveFormat = {0};
    HMMIO mmh = mmioOpen(FileName, &mmioinfo, MMIO_DENYNONE | MMIO_READ);
    if (mmh == NULL)
    {
        printf("Unable to open %s: %x\n", FileName, mmioinfo.wErrorRet);
        exit(1);
    }

    mmr = mmioDescend(mmh, &mmckinfoRIFF, NULL, 0);
    if (mmr != MMSYSERR_NOERROR && mmckinfoRIFF.ckid != FOURCC_RIFF)
    {
        printf("Unable to find RIFF section in .WAV file, possible file format error: %x\n", mmr);
        exit(1);
    }
    if (mmckinfoRIFF.fccType != FOURCC_WAVE)
    {
        printf("RIFF file %s is not a WAVE file, possible file format error\n", FileName);
        exit(1);
    }

    // It's a wave file, read the format tag.
    mmckinfoFMT.ckid = FOURCC_FMT;
    mmr = mmioDescend(mmh, &mmckinfoFMT, &mmckinfoRIFF, MMIO_FINDCHUNK);
    if (mmr != MMSYSERR_NOERROR)
    {
        printf("Unable to find FMT section in RIFF file, possible file format error: %x\n", mmr);
        exit(1);
    }
    // The format tag fits into a WAVEFORMAT, so read it in.
    if (mmckinfoFMT.cksize >= sizeof( WAVEFORMAT ))
    {
        // Read the requested size (limit the read to the existing buffer though).
        LONG readLength = mmckinfoFMT.cksize;
        if (mmckinfoFMT.cksize >= sizeof(waveFormat))
        {
            readLength = sizeof(waveFormat);
        }
        if (readLength != mmioRead(mmh, (char *)&waveFormat, readLength))
        {
            printf("Read error reading WAVE format from file\n");
            exit(1);
        }
    }
    if (waveFormat.Format.wFormatTag != WAVE_FORMAT_PCM)
    {
        printf("WAVE file %s is not a PCM format file, it's a %d format file\n", FileName, waveFormat.Format.wFormatTag);
        exit(1);
    }
    // Pop back up a level
    mmr = mmioAscend(mmh, &mmckinfoFMT, 0);
    if (mmr != MMSYSERR_NOERROR)
    {
        printf("Unable to pop up in RIFF file, possible file format error: %x\n", mmr);
        exit(1);
    }

    // Now read the data section.
    mmckinfoDATA.ckid = FOURCC_DATA;
    mmr = mmioDescend(mmh, &mmckinfoDATA, &mmckinfoRIFF, MMIO_FINDCHUNK);
    if (mmr != MMSYSERR_NOERROR)
    {
        printf("Unable to find FMT section in RIFF file, possible file format error: %x\n", mmr);
        exit(1);
    }
    // Close the handle, we're done.
    mmr = mmioClose(mmh, 0);
    //
    // We now have all the info we need to calculate the file size. Use 64bit math
    // to avoid potential rounding issues.
    //
    LONGLONG fileLengthinMS= mmckinfoDATA.cksize * 1000;
    fileLengthinMS /= waveFormat.Format.nAvgBytesPerSec;
    return fileLengthinMS;
}

Essentially this function opens the WAV file specified, finds the RIFF chunk at the beginning, locates the WAVE chunk, then descends into the WAVE chunk.  It locates the "fmt " chunk within the WAVE chunk, reads it into a structure on the stack (making sure that it doesn't overflow the buffer).  It then pops up a level and finds the "data" chunk.  It doesn't bother to read the data chunk, the only thing needed from that is the length of the chunk which is then used to calculate the number of bytes that are occupied by the samples in the WAV file.

Once we have the format of the data, and the number of bytes in the data chunk, it's trivial to figure out how long the sample will take to play.

Btw, please note that this only looks for WAVE_FORMAT_PCM samples - there are other constant bitrate formats that could be supported but I wanted to hard code this to just PCM samples (it IS just a sample program). 

 

To verify that my calculation is correct, I took my function and dropped it into a tiny test harness:

int _tmain(int argc, _TCHAR* argv[]){    if (argc != 2)    {        printf("Usage: WaveLength <.WAV file name>\n");        exit(1);    }    DWORD waveLengthInMilliseconds = CalculateWaveLength(argv[1]);    printf("File %S is %d milliseconds long\n", argv[1], waveLengthInMilliseconds);     DWORD soundStartTime = GetTickCount();    PlaySound(argv[1], NULL, SND_SYNC);    DWORD soundStopTime = GetTickCount();    printf("Playing %S took %d milliseconds actually\n", argv[1], soundStopTime - soundStartTime);    return 0;}

 If I run this on some of the Vista sounds, I get:

C:\Users\larryo\Documents\Visual Studio 2005\Projects\WaveLength>debug\WaveLength.exe "c:\Windows\Media\Windows Exclamation.wav"
File c:\Windows\Media\Windows Exclamation.wav is 2020 milliseconds long
Playing c:\Windows\Media\Windows Exclamation.wav took 2281 milliseconds actually

The difference between the actual time and the calculated time is the overhead of the PlaySound API itself.  You can see this by trying it on other .WAV files - there appears to be about 200ms of overhead (on my dev machine) associated with building the audio graph and tearing it down.

Comments

  • Anonymous
    January 10, 2007
    Hey this is great! One more question though. How long does it take to run an empty loop? I'll just do an empty loop until enough time goes by for the sound to play. :)

  • Anonymous
    January 10, 2007
    I guess this would be good information to have if you need to display an estimate to the user, but I hope nobody uses it to determine when to free the sample buffer after an asynchronous call.

  • Anonymous
    January 10, 2007
    The comment has been removed

  • Anonymous
    January 10, 2007
    I'll add that non-WAVE_FORMAT_PCM .wav files have a "fact" chunk which contains the length of the stream in frames.  Divide this by the number of frames per second (nSamplesPerSec in the format section) and you have the length of the file in seconds.

  • Anonymous
    January 10, 2007
    > The difference between the actual time and the calculated > time is the overhead of the PlaySound API itself. Sure.  It still means that it's not easy for the calling application to really know when to free the memory.  The caller still has to do polling. In Windows 95 OSR2 the difference was even longer.  Sure it was due to a bug and Microsoft developed a fix internally, but Microsoft didn't allow existing customers to get the fix. Wednesday, January 10, 2007 7:28 PM by Skywing > A bit of a minor nitpick, but you're mixing your CHARs, > TCHARs, and WCHARs, I thought it wasn't socially acceptable to notice that?  I was thermonucleated a few days ago for noticing things like that. > as Michael Kaplan might say. Oops.  Does that mean a(nother) thermonuclear civil war will start inside Microsoft? > especially given the use of %S By the way does that mean that %S works in ordinary Windows versions?  I didn't test it.  I only found it broken in Windows CE and had to make a bunch of calls to MultiByteToWideChar as a workaround.

  • Anonymous
    January 10, 2007
    The comment has been removed

  • Anonymous
    January 10, 2007
    Shouldn't the cast to LONGLONG take place before the multiply by 1000?

  • Anonymous
    January 10, 2007
    Phaeron: Hmm, I think you might be right.  I never remember the order of promotion (whether the multiply happens before the promotion or after).

  • Anonymous
    January 11, 2007
    The comment has been removed

  • Anonymous
    January 11, 2007
    Norman: %S always means 'the opposite of the version of printf() you're calling'. That is, if you call sprintf() it means interpret the argument as a string of WCHAR, while if you call swprintf() it means interpret the argument as a string of CHAR. If you want the argument type to be invariant over the use of the UNICODE macro, use %hs for strings that are always CHAR, and %ls for strings that are always WCHAR. I never had a problem with %S on Windows CE when I was using it correctly, but I haven't honestly found much use for it.

  • Anonymous
    January 11, 2007
    The comment has been removed

  • Anonymous
    January 11, 2007
    The comment has been removed

  • Anonymous
    January 11, 2007
    Could these 261 ms come in part from the audio latency introduced by the buffersize of the audio card?

  • Anonymous
    January 12, 2007
    > Could these 261 ms come in part from the audio latency introduced by the buffersize of the audio card? No, that latency would be introduced downstream and wouldn't affect the return of the PlaySound call.

  • Anonymous
    January 17, 2007
    If you continue to use TCHAR and attendant warts, you're insulated from the future transition to UTF-32. :-)