Freigeben über


Normalizing prompts, part II

Due to a larger than normal amount of input about normalizing prompts, I have decided to talk about it some more. 

The standard and most agreed upon method of prompt normalization is called RMS - which stands for "Root Mean Square".  It works like this:

1) For each input value, take the square of it.

2) Some the squares for all of the input values.

3) Take the square root of the sum

4) Divide by the total number of input values

This takes much more of an average of the input values and therefore is more immune to spikes.  If you normalize to peaks, you are essentially telling REDS to normalize to the tallest spike, which is almost never what you want.

So what file should you normalize to?  Ideally you should be able to enter an RMS value from one to a hundred for RMS normalization.  Unfortunately that is not the way it works with REDS.  We do support RMS normalization, but only by normalizing against another wave file.

So what wave file should you pick? Ideally you want the least difference in volume between TTS and recorded prompts.  In the old SAPI SDK there was a TTS tool that would allow you to record a TTS prompt.  If we still shipped the SAPI SDK, you could use a recorded TTS prompt to get as close to the volume as possible, though you may run across problems with noise depending on the TTS text you choose.

Since that tool is not available, the next best choice is to listen to prompts together with TTS.  The easiest way to do this is through the debugger.  Simple use the peml:div tag within a played prompt to allow part of the prompt to use a recorded prompt and the other to use TTS.  Try changing the volume of the prompt using external tools (or if you don't have them, you'll be forced to continually rerecord the prompt) until it sounds like you have the best match between TTS and the recorded prompt.

Once you have a prompt that matches the TTS volume as best as possible, normalize all other prompts against that one.

Comments