Поделиться через


Text to Speech in Mission Impossible 3: A Dissection

Besides being the best of the three MI movies, there were 2 instances of TTS in the movie that deserve some discussion (and clarification). One of the scenes was simple and plausable, while the second was a definite stretch (i.e., not doable by today's technology).

In the first scene with TTS, one of the "good" guys was automating the descruction of a wharehouse full of "bad" guys, using vehicles equipped with large guns. When the automation started, the computer began speaking out some information using TTS. I'm pretty sure it was Mac OSX TTS. Definitely low on the naturalness scale, but intelligible nonetheless. (Can anyone confirm which TTS voice this was?)

In the second scene(s), THE "good" guy (i.e., Tom Cruise's character), forces THE bad guy to read several syntactically but not semantically grammatical sentences off of a business sized card at gun point. Within seconds of completing the reading of the card, another "good" guy has intercepted the wave beneath the complex to generate a highly natural and intelligible TTS voice which is sent back to our protagonist in a bathroom who then can talk with the "bad" guy's voice.  OK, so I'm actually quite forgiving in movies, giving the technology the benefit of the doubt (i.e., I pretend that I'm watching Sci-Fi and not a modern day action movie). So, if we assume that this was some other technology beyond TTS, great. No worries. However, if you are insisting that the movie follow current plausable technology, then here's what wrong with the TTS in this second scene:

1) The TTS engine was generated from several sentences. Today, takes many many hours of recordings to generate a naturally sounding engine.

2) The recording was done in a bathroom next to a loud party and then streamed to a nearby underground location. Not likely to result in the high quality recordings that one would need for TTS.

3) The recording was streamed through rock. I'm imagining that some signal loss would be encountered in real life.

4) The resulting TTS sounded almost EXACTLY (egads, as if it truly was the other actor speaking with Tom lip-synching) like the "Bad" guy!  Even on the BEST concatenative engines (i.e., based on 40+ hours of recording a person's voice), it won't sound just like the real person.

Comments? Alternative takes?

Comments