Full Text Searching Audio and Video Files

The most fun that I had today was playing with the Microsoft Speech SDK v 5.1 for Windows - not to be confused with the Microsoft .NET Speech SDK Beta 2.  Of course I WAS confused for aa short while because I didn't realize at first that these are completely different.  I had assumed that the .NET version was a complete super set of v 5.1. WRONG!

We have a customer who wants to full text search audio and video files.  The Speech SDK v 5.1 has a few very cool sample applications for speech recognition.  What I am looking into is how to pass it an audio stream instead of using the Microphone.  Also eventually we will have to write an IFilter so that the index or SPS search engine can search the content.  And I will have to figure out how to link the audio text back into the stream, so that you could jump from your search results back to the "hit" in the audio or video file instead of always starting from the beginning.  I believe that there is a standard for linking media togeather in this way (but I don't remember the name of the standard).  This may turn into a huge project unless we can find someone else who has done this already either commercial product or sample code.

I'm off to NYC for a couple of days to meet with NYPD-CT.  It should be very interesting two days.  BTW, the NYPD Counter Terrorism group was on 60 Minutes on Sunday night.   

Go Terps!

Comments

  • Anonymous
    November 17, 2003
    Hi David - yes, I know the distinction between the 5.1 SDK and the .NET Speech SDK is confusing.

    What makes matters even more confusing is that we've actually changed the name of the .NET Speech SDK - now it's the Speech Application SDK! Plus we're now on Beta 3...

    You're right, of course - for your application the Speech SDK 5.1 (aka SAPI) is the right thing to use. The Speech Application SDK is intended for authoring speech enabled ASP.NET pages, primarily to run on the forthcoming Microsoft Speech Server.

    More confusion... for Longhorn you'll have yet another Speech API (System.Speech)...