Jaa


Thoughts on speech API topics

I've been chatting with quite a few people on the team this week about what categories of posts would be interesting and relevant (and inserting "that's a great idea, you should blog it" into each conversation :-). I'm thinking these three broad categories will work well for this blog:

  1. "What's new", and what can be done with it. When we release new APIs, you can bet I'll be talking about them here. In fact, come to think of it, our existing APIs are pretty powerful and useful. I should talk about them too.
  2. Multi-modal UI design. Combining speech and GUI in an application can result in a pretty powerful UI, not only for scenarios where your user isn't staring at a screen or hovering their hands over a keyboard & mouse, but also as a simplifier for complex GUIs.
  3. "How-to" for common speech UI implementation challenges, like dictating long serial numbers into fields, highlighting words as they're read, etc., that can be challenging to implement the first time. So I figure some how-to's should save you some time and help avoid some pit-falls if you ever need to implement any of these.

Anyway, if there's anything else you'd like me to talk about, let me know.

Also check out some other speech bloggers I work with: Richard and Jen

Comments

  • Anonymous
    February 21, 2005
    SAPI 4.0a had phoneme segmentation, but that was the only Microsoft Speech Recognition API that ever did (although one needed to <a href="http://groups-beta.google.com/group/comp.speech.research/msg/7cc34a16612cf98a?dmode=source">jump through hoops</a> to get to it.) Those of us developing educational applications will thank you.
  • Anonymous
    February 22, 2005
    Hi Robert,

    There is so little information out there on such a wonderful product as the Speech Application SDK.

    It would be great if your team pumped in more effort into bringing all the good work to the community and enabling them with speech technologies.

    Regards
  • Anonymous
    February 25, 2005
    May I suggest a short topic for you to cover as an example. I've looked all over the net and have been unable to find text navigation performed with SAPI. Everyone can read out a string of text. The real processing required for Navigation is something most people cannot wrap their minds around. I know these guys, http://slappy.cs.uiuc.edu/fall00/team3/mainframe.html had a problem with SAPI navigation too. So as a request from a desperate college community, "PLEASE" show us the right way to do navigation. :-) Thanks.