Jaa


New log analysis tools in Microsoft Speech Application SDK

As I mentioned in my last post, we've been working on some tools to analyze and report against the copious logs that are generated by the Microsoft Speech Server.

What started as a bit of a stretch goal for us to at least have something in the product to be able to look at these log files has turned into a full blown effort, and we've ended up with some pretty rich functionality (although there is always more we could to in vNext).

The MSS logs are generated on the server using Enterprise Instrumentation Framework (EIF). This has the advantage of being very low impact on the server runtime - the performance hit on even logging vast quantities of events is very low. It's also highly flexible in terms of being able to configure what events get added to the logs, and it supports multiple different log syncs.

However, for log analysis tools this makes for a different story. In order to reduce runtime overhead, the logs are stored in a binary format which is great for writing to sequentially, but for the kind of random access queries we'd like to be able to perform on the logs, it is less than ideal. Additionally, the flexibility of the format in terms of configuration and the complexity of the schema used for all the different types of events that can get logged by the server has made it hard to design robust tools with which to analyze the logs.

In order to do anything useful we decided that we'd have to get the log data into a more appropriate format for querying and reporting against, so we've put together a command line tool and parallel Data Transformation Services task to import the data into a database. On top of this we have two main components:

  • CallViewer: a standalone tool to run queries against the speech server logs. This tool can be used to find calls matching a specific set of criteria, and then to see the events you're most interested in within those calls. A typical use would be to see summaries of all of the Question & Answer cycles within a call. It's easy to get a quick overview of the 'shape' of an individual call - what the system said, how the user responded etc., and then to actually hear a recording of what the user actually said to the system.
  • Speech Application Reports: A set of standard reports designed to display an overview of Microsoft Speech Server usage and how well specific applications being hosted on the server are performing. We've developed this on top of the newly launched SQL Server Reporting Services. (I believe we may be the first Microsoft server offering shipping with reporting built on top of this framework). This is a pretty extensible framework, and our goal has been to provide a baseline set of reports which would be easy to augment with custom reports by customers and ISVs.

I'm hoping that some of our partners in the speech industry would see this as a real opportunity to add value to the MSS community. Tuning of speech applications during test and pilot stages is one of the most important phases of speech application development, so providing further reports to help application authors determine what users are struggling with, or which parts of their application need the most work could be very valuable indeed.

However, I'm pretty happy that we've plugged a huge hole in our end-to-end development story by providing these tools out of the box with the Speech Application SDK.

Comments

  • Anonymous
    March 22, 2004
    Bravo, these are quite terrific.
    <a href="http://www.campaignmoney.com">campaign finance</a>
  • Anonymous
    May 06, 2004
    Is there any extensibility points with the call viewer application?
  • Anonymous
    May 06, 2004
    Sorry, for example, would it be possible to extend the call viewer application to allow a human to transcibe the utterances?
  • Anonymous
    May 06, 2004
    No, there aren't any direct extensibility hooks. However, the reports are designed to be extensible, and are very configurable. The reporting database schema is also much more straightforward. It would be possible to write a tool to walk through the utterences in the reporting DB and extend the schema to include the transcriptions.