Improving Recognition Results
Improving Recognition Results
This section describes how you can improve recognition results by providing the handwriting recognizer with additional information about the entries that the user is likely to write.
The handwriting recognizer's default parsing is optimized to match the user's handwriting with words that appear in the dictionary for the selected language. By default, the recognizer tries to match the entire handwritten word to a dictionary entry. The recognizer can also match the individual letter shapes in the handwriting and return the resulting string.
If your users are likely to enter words that do not appear in the dictionary, you can help the recognizer by providing additional information about these entries.
The following examples show handwriting that is not easily recognized, so the recognizer cannot convert it accurately if you do not supply additional information.
There are two ways that you can improve recognition results for these cases:
- Use word lists to expand the list of input terms to include some known list of alternates. Word lists have the greatest impact on recognition quality. For example, the previous example from a set of physician's notes would be more easily recognized with a word list containing medical terms. For more information about using a word list, see "Extend the dictionary with word lists", later in this topic.
- Use input scopes, which identify the data type and format of inputs that are too broad to be enumerated in a list. Recognition quality for the date, social security number, URL, and email strings in the previous example would all benefit from input scopes. for more information about using input scopes, see "Allow for alternate input formats", later in this topic.
Recognition results are returned in an alternates list. The items in the list are ranked by confidence level. Generally, your application chooses the top-ranked recognition result. By providing extra information about special types of content that the user might enter, you can improve the chances that the user's intended entry is the top alternate.
Recommendation
For the best possible recognition experience, use both word lists and input scopes, as applicable, and also combine them with AutoComplete functionality, discussed in Using AutoComplete with Input Panel.
Programming interfaces
The following table summarizes the key interfaces to use when applying input scopes and word lists to text entry applications.
Text entry method | Input scope usage | Key interfaces |
---|---|---|
Input Panel (Windows Forms) |
Associate word lists and input scopes with text controls |
SetInputScope Function SetInputScopes Function |
Input Panel (Windows Presentation Foundation) |
Same as for Input Panel (Windows Forms). The TextBox.InputScope property is inherited from T:System.Windows.Controls.TextBox. |
|
Custom inking interface |
Configure an AnalysisHintNode object with the word list and input scope |
Factoid SetWordlist CoerceToFactoid |
Extend the dictionary with word lists
If your users are likely to write words or phrases that are not included in the dictionary, you can improve recognition performance by providing a list of these potential entries in a word list. Word lists can significantly improve the accuracy of text recognition in your application. A word list can contain any list of phrases, such as names of places or people, e-mail addresses, special terms, file names, and acronyms and abbreviations.
For example, you could apply word lists in the following scenarios:
- In a handwriting area where the user enters an e-mail address, you might provide a word list that contains e-mail addresses that appear in the user's address book.
- A system designed for data collection in a physician's office might use a word list that contains medical terms.
- An e-commerce form that includes a field for typing the expiration date on a credit card might use a word list that contains a list of month and year combinations, such as 10/07, 10/2007, and other variations.
- An inventory application that presents form fields for part numbers might create a word list of all valid part numbers for the business.
How the recognizer uses word lists
You can use word lists in two ways:
- You can add them to the list of terms in the dictionary. In this case, the recognizer promotes matches from the word list or the dictionary.
- You can replace the list of terms in the dictionary. In this case, the recognizer promotes only matches that appear in the word list.
In both approaches, you can either accept the letter-by-letter translation of the phrase in addition to the matched phrases or restrict results to those that match the word list or dictionary.
Allow for alternate input formats
Windows defines input scopes that enable you to classify handwriting according to the type of entry that you expect, such as a date, file name, or e-mail address. The InputScope enumeration (and its managed code counterparts InputScopeName and InputScopeNameValue), defines a number of data formats for text fields.
For example, the following members are from the System.Windows.Input.InputScopeNameValue enumeration.
Member | Extends recognition to allow |
---|---|
Url |
https://www.microsoft.com |
FullFilePath |
\\servername\sharename\filename.txt |
EmailSmtpAddress |
someone@example.com |
PostalAddress |
123 Main Street, Seattle, WA 98121 |
You can assign one or more of these input scopes to the same field. You describe input scope combinations by using a special regular expression format for input scopes. The following example shows a regular expression that combines two input scopes so that a currency value with or without a currency symbol is allowed:
(!IS_CURRENCY_AMOUNT)|(!IS_CURRENCY_AMOUNTANDSYMBOL)
You can use also use a regular expression to define a special text format that is not one of the supported input scopes. For example, you might define a regular expression to define a part number that consists of three letters followed by an optional hyphen and optional one-to-five numbers, so that ABC-12345, ABC-3, and ABC will all be recognized as valid part numbers.
For more information about the regular expression format, see Regular Expression Syntax Reference.
How the recognizer uses input scopes
Options for using input scopes are similar to those for word lists:
- An input scope can extend the list of dictionary terms, which enables the recognizer to promote matches to the input scope in addition to matches from the dictionary.
- An input scope can be restrictive, in which case the recognizer returns only results that match the input scope.
Allow space for handwriting
In general, handwriting takes more space than text, and there's great variation in the size of individuals' handwriting. Be sure to allow plenty of room on your forms for handwriting, or allow the input area to grow as the user writes, as Input Panel does. If you plan to provide an open space for handwriting, such as found in Windows Journal, consider including stationery options that give users choices in line size.
Provide handwriting guides
Handwriting guides are various styles of stationery or ruled backgrounds for handwriting areas. These guides can help users write more neatly. You can use the InkRecognizerGuide structure to identify the position and layout of these rules. By providing information to the recognizer about the location of handwritten input, you can improve text recognition.
There are three basic types of handwriting guides:
- Free-form (no lines). You can draw a box that defines the borders of the writing area. Provide the location of the box to help the recognizer discern handwriting from extraneous entries.
- Lined guides (horizontal or vertical lines, but not both). With lined guides, the user can freely enter sentences or paragraphs. The recognizer benefits by having information about the location and orientation of the lines. The writing pad in Input Panel and the default stationary used by Windows Journal are examples of lined guides.
- Boxed guides (both horizontal and vertical lines). The user enters each character in a separate box. Although this style of input tends to slow handwriting speed, it can improve the precision of entry when the user needs to write a specific number of characters (for example, an identification number that is exactly nine characters long). The correction area and character pad in Input Panel are examples of boxed guides.
The following illustration shows handwriting within lined guides, which helps recognition.
The character pad provides more precision for entering Web addresses and other unfamiliar character strings that can be difficult for the recognizer to parse in a lined guide.
The following illustration shows the user writing a web address in the character pad.
Consider how to manage recognition alternates
The handwriting recognizer calculates a list of recognition alternates for a given segment of handwriting. Each alternate is assigned a confidence rating. You might use the top-ranking result by default and present a list of alternatives that the user can choose from. You access recognition alternates by using the GetAlternatesFromSelection or the GetAlternates methods.
Clearly link a handwriting area to a text box
Users need clear visual feedback that links a handwriting area with the text box that receives the recognized text. For example, when the user clicks a text entry area, sees the Input Panel icon appear, and taps the icon to display Input Panel, he has established a clear link between Input Panel and the text entry area that will receive the recognized handwriting.
Users expect the writing that they enter in an input area to map to that area when converted to text. If the handwriting area overlaps or abuts more than one text entry area, users get confused about which one will receive the recognized text. Make it easy for users to differentiate the handwriting area for each text entry area. If necessary, provide some type of visual feedback to show where recognized text will be inserted.
Allow the user access to the text after it's inserted
Applications that automatically display a handwriting area when the user sets the input focus to a text entry area need to enable the user to edit the existing contents of the text entry area. For example, the user might want to set the caret within the text entry area and delete characters by using the keyboard. Consider automatically displaying the handwriting area only if the text entry area is empty.
Consider how to time the display of recognition results
The recognition engine is optimized to recognize vocabulary words. Until the entire word has been written, recognition results are inaccurate and tend to change as the writing progresses. Make sure that you're not distracting users by displaying recognition results before they finish writing. Many applications implement some sort of delay; a two-second delay works well.
Implement post-entry correction
In text editing environments that support the Text Services Framework (TSF), users can correct text in Input Panel after it's been recognized and entered into the text entry area. Examples of TSF-enabled environments include the RichEdit control and the MSHTML Editor feature in Internet Explorer. In these environments, invoking Input Panel to correct recognized text displays recognition alternates based on the original ink. The ability to pick alternates can save users significant time.
Starting with Windows Vista, you can use a new TSF programming interface called IHandwrittenTextInsertion to store recognition alternates along with recognized text. Your custom text entry interface can draw from these alternates for its correction interface.
For information about additional benefits to using IHandwrittenTextInsertion , see the reference documentation.
Send comments about this topic to Microsoft
Build date: 12/5/2008