共用方式為


Using MathML-Based Speech to Edit Math in Different Math Models

This post discusses how an Assistive Technology program (AT) can use Presentation MathML to create consistent speech for editing equations created with different math models, such as OfficeMath and MathType. A goal is to make the speech and editing experience be as similar as possible, even though the underlying math models differ in significant ways. An important aid for editing is that the editor handle navigation so that the insertion point (IP) and speech are synchronized. When navigation occurs in a MathML copy of the math zone, editing isn’t possible unless there’s a way to convert a location in the MathML copy to the corresponding character position (cp) in the document. Such synchronization is also needed in calculating the bounding rectangles of the math being spoken. Math keyboard input is facilitated by sophisticated input methodology, such as special hot keys, autocorrection, and autocompletion. The AT should not attempt to handle math keyboard input.

The post Speaking of math… describes two granularities of math speech: coarse-grained (navigate by words—siblings), which speaks math expressions fluently in a natural language, and fine-grained (navigate by characters), which reveals the content at the insertion point (IP) in enough detail to enable unambiguous editing. It seems clear that an AT can generate the same coarse-grained math speech from the MathML for an equation regardless of the underlying math model. The question arises as to whether the fine-grained math speech can also be the same for different math models.

Math speech generality

To create math speech for all math models, the MathML and the speech generated therefrom need to be rich enough semantically to describe the union of all arguments of all math objects (fractions, subscripts, integrals, math functions, etc.) of the various math models. The post Integrands, Summands, and Math Function Arguments compares some math objects for OfficeMath, Presentation MathML, Content MathML, [La]TeX, MathType/Equation Editor, and Nemeth math braille. To illustrate one difficulty, Presentation MathML, LaTeX and Nemeth braille don’t have explicit ­N-ary elements, while the others do. If the same fine-grained math speech is to work for all models, they all need to supply MathML that lets the AT announce that the insertion point is at the end of an integrand, for example.

Another case is the OfficeMath math-function object which has a function-name argument and an argument for the function. For example, in memory sin 𝑥 is stored as a math-function object with the name “sin” and the math argument 𝑥. It’s important for fine-grained speech to announce, “end of function name” when the user navigates past the ‘n’ and “end of argument” when the user navigates past 𝑥 but not yet out of the math-function object. That notifies the user that subsequent keyboard input will be in the function name or argument, respectively. The user might want to change sin 𝑥 to sin 𝑥², for which the math-function argument is 𝑥² and thereby not mean (sin 𝑥)².

Math semantics are important for correct math speech. For example, most superscripts represent raising a base to a power, so that speaking 𝑎² as “a squared” is correct. But in tensor analysis, superscripts are used as indices and 𝑎² should be spoken as “a superscript 2” or “a sup 2”. Presentation MathML 3.0 doesn’t have a way to distinguish between these cases. Adding new MathML attributes could provide a concise way to convey the semantics for speech. Alternatively, the <semantics> tag could be included with the corresponding Content MathML, but that approach is probably too involved for most ATs.

Differences in MathML and OfficeMath models

To illustrate differences in computer math models, consider how MathML, MathType, and OfficeMath represent sin 𝑥. In the PowerPoint and RichEdit OfficeMath memory layouts, sin 𝑥 appears as <U+FDD0>sin<U+FDEE>𝑥<U+FDDF>. Here the Unicode character <U+FDD0> is the math-object start delimiter, the <U+FDEE> is the argument separator, and the <U+FDDF> is the math-object end delimiter. Word also has such delimiters but with different values. Starting at the <U+FDD0>, each → arrow key moves past one of these characters. In a math model that doesn’t have the math-function object, no → arrow key is needed to move to the start of the function name or to move out of the math-function argument. That’s a basic difference in UI between models that affects fine-grained math speech. It doesn’t affect coarse-grained math speech, which in English is “sine x” for all math models.

Ideally the Presentation MathML for sin 𝑥 is

<mrow><mi>sin</mi><mo>&2061;</mo><mi>𝑥</mi></mrow>

How do you relate a position in this MathML to the corresponding position in the OfficeMath memory? Do the <mrow> and </mrow> each have a character position (cp)? They do for OfficeMath, but not for MathType. Are <mi>sin</mi> and <mi>𝑥</mi> separated by a character? They are in OfficeMath, but not in MathType. MathType represents the difference between the sin and the 𝑥 by character formatting and doesn’t use object delimiters for math functions.

Another cp mapping example is <mfrac><mi>a</mi><mi>b</mi></mfrac>. In MathML, there’s nothing between the numerator <mi>a</mi> and the denominator <mi>b</mi>, while in the OfficeMath backing store, there’s the U+FDEE argument separator. In a MathML model, when at the start of <mi>a</mi> the → arrow key might move directly into the denominator, while in OfficeMath it moves to the end of the numerator, allowing the user to insert characters there. It takes an additional → arrow key to move to the start of the denominator.

In addition to location differences in MathML and math-zone spaces, there are text-length changes resulting from automatic conversions of ASCII and lower-case Greek letters to math-italic letters, hidden text, revision marks, and special objects not representable in MathML such as images, hyperlinks, and fields. So, mapping from MathML space to editing space needs special assistance.

MathPlayer uses a MathML representation of an equation and navigates in the abstract MathML space, not the editing space. Hence it needs a way to transfer MathPlayer locations to the user selection for inserting/deleting/selecting text and displaying bounding rectangles. In principle, the MathML writer can create a cp array indexed by the MathML-tag index. Every MathML tag would have an entry in the array, including all closing tags. Then a new UIA method could allow an AT navigating in the MathML space to set a client selection end to the cp for the nth tag. In particular, the AT could set the edit insertion point and bounding rectangles corresponding to locations in the MathML space.

This approach requires that the AT keep track of the MathML tag indices. The post Math Accessibility Trees compares a display tree to a semantic tree for the equation

This equation appears in Nemeth braille as

⠹⠂⠌⠆⠨⠏⠼⠮⠰⠴⠘⠆⠨⠏⠐⠹⠨⠈⠈⠙⠨⠹⠌⠁⠬⠃⠀⠎⠊⠝⠀⠨⠹⠼⠀⠨⠅⠀⠹⠂⠌⠜⠁⠘⠆⠐⠤⠃⠘⠆⠐⠻⠼

There are nodes in both trees to attach the MathML tag indices to. Each node needs to cache the tag indices for the start and end tags that delimit the node.

This approach isn’t likely to be implemented by Microsoft Word since Word creates MathML by converting the OMML for the requested math to MathML using OMML2MML.xsl, a process that doesn’t keep track of MathML tag cp’s. Word would have to create a native MathML writer to implement a tag-cp mapping array. Such an array could be implemented in the RichEdit native MathML writer, thereby enabling tag-cp mapping in PowerPoint and OneNote. The RichEdit MathML writer was created before the OfficeMath build up/down facility was written. That facility uses a subset of the TOM interfaces that is implemented by Word and OfficeArt enabling them to use the facility. The RichEdit MathML reader also uses the TOM subset and in principle could be used by Word instead of converting MathML to OMML using MML2OMML.xsl. In contrast, the RichEdit MathML writer is pretty RichEdit-specific. Math zones can be copied from Word into RichEdit, but the original Word cp’s aren’t copied.

If navigation is done in the editing space, the post Editing Math using MathML for Speech describes two ways for an AT to produce fine-grained math speech from MathML: 1) the MathML contains an <maction> element that gives the explicit math speech for the content at the insertion point, or 2) the MathML represents the math object in which the insertion point is located and includes an <maction> element identifying the insertion point. The first approach typically leads to different speech for different programs.

Character navigation in the editing space depends on the order in which math-object arguments appear in memory and how the arrow keys are handled. All math models put the numerator of a fraction before the denominator and can in principle be traversed using the ← and → arrow keys. MathType puts the integrand of an integral object first, followed by the lower limit and then the upper limit, while OfficeMath puts the limits first followed by the integrand, which is the visual order. Also, MathType uses ↑ and ↓ to navigate between N-ary object arguments vertically, perhaps because the arguments aren’t located in visual order and the ← and → arrow keys are strictly geometric and don’t traverse every character in an equation. In OfficeMath, the ← and → arrow keys move logically, rather than purely geometrically, and traverse every character in an equation. For example, the → arrow key at the end of a numerator moves to the start of the denominator. The MathML <mroot> puts the index after the radicand, while OfficeMath puts it before (in display order).

The most straight-forward approach is to navigate in the editing space as done with character and word (sibling) navigation with OfficeMath speech and Narrator. Then the insertion point is always synced to the speech. Characters are entered and deleted correctly, and information that MathML doesn’t represent is retained. MathPlayer was designed to explore equations and wasn’t intended to be used for creating and editing equations. The MathType editor has no accessibility support so editing equations using speech wasn’t a design scenario. In contrast, it was an essential design scenario for OfficeMath. It wouldn’t be hard to duplicate the richer MathPlayer navigation experience in OfficeMath but by navigating in the editing space rather than in a virtual MathML space.

One might ask whether it’s desirable to have a one-size-fits-all fine-grained editing experience. It might be unexpected or even confusing to say “end of argument” after the 𝑥 in sin 𝑥 in environments that don’t have a math-function object. That coupled with the differences in the way math text is laid out in memory makes the goal of identical fine-grained speech for all math models seem impractical. But for coarse-grained speech, these details are hidden, and it should be possible to have the same coarse-grained math speech for all math models.

It’s a pleasure to thank Doug Geoffray, Ron Parker, and Neil Soiffer for very helpful discussions on these topics.