Mathematical RTF
This post discusses the Word 2007 math RTF control words. A good way to understand these control words is to note that they are actually OMML tag names written with RTF syntax. Hence you can refer to the very thorough OMML documentation for more detailed information. For example in OMML, the built-up skewed fraction for a/b is represented by
<m:f>
<m:fPr>
<m:type m:val="skw"/>
</m:fPr>
<m:num>
<i>
<m:r>a</m:r>
</i>
</m:num>
<m:den>
<i>
<m:r>b</m:r>
</i>
</m:den>
</m:f>
In RTF, it can be represented by
{\mf{\mfPr{\mctrlPr}{\mtype skw}}
{\mnum\u-10187?\u-9138?}
{\mden\u-10187?\u-9137?}}
You need to include the math object's properties group, here {\mfPr…}, including the {\mctrlPr} even if the latter is empty if you want the text to inherit character formatting from the ambient.
Word generally doesn't write surrogate pairs for the math alphabetics, but they work and they're simpler to use since they're used internally for most math variables. Word writes {\mr\mscr0\msty2 a} for the math italic a (U+1D44E) in the numerator of the fraction above and {\mr\mscr0\msty2 b} for the math italic b (U+1D44F) in the denominator, probably because it's easier for human beings to understand, especially since U+1D44E is represented in RTF as the decimal surrogate pair \u-10187?\u-9138?. But the extra translation isn't really that important since RTF is usually only handled by computers. In case you really need to know what the UTF-32 value is, you can convert the RTF pair to hexadecimal form D835 DC4E by pasting -10187 -9138 into the "Decimal code points" box of the Unicode Code Converter, and then convert that to 1D44E. Surrogate pairs must appear inside math object groups as in this example, or inside a math text-run group {\mr…} if not inside a math object. Technically for RTF the latter case shouldn't be necessary, but it happens because Word's RTF reader shares code with the OMML reader and OMML requires the <m:r>.
Math information is collected into two areas:
- math document properties in the {\mmathPr…} group
- math zones in {\mmath…} groups
Math zones can be inline or "display mode", corresponding to TeX's $ and $$ toggles. With Office math, math zones are identified internally by a character-format effect bit like bold. If a math zone fills an entire paragraph, it is a display-mode math zone. If it shares a paragraph with nonmath text, the math zone is inline. The math RTF for an inline math zone replaces the first ellipsis of the nested group structure
{\mmath {\*\moMath…}{\mmathPict…}}
Readers that don't understand the ignorable {\*\moMath…} group can use one of the pictures in the {\mmathPict…} group. An RTF display-mode math zone replaces the second ellipsis in the nested group structure
{\mmath{\*\moMathPara{\moMathParaPr…}{\*\moMath…}}{\mmathPict…}}
The {\mmathPict…} group is a great backward compatibility feature, but it sure bloats Word's math RTF files. One way to alleviate the bloat is to zip the RTF file, just as the docx format is zipped.
Math Objects
Built-up objects like fractions and integrals can appear inside the {\*\moMath…} group and are defined in the following table:
Control word | Meaning |
\macc | Accent object, consisting of a base and a combining diacritical mark. |
\mbar | Bar object, consisting of a base argument and an overbar or underbar |
\mborderBox | Border Box object, consisting of a border drawn around an equation |
\mbox | Box object, which is used to group components of an equation |
\md | Delimiter object, consisting of opening and closing delimiters (such as parentheses, braces, brackets, and vertical bars), and an element contained inside |
\meqArr | Equation-Array object, an object consisting of one or more equations that can be vertically justified as a unit respect to surrounding text on the line. Alignment of multiple points within each equation can occur within the equation array |
\mf | Fraction object, consisting of a numerator and denominator separated by a fraction bar |
\mfunc | Function-Apply object used for math functions like sin x |
\mgroupChr | Group Character object used for stretching a character above or below other characters |
\mlimLow | Lower limit object |
\mlimUpp | Upper limit object |
\mm | Matrix object, consisting of one or more elements laid out in one or more rows and one or more columns |
\mnary | n-ary object |
\mphant | Phantom object used to introduce or suppress spacing |
\mrad | Radical object |
\msPre | Pre-Sub-Superscript object, which consists of a base e and a subscript and superscript placed to left of base |
\msSub | Subscript object |
\msSubSup | Subscript superscript object |
\msSup | Superscript object |
Math Object Arguments
Each math object group contains a property group and one or more arguments. The arguments are contained in the special groups defined in the following argument table:
Control word | Meaning |
\mdeg | Degree argument in radical object |
\mden | Denominator argument in fraction object |
\me | Base argument of a mathematical object |
\mlim | Limit argument of a limLow or limUpp object |
\mfName | Function name argument of the Function-Apply object |
\mnum | Numerator argument of fraction object |
\msub | Subscript argument of n-ary, sPre, sSub, sSupSup objects |
\msup | Superscript argument of n-ary, sPre, sSup, sSupSup objects |
Math RTF Control Words
To see as many examples of math RTF as you desire, type the relevant math into a Word 2007 document and save it as RTF. Then you can use NotePad to see what Word has written. You'll find a huge amount of stuff, but the math RTF will be embedded where it needs to be. That's the way I learned how it worked. Okay, I did look a little at the Word source code ☺ A complete alphabetic listing of all RTF math control words will be part of a new version of the RTF specification which will appear sometime soon on the web. If you want to start generalizing your RTF reader or writer right now to handle math RTF, you can get the list from the corresponding OMML tags by prefixing them with "\m".
Comments
Anonymous
January 25, 2007
Very interesting! :) How can one use the control in an apllication? Does this have any follow up in WPF's RichTextBox? Thanks!Anonymous
February 06, 2007
You can use the Office 2007 RichEdit control in an application, but we haven't released the documentation for the new features. RichEdit 5/6 have been extensively tested inside the Microsoft Office environment, but not within the general Windows environment. We hope to get the necessary testing done and release these more advanced versions for general use. At such a time we’ll include the relevant documentation in the SDK.