Udostępnij za pośrednictwem


OfficeMath

Microsoft Word 2007 and RichEdit 6.0 introduced the native Office math facility. PowerPoint, Excel, and OneNote followed suit in 2010, and Mac Word followed in 2011. But ironically the native math facility hasn’t had a recognizable name. “Microsoft Equation Editor” (MEE) seems natural, but it’s the name of the Design Science math editor that shipped first in Office on Windows and the Mac in 1992 and was recently discontinued due to security problems. In fact, the post Converting Microsoft Equation Editor Objects to OfficeMath needs a name for the native math facility since it describes how you can convert MEE OLE math objects into native math zones. Sometimes people refer to the native math facility as OMML (Office Math Markup Language), which is the XML file format that encapsulates the in-memory math model and is used in docx, pptx, and xlsx files. But the facility is more than a file format since it has TeX typographical quality, is based on Unicode, has user interfaces (UI) and works with an OpenType math font. “Microsoft Math” is a possible name, but other companies might want to ship something similar. None of its specifications are proprietary.

A good name is OfficeMath. “Office” alludes to Microsoft Office but needn’t be exclusive. “Office” suggests a high-quality level (okay, maybe I’m biased ☺). OfficeMath might suggest calculations rather than math text, but documentation can resolve that ambiguity, which also exists for the linear formats AsciiMath and UnicodeMath. The heart of OfficeMath is its in-memory model, named “Professional” in the OfficeMath UI. This model is mirrored in the OMML file format. It features N-ary structures such as integrals with limits and integrands, subscripts, superscripts and accents with well-defined bases, and math functions with function names and arguments. This level of detail is ordinarily reserved for content math formats such as Content MathML and OpenMath. OfficeMath incorporated these structures to support high-quality math typography, with the nice side effect of facilitating symbolic manipulations and graphing (OneNote Math Assistant). This post summarizes OfficeMath’s history, model, file format support, interoperability, math font, and math formatting, and includes links to further information in OfficeMath-oriented posts in Math in Office. OfficeMath UI is discussed in a separate post.

History

A fun place to learn about the origins of OfficeMath is the post LineServices, which tells how the LineServices line-layout component came to be and how it evolved to yield TeX-quality math typography. OfficeMath depends on other technologies as well, including the creation of the math-font OpenType standard described in High-Quality Editing and Display of Mathematical Text in Office 2007 and OpenType Math Tables. For older history, the post How I got into technical WP describes the first math display program (Scroll, 1970) and predecessors of UnicodeMath.

OfficeMath was based on Unicode from the start. Unicode 3.2 (March, 2002) already had most of the current Unicode math character set. The Unicode Technical Committee is committed to including all attested math symbols in the Unicode Standard, so Unicode makes an ideal foundation on which to build math functionality. It also streamlines incorporation into Microsoft Office applications, since they are based on Unicode.

Math Model

As with [La]TeX, MathML, MathType, and other math presentation programs, OfficeMath puts all math expressions and equations into math zones. Math-zone typography differs from the typography of ordinary text (see the section on Formatting below).

In the OfficeMath in-memory "Professional" format, mathematical objects like fraction and subscript are represented by a start delimiter, the first argument, an argument separator if the object has more than one argument, the second argument, etc., with the final argument terminated by an end delimiter. For example, the fraction 𝑎/𝑏 is represented in built-up format by {frac 𝑎|𝑏} where {frac is the start delimiter, | is the argument separator, and } is the end delimiter. Similarly, the subscript object 𝑎𝑏 is represented by {sub 𝑎|𝑏}. The start delimiter is the same character for all math objects as are the separator and end delimiters. In RichEdit, these delimiters are given by the Unicode characters U+FDD0, U+FDEE, and U+FDEF, respectively. In OMML, the start delimiter is represented by an container element, such as <f> for fraction and arguments appear within argument element containers, such as <num>…</num> for a numerator.

The type of object is specified by a character-format property associated with the start delimiter. In plain text, the built-up forms of the fraction and subscript are identical if the fraction arguments are the same as their subscript counterparts. In the example here, a plain-text search for {frac 𝑎|𝑏} matches {sub 𝑎|𝑏} as well as {frac 𝑎|𝑏}. Searching for OfficeMath equations involves plain-text searches like this together with comparison of the object types as discussed in Math Find/Replace and Rich Text Searches. The OfficeMath math objects are listed in the table in the next section along with their OMML and Presentation MathML representations. The objects are represented by prefix notation: the character formatting of the object start delimiter contains the object properties (see ITextRange2::GetInlineObject()). This differs from infix notation like 𝑎/𝑏, which needs to be parsed. The OfficeMath in-memory format is a “built-up” format as distinguished from linear formats like UnicodeMath and LaTeX.

Supported File formats

The OMML format is the XML format that encapsulates the OfficeMath in-memory “Professional” format. When OfficeMath was designed, Presentation MathML 3.0 was nearing publication. But Presentation MathML is missing two important elements which therefore require <mrow> emulations to represent OfficeMath. Specifically, Presentation MathML doesn’t have an explicit N-ary element, nor does it have an explicit math-function element. Furthermore, OfficeMath needs to embed client (Word, PowerPoint, Excel, …) XML easily into the math XML. The MathML <semantics> element can embed such information, but it’s awkward. Accordingly, OMML was created to describe the OfficeMath in-memory format naturally. With best practices, MathML without the <semantics> element can be used to round-trip OfficeMath equations apart from non-math formatting like revision markings and embedded objects.

Here is a listing from MathML and Ecma Math (OMML) of the OMML elements and exact or approximate MathML counterparts

Built-up Office Math Object... OMML tag... MathMl
Accent acc mover/munder
Bar bar mover/munder
Box box menclose (approx)
Boxed Formula borderBox menclose
Delimiters d mfenced
Equation Array eqArr mtable (with alignment groups)
Fraction f mfrac
Math Function func mrow with FunctionApply (2061) mo
Left SubSup sPre mmultiscripts (special case of)
Lower Limit limLow munder
Matrix m mtable
N-ary nary mrow msubsup/moverunder with N-ary mo
Phantom phant mphantom and/or mpadded
Radical rad msqrt/mroot
Group Char groupChr mover/munder
Subscript sSub msub
SubSup sSubSup msubsup
Superscript sSup msup
Upper Limit limUpp mover

Other OMML references are Extracting OMML from Word 2003 Math Zone Images and OMML Specification, Version 2.

More MathML discussion is given in MathML 3.0, Improved MathML support in Word 2007, Rendering MathML in HTML5, and MathML on the Windows Clipboard.

Mathematical RTF is essentially OMML in RTF syntax. See also Office Math RTF and OMML Documentation and Updated RTF Specification.

Linear Format Notations for Mathematics include UnicodeMath and LaTeX Math in Office. See also Recognizing LaTeX Input in UnicodeMath Input Mode.

Interoperability

Major interoperability is afforded via Presentation MathML and [La]TeX math. In addition, the Design Science MEE and MathType equations can be converted to OfficeMath as described in Converting Microsoft Equation Editor Objects to OfficeMath. MathType can convert OfficeMath to MathType equations. These equation facilities are compared in Equation-Editor Office-Math Feature Comparison and Other Office Math Editing Facilities. The latter also compares them to the Microsoft Word EQ Field.

With a bit of effort, equations can be imported into Office applications from Wikipedia articles as described in Copying Equations from Wikipedia into Office Applications. You can create HTML documents with equations in them as described in Creating Math Web Documents using Word 2007.

Math Font

A basic part of OfficeMath is the Unicode OpenType math font. The first such font, Cambria Math, and the OpenType math tables were developed together with the Office 2007 math software, each influencing the other to obtain high quality results. Some history is given in the post High-Quality Editing and Display of Mathematical Text in Office 2007. The font contains extensive math tables, glyph variants and glyphs for most of the Unicode math character set. The tables were incorporated into the OpenType standard as noted in OpenType Math Tables. Posts elaborating on the math font are Special Capabilities of a Math Font and High Fonts and Math Fonts.

Cambria Math and Cambria are serifed fonts designed to look good on digital displays. As such, the stem widths never get skinny, in contrast to Times Roman fonts. If you prefer, the STIX math font is a Times Roman font that includes the OpenType math table support and works with OfficeMath. This font is discussed further in Math STIX Fonts 2.0 and UTR #25 Updates.

Formatting

This section discusses how OfficeMath handles math formatting involving math spacing, math styles, and alignments, and gives links to posts with further information. A math zone is defined by the math-zone character-format effect, an effect like bold or italic. As such, this is a non-nestable property, unlike math objects like fractions, which can be nested arbitrarily deeply. Adjacent math zones automatically merge into a single math zone.

An essential part of good math typography is math spacing. Within a math zone, OfficeMath follows the math spacing rules given in Appendix G of The TeXbook plus some enhancements that weren’t added to TeX for reasons of archivability. Section 3.16 of UnicodeMath summarizes the rules for the most common situations. Also see User Spaces in Math Zones for ways that OfficeMath autocorrects typical user input spacing errors. Two Math Typography Niceties shows how phantom objects can improve math spacing beyond the standard spacing rules.

Math bold and math italic define different math variables in math zones (𝐚 ≠ 𝑎 ≠ a ≠ 𝒂), while in ordinary text, bold and italic are used for emphasis. In math zones, math bold and math italic characters are different Unicode alphanumeric characters, while in ordinary text, bold and italic are character format attributes with no change in character codes. For example, 𝐚 is U+1D41A, 𝑎 is U+1D44E, a is U+0061, and 𝒂 is U+1D482. Even though the math and ordinary-text uses of bold/italic are unrelated semantically, the user can control these math styles using the usual bold and italic UI as described in Using Math Italic and Bold in Word 2007. There are other math styles that yield still different mathematical variables, such as open-face, script, Fractur, and sans serif (see Section 2.2 of Unicode Technical Report #25). In general, character formatting is controlled in math zones as described in Restricted Math Zone Character Formatting. In informal documents, people may want to use sans-serif characters instead of serif characters for aesthetic reasons rather than for defining different variables. Currently OfficeMath doesn’t support this choice, but maybe it should.

Occasionally one needs to embed ordinary text, such as words, into math zones. OfficeMath defines a character format attribute “ordinary text” for this purpose. Text with this attribute uses standard character formatting for italic, bold, etc. Unless the “ordinary text” attribute is active, the bold and italic settings only affect math alphanumerics; ASCII digits, punctuation, operators, and non-math characters are all rendered nonbold and upright.

In addition, OfficeMath has a “no-build-up” attribute to treat operator characters literally rather than use them in build-up translations. For example, if ‘/’ is marked with this attribute, build up in UnicodeMath mode leaves it as the character ‘/’ rather than converting it with the arguments around it into a built-up “stacked” fraction.

Since math zones are one level deep, you can embed ordinary text into a math zone, but you can’t nest a math zone within that ordinary text or elsewhere within the math zone. This hasn’t proven to be a limitation, although TeX can embed ordinary text inside math zones and nested math zones inside the ordinary text. It always seems to be possible to unwrap such nested math-zone scenarios into unnested math zones.

It’s useful to be able to define math properties for an entire document, rather than specify them for each math zone. This is described in Default Document Math Properties. A new property could be defined to use sans-serif math characters instead of serif characters.

There are two kinds of math zones: inline and display. For example, an inline math zone in TeX has the form $...$ and a display math zone has the form $$...$$. Inline math zones use reduced spacing and character sizes to make expressions fit better in line with normal text. In OfficeMath a display math zone starts at the start of a document or follows a hard or soft paragraph end (U+000D or U+000B, respectively) and ends with a hard or soft paragraph end. In some cases, it would be useful to apply display math-zone formatting to inline math zones, but this isn’t currently available.

Inter-equation alignment and line breaking involve multiple lines. To handle these cases and equation numbering, OfficeMath has the Math Paragraph, while MathML uses tables and MathType uses PILEs. A math paragraph is a sequence of one or more display math zones separated by soft paragraph ends (U+000B). Line breaking can be automatic or manual as described in Breaking Equations into Multiple Lines. Background on paragraph formatting is given in Paragraphs and Paragraph Formatting.

In a document with more than a few equations, it’s useful to number equations referred to from elsewhere in the document. The math paragraph has elegant equation-number support, but it hasn’t been exposed beyond prototyping. The earliest way to handle equation numbering is described in Cool Equation Number Macros for Word 2007. Later ideas are in More on Equation Numbering and equation numbering using equation arrays is described in Equation Numbering in Office 2016. This last approach isn’t quite as convenient as the ideal math-paragraph equation numbering, but it can handle virtually all cases.