Udostępnij za pośrednictwem


Directionality in Math Zones

In most places, mathematical text is written “left to right” (LTR). For example, in the expression x + y the plus is displayed to the right of the x and the y is displayed to the right of the plus. But in some Arabic locales, mathematical text is written right to left (RTL). Instead of E = mc2, one would see 2cm = E, although the letters would be Arabic, not Latin.

In such RTL locales, square roots are mirrored, so that the surd symbol √ is flipped relative to the vertical axis. Similarly integral signs are mirrored, although the circular arrows in contour integrals are not mirrored, since they pertain to the 2D complex plane, not the 2D text plane.

The Presentation MathML 3.0 specification provides for RTL math zones. In fact, it allows a dir = “ltr” or “rtl” attribute on the top level <math> element as well as on <mrow>, <mstyle> and token elements like <mi>. Except in rare cases, only the <math> direction need be specified, since all the elements inside have the same directionality (see Section 3.15 of the MathML 3.0 specification). The specification has now undergone Last Call status and so we need to have implementations of the new features. Accordingly I’m interested in implementing at least part of the RTL functionality, namely RTL math zones.

First, consider what an LTR math zone is. This is what Word 2007 and the Office 2010 applications implement. It does have RTL text whenever Arabic or standard Hebrew characters appear adjacent to one another. But all operators and other “neutral” characters are considered to be “strong LTR”, that is, they are displayed to the right of the character that precedes them. This can be quite different from a display that obeys the Unicode Bidirectional Algorithm. A sequence of digits is always displayed LTR, regardless of the character that precedes it even outside math zones and according to the Unicode bidi algorithm. Inside LTR math zones a sequence of digits is displayed to the right of the character that precedes it even if that character is Arabic. According to the Unicode bidi algorithm, a number following an Arabic character is displayed to the left of the Arabic character in both LTR and RTL paragraphs. Inside embedded normal text in a math zone, the usual rules for bidi text are followed. Note that except for such text, the math-zone bidi rules are much simpler than those of the Unicode bidi algorithm, which gets quite tricky in complicated scenarios.

Perhaps you noticed the term “standard Hebrew characters” above. By this I mean all Hebrew characters except the four Hebrew letter-like math symbols ALEF SYMBOL, BET SYMBOL, GIMEL SYMBOL, and DALET SYMBOL (U+2135..U+2138). These symbols are strong LTR characters, unlike their HEBREW LETTER counterparts located in the Unicode Hebrew block (U+0590..U+05FF).

Analogously in an RTL math zone and in the absence of directional overrides, operators and other neutrals are treated as strong RTL characters. A sequence of digits is still displayed LTR, but it appears on the left of the character that precedes it even if that character is Latin. Sequences of Arabic and standard Hebrew letters are RTL as usual. At least that’s how I think a typical RTL math zone should be displayed.

This description of math-zone directionality is somewhat simplified compared to the generality encountered in the real world. To see some of the special cases that can happen, please read the papers by Azzeddine Lazrek:

https://www.ucam.ac.ma/fssm/rydarab/doc/unicode/amassf.doc

https://www.ucam.ac.ma/fssm/rydarab/doc/unicode/amasl.pdf

https://www.ucam.ac.ma/fssm/rydarab/doc/unicode/amdsl.pdf

https://www.ucam.ac.ma/fssm/rydarab/doc/unicode/others.pdf

https://www.ucam.ac.ma/fssm/rydarab/doc/communic/unicodem.pdf

https://www.w3.org/TR/arabic-math/

https://www.ucam.ac.ma/fssm/rydarab/

 

The following review papers are excellent sources for overviews of RTL math:

https://en.wikipedia.org/wiki/Modern_Arabic_mathematical_notation

https://www.ima.umn.edu/2006-2007/SW12.8-9.06/activities/Lazrek-Azzeddine/MathArabIMAe.pdf

Comments

  • Anonymous
    November 13, 2009
    Are there any changes with regard to support for RTL text inside math zones in Office 2010? In my Probability class, which I type with Word 2007, the professor would often write something like P(<name of some event>)=<some expression>, where P is a probability function and some expression is Western-style LTR Math, but the name of the event is some descriptive text in Hebrew and should be RTL. However, as soon as there's a space in <name of some event>, the first word would appear to the left and the second to the right instead of being ordered correctly as an RTL sentence. A work around is to go in and out of math zone when entering RTL phrases, but this is pretty clumsy, may cause all sorts of layout issues and of course isn't needed when intermixing LTR text with math.

  • Anonymous
    November 13, 2009
    Put the Hebrew text inside double quotes to format it as Normal Text inside a math zone. Or equivalently, select the Hebrew text and click the Normal Text button on the math ribbon. Normal text is laid out using the usual bidi rules instead of LTR math zone rules.

  • Anonymous
    November 19, 2009
    The comment has been removed

  • Anonymous
    November 19, 2009
    I tried to reproduce these Shift+Enter problems with Word 2010 Beta 2, and wasn't able to. So I believe the problems have been fixed. Sorry for the inconvenience with Word 2007, with which I was able to reproduce the problems.