Speaking Subscripts, Superscripts, and Fractions
You might think that there’s just one good way to speak a math expression, at least in each natural language. But actually, there are a number of good ways to speak math expressions, each with advantages and disadvantages. This post discusses some of these choices for subscripts, superscripts, and fractions. The post Speaking of math… gives a general introduction to how we can speak Microsoft Office math zones. See also the ClearSpeak specification and the MathSpeak™ Core Specification Grammar Rules. The MathSpeak choices are influenced by Nemeth braille (see Nemeth Braille—the first math linear format).
Subscripts and Superscripts
Let’s start with the Nemeth braille approach used by MathSpeak, though it doesn’t use braille codes. Nemeth braille uses subscript/superscript level shifters. For example,
A sub/sup level stays active until another level is met. The level shifter to go back to the baseline is “baseline”. This is the way the blind mathematician Abraham Nemeth liked to have people speak subscripts and superscripts to him. Back in his day, computer math speech wasn’t available and people read math to him. His sub/sup speech is efficient and unambiguous. He didn’t even say x2 as “x squared”, but as “x sup 2”.
This has the advantage that if x2 is the second component of the vector x, it isn’t misidentified as “x squared”. Superscripts don’t always mean powers. For example, the triple scalar product a ⋅ (b×c) of the vectors a, b,and c is given by εijk ai bj ck, where εijk is the Levi-Civita symbol, and ai, bj, and ck are vector components. Here the superscripts are indices, not powers, and they are automatically summed over since they are repeated.
Furthermore, the level of nested subscripts/superscripts is always clear with level shifters. On the other hand, saying “e to the minus x squared” gives the meaning of that expression without any parsing. A more verbose version of the Nemeth approach is to say “superscript” and “subscript” instead of “sup” and “sub”. Saying the complete words is helpful at first. But as you get familiar with it, the three-letter abbreviations are faster and easier to follow. Too much verbiage gets in the way of comprehending math.
Except for Nemeth himself, the references linked to at the start of this post all say x2 as “x squared” and x3 as “x cubed”. Superscripts as indices aren’t common and a little AI could recognize them. ClearSpeak says xn as “x to the nth power”, while I prefer “x to the n”. “nth” requires localization, whereas ‘n’ alone does not. In my lectures on physics over the years, I don’t think I ever added the word “power”. Although grammatically correct, it wastes time, and being grammatically correct isn’t necessarily a goal for math speech. Math speech wants to be efficient and unambiguous, but some degree of abbreviation helps convey the semantics more efficiently. In fact, mathematics owes a significant part of its success to its concise notations. A side benefit of using abbreviated speech is that localization is simplified: you don’t have to worry much about word order differences and declensions.
If you don’t use the sub/sup/base level shifters, how do you handle compound subscripts and superscripts unambiguously? The various math linear formats except for Nemeth braille all handle compound scripts using tree structures such as TeX’s a^{b_2} or UnicodeMath’s a^(b_2) for ab₂. One could speak these characters, but it’s better to speak what they represent since {} and () are used for a variety of syntactic purposes and may be nested. Accordingly, one can say “a to the b sub 2 end sup”. Here UnicodeMath’s ‘(‘ is replaced by “to the” and the ‘)’ is replaced by “end sup”. For aj2, one can say “a sub j squared”, while the less common aj² can be said as “a sub quantity j squared”. Well defined pauses can also indicate the scopes of the scripts. But it does get tricky. There’s a lot to be said for the Nemeth sub/sup approach for compound scripts as used in MathSpeak. It’s also easy to translate that approach into the corresponding Nemeth braille.
Fractions
Numeric fractions like ¼ are spoken as “one fourth” and simple fractions like a/b are spoken as “a over b”. A fraction is compound if it contains one or more operators with lower precedence than division, such as (a+b)/c. For compound fractions, the beginning and end of the fraction need to be spoken to differentiate between expressions like a/(b+c) and a/b + c. If you say “a over b plus c” it means a/b + c, since we adopt the usual convention that division has higher precedence than addition. It also helps to pause a bit before saying “plus c”.
In the spirit of announcing the start and end of compound entities, one might want to speak a compound numerator as “numerator…end numerator” and a compound denominator as “denominator…end denominator”. But both ClearSpeak and MathSpeak prefer to speak a compound fraction as
“start fraction <numerator> over <denominator> end fraction”.
This is similar to TeX’s notation “{<numerator>\over<denominator>}” and to the Nemeth braille fraction ⠹ <numerator> ⠌ <denominator> ⠼ . This choice is more efficient when both numerator and denominator are compound. Both approaches allow nesting of fractions. Briefer choices include “frac…over…end frac” and “b frac…over…e frac”.
The last of these is how Abraham Nemeth liked fractions to be spoken. Furthermore, if a fraction contains another fraction, he’d say “b b frac … o over … e e frac” for the outer fraction and “b frac…over…e frac” for the inner fraction. He’d repeat the ‘b’, ‘o’, and ‘e’ as many times as the deepest fraction’s nesting level, like stuttering. MathSpeak has a similar option that uses “start” for ‘b’, “over” for ‘o’ and “end” for ‘e’. Revealing the nesting levels is similar to the way we speak nested parentheses as “open paren”, “open second paren”, “open third paren”, and so forth as in ClearSpeak, but in opposite nesting order. MathSpeak and Nemeth Braille indicate the nesting level of square roots and other roots, but don’t give a way to indicate the nesting level of parentheses.
One ends up with a plethora of choices. Since different folks like different choices, both MathSpeak and ClearSpeak offer several speech options. Some choices can be handled by a verbosity level. But qualitatively different choices might best be handled with settings in a dialog box. Nemeth sub/sup level shifters versus tree speech of compound scripts is an example of the latter. See also Larry's Speakeasy, which gives English speech for a wide variety of mathematics.