Science and Nature have difficulties with Word 2007 mathematics
Science and Nature, two premier science publications, are having difficulties with Word 2007’s elegant new mathematics facility. Part of the reason is due to misunderstanding about Word’s MathML support, which hopefully this post will help to rectify. And part of it is that the new facility represents mathematical text in a way that Word itself understands. Such mathematical text can differ dramatically from text entered using the Equation Editor and MathType, which use embedded OLE objects opaque to Word. Since this second area is primarily responsible for the choices made in Word 2007, I discuss it first.
As soon as mathematical text is represented in a way that Word itself understands, things are both simpler and more complicated. Things are simpler because Word’s user interfaces, formatting commands, object model, etc., can be used directly with mathematical text. Things are more complicated because this convergence in user interfaces allows users to insert Word-oriented features into math zones such as
· Images
· Revision markings
· Footnotes and comments
· Elaborate formatting and styles, …
The file format needs to be general enough to express such material faithfully. Unfortunately, MathML 2.0 isn’t able to handle embedded XML namespaces and as such simply isn’t general enough to represent Word 2007 technical documents. Accordingly we had to develop an XML approach that is general enough and we created OMML (Office MathML), which can be embedded in Word’s primary XML, WordProcessingML, and vice versa.
Office 2007 also ships XSLTs to convert OMML to MathML (omml2mml.xsl) and MathML to OMML (mml2omml.xsl). These XSLTs are used, for example, by Word for MathML clipboard support. They are stored in the subdirectory C:\Program Files\Microsoft Office\Office12. Naturally the MathML resulting from OMML in this way is missing content like images, revision markings, footnotes, etc., but for many purposes that’s acceptable. It just isn’t acceptable in the Word docx format, since this format has to reproduce exactly what the user created. The docx format and OMML are international standards and are thoroughly documented as noted in previous blog posts.
One of the very nice features of XML is that it can be translated relatively easily from one kind of XML to another. David Carlisle has used this flexibility to advantage in converting Word’s HTML to HTML with embedded MathML. Word’s HTML contains the math zones in two formats: OMML in comments and images. David’s program extracts the OMML, uses the omml2mml.xsl to convert to MathML and puts it all back together. Admittedly David is a magician, but he proves it can be done J
The bottom line is that Word 2007’s new math facility is a huge improvement over past approaches. But anytime such big improvements occur, there can be, and evidently are, problems with upgrading. I think the trouble is well worth it in both user convenience and the marvelous typographic quality. I’ve been doing technical word processing since the late 1960s and Word 2007’s mathematical capabilities still amaze me. Not that it’s finished; we do have a number of features to add…
Comments
Anonymous
June 04, 2007
Just two points: It is not only Science and Nature, I have yet to come across any journal that would accept Word 2007 documents. And second: Have you contacted the publishers of these journals? Have you tried to find out what blocks them from accepting the new equations is? Are you proactivly trying to mitigate this situation?Anonymous
June 05, 2007
We are in the process of contacting Nature and Science to understand their difficulties better and hopefully to offer solutions. The new docx math format is substantially richer typographically than earlier formats and should be considerably more valuable for a publisher. Admittedly when you have an infrastructure that works, the easiest thing is to just keep using it. But the thoroughly documented docx format should provide a much more faithful conversion path than that of the earlier doc format and MathML is readily extracted from it. In addition, we have more exciting things in mind. It's a great time to be a scientist, engineer, mathematician, or student of those disciplines.Anonymous
June 05, 2007
Excellent to hear that! I guess a lot has to do with third party apps they use in the downstream processing of documents and their compatability with the docx format. I don't really know, but presumably most DTP programs are not yet compatible with docx? Or are they? But sorting these things out with publishers would be incredibly helpful. Also, quite a number of journals use automated systems for paper submission, where you essentially upload a doc file and then they create a pdf out of it on the server. They will need to invest quite heavily to update those pieces of software, I guess... I came across http://www.editorialmanager.com quite often, maybe working with them to add docx support would be helpful?Anonymous
June 07, 2007
Thanks for the lead. Word 2007 files can be "published" in pdf format by Word itself. The user does have to download the Office 2007 pdf handler from http://www.microsoft.com/downloads/details.aspx?FamilyID=4d951911-3e7e-4ae6-b059-a2e79ed87041&DisplayLang=en since Adobe didn't want Microsoft to ship it with Office 2007. But once that's installed, you can create the pdf directly from Word and then post it on the web. I've been using this facility for over a year now with my Unicode Technical Note #28 (http://www.unicode.org/notes/tn28/UTN28-PlainTextMath-v2.pdf)Anonymous
June 08, 2007
The comment has been removedAnonymous
June 11, 2007
Word 2007's object model does have the method Document.ExportAsFixedFormat, which enables a program to export pdf from Word. To see this, launch Word, type Alt+F11 to get to Visual Basic, then choose View/Object browser and click on Document. Further clicking on ExportAsFixedFormat shows this method's prototype and the argument WdExportFormat can take either wdExportFormatPDF or wdExportFormatXPS.Anonymous
June 11, 2007
The comment has been removedAnonymous
July 03, 2007
I don't think the keypoint is that Science and Nature are able or not to deal with " Word 2007’s elegant new mathematics facility". These publishers should only accept a standardized format (such as ODF) or the widespread latex source files. This would prevent authors to bother about the version of word they're using. As long as MS has not provided a real standard (ISO approved), editors should recommand other solutions. Best regards, LAnonymous
July 03, 2007
When it comes to scientific publishing, one has to deal with the great variety of hardware, software, OS, and such... Since Word isn't an inter-operable solution, it makes absolutely no sense to use it as a so-called "widespread medium" for disseminating the papers. Microsoft is still trying to hook the market with it's closed and windows-only software and file formats. I'm afraid this is totally in contradiction with what Science is all about.Anonymous
July 15, 2007
I have not been posting as frequent as I would have liked, but I plan to correct this soon. Meanwhile, here are several links to useful OpenXML (wordprocessingML and Word 2007 focused) links: End user downloads Compatibility packs for older ver