Highlighting in a document

I've had a lot of folks ask me to provide more information on what features are missing from ODF and why it was that we decided to create out own XML format (Open XML). I didn't want to get too involved in pulling together a full detailed list, but it's probably worthwhile pointing things out every once and awhile. Most of you know that ODF wasn't even around when we first started working on our XML formats, so that's really one of the big reasons. Another reason is that we need to make sure that we created an XML format that all of our customers could (and would) use. We want our customers to move all their existing documents into this new format and we need them to be willing to use it as the default format. ODF just wouldn't have allowed us to achieve that (both because of a lack of functionality as well as different optimizations that sacrifice things like performance).

An area I just came across today that really surprised me was highlighting. I'm sure most folks are familiar with highlighting in a Word document. You can use highlighting to call attention to different areas in a document either for yourself or to point things out to others. The key about highlighting is that it does not affect any other formatting. Character shading (aka background-color in ODF) for instance will still be preserved when you highlight some text. I've seen some implementations out there that try to use shading as a substitute for highlighting, but that doesn't really work because people may also want to apply shading in addition to highlighting. For example, you may have a range of text shaded with light gray (ie the background-color is light gray), and then you want to highlight some of the text in that range. Then, once folks have reviewed the document, you want to remove the highlighting without removing the gray shading. In the ODF spec I saw support for shading on text, but not highlighting which we view as two different things (I only saw mention of highlighting on tables).

I came across this the other day while I was looking through the ODF spec and comparing it to the Ecma draft trying to get a better handle as to why the ODF spec was so much lighter (700 pages compared to 4000). I wanted to see if there were things we could do to reduce the number of pages in the Open XML spec without losing any of the necessary information. It looks like while there are some things that can be done for minor size reductions, we just have a lot more functionality and there is no way we could get it anywhere close to that small while still fully covering wordprocessingML, spreadsheetML, and presentationML. There are three reasons that we have so much more content. The first is that we are just representing a much richer set of features (since we have to XMLize all the existing Microsoft Office binary documents) so as a result there is just a lot more to document. The second reason is that the ODF spec points off to other specs for certain things to provide more details. The third reason is that the Ecma Open XML spec is just a lot more detailed as to how things work. The WordprocessingML sections are the furthest along in the latest draft, and if you read through the paragraphs and rich formatting section for instance (Section 19), you'll see what I'm talking about. The ODF spec on the other hand is very light and vague on a number of issues (like the numbering format issue I pointed out earlier).

-Brian

Comments

  • Anonymous
    June 01, 2006
    I can tell you another difference.  You are reading and learning from their spec.  Mostly what we see from detractors of OOX is complaints about something I don't think they are reading.  And I don't think ODF is getting much critical reading.  Thanks for your care and even-tempered approach here.

    By the way, I just remembered that Doug Mahugh has a different blog for his tech work and finally subscribed to it.  He gives a great breakdown of what is to be found in TC45 working draft 1.3. It's at http://blogs.msdn.com/dmahugh/archive/2006/05/25/DraftSpecTour13.aspx

    I suggested in a recent comment elsewhere that it might be useful to split the TC45 document into two, since it tends to fall into two parts, with the conceptual information in the first, um, 700 pages or so, and basically reference material in the rest.  I also think tables and other arrangements can cut down on the space taken, improving the density of some of the reference parts.  I love that the PDF takes full advantage of cross-referencing and linking.  The ODF spec is much harder to handle on my computer in that respect (and I am a lover of fine-grained section numbering in specifications too -- makes it much easier to submit comments and suggestions).
  • Anonymous
    June 02, 2006
  • 35,000 more computers to OpenOffice.org
    - No problems with the migration
    - More to come

    35000 * whatever exorbitant amount you charge for Office = HAHAHAHA
  • Anonymous
    June 02, 2006
    Thanks Gilberto. That isn't really related to this discussion about file formats, but I'm glad you shared it with us. :-)

    -Brian
  • Anonymous
    June 02, 2006
    Dennis; you don't think ODF is getting much critical reading?

    It got plenty of comments back from the last ISO round, there are a number of developers building support for it into their applications as the default format, and lots of third-party developers using it.

    I know ODF doesn't have all the features that OXML has. I also know that the way OXML is being developed, it will not have all the features of ODF - Microsoft are essentially treating Office as being "feature complete". If that's what they think, great, but ODF is a specification which will be continuously developed.

    ODF is where the real innovation in office file formats is happening.
  • Anonymous
    June 02, 2006
    Alex, we aren't treating Office as being "feature complete" at all. In fact if you look at Office 2007 there is a ton of innovation. Look at the support for custom defined schema and content controls in Word for example. That's where developers are really getting excited (we have hundreds of thousands of 3rd party developers already building solutions on top of the XML support from Office 2003).

    My point with the ODF comparrison is that there already exist billions of Microsoft Office documents today and our spec absolutely has to support those documents. That's not innovation, that's just matching the world today. The spec will then continue to grow and evolve over the years in Ecma as we innovate and build.

    -Brian
  • Anonymous
    June 04, 2006
    The comment has been removed
  • Anonymous
    June 05, 2006
    As we move forward with the standardization of the Office Open XML formats, it's interesting to look...
  • Anonymous
    June 05, 2006
    The comment has been removed
  • Anonymous
    July 10, 2006
    Maybe is because ODF is based on existing standards whilst Microsoft has decided to start from scratch?
  • Anonymous
    July 27, 2006
    The comment has been removed