Word XHTML - Bullets and Numbering

This is the fourth post by Zeyad Rajabi who owns the XHTML output from Word's new blogging feature . In earlier posts, Zeyad discussed a general overview of the XHTML , details on XHML compliance , and how we map styles to semantics . Today Zeyad is discussing the ways in which styles have been directly tied to specific XHTML tags.

Today will be a short post about lists in our blogging feature. Word 2007 provides you with a rich editing experience that allows you to create a multitude of different types of lists, from simple standard one level lists, multi-level lists, to custom defined bullet and numbering lists.

Given the time and resource constraint for our blogging feature we decided to take a more simplistic route with lists. Our blogging feature only outputs two types of lists: unordered and ordered lists (we do not support definition lists). That is, we are only relying on <ul> and <ol> HTML elements to render the look of lists, which will give full power to the host browser for rendering.

For this release of the blogging feature we are not going to output the following CSS properties:

  • list-style
  • list-style-image
  • list-style-position
  • list-style-type

Not outputting such CSS properties limits the fidelity level we will support for our blogging feature when comparing to the full power of Word 2007 bullets and numbering list feature.

Word 2007 allows for defining custom style lists, such as using strings “Heading 1” and “Heading 2” to depict different levels in a list. Given that we will only rely on <ul> and <ol> HTML elements and not the CSS properties mentioned above, the number of lists supported in our blogging feature will be much less than Word 2007.

Sample Lists

Below is a collection of some example lists and the corresponding HTML output.

Simple flat numbered list

  1. item 1
  2. item 2
  3. item 3

HTML:

 <ol>
   <li>item 1</li>
   <li>item 2</li>
   <li>item 3</li>
</ol>

Simple flat bulleted list

  • item 1
  • item 2
  • item 3

HTML:

 <ul>
   <li>item 1</li>
   <li>item 2</li>
   <li>item 3</li>
</ul>

Nested bulleted and numbered lists

  • Level 1 item 1
    • Level 2 item 1
    • Level 2 item 2
  • Level 1 item 2
    1. Level 2 item 1
    2. Level 2 item 2

HTML:

 <ul>
   <li>Level 1 item 1
      <ul>
         <li>Level 2 item 1</li>
         <li>Level 2 item 2</li>
      </ul>
   </li>
   <li>Level 1 item 2
      <ol>
         <li>Level 2 item 1</li>
         <li>Level 2 item 2</li>
      </ol>
   </li>
</ul> 

Multilevel List

  • level 1
    • level 2
      • level 3

HTML:

 <ul>
   <li>level 1
      <ul>
         <li>level 2
            <ul>
               <li>level 3</li>
            </ul>
         </li>
      </ul>
   </li>
</ul> 

Nested paragraphs

  • Item 1

    Some text.

  • Item 2

    Some text.

HTML:

 <ul>
   <li>Item 1
      <p>Some text.</p>
   </li>
   <li>Item 2
      <p>Some text.</p>
   </li> 
</ul> 

Nested paragraphs (w/o spaces)

  • Item 1

    Some text

  • Item 2

    Some text

HTML:

 <ul>
   <li style="margin-top:0px;margin-bottom:0px">Item 1
      <p style="margin-top:0px;margin-bottom:0px">Some text.</p> 
   </li> 
   <li style="margin-top:0px;margin-bottom:0px">Item 2
      <p style="margin-top:0px;margin-bottom:0px">Some text.</p> 
   </li> 
</ul> 

Comments are welcome

Any comments or questions are welcome.

Comments

  • Anonymous
    July 12, 2006
    Hm, are all the ol in the sample correct or just some typos? E.g. shouldn't the Simple flat bulleted list be an ul instead of an ol?

  • Anonymous
    July 12, 2006
    The comment has been removed

  • Anonymous
    July 12, 2006
    The comment has been removed

  • Anonymous
    July 12, 2006
    The current behavior is that unless the user specifies a font directly, then it is left unspecified in XHTML so that the blog's CSS can control the look.

    Would you like a different behavior instead?

    Mike, just because something is a top priority for you to see discussed doesn't mean it is for everyone else. If there are subjects you'd like to see covered, just let me know though and I'll try to get something posted.

    -Brian

  • Anonymous
    July 12, 2006
    In every version of Word I've ever used (through Office 2003) the built-in numbered lists feature is broken.  There's even a very lengthy article from a Word guru about why on ExpertsExchange (sorry I don't have the url handy) that is pretty definitive.  The ONLY way I've found to have working numbered lists in Word is to use sequence fields ... and such is the workaround of choice with other shops I've talked too.  Anyway that leads to my question ... a two-parter: 1) Do you know if lists are still broken in Word 2007?, 2) Does your XHTML by any chance support list numbering through sequence fields?  In regards to part 2 of my question it seems that it would be relatively easy to recognize the sequence fields and convert to the HTML structured tags...

  • Anonymous
    July 12, 2006
    The comment has been removed

  • Anonymous
    July 12, 2006
    Zeyad, "Instead of outputting sequence fields as <ul> or <ol> lists we will simply output paragraphs with appropriate number of non breaking spaces." does not sound very promising in regards to semantics - yap, I hate it to be that emotional about the HTML I produce (or produce through applications) but we all have to bear our little burden :]

  • Anonymous
    July 12, 2006

    The whole point of blogging using Word 2007 as a formatting tool is to get it rendered with full-fidelity anywhere it's viewed. Otherwise, why even bother?

    There is only one case when this fidelity does not matter, it's when you type text without font formatting at all. I guess it's only a slice of users.

  • Anonymous
    July 12, 2006
    The comment has been removed

  • Anonymous
    July 12, 2006
    Chris, all good points! Too bad Mike missed them.

  • Anonymous
    July 13, 2006

    "I think there are many more points to using Word for blogging than 100% WYSIWYG everywhere. Background spelling/grammar, autocorrect, local save, auto-recover, inline images and so on are some of many reasons. "

    Try Firefox. Two extensions added to v 1.5 and you are good. Free. Or may be you are assuming customers of Word 2007 are ignorant of what's available out there?


    "People had  conspiracy theories about us co-opting HTML"

    Hehe. Because Microsoft never heavy-handed HTML, haven't they? Isn't DHTML its own standards (i.e. a proprietary file format used by a single vendor)?

    You managed to write a lengthy paragraph totally off-topic. I as a user want my blog to appear with full-fidelity. Period. It encompasses fonts. It's a very old problem. Adobe fixed it by embedding fonts. Microsoft tried to copt web font embedding back with IE4, I honestly don't know right now the mess that it still is (pending patents, sub-licensing, ...), but I do know Microsoft is responsible for web font embedding to remain in such a sorry state since 1997.

    If you don't provide full-fidelity, it's the equivalent of writing a program on your computer and be unable to have it run on someone else's computer.

  • Anonymous
    July 13, 2006
    Mike, your point was that they didn't provide "full fidelity" and that that was a problem.  Seems to me that Chris directly addressed that point.  In fact, your criticisms of past practices of different Microsoft teams is what is off-topic.

    Personally, I have no need for "full fidelity" in a blog feature, especially if it requires contorted HTML.  Most decent blogging software I've seen already has a CSS, and if you were not using blog software, you could create your own CSS.

    I can certainly see why some people might want full fidelity, though.  Maybe some sort of option would be a good idea, assuming enough people need this capability to make the effort worthwhile.

    I will note that, if the only way to accomplish this is through all kinds of hacked-up HTML, maybe the less-functionality but better-standardized code is really the right way to go.

  • Anonymous
    July 13, 2006

    Andrew,

    You are not living in 2007. You are satisfied with the way computers worked in early 90s. That's fine. But I don't expect that from software, especially if I have to pay for it.

  • Anonymous
    July 13, 2006
    I'm not living in 2007?  You have a firm grasp on the obvious. ;)

    I admit I'm a bit confused by your response.  What is it about my point (really Chris's point, since I think I was really echoing him) that is not applicable to 2007?  What is my 1990s expectation that is unrealistic of 2007?

  • Anonymous
    July 13, 2006
    I have been contributing comments here about HTML export and lists for a while now.

    Please don't try to provide full fidelity rendering to HTML - as others have noted that is not a sensible goal. But please do something to let users produce decent HTML. As Zeyad notes it is too hard to map arbitrary Word lists of all kinds to HTML. But using styles is (relatively) easy.

    More from me here: http://localhost:8000/documents/blog/drafts/more_word_export.htm

  • Anonymous
    July 13, 2006
    Oops - I posted a local URL in the last comment.

    Should be: http://ptsefton.com/blog/2006/07/14/more_word_export

  • Anonymous
    July 16, 2006
    The comment has been removed

  • Anonymous
    July 17, 2006
    Hi Mark,

    Thanks for the feedback.

    As for playing around with the XHTML output, you can do so with Beta 2. Please be aware that there are a few bugs with our XHTML output in Beta 2. We have made many improvements since then.

    Zeyad Rajabi (MS)

  • Anonymous
    July 27, 2006
    The comment has been removed

  • Anonymous
    August 23, 2006
    I have the same problem as Denise. Can anyone help wioth this XML problem?

  • Anonymous
    June 01, 2009
    PingBack from http://uniformstores.info/story.php?id=3948

  • Anonymous
    June 01, 2009
    PingBack from http://uniformstores.info/story.php?id=17937