Sdílet prostřednictvím


Table models in file formats

In my post last week about the lack of table support in ODF, some folks were curious as to why the Ecma Open XML formats have three different table models. I explained that when you are designing a file format, you need to examine closely the target user scenarios of the applications that will use those formats.

Obviously the use cases for a table in a spreadsheet are different from those around tables in presentations or wordprocessing documents. For instance, it's not much of a stretch to imagine a table of data in a spreadsheet with 50,000 rows and 200 columns. That would never happen in a presentation though. A table in a presentation is much more heavily focused on the layout and formatting of the data (same case with a wordprocessing document).

I think a great example of why you often need different table models would be the ODF spec itself. It was too difficult to map their existing table model to the presentation format so instead of working through that issue they instead left it out of the spec. Otherwise the spec would have just stated that the table model applied to all three formats. As it stands right now, the only way to get a table in a presentation is to embed a spreadsheet. The plan is that in V 1.2 (which is still over a year out) they will have support for spreadsheet formulas and presentation tables. One could argue over whether it would have been better to actually finish the spec before submitting it to ISO and creating organizations like the ODF Alliance who purpose is to push for policies that mandate ODF, but maybe that's just me ;-).

Looking at the table models, I do think the ODF guys made a big mistake in the design of their spreadsheet format. They chose to make the table model for wordprocessing documents and spreadsheet documents the exact same (but it looks like it's still different from HTML or CALS). Now I do understand that this level of commonality is the nirvana for most folks and I also had the goal of making the Open XML formats as close to this ideal as possible (it's something we actually looked really hard at doing when we first started working on the Open XML formats). The problem is that for the same reason you often need different user interfaces in the different applications, you also need a different file format at times. Sure there are plenty of concepts that can be reused (such as basic formatting), but a spreadsheet grid is different from a presentation table. Otherwise you're stuck with a format that sells everyone a bit short as it's the greatest common factor of all the applications, and isn't optimized for the unique customer scenarios.

If you've looked at the Ecma spec, you can see that we had to diverge in the table design of for the Office Open XML formats. The use of tables in wordprocessing documents and presentations is very similar, and as a result the table models in those formats are very similar. In a spreadsheet though, you have to account for much larger sets of data, and at that point the efficiency with which you write out that information can have a much more significant impact in the amount of time it takes to actually parse the files.

So, is the spreadsheetML format super easy? Well that depends on who you ask. For people that have developed against the old binary formats, things will be unbelievably easier and more reliable. But for folks who've primarily used table models like HTML, there will be a bit of a learning curve. That's why the file format documentation that we're doing in Ecma is so important. It will empower anyone to program against these files. We could have gone with a more verbose simple table model, but that would have been at the detriment of every user out there. Most people don't care about their file format, they just want things to work. As I said in an earlier post, we had to take the training wheels off, but we're going to be there with you as you learn to ride on your own.

-Brian

Comments

  • Anonymous
    July 25, 2006

    "Looking at the table models, I do think the ODF guys made a big mistake in the design of their spreadsheet format but that's because they had different design goals. They chose to make the table model for wordprocessing documents and spreadsheet documents the exact same"

    You speak as if there were any table model in BIFF. There isn't. You've added a "list"model in Office 2003 which is now a full fledge "Table" in 2007, but it's disingenuous to say what you said the way you said it, since billions of BIFF users have obviously managed their way through without a table model so far.

  • Anonymous
    July 25, 2006
    The comment has been removed

  • Anonymous
    July 25, 2006
    As far as I can tell, the table model in BIFF is the worksheet. It sounds like the ODF guys just said "a spreadsheet is a table, so every time somebody needs a table just make it a spreadsheet" or maybe "everytime somebody needs a spreadsheet just make it a table".

    Brian makes a good point that ODF obviously does not have a very good format for representing spreadsheets. It does a perfectly good job of representing tables that can be used as spreadsheets, but large spreadsheets used by financial analysts and the like are just not well-supported.

    If the format is not optimal and requires long (i.e. minutes) save and load times, people won't use it unless they are forced to. If people won't use the format, why create it? MS is trying to create formats that people will want to use.

  • Anonymous
    July 25, 2006
    The comment has been removed

  • Anonymous
    July 26, 2006
    I would argue that a table and a worksheet are not the same beast, so having to deal with two models is of little concern.  Having two types of tables bothers me, but I will let this go to those who know better why it had to be.

    I am bothered by what I read, however.  I had thought that we were moving toward application units (maybe read services?) that did what they did very well and didn't try to do all things.  By putting these units together, you could build what you needed, not what someone decided to allow.  If I need a spreadsheet in the middle of a document, then I get it.  I am not hampered by a word processing table model which isn't complete enough.  I get a full spreadsheet.

    If this apporach is taken, there is no lowest common denominator for someone with all of the units installed.  But what if they do not have the units they need, or the units come from different vendors.  This is the case for completeness and fidelity in the document format.  The format can have everything in it that the application unit can describe.  If another unit can express more, then it does, but still within the published format.  This is the beauty of XML - it's extensible.

    If my unit doesn't know how to do everything that has been captured in the document format, it ignores what it can't do and does the best with what it can.  If it has to compromise what is there, it warns the user and gives them the choice of continuing.  And if this happens too often, they will use a better application unit for that function.

    Why do we still view things so monolithically?  I like the idea of competition for the service provided.  Go for it.  But I want the underlying document format to allow me to change vendors, upgrade to better/higher function, pick what's important to me.

  • Anonymous
    July 26, 2006
    I have one off topic question I would like to know your answer on. Our company is starting to get .docx documents more and more from outside, and we are wondering if MS is recommending the deployment of the beta compatibility pack for office2003. I don’t see an easy way to roll it out in any case. I think MS should have waited on the .docx default till after the beta, when the compatibility packs were also out of beta and ready for originations to roll out.

  • Anonymous
    July 26, 2006
    The comment has been removed

  • Anonymous
    July 26, 2006
    Brian,

    I woud argue that if anyone is using Excel to analyse spreadsheets with 50,000 rows and 200 columns they need to be sent for re training. There is plenty of documentation about spreadsheet errors that have resulted in mis-statement of financial results. There are tools more suited to the task than a spreadsheet.

  • Anonymous
    July 26, 2006
    The comment has been removed

  • Anonymous
    July 26, 2006
    Todd: What happens to performance if you increase spreadsheet size drastically? For example, take it to the 65536x256 grid size that was the upper limit for pre-2007 Excel versions? And then, take it further till you reach the 1048576x16384 grid limit of Excel 2007.
    I would be highly interested in seeing a graph of performance vs. spreadsheet size for that.

  • Anonymous
    July 26, 2006
    Also note that I'm specifically talking about load and save time, not the performance of the application once the file is loaded.

    OpenOffice is obviously slow in those cases as well, but in most cases that isn't tied to the file format itself. If you look at the design of the two formats, you'll see that SpreadsheetML can be loaded and saved much faster than ODF.

    Also, while I've been refering to the "table model" of spreadsheets, I really do mean the entire spreadsheet. If you look at an ODF spreadsheet, the entire sheet is wrapped in a <table> tag.

    -Brian

  • Anonymous
    July 26, 2006
    First, embedded OLE objects have their place, but an embedded spreadsheet isn't the same as native table functionality.  Embedded spreadsheets are useful if you have a need to activate the spreadsheet object in order to work with the spreadsheet functionality of the spreadsheet app.  But embedded spreadsheets are not necessarily the best way to present a table of data.

    Second, advocating that word processors include native spreadsheet functionality (as someone seemed to above) is whacky.  Do you really want word processors to have to support pivot tables and the like?

    Lastly, tables are not the same as spreadsheets.  Spreadsheets obviously have more calculation features than tables.  But more than that, tables (at least in Microsoft Office) don't necessarily have the same shape as spreadsheets.  In Word, you can use the free form table drawing tool to create a table that looks nothing like a spreadsheet grid (with diagonal grid lines, gridlines that don't go all the way across or down the table, nested tables, etc).

    Forcing the same format for tables and spreadsheets is a misguided attempt at elegance.

  • Anonymous
    July 26, 2006
    After seeing OpenOffice 2.0.3's implementation of the ODF format versus Microsoft Office (beta 2)'s plugin's implementation of the ODF format here: http://www.robweir.com/blog/, I'm left wondering why anyone associated with Microsoft can carry any authority about how to implement the ODF format--particularly with regard to tables, which in the examples I've seen, are pretty thoroughly munged in Microsoft Office (beta 2).

  • Anonymous
    July 26, 2006
    OLE is irrelevent in this discussion.

    Biff would like us all to think that because the table data is specified in the same way, that must mean that ODF-compliant apps have to embed the spreadsheet into the presentation app in order to make it work.

    And, while discussion of the Ecma spec. is all well and good, remember that ODF is the only spec. here that's actually been, y'know, standardised. Given ODF is already out, and being implemented, it really would be a sorry tale for the Ecma guys if they couldn't improve on it. Well, a sorrier tale than people not contributing to the multi-vendor TC at OASIS, anyway.

  • Anonymous
    July 27, 2006
    The comment has been removed

  • Anonymous
    July 27, 2006
    Alex, table data isn't defined at all in terms of a presentation right now. The only way to get a table in a presentation for ODF is to embed a spreadsheet. So Biff's statments are very accurate. The ODF newsgroups say that they are going to work to get tables in for version 1.2 which is over a  year away.

    Your second point really hits on my big concern at this point. While ODF has gone through standardization, it was done so prematurely. It's missing a number of significant pieces of functionality, and it's clear if you look at the discussions that they pushed it through for political reasons.

    -Brian

  • Anonymous
    July 27, 2006
    Yes, ODF standardization was prematurely pushed thru for political reasons, and it's sad that ISO allowed itself to be used like that.

  • Anonymous
    July 30, 2006
    I love how these mini-ODF dramas are turned into crises.
    <a href="http://www.b086.com/blog2/1251/" title="竹地板">竹地板</a>

  • Anonymous
    July 30, 2006
    I love how these mini-ODF dramas are turned into crises.
    <a href="http://www.b086.com/blog2/1251/" title="竹地板">竹地板</a>

  • Anonymous
    July 30, 2006
    The comment has been removed

  • Anonymous
    June 14, 2009
    PingBack from http://patiocushionsource.info/story.php?id=2118

  • Anonymous
    June 16, 2009
    PingBack from http://fixmycrediteasily.info/story.php?id=17446

  • Anonymous
    June 18, 2009
    PingBack from http://adirondackchairshub.info/story.php?id=3614

  • Anonymous
    June 19, 2009
    PingBack from http://mydebtconsolidator.info/story.php?id=16977