Open XML links for 10-11-2007
Andy Updegrove's "Meanwhile, Back in Minnesota: Your Chance to Help" provides information about how to provide feedback on document formats legislation to the state of Minnesota. The deadline for feedback is next Monday, so if you'd like to participate in the process now's the time to do it. You might want to share your views on whether choice is a good thing or not, whether governments should mandate specific formats rather than general guidelines, and related topics. The Massachusetts ETRM is a good example of one state's approach in this area. The state of Texas also did a study entitled "Estimated Two-year Net Impact to General Revenue Related Funds" that sheds some light on the costs involved in mandating document formats.
Wouter van Vugt's "Extracting data from a xml-mapped document" includes a handy XSLT for extracting the custom XML data from WordprocessingML. I've covered before how custom markup works in WordprocessingML, and although it has some unique benefits, one downside relative to custom XML parts is that the business data is interspersed with the Open XML markup. Wouter's sample transformation helps to simplify that messy detail.
By the way, if you're interested in creating documents with custom schemas attached using Microsoft Word (no programming involved), MSDN has a how-to article entitled "Create an XML document based on a custom Schema" that takes you through the steps involved.
Guy Creese has done some informal "IBM Lotus Symphony Performance Tests" to assess the performance of IBM's recently announced open-source suite ...
I put up a post last week about IBM's new Lotus Notes Symphony office software suite, saying that based on an article in PC World, it seemed to be sloow in loading and a significant consumer of system resources. In short, the free software had some hidden costs. Shazaam, I got a ping from IBM Analyst Relations along the lines of, mmm, a few facts are not correct and how about a briefing on the product?
Fair enough, I thought. I'm still waiting for that to occur. But in the meantime, I figured I'd download the software and try it out myself, so I could ask some intelligent questions during the briefing. At a summary level, here's what I found, when running the software on a Pentium 4 with 2 GB of memory:
- On average, an IBM Lotus Notes Symphony app (Beta 1) takes three to four times as long to load as the comparable Microsoft Office 2003 product (with some significant outliers: e.g., 15 and 33 times as long).
- An IBM Lotus Notes Symphony app (Beta 1) consumes more CPU at load time than the comparable Microsoft Office 2003 product.
- An IBM Lotus Notes Symphony app (Beta 1) consumes three to five times more memory than the comparable Microsoft Office 2003 product.
It will be interesting to hear his thoughts after the analyst briefing.
And finally, speaking of performance, Zeth posted a comparison on Command Line Warriors this week about file sizes for a few document formats. He created a table showing the size of a document that only contains "Hello World" in a few different formats:
Format | Application | File Size (bytes) |
---|---|---|
.txt | Emacs 21.4.1 | 11 |
.abw | Abiword 2.4.6 | 2517 |
.odt | OpenOffice Writer 2.20 | 6674 |
.doc | Microsoft Word 2003 SP2 | 24064 |
Just to extend this research a bit, here are the results with the two editors I use most often:
Format | Application | File Size (bytes) |
---|---|---|
.docx | Microsoft Word 2007 | 9870 |
.txt | Notepad 6.0 | 11 |
Comments
Anonymous
October 11, 2007
PingBack from http://www.artofbam.com/wordpress/?p=7558Anonymous
October 12, 2007
Doug, now please post file size results for documents that have some actual content, say, the text of U.S. Constitution or something :)))Anonymous
October 12, 2007
Well, Anna, I was just following the rules of that comparison page, to get an apples-to-apples comparison. But since you asked ... I found a copy of the text of the constitution at http://www.usconstitution.net/const.txt So pasting that into a Word 2007 document and saving it, I get these file sizes ... .TXT = 45,992 bytes .DOC = 85,504 bytes .DOCX = 46,618 bytesAnonymous
October 12, 2007
Thanks, Doug! These numbers make much more practical sense than "Hello, World!" example :)))Anonymous
October 16, 2007
The zipped verision of the text format U.S. Constitution is only 14,182 byes. This is 30% the size of the zipped .docx file. What does the other 70% of the .docx file do?Anonymous
October 16, 2007
The comment has been removedAnonymous
October 16, 2007
The comment has been removedAnonymous
October 17, 2007
If a text file includes everything you need, then I'd agree that a DOCX is overkill. Text is great if file size is a top priority, and DOCX is great if compatibility with existing Office documents is a priority. As for the contents of the DOCX, it's easy to rename it to a ZIP and open it, and everything's XML-based and defined in the spec, so you can look up the individual elements if you'd like to understand what's there in more detail. Most users experience a signficant reduction in disk storage requirements when moving from DOC to DOCX.Anonymous
October 20, 2007
Your apples to apples comparison is misleading - suggesting the docx file format is as compact as the original textfile. However, when accessed, the docx file is much larger. I'm sure storage requirements go down with the new format since it uses zip compression - most pudgy files do. I wonder when the casual user will realize they can no longer save disk space by zipping the new formats. The storage might even get larger. I'm not against non-text document files, just against pudgy ones, particularly those where the pudgy part is not pertinant to the content of the document.Anonymous
October 20, 2007
Well, the two rows I added to that table show that the DOCX (9870 bytes) is about 900 times larger than the text file (11 bytes). It's not clear to me how that is "suggesting that a docx file is as compact as the original text file."Anonymous
October 20, 2007
I believe I wrote - apples to apples. There was a reason I did not say table. "# dmahugh said on October 12, 2007 5:00 PM: Well, Anna, I was just following the rules of that comparison page, to get an apples-to-apples comparison. But since you asked ... I found a copy of the text of the constitution at http://www.usconstitution.net/const.txt So pasting that into a Word 2007 document and saving it, I get these file sizes ... .TXT = 45,992 bytes .DOC = 85,504 bytes .DOCX = 46,618 bytes" 45k txt -> 46k docx is misleading to anyone who has to ask, such as the anonymous Anna Niemous.Anonymous
October 21, 2007
" DOCX is great if compatibility with existing Office documents is a priority"-Doug Mahugh Files/documents/formats are always compatible with each other - that is, nothing in one Files/documents/formats causes the contents of another Files/documents/formats to fail. The real question is whether the new format is compatible with the old applications and the old format compatible with the new apps. For Microsoft's own applications a converter had to be written to patch the old applications. This indicates that the new format is not compatible with the old applications, only that a new application can bridge the gap. Any older application that was written to handle the old format cannot read the new format, therefore the new format is not really compatible.