New blog: Gray Matter
People often ask me how much smaller Open XML documents are than corresponding Office binary documents. It's a hard question to answer with any precision, because the difference in size is so dependent on the document content. For example, a long simple text document will compress by orders of magnitude, a typical document with graphics and other types of content compresses somewhat less, and you can even create documents that are a bit larger in Open XML format than they are in the binary formats.
Gray Knowlton's post on "File size reduction for Open XML" covers some of the issues involved, and explains what you can expect in general terms. Gray is a group product manager in Office, and I'm on his team. We have a lot going on around Open XML these days, and this post is the first in a 3-part series he'll be doing on file size reduction, document "sanitization" and improvements in document format security. You can subscribe to Gray's blog right here, and you can expect he'll have some interesting things to say about Open XML going forward.
Comments
Anonymous
December 18, 2007
PingBack from http://geeklectures.info/2007/12/18/new-blog-gray-matter/Anonymous
December 19, 2007
I downloaded Gray's test cases and found that zipping the Test 5 MSO binary format resulted in a file 58% (15/26ths) of the MSO-XML format file size. Zipping the MSO-XML file resulted in an 85% (22/26) reduction, for a zip-off final that leaves zipping MSO binary with a compression advantage of 70% file size over zipping MSO-XML. After extracting and re-zipping the MSO-XML file the file size is 14k, or (14/15) for a 7% reduction from the zipped MSO binary file. It looks like zipping the old format is the easist way to get a decent reduction. Resuffixing (to .zip), extracting, resuffixing (to .docx), and re-zipping the MSO-XML contents offers a slight size advantage for a lot more work than click & zip. A comparison of the zipped files shows the MS-zip algorithm is consistently worse than WinZip's. The largest difference (% and byte count) was on the document.xml file, which MS-zip squeezed it down 97% and WinZip to 99%. This shows MS-zip at a 3% to 1% disadvantage. (All values referring to Test 5.)