Condividi tramite


Introduction to Office Open XML

Hi all,

Today, I am going to talk a little about OO XML(Office Open XML), as well as touching a little bit on the Open XML development tools.

You should all noticed that since office 2007, your favourite office documents has an additional "x" added in the file extension (I.e. doc to docx). This is actually the beginning of Open XML File Format. The easiest way to illustrate a doc and a docx file can be done by:

  1. Make a seperate copies of a docx, for example: docx_example.docx
  2. Rename the file extension to zip so that you have docx_example.zip
  3. You should be able view the zip files as if there are muliple layers of XML files!
  4. You cannot do the same with the old doc files.

Try it! This is because the Open XML file format is using a collection of XML files to store the document's content as well as its other attributes, such as styles, images, properties, and comments etc.

Open XML File Format Benefits:

  • The open file formats and the open specifications enable broad access to technologies.
  • The decoupled contents and other document building bloicks allow great programming flexibility and accessibility
  • Segmented data storage improves data recovery and fault tolerance. Unlike the older version of office files, the document is still recoverable if part of the file is corrupted.
  • ZIP compression reduces file sizes
  • Create/manipulate Office documents without using Office OM, Perfect for server-based scenarios.
  • Lightweight requirements: open/save zip files and XML parser
  • Extremely high performance for processing mass amounts of documents

Now, with this new open file formats and open specification, Microsoft Office development becomes faster, easier and even more customized. Open XML development allows developers to quickly parse through Office documents details and custom contents, manupulate specific parts of the Office documents and rapid generation of mass amount of Open XML Office documents.

Typical application of Open XML development can be to quickly generate office documents by reusing a similar Open XML documents template while only changing the delta between the documets.

For Open XML development, there are several tools I would recommend:

  • Open XML SDK 2.0
    • Strongly typed part classes to manipulate Open XML document packages
    • Strongly typed content classes to manipulate Open XML parts
    • Content Construction, Search, and Manipulation using LINQ
    • Validation of Open XML documents
  • Open XML SDK Productivity Tools
    • Explore the structure of Open XML documents
    • Generate Open XML SDK source code based on document content
    • Highlight differences between Open XML documents
    • Validate a document, part, or segment against 2007 and 2010 formats
    • SDK Documentation
  • Open XML SDK Code Snippets
    • Visual Studio code snippets
    • 52 snippets for performing common tasks in Word, Excel, and PowerPoint
    • Speed up your development or use as a learning tool
  • Open XML Package Editor Power Tool for Visual Studio
    • An add-in for Visual Studio 2010 that enables you to parse and edit Open Packaging Convention files (including Word, Excel, and PowerPoint documents).
    • Open any Open XML Package file directly in Visual Studio 2010
    • Browse contents in a tree view
    • Open parts in Visual Studio's rich XML editor
    • Add/remove parts and relationships
    • Import and export part contents
    • Create new Office Packages from a set of templates using Visual Studio's File > New dialog

With the Open XML SDK, creating a new Word document can be as simple as this: 

1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
 // Create a Wordprocessing document. using (WordprocessingDocument package = WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document))  {    // Add a new main document part.   package.AddMainDocumentPart();     // Create the Document DOM.   package.MainDocumentPart.Document =      new Document(        new Body(          new Paragraph(            new Run(              new Text("Hello World!")))));  } 

 Feel free to drop a note on what you would like to hear specifically for Open XML, also everything you need to know about developing in Office can also be found here: https://msdn.microsoft.com/en-us/office/bb265236.aspx

Cheers,
Danny

Comments

  • Anonymous
    July 13, 2011
    Hi Danny, Great intro to Open XML. The Open XML Developer Center is super - also, for developers both old and new to Open XML, OpenXMLDeveloper.org has a ton of great content, as well as super forums where you can get your questions answered. -Eric.

  • Anonymous
    July 13, 2011
    OpenXML is difficult to work with and counter intuitive for the most part.  Your example is simplistic and if you were to try to do a real world document generation with OpenXML the code would be bloated and difficult to maintain. The OpenXML sdk needs to seriously rework their metaphor and stop making it so complex.  For example, if I make a paragraph why do I need to make a "Run" before I add text. Clean up your bloat.

  • Anonymous
    July 14, 2011
    Welcome to the blog! Hi Eric, thanks for the comment and you are absolutely correct! There are tons of external resources on Open XML. Thank you for sharing with us! Hi Darren, thanks for the comment and let me try to elaborate more. :) Open XML is a useful tool to manipulate Open XML documents and underlie Open XML schema elements within a package. The classes in the Open XML SDK encapsulate many common tasks that developers perform on Open XML. Also, the reusability of the modules allows users to generate a large amount of documents with great consistency and performance. To address your concern on "Run", a Run is a container for one or more pieces of text having the same set of properties. For example, a Run might have the property bold, which indicates that Run's text is to be displayed in a bold typeface. The purpose of having Run here is trying to help by reducing the number of repeated definitions and properties, as well as reducing the amount of work required to make changes to the document's appearance. Hope this helps. Thanks! Cheers, Danny