Announcing the First CTP of Open XML SDK V2

Today, the development team for the Open XML SDK has announced that they have released the first Community Technology Preview (CTP) of version 2 of the Open XML SDK.  Download it at https://go.microsoft.com/fwlink/?LinkId=127912.  There is a lot of very cool stuff in this release, including:

  • This blog is inactive.
    New blog: EricWhite.com/blog

    Blog TOC Strongly typed document object model (DOM).

  • Tool to compare two Open XML files.

  • Class explorer that helps you understand the markup and determine which classes to use in the strongly typed DOM.

  • Document reflector that can write a lot of your code for generating documents or content.

This is a big step forward.  Version 1 of the SDK provided us with strongly typed access to the parts of a package; however, it didn’t provide any facilities for consuming or producing the XML contained in the parts.  This version of the SDK really helps a lot with many aspects of modifying and producing the XML parts.  Developers who work with Open XML should incorporate the SDK and associated tools into their toolkit.

Development of this SDK has been, and will continue to be an interactive process with the users of the SDK.  You can participate!  Sign up in the Open XML SDK Connect site, and receive SDK related news and provide feedback directly to the team.  You can find the Connect site here.

Strongly Typed Document Object Model

The most important new feature of this version of the SDK is the strongly typed document object model (DOM).  I’ve written a whole pile of code using V1 of the SDK using LINQ to XML, and perhaps the biggest single issue that I encounter is remembering the namespaces and names of elements and attributes in the markup.  Due to the less strongly typed nature of LINQ to XML, it is far too easy to write code that compiles, but doesn’t run properly.  If you misname an element in a query, your query can erroneously return an empty collection.  If you misname an element when generating markup, you will generate an invalid document.

The gist of the strongly typed DOM is that the SDK defines classes for elements in the markup.  When you access the contents of a part, you access via these class.  For example, instead of retrieving a collection of System.Xml.Linq.XElement objects for the paragraph (w:p) elements, you retrieve a collection of DocumentFormat.OpenXml.Wordprocessing.Paragraph objects.  The SDK defines generic methods that take a type parameter so that you can more easily retrieve all Paragraph elements that are child elements of the Body element.  In addition, there are a number of places in Open XML where an attribute can have one of several values.  The SDK defines enumerations for the properties that represent the attributes; this provides more strongly typed goodness.  The API is LINQ friendly, including lazy access, axis methods that parallel the LINQ to XML axes, and annotations so that you can store application specific information on each object.

The easiest way to show the benefits of the new, strongly typed DOM is to present a very small example using LINQ to XML (a weakly typed approach), and the same example using V2 of the SDK.

The following example shows a typical LINQ to XML query to retrieve paragraphs from a word processing document:

var paragraphs = doc.MainDocumentPart
.GetXDocument()
.Element(w + "document")
.Element(w + "body")
.Elements(w + "p")
.Select
(
p => new
{
ParagraphNode = p,
Text = p
.Elements()
.Where(z => z.Name == w + "r" || z.Name == w + "ins")
.Descendants(w + "t")
.Select(t => (string)t)
.StringConcatenate()
}
);
foreach (var b in paragraphs)
Console.WriteLine(b.Text);

The following shows the equivalent LINQ query using the strongly typed DOM:

var build = doc.MainDocumentPart
.Document.Body.Elements<Paragraph>()
.Select
(
p => new
{
ParagraphNode = p,
Text = p
.Elements()
.Where(z => z is Run || z is InsertedRun)
.Select(z => z.Elements<Text>()
.Select(t => t.Text)
.StringConcatenate())
.StringConcatenate()
}
);
foreach (var b in build)
Console.WriteLine(b.Text);

Note that the second version doesn’t use strings to identify the nodes.  Instead, you use actual type names.  For instance, the following line retrieves all of the child paragraph nodes of the Body element:

var build = doc.MainDocumentPart
.Document.Body.Elements<Paragraph>()

In this case, it isn’t possible to misspell “Paragraph”.  It is a type name, so would not compile if it were misspelled.

And because we now have strongly typed objects in the DOM, we can write some very cool extension methods that make programming with the DOM much simpler.  Also, there is a strongly typed streaming reader and writer.  These are topics for another day.

OpenXmlDiff

I’ve blogged about various approaches to comparing two Open XML documents.  This is the best approach so far.  It is a super tool that should become part of every Open XML developer’s toolkit.  The use of it couldn’t be simpler – open two documents, click the compare button, and see the differences:

Open XML Class Explorer

This tool allows you to navigate through the classes in the SDK.  Its primary purpose is to allow you to explore the markup and determine the appropriate strongly typed class to use for the markup.  This would be super all by itself, but the team has put together something extra!  When you click on a class, in addition to seeing the class hierarchy, you also see the actual text of the Ecma 376 specification.  This is a very convenient way to explore the specification.

Open XML Document Reflector

If V2 of the SDK only contained the above items, it would be an incredibly impressive release.  But, there is more!

The document reflector can really help you to write code to generate documents.  In my job as technical evangelist for Open XML, I’ve talked with a large number of customers, and the most common scenario is document generation.  Well, this tool really helps!

To use the document reflector, you open a document, and click on a node within the hierarchy of nodes that are presented to you.  The document reflector then presents you with C# code to generate that node within the document.  You can click on a part, and the document reflector will generate the code to create the part.  And if you click on the document node itself, the document reflector presents you with the code to create the entire document.  You can then take the code generated by DocumentReflector, parameterize it, and modify it as necessary.  This really simplifies the process of creating a program to generate documents.

This version of the toolkit is released with the Community Technology Preview (CTP) license, which means that you can’t deploy solutions using it until it is released with a ‘go-live’ license.  However, there is a lot of value in the accompanying tools for developers who must ship using V1 of the SDK.

Finally, because this is a CTP, be aware that the API may change between now and the final release.  Sign up for the Open XML SDK Connect site and get all the latest news.

Comments

  • Anonymous
    September 05, 2008
    PingBack from http://www.easycoded.com/announcing-the-first-ctp-of-open-xml-sdk-v2/

  • Anonymous
    September 06, 2008
    nice this should make my xml life easier.

  • Anonymous
    September 08, 2008
    Where the heck did August go? A few links from the last month ... Open XML SDK V2. The first CTP of V2

  • Anonymous
    September 08, 2008
    Eric White's Blog : Announcing the First CTP of Open XML SDK V2 이게 웬 떡일까요^^ Open XML SDK의 두번째 버젼의 첫 CTP(Community

  • Anonymous
    September 10, 2008
    Open XML SDK V2 The first CTP of V2 of the Open XML SDK is now available for download, Eric White has

  • Anonymous
    September 12, 2008
    Hi Eric, As regards the diff tool, is there any way it can be (or taught to be by the user) a bit smarter about what parts it is diffing. If the parts aren't identically named, but are semantically similar/identical even, you can't diff. For example, our app writes out an Excel sheet as xl/worksheets/sheet.xml, not sheet1.xml, so when comparing with Excel, it doesnt work. Same with themes, we use theme.xml. I know may people will just choose the empirical norm of following Excel's behavior, but for us mavericks, any way to "map" them, or even manually choose which parts within a package to diff, one at a time would be good ;) Very useful tool though. Gareth  

  • Anonymous
    September 16, 2008
    Open XML SDK V2. The first CTP of V2 of the Open XML SDK is now available for download. This release...

  • Anonymous
    September 17, 2008
    The comment has been removed

  • Anonymous
    September 20, 2008
    You've been kicked (a good thing) - Trackback from DotNetKicks.com

  • Anonymous
    September 23, 2008
    Stupid. I did similar thing as an experiment with WordML (Word 2003) - with slight modifications to the schemas I used the XSD tool from VS2005 to generate classes that match the schema. This approach still doesn't make up for a decent API - not even close. How do I do a "simple" copy-paste from one document to another? The API gives no clues about that. Transferring an already complex schema into code doesn't do much good.

  • Anonymous
    September 23, 2008
    The comment has been removed

  • Anonymous
    October 01, 2008
    (October 1, 2008 - Update - Open XML SDK V2 does help in consuming the XML of the parts. See this post

  • Anonymous
    October 14, 2008
    The Open XML SDK 2.0 Community Technology Preview (CTP) is here! You can find the documentation for it

  • Anonymous
    November 02, 2008
    The comment has been removed

  • Anonymous
    November 02, 2008
    Hi Sanjay, The go-live version will be released in the same timeframe as Office 14.  Microsoft hasn't announced any date for 14 yet, though. -Eric

  • Anonymous
    November 04, 2008
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 ...

  • Anonymous
    November 04, 2008
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 ...

  • Anonymous
    November 10, 2008
    In your example:         .Where(z => z is Run || z is InsertedRun)                .Select(z => z.Elements<Text>()                    .Select(t => t.Text)                    .StringConcatenate())                .StringConcatenate() What is .StringConcatenate ? Won't compile for me...

  • Anonymous
    November 10, 2008
    Hi PhilM, Sorry about that - this is a function that I've been using for some time.  You can find that function, along with an explanation of it here:  http://blogs.msdn.com/ericwhite/pages/Aggregation.aspx -Eric

  • Anonymous
    December 02, 2008
    WYvz96  <a href="http://xgmwhdipovxs.com/">xgmwhdipovxs</a>, [url=http://hqflpjsjrapa.com/]hqflpjsjrapa[/url], [link=http://uzcmwqqbnlud.com/]uzcmwqqbnlud[/link], http://zpqyvbvfrxqb.com/