Technical Improvements in the Open XML SDK

[Blog Map]  This blog is inactive.  New blog: EricWhite.com/blog

(Note: July 9, 2008 - I've written a new post that shows an even better way to implement functionality like this.) 

Sometimes I get to write a blog post that is really fun to write, and this is one of them.  This particular subject started brewing in my mind last November and December, before I started in my current job.  At the time, I was writing some code to see the most effective and approachable way to access Open XML documents using LINQ to XML.

One of the problems that I ran into is that after I had populated an XML tree from a part, there was no good place to keep that populated XDocument.  It would be possible to keep it in a dictionary, and then look it up from the part every time you need it, but this didn't appeal to me.  However, if the Open XML SDK had annotations, in the style of LINQ to XML, then after populating an XDocument from a part, we can attach the XDocument to the part.  Before populating the XDocument, we first check to see if we already have one.  Well, annotations have been added to the April 2008 CTP of the Open XML SDK.

This makes it easier to deal with the XML contained in the parts.  All a developer needs to do is to load the WordprocessingDocument, and get the XDocument for specific parts as necessary.  If the XDocument has already been loaded, the work to load it will not be repeated.

There are more sophisticated uses of this new feature.  One possible enhancement: automatically reserialize the XDocument objects back to the package if the XDocument was changed.  I'll be blogging more on this.

In the following example, I've written an extension method, GetXDocument, that you can call on any OpenXmlPart.  You can see how this method uses annotations.

public static XDocument GetXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.Annotation<XDocument>();
if (xdoc != null)
return xdoc;
using (StreamReader streamReader = new StreamReader(part.GetStream()))
xdoc = XDocument.Load(XmlReader.Create(streamReader));
part.AddAnnotation(xdoc);
return xdoc;
}

Here is the entire example:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using Microsoft.Office.DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Xml.Linq;
namespace OpenXmlSdkExample
{
public class Comment
{
public int Id { get; set; }
public string Text { get; set; }
public string Author { get; set; }
public Paragraph Parent { get; set; }
public Comment(Paragraph parent) { Parent = parent; }
}
public class Paragraph
{
public XElement ParagraphElement { get; set; }
public string StyleName { get; set; }
public string Text { get; set; }
public IEnumerable<Comment> Comments()
{
XNamespace w =
"https://schemas.openxmlformats.org/wordprocessingml/2006/main";
XElement p = ParagraphElement;
var commentIds = p
.Elements(w + "commentRangeStart")
.Attributes(w + "id")
.Select(c => (int)c);
return
commentIds
.Select(i =>
new Comment(this)
{
Id = i,
Author =
Parent.MainDocumentPart.CommentsPart.GetXDocument()
.Root
.Elements(w + "comment")
.Where(c => (int)c.Attribute(w + "id") == i)
.First()
.Attribute(w + "author")
.Value,
Text =
Parent.MainDocumentPart.CommentsPart.GetXDocument()
.Root
.Elements(w + "comment")
.Where(c => (int)c.Attribute(w + "id") == i)
.First()
.Descendants(w + "p")
.Select(run => run
.Descendants(w + "t")
.StringConcatenate(e => (string)e)
+ "n")
.Aggregate(new StringBuilder(), (sb, v) => sb.Append(v), sb => sb.ToString())
.Trim()
}
);
}
public WordprocessingDocument Parent { get; set; }
public Paragraph(WordprocessingDocument parent) { Parent = parent; }
}
public static class LocalExtensions
{
public static string DefaultStyle(this WordprocessingDocument doc)
{
XNamespace w =
"https://schemas.openxmlformats.org/wordprocessingml/2006/main";
XDocument styleXDocument = doc.MainDocumentPart.StyleDefinitionsPart.GetXDocument();
return (string)(
from style in styleXDocument.Root.Elements(w + "style")
where (string)style.Attribute(w + "type") == "paragraph" &&
(string)style.Attribute(w + "default") == "1"
select style
).First().Attribute(w + "styleId");
}
public static IEnumerable<Paragraph> Paragraphs(this WordprocessingDocument doc)
{
// a good convention to use is to name the XNamespace
// variable with the same name as the namespace prefix,
// and to name XName variables with the local name of the element
XNamespace w =
"https://schemas.openxmlformats.org/wordprocessingml/2006/main";
XName r = w + "r";
XName ins = w + "ins";
string defaultStyle = doc.DefaultStyle();
// query for all paragraphs in the document.
return
from p in doc
.MainDocumentPart
.GetXDocument()
.Root
.Element(w + "body")
.Descendants(w + "p")
let styleNode = p
.Elements(w + "pPr")
.Elements(w + "pStyle")
.FirstOrDefault()
select new Paragraph(doc)
{
ParagraphElement = p,
StyleName = styleNode != null ?
(string)styleNode.Attribute(w + "val") :
defaultStyle,
// in the following query, we need to select both
// the r and ins elements in order to assemble the text
// properly for paragraphs that have tracked changes.
Text = p
.Elements()
.Where(z => z.Name == r || z.Name == ins)
.Descendants(w + "t")
.StringConcatenate(element => (string)element)
};
}
public static XDocument GetXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.Annotation<XDocument>();
if (xdoc != null)
return xdoc;
using (StreamReader streamReader = new StreamReader(part.GetStream()))
xdoc = XDocument.Load(XmlReader.Create(streamReader));
part.AddAnnotation(xdoc);
return xdoc;
}
public static string StringConcatenate<T>(this IEnumerable<T> source,
Func<T, string> func)
{
StringBuilder sb = new StringBuilder();
foreach (T item in source) sb.Append(func(item));
return sb.ToString();
}
}
class Program
{
static void Main(string[] args)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open("Test.docx", true))
{
Console.WriteLine(wordDoc.DefaultStyle());
foreach (var p in wordDoc.Paragraphs())
Console.WriteLine("{0}:{1}", p.StyleName.PadRight(20), p.Text);
}
}
}
}

Comments

  • Anonymous
    March 13, 2008
    After nine months of developer feedback on the Open XML SDK , we have some good news today: a roadmap

  • Anonymous
    March 13, 2008
    L'annonce vient tout juste de tomber sur openxmldeveloper.org : le SDK Open XML CTP 2 va être mis à disposition

  • Anonymous
    March 17, 2008
    Pubblicata la roadmap di Open XML SDK. Vi segnalo alcuni link di approfondimento: Open XML SDK download

  • Anonymous
    March 22, 2008
    I would like to OOXML 2.0 incorporating XML based open formats for Visio, Publisher and OneNote. Access too if possible and feasible.

  • Anonymous
    March 27, 2008
    On March 13th, 2008, Microsoft announced a roadmap for the Open XML SDK.&#160; The Open XML SDK, originally

  • Anonymous
    April 01, 2008
    Dopo la standardizzazione ECMA annunciato il completamento del processo formale per&#160; l'approvazione

  • Anonymous
    April 17, 2008
    We are glad to announce that the Open XML Format SDK April CTP is available! You can download the new

  • Anonymous
    April 17, 2008
    The April 2008 CTP of the Open XML SDK is now live on the web, and available for download! I'm really

  • Anonymous
    April 17, 2008
    Suite à l'annonce qu'avait fait Microsoft sur la disponibilité du SDK Open XML, voici enfin venue la

  • Anonymous
    July 08, 2008
    In this post, I’m presenting some code that uses the Open XML SDK and LINQ to XML to query an Open XML

  • Anonymous
    April 27, 2009
    There is an interesting approach that we use in PowerTools for Open XML that makes it easy to write cmdlets