Simplifying Open XML WordprocessingML Queries by First Accepting Revisions

Revision tracking is one of the more involved areas of the Open XML standard.  There are over 40 elements and attributes (some with very involved semantics) that define tracked revisions.  I've written an MSDN article, Accepting Revisions in Open XML Word-Processing Documents, on the exact semantics of revision tracking markup.  By first accepting revisions, you eliminate the need to process those many elements and attributes in order to retrieve the contents of the document.  However, you may not want to modify the document on disk.  It is easy to write code to pull the document into memory and accept revisions without touching the original document on disk.  This post presents a couple of examples that show how to do this using the Open XML SDK.

This is one in a series of posts on transforming Open XML WordprocessingML to XHtml.  You can find the complete list of posts here.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC

Note: The RevisionAccepter class in PowerTools for Open XML provides a complete implementation of accepting revisions.  You can download the RevisionAccepter by going to PowerTools for Open XML, clicking on the Downloads tab, and downloading RevisionAccepter.zip.  PowerTools for Open XML is licensed under the Microsoft Public License (Ms-PL), which gives you wide latitude in how you use the code.

The gist of the technique is to read the document into a byte array, create a resizable memory stream, write the byte array into the memory stream, and then open the document from the memory stream.  By using this technique, we can write queries that don't need to take revision tracking into account, but we won't touch the original document on disk.  The following example shows how to do this:

using System;
using System.IO;
using System.Linq;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using OpenXmlPowerTools;

class Program
{
static void Main(string[] args)
{
byte[] byteArray = File.ReadAllBytes("Test.docx");
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(memoryStream, true))
{
RevisionAccepter.AcceptRevisions(wordDoc);

// Print the markup for the first paragraph after
// revisions have been accepted.
XDocument xdoc = wordDoc.MainDocumentPart.GetXDocument();
XElement para1 = xdoc.Root.Element(W.body).Elements(W.p).FirstOrDefault();
if (para1 != null)
Console.WriteLine(para1);
}
}
}
}

Even though the RevisionAccepter class is written using LINQ to XML, this technique works equally well when used with the strongly-typed object model of the Open XML SDK 2.0:

using System;
using System.IO;
using System.Linq;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using OpenXmlPowerTools;

class Program
{
static void Main(string[] args)
{
byte[] byteArray = File.ReadAllBytes("Test.docx");
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(memoryStream, true))
{
RevisionAccepter.AcceptRevisions(wordDoc);

// Print the markup for the first paragraph after
// revisions have been accepted.
Paragraph para1 = wordDoc.MainDocumentPart.Document.Body
.Elements<Paragraph>().FirstOrDefault();
if (para1 != null)
Console.WriteLine(XElement.Parse(para1.OuterXml));
}
}
}
}

Comments

  • Anonymous
    March 01, 2010
    Hi Eric, Is it possible to merge similar runs? Suppose the user changes formatting for some text in a paragraph(new run added with new rPr) and again puts the old formatting back(new run remains with the initial rPr).Now whole paragraph has same formatting but with multiple runs. In this scenario is it possible to merge runs. Thanks, Sandeep

  • Anonymous
    March 01, 2010
    Hi Sandeep, Take a look at this post: http://blogs.msdn.com/ericwhite/archive/2010/02/08/enabling-better-transformations-by-simplifying-open-xml-wordprocessingml-markup.aspx It shows you how to merge adjacent runs with identical formatting. -Eric

  • Anonymous
    March 22, 2010
    Hi Eric, I'm new to the Open XML standard. I'm stuck trying to implement Revisions as part of a web application that I'm working on. Here are a few details: Exising Scenario: Users enter data in a large web form (over 50 web pages and multiple input fields on each page). This data is then reviewed by a Reviewer. Functionality exists for a Reviewer to view data entered by a user in .docx format. Once reviewed, the Reviewer may request changes from the user. The user then goes back to the web form and updates data and notifies the Reviewer once ready for a second review. The Reviewer is then able to generate another .docx document. The Problem: When work has been re-submitted, the Reviewer has to manually open both the .docx documents (first and second submissions) and manually look for the changes. We would like to implement the "Track Changes" functionality so that in the case of re-submissions, the Reviewer can view a .docx document with changes highlighted. I'm unable to find anything on the Web describing how to go about doing something like this. I keep on coming across your MSDN Article regarding Accepting Revisions in Open XML Word-Processing Documents. Are you able to assist with any of the following queries: Firstly, is this possible? What is the best way to go about this? Any assistance would be greatly appreciated. I'm not looking for specific code, just guidance on how this could be achieved. Thanks, Chagir

  • Anonymous
    May 03, 2010
    Chagir, I was wondering if you had any luck with this?  I am also trying to implement Track Changes and haven't had any luck. Thanks, Bob

  • Anonymous
    May 03, 2010
    Hi Chagir and Bob, This is certainly doable, but it is a non-trivial task.  To generate documents with tracked revisions, you need to understand the markup that I've explained in http://msdn.microsoft.com/en-us/library/ee836138.aspx. Another help is to create documents with tracked changes, and then examine the resulting markup. I know the amount of study that it took for me to fully understand tracked changes, and it took a while.  I wish I had a better answer for you, but unfortunately, I don't. -Eric