Using the Open XML SDK and LINQ to XML to Remove Comments from an Open XML Wordprocessing Document

This post presents a snippet of code to remove comments from an Open XML Wordprocessing document.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCNote: This post may be of interest to LINQ to XML developers, as it contains some information that helps you write queries that perform better.  In the case of very large documents, the approach described below performs much better than other approaches.

The code is very simple: remove all w:commentRangeStart, w:commentRangeEnd, and w:commentReference elements in the main document part, and then remove the comment part.

The following is the code that removes the above mentioned elements.

// pre-atomize the XName objects so that they are not atomized for every item in the collection
XName commentRangeStart = w + "commentRangeStart";
XName commentRangeEnd = w + "commentRangeEnd";
XName commentReference = w + "commentReference";
mainDocumentXDoc.Descendants()
.Where(x => x.Name == commentRangeStart ||
x.Name == commentRangeEnd ||
x.Name == commentReference)
.Remove();

mainDocumentXDoc

    .Descendants(w + "commentRangeStart")

    .Remove();

mainDocumentXDoc

    .Descendants(w + "commentRangeEnd")

    .Remove();

mainDocumentXDoc

    .Descendants(w + "commentReference")

    .Remove();

Of course, this causes iteration of all of the descendants three times, not very desirable for large documents.

So, keeping this in mind, you might write it like this:

mainDocumentXDoc.Descendants()

    .Where(x => x.Name == w + "commentRangeStart" ||

        x.Name == w + "commentRangeEnd" ||

        x.Name == w + "commentReference")

    .Remove();

This causes iterations of the Descendants axis only once.  However, there is a subtler performance issue here: the names (as expressed by w + "commentRangeStart", etc.) are atomized over and over again for every item in the Descendants axis.  To make the code perform as well as possible, we pre-atomize the XName objects, then we use them in the call to the Where extension method:

XName commentRangeStart = w + "commentRangeStart";

XName commentRangeEnd = w + "commentRangeEnd";

XName commentReference = w + "commentReference";

mainDocumentXDoc.Descendants()

    .Where(x =>

       x.Name == commentRangeStart ||

       x.Name == commentRangeEnd ||

       x.Name == commentReference)

    .Remove();

For more detailed information about atomization and LINQ to XML performance, see Performance of LINQ to XML.

The attached code also has a bool method that indicates whether the document contains comments.

Code is attached.

RemoveComments.cs

Comments

  • Anonymous
    July 13, 2008
    In the last three posts, in addition to the information regarding how we want to alter the markup in

  • Anonymous
    July 17, 2008
    Les voici : PowerTools : Utiliser System.IO.Packaging dans PowerTools pour modifier des propriétés (Doug

  • Anonymous
    July 18, 2008
    In the next series of blog posts, I’ll be exploring some interesting aspects of SharePoint development.

  • Anonymous
    July 20, 2008
    Just installed the OpenXML SDK v1.0.  Forgive me if my question is not directly related.  I'm trying to select all Tables in a Word document and write them out as Worksheets in an Excel workbook.  I can collect Word tables with VSTO but can't easily write them out to Excel.  Can I do that with OpenXML SDK?

  • Anonymous
    July 22, 2008
    This post presents a custom application page in SharePoint that uses Open XML, the Open XML SDK and LINQ

  • Anonymous
    August 17, 2008
    Ce post n’a pas voulu partir ni jeudi ni vendredi, le voici donc ! Des mise à jours à n’en plus finir

  • Anonymous
    February 06, 2009
    One of the more common scenarios related to a Wordprocessing document is the need to sanitize a document