Using the Open XML SDK and LINQ to XML to Remove Comments from an Open XML Wordprocessing Document
This post presents a snippet of code to remove comments from an Open XML Wordprocessing document.
This blog is inactive.
New blog: EricWhite.com/blogBlog TOCNote: This post may be of interest to LINQ to XML developers, as it contains some information that helps you write queries that perform better. In the case of very large documents, the approach described below performs much better than other approaches.
The code is very simple: remove all w:commentRangeStart, w:commentRangeEnd, and w:commentReference elements in the main document part, and then remove the comment part.
The following is the code that removes the above mentioned elements.
// pre-atomize the XName objects so that they are not atomized for every item in the collection
XName commentRangeStart = w + "commentRangeStart";
XName commentRangeEnd = w + "commentRangeEnd";
XName commentReference = w + "commentReference";
mainDocumentXDoc.Descendants()
.Where(x => x.Name == commentRangeStart ||
x.Name == commentRangeEnd ||
x.Name == commentReference)
.Remove();
mainDocumentXDoc
.Descendants(w + "commentRangeStart")
.Remove();
mainDocumentXDoc
.Descendants(w + "commentRangeEnd")
.Remove();
mainDocumentXDoc
.Descendants(w + "commentReference")
.Remove();
Of course, this causes iteration of all of the descendants three times, not very desirable for large documents.
So, keeping this in mind, you might write it like this:
mainDocumentXDoc.Descendants()
.Where(x => x.Name == w + "commentRangeStart" ||
x.Name == w + "commentRangeEnd" ||
x.Name == w + "commentReference")
.Remove();
This causes iterations of the Descendants axis only once. However, there is a subtler performance issue here: the names (as expressed by w + "commentRangeStart", etc.) are atomized over and over again for every item in the Descendants axis. To make the code perform as well as possible, we pre-atomize the XName objects, then we use them in the call to the Where extension method:
XName commentRangeStart = w + "commentRangeStart";
XName commentRangeEnd = w + "commentRangeEnd";
XName commentReference = w + "commentReference";
mainDocumentXDoc.Descendants()
.Where(x =>
x.Name == commentRangeStart ||
x.Name == commentRangeEnd ||
x.Name == commentReference)
.Remove();
For more detailed information about atomization and LINQ to XML performance, see Performance of LINQ to XML.
The attached code also has a bool method that indicates whether the document contains comments.
Code is attached.
Comments
Anonymous
July 13, 2008
In the last three posts, in addition to the information regarding how we want to alter the markup inAnonymous
July 17, 2008
Les voici : PowerTools : Utiliser System.IO.Packaging dans PowerTools pour modifier des propriétés (DougAnonymous
July 18, 2008
In the next series of blog posts, I’ll be exploring some interesting aspects of SharePoint development.Anonymous
July 20, 2008
Just installed the OpenXML SDK v1.0. Forgive me if my question is not directly related. I'm trying to select all Tables in a Word document and write them out as Worksheets in an Excel workbook. I can collect Word tables with VSTO but can't easily write them out to Excel. Can I do that with OpenXML SDK?Anonymous
July 22, 2008
This post presents a custom application page in SharePoint that uses Open XML, the Open XML SDK and LINQAnonymous
August 17, 2008
Ce post n’a pas voulu partir ni jeudi ni vendredi, le voici donc ! Des mise à jours à n’en plus finirAnonymous
February 06, 2009
One of the more common scenarios related to a Wordprocessing document is the need to sanitize a document