Modifying Open XML Documents using the SharePoint Object Model
When working with Open XML documents from within SharePoint, you may want to open a specific document, modify it in some way, and then save it, either replacing the original document, or saving to a new location. This isn’t very hard, but there are a few issues. This post describes the issues and presents a minimal amount of code that shows how to open an Open XML document from a SharePoint list, modify it, and then save it back to either the same location, or a different one if you desire.
This blog is inactive.
New blog: EricWhite.com/blog
Blog TOCIn detail, the technique consists of:
- Writing a function that takes an SPFile as an argument. This SPFile object is the DocumentLibrary list item that we want to modify. This factoring allows us to first write this code as a console application. Then, when we have the code debugged, we can easily move the factored code into a SharePoint feature.
- Reading the document into a byte array using the SPFile.OpenBinary method.
- Creating a resizable MemoryStream object using the parameterless constructor of the MemoryStream class.
- Writing the byte array into the MemoryStream.
- Opening the document using the Open XML SDK. Trap the System.IO.FileFormatException when opening. The Open XML SDK will throw this exception if the document isn’t valid.
- Modifying the document using LINQ to XML and the Open XML SDK. This code is similar to a bunch of other code that I’ve posted on my blog.
- Writing the document back to the document library.
This code can either use the Open XML SDK V1 or CTP1 of the Open XML SDK V2.
The code that I present in this post uses the techniques for working with Open XML documents in memory. The blog post, Working With In-Memory Open XML Documents, describes some of the issues, including the issue that we can’t use the MemoryStream constructor that takes a byte array as a parameter, because that constructor creates a non-resizable MemoryStream. The Open XML SDK requires that the stream be resizable, as a modified part that is written back to the package may be larger than the original.
In this example, I use one of the techniques that I’ve found I really like when working with the SharePoint object model. I first develop my code that uses the object model using a console application, factoring the code so that I can easily move the necessary code to a feature. This means that while I’m writing my code and getting it to work properly, I have a very fast edit/compile/debug cycle. Then, when I have all of my properly factored code as I want it, I can assemble my SharePoint feature in no time at all.
Note that one of the exceptions that the Open XML SDK can throw is System.IO.FileFormatException. This exception is defined in the System.IO.Packaging namespace, which is in the WindowsBase assembly. To compile this console application, you must add references to the following three assemblies:
- DocumentFormat.OpenXml, which is the Open XML SDK (either V1 or CTP1 of V2)
- Microsoft.SharePoint
- WindowsBase
Because this is a console application that is using the SharePoint object model, you must run this code as an administrator on the box that hosts the SharePoint site. In my case, I wrote this code inside of the virtual machine that you can find here.
Here is the code that shows how to modify an Open XML document:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.IO.Packaging;
using System.Xml;
using System.Xml.Linq;
using Microsoft.SharePoint;
using DocumentFormat.OpenXml.Packaging;
public static class LocalExtensions
{
public static XDocument GetXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.Annotation<XDocument>();
if (xdoc != null)
return xdoc;
using (StreamReader sr = new StreamReader(part.GetStream()))
using (XmlReader xr = XmlReader.Create(sr))
xdoc = XDocument.Load(xr);
part.AddAnnotation(xdoc);
return xdoc;
}
public static void PutXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.GetXDocument();
if (xdoc != null)
{
// Serialize the XDocument object back to the package.
using (XmlWriter xw =
XmlWriter.Create(part.GetStream
(FileMode.Create, FileAccess.Write)))
{
xdoc.Save(xw);
}
}
}
public static string StringConcatenate(
this IEnumerable<string> source)
{
return source.Aggregate(
new StringBuilder(),
(s, i) => s.Append(i),
s => s.ToString());
}
}
class Program
{
enum ModifyDocumentResults
{
Success,
InvalidFileFormat
}
static ModifyDocumentResults ModifyDocument(SPFile file)
{
byte[] byteArray = file.OpenBinary();
using (MemoryStream mem = new MemoryStream())
{
mem.Write(byteArray, 0, (int)byteArray.Length);
try
{
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(mem, true))
{
XNamespace w =
"https://schemas.openxmlformats.org/wordprocessingml/2006/main";
// modify the document as necessary
// for this example, we'll convert the first paragraph to upper case
XDocument doc = wordDoc.MainDocumentPart.GetXDocument();
XElement firstParagraph = doc
.Element(w + "document")
.Element(w + "body")
.Element(w + "p");
if (firstParagraph != null)
{
string text = firstParagraph
.Descendants()
.Where(n => n.Name == w + "t" || n.Name == w + "ins")
.Select(n => (string)n)
.StringConcatenate();
firstParagraph.ReplaceWith(
new XElement(w + "p",
new XElement(w + "r",
new XElement(w + "t", text.ToUpper()))));
// write the XDocument back into the Open XML document
wordDoc.MainDocumentPart.PutXDocument();
}
}
// write it back to the document library
// change linkFilename if you want to write to a different location
// than the original.
string linkFilename = file.Item["LinkFilename"] as string;
file.ParentFolder.Files.Add(linkFilename, mem, true);
}
catch (System.IO.FileFormatException)
{
return ModifyDocumentResults.InvalidFileFormat;
}
}
return ModifyDocumentResults.Success;
}
static void Main(string[] args)
{
SPSite siteCollection = new SPSite("https://localhost");
SPWeb rootWeb = siteCollection.RootWeb;
// find a file to modify - any file will do so long as it's an Open XML
// WordprocessingDocument. For this example, I modify the first document
// in a document library named 'Open XML Documents'. However, the code
// modifies the document only if it has an extension of .docx. Further,
// the code traps exceptions when attempting to open using the Open XML SDK.
// If an exception is thrown, then the document isn't valid.
SPList spList = rootWeb.Lists["Open XML Documents"];
if (spList.ItemCount >= 1)
{
string linkFilename = spList.Items[0]["LinkFilename"] as string;
if (linkFilename != null)
{
if (linkFilename.EndsWith(".docx"))
{
SPFile file = spList.Items[0].File;
ModifyDocument(file);
}
}
}
}
}
Code is attached.
Comments
Anonymous
December 14, 2008
PingBack from http://stevepietrek.com/2008/12/14/links-12142008/Anonymous
January 01, 2009
Un año más ( ¡FELIZ AÑO NUEVO A TODOS! ), aquí estoy dando guerra (y ya van más de dos desde que RodrigoAnonymous
January 12, 2009
Man, it's already the second week of 2009. Where does the time go? Here are a few links to posts andAnonymous
August 04, 2009
Excellent post Eric!!! I have used the code you provide here in order to create a sample about how to expose Open Xml Document Metadata in a WSS Web Site Web Part. See the post here: http://twurl.nl/s7v2zqAnonymous
February 03, 2012
For those of you who are getting an error when it tries to load the SPSite (error states site does not exist when in fact it does), try targeting x64 instead of x86 and that should clear things up.