Working with In-Memory Open XML Documents
Sometimes you want to work with Open XML documents in memory. There are two scenarios that I know of:
This blog is inactive.
New blog: EricWhite.com/blogBlog TOCWhen working with document libraries in SharePoint, you retrieve a document from the document library as a byte array. You can then modify it as necessary, and then put it back into the document library, either as a new document, or replacing the original. This post shows how to do this.
In a web application, you may want to fabricate Open XML documents on the fly and serve them up to remote users. You don’t want to serialize such temporary documents to the file system. After creating them, you want to send them directly to the end user of the web application.
This blog post presents a bit of code that shows how to work with in-memory documents as a MemoryStream. The code works with either Open XML SDK V1 or CTP1 of the Open XML SDK V2.
There is one important point to make about using the Open XML SDK with MemoryStream objects. There is a MemoryStream constructor that takes a byte array as an argument. However, we can’t use that constructor because it creates a non-resizable instance of the MemoryStream class, and the Open XML SDK needs a resizable memory stream, as parts may change in size when serialized back into the Open XML package. Instead, we use the constructor that takes no parameters. This creates a resizable MemoryStream. We can then write the byte array to the MemoryStream, and then open the Open XML package from the MemoryStream (using the WordprocessingDocument class in this example).
After opening the WordprocessingDocument, we can work with the document as normal using the Open XML SDK. After leaving the scope of the ‘using’ statement that opens the document, the memory stream will contain the new, modified document.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
public static class LocalExtensions
{
public static XDocument GetXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.Annotation<XDocument>();
if (xdoc != null)
return xdoc;
using (StreamReader sr = new StreamReader(part.GetStream()))
using (XmlReader xr = XmlReader.Create(sr))
xdoc = XDocument.Load(xr);
part.AddAnnotation(xdoc);
return xdoc;
}
public static void PutXDocument(this OpenXmlPart part) {
XDocument xdoc = part.GetXDocument();
if (xdoc != null) {
// Serialize the XDocument object back to the package.
using (XmlWriter xw =
XmlWriter.Create(part.GetStream
(FileMode.Create, FileAccess.Write))) {
xdoc.Save(xw);
}
}
}
public static string StringConcatenate(
this IEnumerable<string> source)
{
return source.Aggregate(
new StringBuilder(),
(s, i) => s.Append(i),
s => s.ToString());
}
}
class Program
{
static void Main(string[] args)
{
byte[] byteArray = File.ReadAllBytes("Test.docx");
using (MemoryStream mem = new MemoryStream())
{
mem.Write(byteArray, 0, (int)byteArray.Length);
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(mem, true))
{
XNamespace w =
"https://schemas.openxmlformats.org/wordprocessingml/2006/main";
// modify the document as necessary
// for this example, we'll convert the first paragraph to upper case
XDocument doc = wordDoc.MainDocumentPart.GetXDocument();
XElement firstParagraph = doc
.Element(w + "document")
.Element(w + "body")
.Element(w + "p");
if (firstParagraph != null)
{
string text = firstParagraph
.Descendants()
.Where(n => n.Name == w + "t" || n.Name == w + "ins")
.Select(n => (string)n)
.StringConcatenate();
firstParagraph.ReplaceWith(
new XElement(w + "p",
new XElement(w + "r",
new XElement(w + "t", text.ToUpper()))));
// write the XDocument back into the Open XML document
wordDoc.MainDocumentPart.PutXDocument();
}
}
// at this point, the MemoryStream contains the modified document.
// We could write it back to a SharePoint document library or serve
// it from a web server.
// in this example, we'll serialize back to the file system to verify
// that the code worked properly.
using (FileStream fileStream = new FileStream("Test2.docx",
System.IO.FileMode.CreateNew))
{
mem.WriteTo(fileStream);
}
}
}
}
Code is attached.
Comments
Anonymous
December 11, 2008
Parmi les posts techniques à ne pas manquer : Comment assembler des documents Word 2007 (utilisationAnonymous
December 22, 2008
Thanks a million for your insight on this subject. Using your logic I got my asp.net app working like a champ now.Anonymous
January 12, 2009
Man, it's already the second week of 2009. Where does the time go? Here are a few links to posts andAnonymous
September 11, 2009
The comment has been removedAnonymous
September 11, 2009
Thanks Jason! I'm happy it worked for you!Anonymous
December 23, 2009
hi, i have a problem using the memory stream, maybe you can help.. when working on the localhost the code works great! but when i copy the project to the main server i recieve : http 404 file or directory not found. i don't understand why. i am able to read into a filestream the file. and then send the memory stream as in your example. but when i try : wordprocessingdocument.open (memstream,true) i get the error. how can this be? i an working on the memorystream and still get the error of file (404)? please help!!!!Anonymous
December 24, 2009
Hi Karen, I would guess that opening the memory stream is not the cause of the issue. Opening the memory stream doesn't cause an http 404 error. The WordprocessingDocument.Open method throws exceptions (ArgumentNullException and OpenXmlPackageException). I suspect that you have a security configuration issue on your web server. I am not an expert in those areas, but that is where I would start looking. -EricAnonymous
March 11, 2010
How can i perform a find and repolace in this document? ex: I have a string to replace "<<NAME>>" and i want to replace with another value.Anonymous
April 16, 2010
Hi Eric, How can I serve the file to the client to download from a web application? ThanksAnonymous
April 16, 2010
The comment has been removedAnonymous
November 25, 2010
Thank you Eric, I spent an embarassing amount of time getting this to work. I don't think that it would have been possible to come up with PutXDocument() without this blog. One other issue that I had though was in returning a byte[] to my database. I couldn't find a way to get the correct contents out of the memorystream so I wrote to another one. I suspect that there is a better way... so this: return ms.GetBuffer() Did not work but this: MemoryStream output = new MemoryStream(); ms.WriteTo(output); return output.GetBuffer();