Embedding an Open XML File in another Open XML File

A couple of weeks ago I gave a presentation on the Open XML SDK to a few customers, where I was asked questions on how to embed files within Open XML documents. I thought it would be a good opportunity to devote a couple of posts around this topic. In today's post I am going to show you how to embed an Open XML file in another Open XML file. Specifically, I am going to show you how to embed an Excel spreadsheet (.xlsx) into a Word document (.docx). Next post will cover how to embed other file types in Open XML files.

My post will talk about using version 2 of the SDK.

If you just want to jump straight into the code, feel free to download this solution here.

Solution

To embed an Excel spreadsheet into a Word document we can take the following actions:

  1. Create a template in Word that contains a content control that will be used to demarcate the region where the embedded object will be inserted
  2. Open up the Word document via the Open XML SDK and access its main document part
  3. Add an image part to the document (this image will be a placeholder image of the embedded object file)
  4. Add an embedded package part to the document
  5. Create a paragraph that contains the embedded object
  6. Locate the content control that will contain the embedded object
  7. Swap out the content control for the newly created paragraph
  8. Save changes made to the Word document

Note that the steps outlined above are just one method to accomplish this scenario.

For the sake of this example, let's say I am starting with the following Word document:

This document contains a content control, named "EmbedObject," which will contain my embedded object. In addition, let's say I have the following Excel spreadsheet I wish to embed:

Embedded Objects in Open XML

Before we get into the code, I wanted to talk more about embedded objects. Office has three ways of storing embedded objects:

  1. Those where Office persists the IStorage as given to Office during OLE operations
  2. Those where Office persists the IStorage as given during OLE operations, but gives the embedded object a friendly extension and filename. This method assumes that the embedded object is a native file format of the application in question
  3. Those where Office interprets the IStorage given during OLE operations as simply a wrapper for a package and only stores the package. This method assumes that the package conforms to Open Packaging Conventions

The major difference between #0 vs. #1 and #2 is in how objects are embedded within a file. Types #1 and #2 allows developers working with Open XML files to more easily extract and insert embedded objects because there is no need to talk to an OLE server. Instead, developers can simply read/write embedded objects as if they were reading from or writing to files on disk. Office differentiates between these three types by looking for a specific registry key under HKCR\CLSID\{Apps_OLE_Storage_CLSID}, where Apps_OLE_Storage_CLSID is the CLSID of the OLE storage server. The Office applications look for a subkey named IPersistStorageType and determines the type of the embedded object in the following manner:

  • Office assumes the embedded object is type #0 if no subkey is specified or if the value of the subkey is 0
  • Office assumes the embedded object is type #1 if the subkey has a value of 1
  • Office assumes the embedded object is type #2 if the subkey has a value of 2

The cool thing is that other applications can take advantage of this reg key. For example, if an application writes out a value of 1 for this subkey for a particular file format then the Office applications will embed files of that type natively in the Open XML file formats.

One more thing to note is that all embedded object types require a prog id, which you can find from the registry, as well as an image representation of the object.

The Code

As mentioned above, when an object is embedded in a document, both a visual representation of the object and the underlying data is stored. The visual representation is simply an image of what you would see if you were to activate the object. For the sake of this solution, my visual representation of the document will be a placeholder image that indicates to users how to refresh the embedded object and will look like the following image:

Looking at the steps outlined above in the Solution section, here is the code snippet to accomplish steps two through four:

using (WordprocessingDocument myDoc = WordprocessingDocument.Open(output, true)) { MainDocumentPart mainPart = myDoc.MainDocumentPart; ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Png); imagePart.FeedData(File.Open("placeholder.png", FileMode.Open)); EmbeddedPackagePart embeddedObjectPart = mainPart.AddEmbeddedPackagePart(@"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); embeddedObjectPart.FeedData(File.Open("embed.xlsx", FileMode.Open)); }

The placeholder.png refers to the placeholder image I showed you above and the embed.xlsx file is the spreadsheet that will be embedded. The string "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" represents the content type of an Excel spreadsheet with an extension .xlsx. You can find the content type of a particular file by going to HKCR\.XXX, where XXX is the extension of the file format, and looking for a sub key named "Content Type."

The next step is to create a paragraph that represents our embedded object. Like my other post on importing SmartArt from PowerPoint to Word, I am going to take advantage of the Document Reflector tool that ships free with the SDK. Using this tool's output as a starting point, I am able to generate the necessary paragraph with the following code snippet:

static Paragraph CreateEmbeddedObjectParagraph(string imageId, string embedId) { Paragraph p = new Paragraph( new Run( new EmbeddedObject( new V.Shapetype( new V.Stroke() { JoinStyle = V.StrokeJoinStyleValues.Miter }, new V.Formulas( new V.Formula() { Equation = "if lineDrawn pixelLineWidth 0" }, new V.Formula() { Equation = "sum @0 1 0" }, new V.Formula() { Equation = "sum 0 0 @1" }, new V.Formula() { Equation = "prod @2 1 2" }, new V.Formula() { Equation = "prod @3 21600 pixelWidth" }, new V.Formula() { Equation = "prod @3 21600 pixelHeight" }, new V.Formula() { Equation = "sum @0 0 1" }, new V.Formula() { Equation = "prod @6 1 2" }, new V.Formula() { Equation = "prod @7 21600 pixelWidth" }, new V.Formula() { Equation = "sum @8 21600 0" }, new V.Formula() { Equation = "prod @7 21600 pixelHeight" }, new V.Formula() { Equation = "sum @10 21600 0" }), new V.Path() { AllowGradientShape = V.BooleanValues.T, ConnectionPointType = OVML.ConnectValues.Rectangle, AllowExtrusion = V.BooleanValues.F }, new OVML.Lock() { Extension = V.ExtensionHandlingBehaviorValues.Edit, AspectRatio = OVML.BooleanValues.T } ) { Id = "_x0000_t75", CoordinateSize = "21600,21600", Filled = V.BooleanValues.F, Stroked = V.BooleanValues.F, OptionalNumber = 75, PreferRelative = V.BooleanValues.T, EdgePath = "m@4@5l@4@11@9@11@9@5xe" }, new V.Shape( new V.ImageData() { Title = "", RelationshipId = imageId } ) { Id = "_x0000_i1025", Style = "width:500pt;height:400pt", Ole = V.BooleanEntryWithBlankValues.Empty, Type = "#_x0000_t75" }, new OVML.OleObject() { Type = OVML.OLEValues.Embed, ProgId = "Excel.Sheet.12", ShapeId = "_x0000_i1025", DrawAspect = OVML.OLEDrawAspectValues.Content, ObjectId = "_1307530183", Id = embedId } ) { DxaOriginal = (UInt32Value)10957U, DyaOriginal = (UInt32Value)8455U }) ); return p; }

The last step of the solution is to swap out the content control for this newly created paragraph. This code is very similar to a lot of my previous posts where I used content controls as semantic structures. Here is the code snippet to accomplish this task:

Paragraph p = CreateEmbeddedObjectParagraph(mainPart.GetIdOfPart(imagePart), mainPart.GetIdOfPart(embeddedObjectPart)); SdtBlock sdt = mainPart.Document.Descendants<SdtBlock>() .Where(s => s.GetFirstChild<SdtProperties>().GetFirstChild<Alias>().Val.Value .Equals("EmbedObject")).First(); OpenXmlElement parent = sdt.Parent; parent.InsertAfter(p, sdt); sdt.Remove(); mainPart.Document.Save();

End Result

Running this code I should end up with a document that looks like the following:

Upon activating the embedded object I will see the following:

Pretty easy stuff! Next time I will show you how to embed other file formats, like PDF.

Zeyad Rajabi

Comments

  • Anonymous
    July 05, 2009
    hi Brian, I want to create office2007 file with OpenXML SDK2 in VC++ 6.0, is it possible? Thanks/Jimburg

  • Anonymous
    July 05, 2009
    hi Brian, I want to create office2007 file with OpenXML SDK2 in VC++ 6.0, is it possible? Thanks/Jimburg

  • Anonymous
    July 06, 2009
    The Open XML SDK is based on .NET. That being said, there are ways to unmanaged code interoperate with managed code. Here is one article that talks about doing this: http://msdn.microsoft.com/en-us/library/ms973872.aspx

  • Anonymous
    July 06, 2009
    Thank you very much, it really helps.