Office Word Basic operations: Open XML SDK 2.5 Office Word documents
Introduction
This article will provide a solid base for performing common operations in Microsoft Word document 2007 format using Open XML SDK 2.5 for Office. What this article does not cover are drilling down each element type such as a paragraph element or what makes up a specific type of style be it for a paragraph or a table.
The intent is to provide examples for developers who have never worked with Open XML for Office documents to create documents quickly without getting into all the specifics
What is Open XML
Open XML is an open ECMA 376 standard and is also approved as the ISO/IEC 29500 standard that defines a set of XML schemas for representing spreadsheets, charts, presentations, and word processing documents. Microsoft Office Word 2007, Excel 2007, PowerPoint 2007, and the later versions all use Open XML as the default file format.
A document (WordprocessingML document) is organized around the concept of stories. A story is a region of content in a WordprocessingML document.
Not all stories must be present in a valid WordprocessingML document. The simplest, valid WordprocessingML document only requires a single story—the main document story. In WordprocessingML, the main document story is represented by the main document part. At a minimum, to create a valid WordprocessingML document using code, add a main document part to the document. In the code samples provided, the first code sample creates a document only with the main document part.
public bool CreateEmptyDocument(string pFileName)
{
var fileName = Path.Combine(DocumentFolder, pFileName);
if (File.Exists(fileName))
{
File.Delete(fileName);
}
using (var document = WordprocessingDocument.Create(fileName, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = document.AddMainDocumentPart();
mainPart.Document = new Document();
mainPart.Document.AppendChild(new Body());
mainPart.Document.Save();
}
return Helpers.ValidateWordDocument(fileName) == 0;
}
The main document story of the simplest WordprocessingML document consists of the following XML elements:
document | The root element for a WordprocessingML's main document part, which defines the main document story. |
body | The container for the collection of block-level structures that comprise the main story. |
p paragraph | Paragraph para = body.AppendChild(new Paragraph()); |
r run | runPara = para.AppendChild(new Run()); |
trange of text | runPara.AppendChild(new Text(“Some text in a paragraph”)); |
A simple example using the three parts above.
public bool CreateDocumentWithSimpleParagraph(string pFileName)
{
var fileName = Path.Combine(DocumentFolder, pFileName);
if (File.Exists(fileName))
{
File.Delete(fileName);
}
using (var document = WordprocessingDocument.Create(fileName, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = document.AddMainDocumentPart();
mainPart.Document = new Document();
var body = mainPart.Document.AppendChild(new Body());
Paragraph para = body.AppendChild(new Paragraph());
Run runPara = para.AppendChild(new Run());
// Set the font to Arial to the first Run.
var runProperties = new RunProperties(
new RunFonts()
{
Ascii = "Arial"
});
var color = new Color { Val = Helpers.ColorConverter(System.Drawing.Color.SandyBrown) };
runProperties.Append(color);
Run run = document.MainDocumentPart.Document.Descendants<Run>().First();
run.PrependChild<RunProperties>(runProperties);
var paragraphText = "Hello from Word";
runPara.AppendChild(new Text(paragraphText));
mainPart.Document.Save();
}
return Helpers.ValidateWordDocument(fileName) == 0;
}
Generates the following xml which can be viewed by changing the .docx file extension to .zip.
<?xml version="1.0" encoding="utf-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:r>
<w:rPr>
<w:rFonts w:ascii="Arial" />
<w:color w:val="F4A460" />
</w:rPr>
<w:t>Hello from Word"</w:t>
</w:r>
</w:p>
</w:body>
</w:document>
Why use Open XML
The Open XML file formats are useful for developers because they use an open standard and are based on well-known technologies: ZIP and XML. See also, seven key benefits of Open XML by Eric White.
Installation
To work with Open XML documents a NuGet package needs to be installed by either right clicking on a Visual Studio solution, select manage NuGet packages, select the “Browse” tab, type into the search box DocumentFormat and the first item will be DocumentFormat.OpenXml. With this item selected choose which project in your solution will use the library via clicking on the check box next to the project name and pressing the install button. An alternate method is via Visual Studio Tool menu, select NuGet package manager, Package manager console, copy the install link from the following page into the console, press enter to install.
Using Open XML
Depending on what your objectives are using statements will be needed. To learn which using statements are required first create a class for performing operations on documents as done in the accompanying code samples (Operations.cs), for instance, copy the method CreateEmptyDocument using section, change the file name. At this point Visual Studio will complain about not knowing what the objects are. Hover over each object, when the lightbulb appears allow it to insert the appropriate using statements.
Create a class to the new method and execute the method followed by opening the document from Windows Explorer. If the document fails to open this means the construct (code used) to create the document most likely created a malformed document. Rather than traversing to the document in Windows Explorer copy the following code Helpers.ValidateWordDocument into your project from the accompanying source code passing in the newly created document and check for a return value of 0 which means the document should be valid while any return value greater than 0 indicates one or more errors in the document structure. The method ValidateWordDocument as is writes the exceptions to Visual Studio’s Output window for inspection which will assist in tracking down the problem.
Caveat: Although there is very little exception handling in the source code provided does not mean you should not implement exception handling in the form of try/catch statements. The most common reason for an exception, the document already exists and is currently open perhaps in Word as developers like to check the labor of their coding, forget to the close the document followed by running the same code again which created the document in the first place.
Taking small steps
Rather than write a document with a header, several paragraphs, images and list at once start out slow which is how the code samples were done for this reason, for easy of learning.
Step 1, create an empty document as shown below.
using (var document = WordprocessingDocument.Create(fileName, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = document.AddMainDocumentPart();
mainPart.Document = new Document();
mainPart.Document.AppendChild(new Body());
mainPart.Document.Save();
}
Even with this simple code it’s recommended during development time to validate the newly created document using Helpers.ValidateWordDocument. When ready for production disable this method from being called using directives are the easiest method e.g. if in DEBUG mode run validation while in RELEASE mode don’t run the validation or perhaps do run the validation depending on your comfort level of “can something go wrong” in the wild.
Once you have studied the code above move on to adding a paragraph.
private void NextLevel(string pFileName)
{
var fileName = Path.Combine(DocumentFolder, pFileName);
if (File.Exists(fileName))
{
File.Delete(fileName);
}
using (var document = WordprocessingDocument.Create(fileName, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = document.AddMainDocumentPart();
mainPart.Document = new Document();
var body = mainPart.Document.AppendChild(new Body());
var para = body.AppendChild(new Paragraph());
Run runPara = para.AppendChild(new Run());
var paragraphText = "My first paragraph.";
runPara.AppendChild(new Text(paragraphText));
mainPart.Document.Save();
}
Console.WriteLine(Helpers.ValidateWordDocument(fileName));
}
In the example above a paragraph is added to a newly created document, zero styling of text as this can get complex and as mentioned it’s best to learn in steps.
The following example adds to the above example styling the sole paragraph font color. The color for styling is in hex which many developers don’t know a hex representation of colors without referencing a conversion table. For this reason a method is provided to translate a Color to its hex representation using Helpers.ColorConverter, pass in System.Drawing.Color.SandyBrown and get back F4A460.
private void NextLevel_1(string pFileName)
{
var fileName = Path.Combine(DocumentFolder, pFileName);
if (File.Exists(fileName))
{
File.Delete(fileName);
}
using (var document = WordprocessingDocument.Create(fileName, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = document.AddMainDocumentPart();
mainPart.Document = new Document();
var body = mainPart.Document.AppendChild(new Body());
var para = body.AppendChild(new Paragraph());
Run runPara = para.AppendChild(new Run());
// Set the font to Arial to the first Run.
var runProperties = new RunProperties(
new RunFonts()
{
Ascii = "Arial"
});
var color = new Color { Val = Helpers.ColorConverter(System.Drawing.Color.SandyBrown) };
runProperties.Append(color);
Run run = document.MainDocumentPart.Document.Descendants<Run>().First();
run.PrependChild<RunProperties>(runProperties);
var paragraphText = "Styling paragraph with font color";
runPara.AppendChild(new Text(paragraphText));
mainPart.Document.Save();
}
Console.WriteLine(Helpers.ValidateWordDocument(fileName));
}
Repetitive code
When a developer finds they are writing the same code over and over again this is a sure sign a common method may be in order. The perfect case may be adding a new paragraph yet there is not a lot of code for this to warrant a common method. A good candidate is creating borders for a table within a document.
The following is a generic method to create borders for a table.
public static TableProperties CreateTableProperties()
{
return new TableProperties(
new TableBorders(
new TopBorder { Val = new EnumValue<BorderValues>(BorderValues.Single), Size = 12 },
new BottomBorder { Val = new EnumValue<BorderValues>(BorderValues.Single), Size = 12 },
new LeftBorder { Val = new EnumValue<BorderValues>(BorderValues.Single), Size = 12 },
new RightBorder { Val = new EnumValue<BorderValues>(BorderValues.Single), Size = 12 },
new InsideHorizontalBorder { Val = new EnumValue<BorderValues>(BorderValues.Single), Size = 12 },
new InsideVerticalBorder { Val = new EnumValue<BorderValues>(BorderValues.Single), Size = 12 })
);
}
Which is called as follows.
var table = new Table();
// set borders
TableProperties props = Helpers.CreateTableProperties();
table.AppendChild(props);
By writing a method as shown above not only can it be used again it clear up code for easier coding and maintenance. Another example for code reuse is for adding an image to a document as there are many parts to write code for appending an image as shown below.
public static void AddImageToBody(WordprocessingDocument document, string relationshipId, int pWidth, int pHeight)
{
// Define the reference of the image.
var element =
new Drawing(
new Inline(
new Extent() { Cx = pWidth, Cy = pHeight },
new EffectExtent()
{
LeftEdge = 0L,
TopEdge = 0L,
RightEdge = 0L,
BottomEdge = 0L
},
new DocProperties()
{
Id = (UInt32Value)1U,
Name = "Picture 1"
},
new NonVisualGraphicFrameDrawingProperties(
new GraphicFrameLocks()
{
NoChangeAspect = true
}),
new Graphic(
new GraphicData(
new Picture(
new NonVisualPictureProperties(
new NonVisualDrawingProperties()
{
Id = (UInt32Value)0U,
Name = "New Bitmap Image.jpg"
},
new NonVisualPictureDrawingProperties()),
new BlipFill(
new Blip(
new BlipExtensionList(
new BlipExtension()
{
Uri = "{28A0092B-C50C-407E-A947-70E740481C1C}"
})
)
{
Embed = relationshipId,
CompressionState =
BlipCompressionValues.Print
},
new Stretch(
new FillRectangle())),
new ShapeProperties(
new Transform2D(
new Offset() { X = 0L, Y = 0L },
new Extents() { Cx = pWidth, Cy = pHeight }),
new PresetGeometry(
new AdjustValueList()
)
{ Preset = ShapeTypeValues.Rectangle }))
)
{ Uri = "http://schemas.openxmlformats.org/drawingml/2006/picture" })
)
{
DistanceFromTop = (UInt32Value)0U,
DistanceFromBottom = (UInt32Value)0U,
DistanceFromLeft = (UInt32Value)0U,
DistanceFromRight = (UInt32Value)0U
});
// Append the reference to body, the element should be in a Run.
document.MainDocumentPart.Document.Body.AppendChild(new Paragraph(new Run(element)));
}
Note the structure of the method above, rather than attempting to write this method in a conventional manner as shown next, imagine debugging this code or modifying the code. This is why formatting the code as done above makes sense for not only this example but for any complex operation.
public static void AddImageToBodyBad(WordprocessingDocument document, string relationshipId, int pWidth, int pHeight)
{
// Define the reference of the image.
var element = new Drawing(new Inline(new Extent() { Cx = pWidth, Cy = pHeight },new EffectExtent() {LeftEdge = 0L,TopEdge = 0L,RightEdge = 0L,BottomEdge = 0L},
new DocProperties() {Id = (UInt32Value)1U,Name = "Picture 1"},new NonVisualGraphicFrameDrawingProperties(new GraphicFrameLocks() {NoChangeAspect = true}),
new Graphic(new GraphicData(new Picture(new NonVisualPictureProperties(new NonVisualDrawingProperties() {Id = (UInt32Value)0U,Name = "New Bitmap Image.jpg"},new NonVisualPictureDrawingProperties()),
new BlipFill(new Blip(new BlipExtensionList(new BlipExtension() {Uri = "{28A0092B-C50C-407E-A947-70E740481C1C}" })) {Embed = relationshipId,CompressionState =BlipCompressionValues.Print},
new Stretch(new FillRectangle())), new ShapeProperties(new Transform2D( new Offset() { X = 0L, Y = 0L }, new Extents() { Cx = pWidth, Cy = pHeight }), new PresetGeometry( new AdjustValueList() ) { Preset = ShapeTypeValues.Rectangle }))) { Uri = "http://schemas.openxmlformats.org/drawingml/2006/picture" })) {DistanceFromTop = (UInt32Value)0U,DistanceFromBottom = (UInt32Value)0U,DistanceFromLeft = (UInt32Value)0U, DistanceFromRight = (UInt32Value)0U });
// Append the reference to body, the element should be in a Run.
document.MainDocumentPart.Document.Body.AppendChild(new Paragraph(new Run(element)));
}
Building blocks
By breaking up building a document the maintainer of the code can better understand code flow along with adding or modifying code to create a document. In the following code example (included with accompanying source code) there are several methods (with overloads) to add paragraphs and bullets to a document along with a method to save the document to disk. When there is a need for new functionality such as adding a header, footer or appending an image the developer writes a method for each new feature needed rather than code everything in one method which goes back to code reusability discussed above.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
namespace WordOpenXml_cs
{
/// <summary>
/// Code by Karen Payne MVP along with assistance
/// from various forum post this class has been glued
/// together.
/// </summary>
public class DocumentWriter : IDisposable
{
private MemoryStream _memoryStream;
/// <summary>
/// Represents the document to work on
/// </summary>
private WordprocessingDocument _document;
/// <summary>
/// Create a new document
/// </summary>
public DocumentWriter()
{
_memoryStream = new MemoryStream();
_document = WordprocessingDocument.Create(_memoryStream, WordprocessingDocumentType.Document);
var mainPart = _document.AddMainDocumentPart();
var body = new Body();
mainPart.Document = new Document(body);
}
/// <summary>
/// Append a paragraph to the document
/// </summary>
/// <param name="sentence"></param>
public void AddParagraph(string sentence)
{
List<Run> runList = ListOfStringToRunList(new List<string> { sentence });
AddParagraph(runList);
}
/// <summary>
/// Append multiple paragraphs to the document
/// </summary>
/// <param name="sentences"></param>
public void AddParagraph(List<string> sentences)
{
List<Run> runList = ListOfStringToRunList(sentences);
AddParagraph(runList);
}
/// <summary>
/// Append paragraphs from a list of Run objects.
/// </summary>
/// <param name="runList"></param>
public void AddParagraph(List<Run> runList)
{
var para = new Paragraph();
foreach (Run runItem in runList)
{
para.AppendChild(runItem);
}
var body = _document.MainDocumentPart.Document.Body;
body.AppendChild(para);
}
/// <summary>
/// Append to the document a list of sentences (list of string) and create bullet list
/// </summary>
/// <param name="sentences"></param>
public void AddBulletList(List<string> sentences)
{
var runList = ListOfStringToRunList(sentences);
AddBulletList(runList);
}
/// <summary>
/// Append to the document a list of sentences (list of Run) and create bullet list
/// </summary>
/// <param name="runList"></param>
public void AddBulletList(List<Run> runList)
{
// Introduce bulleted numbering in case it will be needed at some point
NumberingDefinitionsPart numberingPart = _document.MainDocumentPart.NumberingDefinitionsPart;
if (numberingPart == null)
{
numberingPart = _document.MainDocumentPart.AddNewPart<NumberingDefinitionsPart>("NumberingDefinitionsPart001");
var element = new Numbering();
element.Save(numberingPart);
}
// Insert an AbstractNum into the numbering part numbering list. The order seems to matter or it will not pass the
// Open XML SDK productivity Tools validation test. AbstractNum comes first and then NumberingInstance and we want to
// insert this AFTER the last AbstractNum and BEFORE the first NumberingInstance or we will get a validation error.
var abstractNumberId = numberingPart.Numbering.Elements<AbstractNum>().Count() + 1;
var abstractLevel = new Level(new NumberingFormat()
{
Val = NumberFormatValues.Bullet
}, new LevelText() { Val = "·" }) { LevelIndex = 0 };
var abstractNum1 = new AbstractNum(abstractLevel) { AbstractNumberId = abstractNumberId };
if (abstractNumberId == 1)
{
numberingPart.Numbering.Append(abstractNum1);
}
else
{
var lastAbstractNum = numberingPart.Numbering.Elements<AbstractNum>().Last();
numberingPart.Numbering.InsertAfter(abstractNum1, lastAbstractNum);
}
// Insert an NumberingInstance into the numbering part numbering list. The order seems to matter or it will not pass the
// Open XML SDK Productity Tools validation test. AbstractNum comes first and then NumberingInstance and we want to
// insert this AFTER the last NumberingInstance and AFTER all the AbstractNum entries or we will get a validation error.
var numberId = numberingPart.Numbering.Elements<NumberingInstance>().Count() + 1;
var numberingInstance1 = new NumberingInstance() { NumberID = numberId };
var abstractNumId1 = new AbstractNumId() { Val = abstractNumberId };
numberingInstance1.Append(abstractNumId1);
if (numberId == 1)
{
numberingPart.Numbering.Append(numberingInstance1);
}
else
{
var lastNumberingInstance = numberingPart.Numbering.Elements<NumberingInstance>().Last();
numberingPart.Numbering.InsertAfter(numberingInstance1, lastNumberingInstance);
}
Body body = _document.MainDocumentPart.Document.Body;
foreach (Run runItem in runList)
{
// Create items for paragraph properties
var numberingProperties = new NumberingProperties(new NumberingLevelReference()
{
Val = 0
}, new NumberingId() { Val = numberId });
var spacingBetweenLines1 = new SpacingBetweenLines() { After = "0" }; // Get rid of space between bullets
var indentation = new Indentation() { Left = "720", Hanging = "360" }; // correct indentation
var paragraphMarkRunProperties1 = new ParagraphMarkRunProperties();
var runFonts1 = new RunFonts() { Ascii = "Symbol", HighAnsi = "Symbol" };
paragraphMarkRunProperties1.Append(runFonts1);
// create paragraph properties
var paragraphProperties = new ParagraphProperties(
numberingProperties,
spacingBetweenLines1,
indentation,
paragraphMarkRunProperties1);
// Create paragraph
var newPara = new Paragraph(paragraphProperties);
// Add run to the paragraph
newPara.AppendChild(runItem);
// Add one bullet item to the body
body.AppendChild(newPara);
}
}
public void Dispose()
{
CloseAndDisposeOfDocument();
if (_memoryStream != null)
{
_memoryStream.Dispose();
_memoryStream = null;
}
}
/// <summary>
/// Save document.
/// </summary>
/// <param name="pFileName">Path and file name to save to</param>
public void SaveToFile(string pFileName)
{
if (_document != null)
{
CloseAndDisposeOfDocument();
}
if (_memoryStream == null)
throw new ArgumentException("This object has already been disposed of so you cannot save it!");
using (var fs = File.Create(pFileName))
{
_memoryStream.WriteTo(fs);
}
}
/// <summary>
/// Dispose of document object.
/// </summary>
private void CloseAndDisposeOfDocument()
{
if (_document != null)
{
_document.Close();
_document.Dispose();
_document = null;
}
}
private static List<Run> ListOfStringToRunList(List<string> sentences)
{
var runList = new List<Run>();
foreach (var item in sentences)
{
var newRun = new Run();
newRun.AppendChild(new Text(item));
runList.Add(newRun);
}
return runList;
}
}
}
Alternate methods to using Open XML.
A logical choice for many is Word automation which is done by adding references to Primary Interop Assemblies (PIAs) as explained in the following code sample. Using Word automation is easier to use than Open XML with drawbacks such as PIA’s must be present on the machine creating and modifying documents along with matching the same version of DLL’s your solution is dependent on. There is a possibility of objects not being properly released which overtime can ultimately slow or crash a machine.
Another option is to use a third party library such as Aspose, e-iceblue or GemBox. These libraries are easier to use then Open XML or Word automation yet this doesn’t negate using any of these options. In the accompanying code samples there is a project mirroring the Open XML project which uses GemBox code samples to get an idea the difference between both methods. There are no code samples for Word automation as there is a chance of failure dependent on the developer machine which attempts to run automation code samples while the Open XML and Gembox code samples will not fail unless when attempting to run the Open XML code samples a package is missing which may be resolved by selecting restore NuGet packages from right clicking on Solution Explorer and selecting restore NuGet packages.
Code samples
Code samples are broken down into separate methods where from top to bottom build on each other from an creating an empty document to working with list (bulleted list), images, styling, tables and simple modification of text.
You are encouraged to run each code sample once, view results then go back and run each code sample again by setting a breakpoint at the start of each code sample and run through the code to better understand what the code does.
Integration into your solution
Add the NuGet package for DocumentFormat.OpenXml as explained in the Installation section above. Create a unit test project, create a test method for each operation which will be done in your application. As the methods don’t exist Visual Studio will prompt for you to create the method(s).
If in a test method you write
var wordOperations = new WordOperations();
The method does not exist, select the lightbulb and select generate class WordOperations in a new file. Once created open this file and change the class from internal to public or select “generate new type” which brings up a dialog to add this class to another project which is the best option as the class does not need to be in the unit test project. Once this is done create methods (which of course don’t exists) and as with the creation of the class Visual Studio will create these methods and properties for you. This style of unit test is known as TDD (Test Driven Development).
The alternate which many developers opt for is to first create Open XML code in a class project or a class in the same project which will call the Word methods, create a unit test project and add a reference to the class project to write unit test against.
The last alternate is to simply write methods, once the code executes open the newly generated or modified documents from Windows Explorer and examine the documents.
Important notes in both projects documents are created under Bin\Debug\Documents where if the folder Documents does not exists it will be created by a post build event (refer to project properties, build events.
References
Tools
Open XML Package Editor for Modern Visual Studios
See also
Requires
To run the code samples, Microsoft Visual Studio 2015 or higher.
Summary
This article has presented the basics to write code for most common operations for generating Word documents using Open XML with tips and suggestions to write maintainable code. There is a good deal more which Open XML can perform which as a developer becomes comfortable with Open XML will become easier while little interaction with Open XML can be frustrating which means, take your time building on what has been examined in this article.
Source code