OpenXmlCodeTester: Validating Code in Open XML Documents

Many types of documents contain code, including API documentation, tutorials, specifications, technical books, and magazine articles.  Too often, there is no way to automate testing and validation of this code.  This post presents a small example program that shows how to use content controls in Open XML word documents to delineate code snippets so that you can automate validation of the snippets.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCThis scenario was my first use of Open XML – it was the killer application (for me) that prompted me to crack open an Open XML document.  I wrote some code to extract the paragraphs that contain code, compile the code, run it, and validate the output.  Because I had a system in place to automate testing of code in documents, when I would receive a new drop of the library that I was documenting, it was a simple matter to re-run the tests, and fix all examples that were broken due to changes in the programming interface of the lib.

I suspect that this blog post will be very interesting to a very few people, but of casual note to all others.  I’m very interested to hear if this scenario is useful to you.  But more than this particular scenario, I want to spark interest in the possibilities that content controls open up for you.

This may be the longest blog post I’ve written, because this blog post also serves as a manual of sorts for OpenXmlCodeTester. J

About the Code

This code may be of interest to students of functional programming using C# - it is written in the pure functional style:

  • No variables are mutated (modified) after initialization.  In this context, variables really should be called ‘symbols’ instead of variables, because their values don’t “vary”.  So this means that you can use variables with impunity in queries – if the variable is in scope, you can use it in a query, and it will always have the same value.
  • All functions are pure – that is, they don’t modify any data outside of the function, and given the same set of arguments, will produce the same results.
  • Where appropriate, queries are lazy.  However, it doesn’t make sense to go back again and again to the actual Open XML documents – instead, the program assembles an XML tree that contains the snippets, build instructions, and configuration information.  It then transforms this XML tree into a new XML tree that contains the results of validating each snippet.  Then, finally, the utility prints a text report from the XML tree that contains the test results.  This is an example of what I was thinking of when I made the point that you can redefine a problem in terms of a transformation, and therefore apply functional programming in many scenarios where you might not normally think to do so.
  • If this code doesn’t make sense to you - if you are not familiar with performing transformations using C# in the pure functional style, I recommend that you work through this Functional Programming Tutorial.
  • The only exception to purity is that the program writes files and directories to disks, and then alters the contents of the directories.  These types of operations are considered to be side-effects.  But this is the only way to implement a utility such as this.

OpenXmlCodeTester uses the RunExecutable method presented in Running an Executable and Collecting the Output.  It uses this function both to compile the code, and then to run the compiled snippet.

OpenXmlCodeTester also uses the code presented in Using LINQ to XML to Retrieve the Content of Content Controls.

OpenXmlCodeTester uses the MSBUILD utility to compile each snippet.  There is a function, ‘CreateProject’ that assembles an XML tree that contains the project that MSBUILD uses.  It writes this XML tree into the directory that contains the snippet to be compiled, so that MSBUILD can find and use it.  You can find more information on MSBUILD here.

OpenXmlCodeTester uses the Open XML SDK.  Download it at https://go.microsoft.com/fwlink/?LinkId=120908.

OpenXmlCodeTester, as presented in this blog post, only compiles and validates C# code.  However, it is a fairly trivial exercise to modify the code to validate code written in Visual Basic, C++, or any other language.  In the version that I used to validate the code in the LINQ to XML documentation, I added an attribute on an element in the build instructions indicating the language, so I could validate the C# code and the Visual Basic code in a single test pass.

Overview of OpenXmlCodeTester

OpenXmlCodeTester finds snippets of code that are contained within content controls, compiles each snippet, runs the resulting executable, and validates the output against the output in the source document.

Often, you want to test code in a number of documents, and those documents may be in any number of directories on the disk.  OpenXmlCodeTester.Test is a function that takes as an argument an array of strings specifying the directories.  OpenXmlCodeTester.Test finds all Open XML documents (DOCX files) in all of the specified directories, and tests all code found in those documents in a single test pass.

One key point about this utility – there isn’t any configuration information maintained outside of the documents.  The Open XML documents themselves contain all the configuration information necessary to run the utility.  This simplifies the process of keeping the build instructions up-to-date.

Content Controls

To work with content controls, you need to configure Word 2007 to show the Developer tab in the ribbon.  Click on the Office button, then click on Word Options, then make sure the check box, “Show Developer Tab in the Ribbon” is checked.

To create a content control, select the text that you want to be contained in the content control, and then click the button on the Developer ribbon to create a rich text content control:

To set the title of the content control, first place the insertion point in the content control, and then click the Properties button:

Snippets

For the purposes of this blog post, the content of one content control is called a “snippet”.  The title of a content control is the snippet identifier.  In the following screen clipping, you can see a content control that contains code, and has the title of “0001Snip”.

There are two types of snippets – code snippets and output snippets.  In the above example, the snippet is the code to be compiled and validated.  The following screen clipping shows a content control that contains the expected output from the code.  Its title is “0001out”.

OpenXmlCodeTester reports an error if there are multiple snippets with the same title (and hence have the same snippet Id).  Snippets ids must be unique for a test run.

In addition to using content controls to contain snippets, the utility uses content controls for specifying build instructions and for specifying general configuration information for a test run.  You use XML to specify the build instructions and configuration information.

A good approach is to format the text within the build instructions and configuration information to make the text invisible.  Using this approach, you don’t need to remove these content controls before printing the document.  However, OpenXmlCodeTester doesn’t care whether the text is visible or not.  The attached zip file contains documents that have properly set-up content controls.

When you specify build instructions for a snippet, you insert a content control containing the build instructions with the title of “Build” immediately before the snippet to be tested.  In the following screen clipping, you can see the build instructions content control.

As you can see in the above screen clipping, each build instruction contains an identifier – the Id attribute.  This is the test identifier.  Test identifiers must be unique for a test run.  OpenXmlCodeTester reports an error if the test identifiers are not unique.

The content control that contains configuration information can be placed in any document in any of the specified directories.  OpenXmlCodeTester must find one and only one content control with the title “Configuration”.  The following screen clipping shows a typical configuration content control:

Snippet Context (How a Snippet Becomes a Complete, Compilable Program)

Most snippets are, by themselves, not complete, compilable programs.  For instance, they often don’t have "using" statements.  They may not have a Main method.  In some cases, the snippet doesn’t even contain a function declaration – the snippet needs to be placed inside a function in order to compile.  OpenXmlCodeTester uses an approach of inserting each snippet into a “snippet context” to create a complete program that can be compiled.  The snippet context contains little pieces of XML that are replaced with the snippets to be tested.  As an example, you could have the following small snippet that can be inserted into a Main method:

for (int i = 0; i < 10; i++)
Console.WriteLine(i);

Here is a snippet context that can be used for bits of code that can be inserted into a Main method:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Schema;
using System.Xml.Linq;

class Program
{
static void Main(string[] args)
{
<Snippet/>
}
}

As you might guess, the snippet is inserted at the point of the <Snippet/> XML element inside of the snippet context.

When you specify the configuration information for OpenXmlCodeTester, you must include an element named DefaultSnippetContext (see the screen clipping of configuration information above).  If you don’t specify a SnippetContext element in the build instructions for a specific snippet, then OpenXmlCodeTester uses the default snippet context from the configuration information.

There are a number of scenarios where you will want to specify a snippet context for a particular test.  For example, if the snippet is, in fact, a complete, buildable program, you can specify a snippet context that contains only the <Snippet/> element.  The following snippet is a complete program:

using System;

class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello, World");
}
}

Here are the build instructions for the above snippet:

<Test Id="0003" OutputId="0003out">
<SnippetContext>
<![CDATA[<Snippet />]]>
</SnippetContext>
</Test>

In some cases, you may want to assemble multiple snippets into a single executable.  In the Snippet element that is contained within a snippet context, you can specify as an attribute the id of the snippet to insert.  You can specify more than one of these elements in the snippet context.  The following build instruction contains a snippet context that assembles three snippets into a single executable:

<Test Id="0002" OutputId="0002out">
<SnippetContext>
<![CDATA[
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

class Program
{
static void Main(string[] args)
{
<Snippet Id="0002a"/>
<Snippet Id="0002b"/>
<Snippet />
}
}
]]>
</SnippetContext>
</Test>

If the Snippet element contains an Id, it pulls in the appropriate snippet.  As mentioned above, when you specify build instructions for a snippet, you insert a content control containing the build instructions with the title of “Build” immediately before the snippet to be tested.  If the Snippet element does not contain an Id, then the element is replaced with the snippet immediately following the build instructions.  The documents that are included in the zip file that is attached to this blog post contain examples of:

  • A configuration content control that contains a default snippet context.
  • A build instruction that contains a snippet context for a complete compilable program.
  • A build instruction that contains a snippet context that assembles multiple snippets into a compilable program.

Directory Structure

When you run OpenXmlCodeTester, it creates a subdirectory named Test under the working directory where OpenXmlCodeTester is started.  It then creates a subdirectory under the Test directory for each test.  OpenXmlCodeTester places the complete assembled program for each test in the directory created for the test.  In addition, it places the MSBUILD project in that directory.  The resulting executable will also be placed in that directory by the compiler.  The following screen clipping shows the directory structure that is created when you run OpenXmlCodeTester from Visual Studio.  When running from Visual Studio, the starting directory is OpenXmlCodeTester/bin/Debug:

The Test directory (and the subdirectories for each test) are deleted and re-created with each test run.

Build Instructions

This section describes in detail how you specify build instructions for a test.

The root element for a build instruction is named Test.

There are two required attributes for the Test element, Id, and OutputId.  Id is the test identifier.  OutputId is the snippet identifier for the content control that contains the expected output from the snippet.

The Test element may contain an optional child element, SnippetContext, which is expected to contain a CData node that contains the snippet context for the test.  If the SnippetContext child element is missing in the build instructions for a specific test, then the test uses the default snippet context from the configuration content control.

On occasion, you may want to test a snippet that expects to load a file from the disk.  You can specify a child element, CopySnippet, which tells OpenXmlCodeTester to copy a snippet to a file in the test directory.  The snippet can then use the file when the test is run.

For example, you may have a snippet that shows how to use an XSD schema to validate an XML document.  The following snippet expects Test.xsd and Test.xml to exist as files when it is run:

XmlSchemaSet schemas = new XmlSchemaSet();
schemas.Add("", XmlReader.Create("Test.xsd"));
XDocument doc = XDocument.Load("Test.xml");
string errorMessage = "";
doc.Validate(schemas, (o, e) => errorMessage += e.Message + Environment.NewLine);
if (errorMessage.Length > 0)
Console.WriteLine(errorMessage);
else
Console.WriteLine("Document Validated");

If you have created content controls for the XSD and XML files (with snippet Ids of 0007xsd and 0007xml respectively), then the following build instruction tells OpenXmlCodeTester to copy those snippets to the test directory with the appropriate file names:

<Test Id="0007"
OutputId="0007out">
<CopySnippet FromId="0007xml" ToFile="Test.xml"/>
<CopySnippet FromId="0007xsd" ToFile="Test.xsd"/>
</Test>

The documents in the zip file that is attached to this post contain an example of this type of test.

Here is the XSD schema for the build instructions:

<xs:schema attributeFormDefault='unqualified'
elementFormDefault='qualified'
xmlns:xs='https://www.w3.org/2001/XMLSchema'>
<xs:element name='Test'>
<xs:complexType>
<xs:sequence minOccurs='0'>
<xs:choice maxOccurs='unbounded'>
<xs:element name='CopySnippet'>
<xs:complexType>
<xs:attribute name='FromId'
type='xs:string'
use='required' />
<xs:attribute name='ToFile'
type='xs:string'
use='required' />
</xs:complexType>
</xs:element>
<xs:element name='SnippetContext'
type='xs:string' />
</xs:choice>
</xs:sequence>
<xs:attribute name='Id'
type='xs:string'
use='required' />
<xs:attribute name='OutputId'
type='xs:string'
use='required' />
</xs:complexType>
</xs:element>
</xs:schema>

OpenXmlCodeTester validates each build instruction against this XSD, and reports errors if the build instruction doesn’t validate properly.

Configuration Information

As mentioned, there must be one and only one content control with the title of “Configuration”.  The configuration information is XML with the root element of Configuration.  There are two required child elements of the Configuration element: DefaultSnippetContext and MSBuildPath.  You’ve already seen an example configuration content control.

Here is the XSD schema for the configuration information:

<xs:schema attributeFormDefault='unqualified'
elementFormDefault='qualified'
xmlns:xs='https://www.w3.org/2001/XMLSchema'>
<xs:element name='Configuration'>
<xs:complexType>
<xs:sequence>
<xs:element name='DefaultSnippetContext'
type='xs:string' />
<xs:element name='MSBuildPath'
type='xs:string' />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

About the Attached Zip File

The attached zip file contains a Visual Studio 2008 project.  The sample DOCX files are in bin/debug.  Because they are in bin/debug, you can simply compile the project and run it to see the OpenXmlCodeTester in action.

OpenXmlCodeTester.zip

Comments

  • Anonymous
    September 08, 2008
    PingBack from http://www.easycoded.com/openxmlcodetester-validating-code-in-open-xml-documents/

  • Anonymous
    September 13, 2008
    In my post, OpenXmlCodeTester: Validating Code in Open XML Documents I used XSD to validate the build

  • Anonymous
    October 07, 2008
    Eric -- this was a very interesting post!  I've spent the last few years working on a similiar project that encapsulates scripting code (IronPython, IronRuby, R, MATLAB, etc.) into Word and Excel documents that can be executed to create results documents.  It's impressive what you can do in Microsoft Office given the updated content control and XML storage capabilities introduced in Office 2007 (and 2003 to a lessor degree).  If you're curious, you can check out www.InferenceForDotNET.com for more info.

  • Anonymous
    October 20, 2008
    I've fallen a few weeks behind on posting links to various articles and blog posts, so this post is a