Jaa


How to Control Sections when using OpenXml.PowerTools.DocumentBuilder

DocumentBuilder is a small API (part of the PowerTools for Open XML project, an open source project on CodePlex) that allows you to merge contents of documents while retaining document integrity and resolving issues of markup interdependence.  This post contains detailed information on interdependence of Open XML WordprocessingML markup.  This post introduces DocumentBuilder, and gives a few examples of its use.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCOne of the scenarios that you want to control is how sections are copied to the document you are building.  Sections carry a fair amount of formatting information, including the layout of the page, and the headers and footers that will be used for the section.

A super-short summary of how you use DocumentBuilder: to assemble a new document, you open any number of source Open XML WordprocessingML documents, then assemble a List<OpenXml.PowerTools.Source>, where members of the list are Source objects that you construct, passing a number of parameters.  There are three overloads of the Source constructor:

public Source(WordprocessingDocument source, bool keepSections)
public Source(WordprocessingDocument source, int start, bool keepSections)
public Source(WordprocessingDocument source, int start, int count, bool keepSections)

The first parameter is the open WordprocessingML source document.

If you use the first overload, then the entire source document is copied (along with all dependencies) to the destination document.  If you use the second overload, you can specify a starting paragraph (technically the starting child element of the w:body element), and the document from that point on is copied to the destination document.  The third overload allows you to pluck just a subsection of the source document for inclusion in the destination document.

The last parameter is always a bool, keepSections.  You can specifically control how sections are copied to the newly-built document using the keepSections argument.

Semantics of the Keep Sections argument

If you do not specify to keep sections for any document in the source collection, then DocumentBuilder takes the section from the first document in the list.  The following code would use the section (and its headers) from Header1.docx, the first document in the list:

using (WordprocessingDocument part1 =
WordprocessingDocument.Open("Header1.docx", true))
using (WordprocessingDocument part2 =
WordprocessingDocument.Open("Header2.docx", true))
{
List<Source> sources = new List<Source>();
sources.Add(new Source(part1, false));
sources.Add(new Source(part2, false));
DocumentBuilder.BuildDocument(sources, "Header3.docx");
}

If you specify true for all documents in the list, then the section properties of each document are copied.  If each document contains only one section, then the resulting document the same number of sections as source documents.  The following code would result in a document with two sections, each with their own headers (given that the source documents have only one section each):

using (WordprocessingDocument part1 =
WordprocessingDocument.Open("Header1.docx", true))
using (WordprocessingDocument part2 =
WordprocessingDocument.Open("Header2.docx", true))
{
List<Source> sources = new List<Source>();
sources.Add(new Source(part1, true));
sources.Add(new Source(part2, true));
DocumentBuilder.BuildDocument(sources, "Header3.docx");
}

Both of the above examples use the first overload of the Source constructor where you don't specify ranges of content.

The situation gets a little more complicated if you specify ranges of content.  To clarify what the semantics are, let's take a look at the markup for a small document that has two sections, with two paragraphs in the first section, and one paragraph in the second:

<w:body>
<w:p>
<w:r>
<w:t>First paragraph.</w:t>
</w:r>
</w:p>
<w:p>
<w:pPr>
<w:sectPr>
<w:headerReferencew:type="default"
r:id="rId7"/>
<w:pgSzw:w="12240"
w:h="15840"/>
<w:pgMarw:top="1440"
w:right="1440"
w:bottom="1440"
w:left="1440"
w:header="720"
w:footer="720"
w:gutter="0"/>
<w:colsw:space="720"/>
<w:docGridw:linePitch="360"/>
</w:sectPr>
</w:pPr>
</w:p>
<w:p>
<w:r>
<w:lastRenderedPageBreak/>
<w:t>Second paragraph.</w:t>
</w:r>
</w:p>
<w:sectPr>
<w:headerReferencew:type="default"
r:id="rId8"/>
<w:pgSzw:w="12240"
w:h="15840"/>
<w:pgMarw:top="1440"
w:right="1440"
w:bottom="1440"
w:left="1440"
w:header="720"
w:footer="720"
w:gutter="0"/>
<w:colsw:space="720"/>
<w:docGridw:linePitch="360"/>
</w:sectPr>
</w:body>

In this example, you can see that the section property complex element (w:sectPr) for the first section is a child element of the paragraph properties (w:pPr) element of the paragraph (w:p) element. The section property element for the last section of any document is a sibling element to the paragraph properties, and follows the last paragraph of the document.

The rule for the keepSections argument when specifying a range of content is that if and only if the section properties fall within the range of content, and you specify true for keepSections, then the section properties will be transferred to the new document. The range of content is specified using the index of the child elements of the w:body element. If DocumentBuilder finds a paragraph in the content range that contains section properties, those section properties are moved to the new document in that location. If the range of content is such that the range includes the last section property element, which is a child of w:body, and a following sibling to the last paragraph, then those section properties are moved to the new document.

If you are specifying ranges of content, and there are no section properties within the range, the resulting document is created without any sections, which is valid per the Open XML spec.

The following example would create a new document with the section (and headers) from the Heading2.docx document:

using (WordprocessingDocument part1 =
WordprocessingDocument.Open("Header1.docx", true))
using (WordprocessingDocument part2 =
WordprocessingDocument.Open("Header2.docx", true))
{
List<Source> sources = new List<Source>();
sources.Add(new Source(part1, false));
sources.Add(new Source(part2, true));
DocumentBuilder.BuildDocument(sources, "Header3.docx");
}

The following example creates a document with no section properties, given that the first paragraph in each document doesn't contain section properties:

using (WordprocessingDocument part1 =
WordprocessingDocument.Open("Header1.docx", true))
using (WordprocessingDocument part2 =
WordprocessingDocument.Open("Header2.docx", true))
{
List<Source> sources = new List<Source>();
sources.Add(new Source(part1, 0, 1, true));
sources.Add(new Source(part2, 0, 1, true));
DocumentBuilder.BuildDocument(sources, "Header3.docx");
}

The following example creates a document with four sections: two from the first document, and one each from the last two. The first document is the one that I listed in the above markup that contains two sections.

using (WordprocessingDocument twoSections =
WordprocessingDocument.Open("TwoSections.docx", true))
using (WordprocessingDocument part1 =
WordprocessingDocument.Open("Header1.docx", true))
using (WordprocessingDocument part2 =
WordprocessingDocument.Open("Header2.docx", true))
{
List<Source> sources = new List<Source>();
sources.Add(new Source(twoSections, true));
sources.Add(new Source(part1, true));
sources.Add(new Source(part2, true));
DocumentBuilder.BuildDocument(sources, "Header3.docx");
}

If we specify a range for the first document, picking up only two paragraphs, then the resulting document has three sections (one from each document):

using (WordprocessingDocument twoSections =
WordprocessingDocument.Open("TwoSections.docx", true))
using (WordprocessingDocument part1 =
WordprocessingDocument.Open("Header1.docx", true))
using (WordprocessingDocument part2 =
WordprocessingDocument.Open("Header2.docx", true))
{
List<Source> sources = new List<Source>();
sources.Add(new Source(twoSections, 0, 2, true));
sources.Add(new Source(part1, true));
sources.Add(new Source(part2, true));
DocumentBuilder.BuildDocument(sources, "Header3.docx");
}

Using the keepSections argument appropriately allows you to precisely control which sets of section properties are moved from source documents into the destination document.

Comments

  • Anonymous
    January 11, 2011
    Is there any way to use the library without PowerTools?

  • Anonymous
    July 08, 2012
    Hi, I embed HTML content in docx files. While merging those two docx files by using DocumentBuilder i'm getting error 'Source contains altChunk,

  • Anonymous
    July 08, 2012
    @Rajesh, Documentbuilder does not work with documents that contain altChunk in them.  You have to first merge the altChunk info using Word or Word Automation Services, and then you can use DocumentBuilder. -Eric

  • Anonymous
    October 04, 2012
    The comment has been removed