Compartilhar via


Leveraging the Power of Word Automation Services and the Open XML SDK

Have you ever wanted to update page number fields or a table of contents within Word documents on the server? The Open XML SDK provides functionality that allows you to easily add or remove content within a Word document. However, as mentioned in the announcement of the Open XML SDK 2.0, the SDK does not provide runtime application behaviors such as layout and recalculation. In order to recalculate a table of contents you need to run some kind of layout engine in order to properly determine pages within a document.

Word Automation Services is designed to handle tasks that require application logic, such as file conversion and layout. As Brian mentioned in a previous post, Open XML and Office Services are really meant to work better together. In this post, I am going to show you how to leverage Word Automation Services to update a table of contents after a document has been modified by the Open XML SDK.

If you want to jump straight into the code, feel free to download the solutions here.

Scenario

Imagine a scenario where I am working with a group of people to create a book about the solar system. We've divided up the book into separate chapters where each chapter is assigned to a particular author. Once everyone is done authoring their assigned chapter we want to merge all the documents into one final document. In fact, this scenario is very similar to the scenario I talked about in a previous post titled the easy way to assemble multiple Word documents. The big difference is that in this scenario I want to make sure all page references, including the table of contents, are properly set in the final document.

Solution

The scenario discussed above requires two actions:

  1. Assemble multiple Word documents into one final document using the Open XML SDK
  2. Leverage Word Automation Services, which is part of SharePoint 2010, to updates any fields, like the table of contents, within the document

As is the case with many of my previous posts setting up the right template is the most important step in starting Office document solutions. Once we have the template setup, our next task is to come up with an easy way for users to run the document assembly solution. Since this solution will run on SharePoint, we will create a custom action, which users will be able to access right from a drop down menu off of the template document. This custom action will run the code necessary to assemble the document as well as call into Word Automation Services to update fields.

In summary, we will need to take the following actions:

  1. Create the right template
  2. Create SharePoint libraries that will store the template as well as chapter documents
  3. Create a custom action that can be invoked from the SharePoint document drop down menu. This custom action will allow users to assemble all the documents together
  4. Using the Open XML SDK, open the template document and look for all content controls
  5. For every content control found, find the corresponding document in the library and merge that content into the final document
  6. Once the document assembly is complete, invoke Word Automation Services to update the fields in the final document

Step 1 – Creating the Right Template

The template will represent the final look of the document we want to create. In this template we will merge a specific chapter in a specific location within the template. We will leverage content controls as an easy mechanism for specifying semantic regions within a document. In other words, content controls allow us to uniquely identify a specific region within a document. Here is a screenshot of the template we will use:

image

In the example above, we will use the content control named "SolarOverview" to represent the location where the solar system overview document will be merged. The content of the content control, in this case, "Planets/SolareOverview.docx", represents the SharePoint library location of the document to be merged.

Step 2 – Leveraging SharePoint Libraries

The template document will exist in its own SharePoint library, while the chapters of the solar system book will be stored in the Planets SharePoint library:

image

Note: There is no technical reason to separate the location of the template document from the chapter documents.

Step 3 – Create a Custom Action within SharePoint

There are several ways to provide UI to users to allow them to invoke our document assembly solution. For the sake of this blog post, we are going to create a custom action that can invoke the document assembly solution straight off of the drop down menu for our template document. Here is a screenshot of the custom action we will create:

image

Notice that this custom action menu has two Assemble Open XML Document commands. The difference between these two commands is that one of the commands will also invoke Word Automation Services to update fields within the document. In order to create a custom action within SharePoint we will need to create our own custom feature. Here is the xml necessary to create such a feature:

<Feature

Id="2119559A-5740-42fa-83E5-C02FB46FC701"

Title="Open XML Demo"

Description="Open XML Demo"

Version="1.0.0.0"

Scope="Web"

Hidden="FALSE"

ImageUrl="menuprofile.gif"

xmlns="https://schemas.microsoft.com/sharepoint/">

    <ElementManifests>

        <ElementManifest Location="elements.xml" />

    </ElementManifests>

</Feature>

Our next task is to define what this feature looks like via the elements.xml file:

<Elements xmlns="https://schemas.microsoft.com/sharepoint/">

<!-- Per Item Dropdown (ECB) Link -->   

<CustomAction Id="Assemble Document"

        RegistrationType="List"

        RegistrationId="101"

        ImageUrl="/_layouts/images/GORTL.GIF"

        Location="EditControlBlock"

        Sequence="101"

        Title="Assemble Open XML Document" >

<UrlAction Url="~site/_layouts/CustomApplicationPages/AssembleDocument.aspx?ItemId={ItemId}&amp;ListId={ListId}"/>

</CustomAction>

<CustomAction Id="Assemble Document (New)"

RegistrationType="List"

RegistrationId="101"

ImageUrl="/_layouts/images/GORTL.GIF"

Location="EditControlBlock"

Sequence="101"

Title="Assemble Open XML Document (New)" >

<UrlAction Url="~site/_layouts/CustomApplicationPages/AssembleDocumentNew.aspx?ItemId={ItemId}&amp;ListId={ListId}"/>

</CustomAction>

</Elements>

The above xml defines two custom actions, called "Assemble Open XML Document" and "Assemble Open XML Document (New)". These commands will direct users to two different ASP.NET urls, which will allow users to specify the name of the merged document. Both urls will contain a text field as well as an Assemble Document button:

image

Step 4 – Merge Documents with the Open XML SDK

The document assembly solution will be invoked via the Assemble Document button command from the ASP.NET url mentioned above. This command will perform the following actions:

  1. Retrieve the template document from SharePoint library
  2. Open the template document with the Open XML SDK
  3. Retrieve all content controls and their content from the template document
  4. Retrieve all referenced documents from the appropriate SharePoint library
  5. Replace the content control within the template with the merged content from the SharePoint library (using altChunks)
  6. Save the final document with the name/path specified by the text field from the ASP.NET url

The following code snippet accomplishes these actions:

protected void AssembleDocumentBtn_Click(Object sender, EventArgs e) { SPSite siteCollection = this.Site; SPWeb site = this.Web; site.AllowUnsafeUpdates = true;   lblTemplateDocument.Text = tbNewDocumentName.Text + "<=====";   string ListId = Request.QueryString["ListId"]; SPList list = site.Lists[new Guid(ListId)]; string ItemId = Request.QueryString["ItemId"]; SPListItem item = list.Items.GetItemById(Convert.ToInt32(ItemId));   if (list is SPDocumentLibrary) { SPDocumentLibrary documentLibrary = (SPDocumentLibrary)list;   SPFile file = site.GetFile(item.Url);   byte[] byteArray = file.OpenBinary(); using (MemoryStream mem = new MemoryStream()) { mem.Write(byteArray, 0, (int)byteArray.Length); using (WordprocessingDocument myDoc = WordprocessingDocument.Open(mem, true)) { var contentControls = myDoc.MainDocumentPart .Document .Descendants<SdtBlock>() .Select(b => GetTextFromContentControl(b));   var insertList = contentControls .Select(s => s.Trim().Split('/')) .Where(g => g.Count() == 2) .Select(g => new { DocumentLibrary = g[0], DocumentName = g[1] } );   var libraryList = insertList.Select(c => c.DocumentLibrary) .Distinct().ToList(); List<InsertDocument> insertDocumentList = new List<InsertDocument>();   foreach (var lib in libraryList) { SPList clauseList = site.Lists[lib]; SPListItemCollection listItems = clauseList.Items;   // get list of all insert documents from the document // libraries for (int i = 0; i < listItems.Count; ++i) { insertDocumentList.Add( new InsertDocument() { DocumentLibraryName = lib, DocumentName = listItems[i]["LinkFilename"] .ToString(), ListItems = listItems } ); } }   ReplaceContentControls(myDoc, insertDocumentList); }   // write it back to the document library SPFolder fldr = site.GetFolder(list.RootFolder.Url); SPFileCollection files = fldr.Files; files.Add(tbNewDocumentName.Text, mem, true); } string libraryRelativePath = documentLibrary.RootFolder .ServerRelativeUrl; string libraryPath = siteCollection.MakeFullUrl(libraryRelativePath); Response.Redirect(libraryPath); } } private void ReplaceContentControls(WordprocessingDocument myDoc, List<InsertDocument> insertDocumentList) { MainDocumentPart mainPart = myDoc.MainDocumentPart; List<SdtBlock> sdtList = mainPart.Document.Descendants<SdtBlock>() .ToList(); foreach (var sdt in sdtList) { string[] text = sdt.InnerText.Trim().Split('/'); InsertDocument insertDocument = GetInsertDocument(insertDocumentList, text[0], text[1]);   if (insertDocument != null) { // create unique AltChunkId string altChunkId = "AltChunkId" + (insertDocument.Idx).ToString();   // grab the file from SharePoint SPFile insertFile = insertDocument .ListItems[insertDocument.Idx].File; byte[] insertDocByteArray = insertFile.OpenBinary();   // create the new chunk part AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart( AlternativeFormatImportPartType.WordprocessingML, altChunkId);   // create a memory stream from the byte array, and feed the // memory stream into the newly created chunk using (MemoryStream insertMem = new MemoryStream()) { insertMem.Write(insertDocByteArray, 0, (int)insertDocByteArray.Length); insertMem.Seek(0, SeekOrigin.Begin); chunk.FeedData(insertMem); } AltChunk altChunk = new AltChunk(); altChunk.Id = altChunkId;   // add the chunk element and remove the content control OpenXmlElement parent = sdt.Parent; parent.InsertAfter(altChunk, sdt); sdt.Remove(); } } mainPart.Document.Save(); } static string GetTextFromContentControl(SdtBlock contentControlNode) { return contentControlNode.Descendants<Paragraph>() .Select ( p => { var t = p.Elements() .Where(z => z is Run || z is InsertedRun) .SelectMany(r => r.Elements<Text>()); return t.StringConcatenate(text => text.Text) + Environment.NewLine; } ).StringConcatenate(); }   private class InsertDocument { public string DocumentLibraryName { get; set; } public string DocumentName { get; set; } public SPListItemCollection ListItems { get; set; } public int Idx { get; set; } }   private InsertDocument GetInsertDocument(List<InsertDocument> source, string library, string document) { return source.Select( (s, i) => new InsertDocument() { DocumentLibraryName = s.DocumentLibraryName, DocumentName = s.DocumentName, ListItems = s.ListItems, Idx = i } ) .Where(s => s.DocumentLibraryName == library && s.DocumentName == document) .FirstOrDefault(); }

The above code snippet will merge the Open XML documents together and will ensure that all formatting and content are preserved. Here is a screenshot of how the merged document looks like:

image

That being said, the code snippet will not update your table of contents, as shown below:

image

Step 5 – Updating Fields with Word Automation Services

Instead of requiring users to manually update their table of contents, we can perform this action automatically with Word Automation Services. That's where our second custom action command will come into play. The second custom action command is exactly the same as our first command except that it will also invoke Word Automation Services. Here is the code snippet for the second custom action command:

protected void AssembleDocumentNewBtn_Click(Object sender, EventArgs e) {   ...     if (list is SPDocumentLibrary) { ...   string outputFileUrl = site.Url + "/" + tbNewDocumentName.Text;   ConversionJob convJob = new ConversionJob("Word Automation Services"); convJob.Name = "Document Assembly"; convJob.UserToken = site.CurrentUser.UserToken; convJob.Settings.UpdateFields = true; convJob.AddFile(outputFileUrl, outputFileUrl); convJob.Start();     string libraryRelativePath = documentLibrary.RootFolder .ServerRelativeUrl; string libraryPath = siteCollection.MakeFullUrl(libraryRelativePath); Response.Redirect(libraryPath); } }

As you can see, calling into Word Automation Services is pretty easy; it's only six lines of code!

End Result

Using the above code we should end up with a merged document that has an updated table of contents:

image

I am excited about this solution because it shows you the power of combining the Open XML SDK with Word Automation Services.

Zeyad Rajabi

Comments

  • Anonymous
    February 10, 2010
    You mention layout again in this article, yet all the research I've done so far indicates that Word Automation Services only supports print automation and file conversion at this time. Can you please expound on the layout (i.e. pagination) control and when/if it will be supported. Or, if I've missed it, please talk more about how it is implemented? Thanks!

  • Anonymous
    February 10, 2010
    I left this same comment on the Word Automation Services page, but never got a response. Maybe someone will address here... Our biggest bottleneck at this moment is the requirement to use Word for Windows on the server-side for evaluating document pagination and manipulating that pagination in some cases (e.g. Final document has a signature orphaned on the last page by itself and the last paragraph is too big to allow widow/orphan to automatically move to last page with signature). To address situation like this example we need to determine if the signature block is the first thing on the last page of the document and count several lines before it to insert a hard page break if that is the case. I was hoping that Word Automation Services was going to provide pagination, but it does not appear that functionality is in there. Can someone please suggest an enterprise solution for evaluating a Word document based on final pagination and programmatically modifying that document as described above or point me in the right direction for learning how Word Automation Services supports this? Thanks!

  • Anonymous
    February 10, 2010
    Drewberk – Thanks for the feedback. Unfortunately, Word Automation Services does not update/add lastRenderedPageBreak elements in your document. Word Automation Services will repaginate your document such that any page reference field, like in a table of contents, will be updated appropriately. As a workaround to your solution, you could do the following:

  1. Add a hidden paragraph in between even paragraph that contains a page number field
  2. Use Word Automation Services to update/repaginate document
  3. Look at page numbers within the fields to determine pages within your document
  4. Perform your custom document manipulation
  5. Remove hidden paragraphs from document Hopefully this workaround works for you.
  • Anonymous
    February 17, 2010
    @Zeyad - Interesting suggestion. Thanks! I'll look into that.

  • Anonymous
    February 23, 2010
    Hi Zeyad, At the PDC conference session on Open XML SDK and Word Automation Services you mentioned that we can even convert files to Open Document Text (ODF format). Is it really possible? as I can not seem to find any example and the save formats under Microsoft.Office.Word.Server.Conversions namespace does not contain ODF as an option. Could you please elaborate on this? Kind regards, Wamiq Ansari