Mail Merging Into a Single Word Document Using XML Literals
With the release of Microsoft Office 2007 we can work with a much simpler, standard, XML format called Open XML which opens the door for many types of applications that cannot work via COM. What if you needed to build a scalable web service that processes many documents in high volume? What if you wanted to quickly read or write to these formats from a client application but wanted to have minimal dependencies on other applications? These types of programs do not want to require Microsoft Office be installed to run. The cool thing is you have the tools already with Visual Basic 2008. XML Literals are an easy way to manipulate any kind of XML, including Open XML.
I’ve written before on how to use XML Literals to manipulate Word documents, if you missed them:
- DevProConnections Article: Taking Advantage of LINQ and XML in Microsoft Office 2007
- Sample: Working with Open XML in Visual Basic
- OBA Part 3 - Storing and Reading Data in Word Documents
- Accessing Open XML Document Parts with the Open XML SDK
- Mail Merging with Word, LINQ and XML Literals in VB
If you aren’t familiar with XML Literals or Open XML then I’d suggest reading the article in the first bullet above first. It’s a great intro into XML Literals, Open XML and it also demonstrates a couple practical ways of reading and writing to Word document formats directly.
Recently I got a great question on how to change the program outlined in the last bullet above to merge letters we were creating on the fly from a database into a single document instead of multiple documents like we’re doing. This would make it much easier for an end user to print them. That’s such an obvious thing I can’t believe I didn’t think of that! For those of you that aren’t familiar with what we did I’ll give a quick recap of that application.
Creating Word Documents with XML Literals
What I wanted to do is take all the Customers in the Northwind database who had some orders shipping today and send them thank you letters. To get started I created a new Word 2007 document with the letter text and some placeholder field names to indicate where I want the data:
Save the document (I named it Doc1.docx) and then rename the .docx extension to .zip and you can drill into the package and see a bunch of XML documents inside it. If we drill down through the zip file we'll see that the text we just typed is located in the \word\document.xml file. Copy this xml and paste it into the Visual Basic editor, it will infer it as an XDocument object, and then you can use embedded expressions to replace the placeholders. (If I’ve lost you, read this article which explains it step-by-step.)
In order to create letters for our customers in the database, we want to end up with a collection of XDocuments with our data merged into them. I created a simple class called Letter that has two properties, CustomerID As String and Document As XDocument. Then I wrote a LINQ query to select the data and embed it into the document (I’ve omitted all the WordProcessingML for clarity, take a loot at the code sample for the whole listing.)
Dim letters = _
From Order In db.Orders _
Join Employee In db.Employees _
On Order.EmployeeID Equals Employee.EmployeeID _
Where Order.OrderDate IsNot Nothing AndAlso _
Order.ShippedDate IsNot Nothing AndAlso _
Order.ShippedDate.Value.Date >= #1/1/2007# _
Let DateOrder = Order.OrderDate.Value.ToShortDateString _
Let DateShip = Order.ShippedDate.Value.ToShortDateString _
Select New Letter With { _
.CustomerID = Order.Customer.CustomerID, _
.Document = <?xml version="1.0" encoding="utf-8" standalone="yes"?>
<w:document ...
<w:p w:rsidR="00705CFF" w:rsidRDefault="00112228" w:rsidP="00AA2EC6">
<w:r>
<w:br/>
</w:r>
<w:r w:rsidR="007A5236">
<w:t xml:space="preserve">Dear </w:t>
</w:r>
<w:proofErr w:type="spellStart"/>
<w:r>
<w:t> <%= Order.Customer.ContactName %> </w:t>
</w:r>
<w:proofErr w:type="spellEnd"/>
<w:r w:rsidR="00AA2EC6">
<w:t>,</w:t>
</w:r>
</w:p>
<w:p w:rsidR="00E04FB0" w:rsidRDefault="00AA2EC6" w:rsidP="00AA2EC6">
<w:r>
<w:t xml:space="preserve">We’d like to inform you that the order you placed on </w:t>
</w:r>
<w:proofErr w:type="spellStart"/>
<w:r w:rsidR="00112228">
<w:t> <%= DateOrder %> </w:t>
</w:r>
<w:proofErr w:type="spellEnd"/>
<w:r w:rsidR="00806521">
<w:t xml:space="preserve"> has shipped on </w:t>
</w:r>
<w:proofErr w:type="spellStart"/>
<w:r w:rsidR="00112228">
<w:t> <%= DateShip %> </w:t>
</w:r>
... </w:document>}
The query returns an IEnumerable(Of Letter) objects with the Document property set to the document.xml data for that customer. The next step involved taking the template Doc1.docx file, making a copy of it and then simply replacing the document.xml part inside the package with the one here in the Letter class for each letter. However, what I did was create separate letters for each customer. Instead we want to create one .docx file with ALL the letter data merged inside.
Merging into a Single Word Document
The first thing we need to do is insert a page break between all our letters. The easiest thing to do is to open back up your template Doc1.docx file in Word and insert a page break, save it, and then look at the document.xml again in the package. You should see this WordProccessingML element inserted near the end:
<w:p w:rsidR="00622A50" w:rsidRDefault="00622A50">
<w:r>
<w:br w:type="page"/>
</w:r>
</w:p>
Paste this into the query above in the document exactly where you pulled it out of the document.xml.
Next we have a couple options on how to manipulate the package (docx file). I’ve shown how to use the Open XML SDK before to manipulate documents but in this simple case we can use the System.IO.Packaging classes directly. This is because all we’re doing is working with text in the document. If we were working with images or embedded objects it would be easier to use the SDK. In the next post we’ll add photos of the employees on each order into the document, but for this post let’s just focus on how to merge the text.
So the first thing we need to do is Import some namespaces including an XML namespace that we’re going to use:
Imports <xmlns:w="https://schemas.openxmlformats.org/wordprocessingml/2006/main">
Imports System.IO
Imports System.IO.Packaging
Next we need to set up some variables and copy the Doc1.docx template to a new file I’m calling AllLetters.docx.
Dim uri As New Uri("/word/document.xml", UriKind.Relative)
Dim contentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"
Dim sourceFile = CurDir() & "\Doc1.docx"
Dim letterDir = CurDir() & "\Letters\"
My.Computer.FileSystem.CreateDirectory(letterDir)
Dim mergeFile = letterDir & "AllLetters.docx"
My.Computer.FileSystem.CopyFile(sourceFile, mergeFile, True)
Now we can open the template package and load the main document part as an XElement:
Using p As Package = Package.Open(mergeFile)
'get the main document part (document.xml)
Dim mainPart = p.GetPart(uri)
Dim mainDocumentXML As XElement
Using sr As New StreamReader(mainPart.GetStream)
mainDocumentXML = XElement.Load(sr)
End Using
The next part is where it gets fun. A word document has a top element structure like this:
<w:document>
<w:body>
...
</w:body>
</w:document>
So what we need to do is grab the body of the template, replace it with the first customer’s document body we have and then append the rest of the elements inside the bodies of the rest of the documents. A word document can only have one <w:body> element for it to be legal. The way we get the <w:body> XElement from the document is using the descendants syntax with the three dot notation (…):
Dim mainBody = mainDocumentXML...<w:body>.First()
Dim i = 0
For Each letter In letters
Dim nextBody = letter.Document...<w:body>.First()
If i = 0 Then
'Replace the first body contents in the template
mainBody.ReplaceNodes(nextBody.Elements())
Else
'Append the new contents for the rest of the customers
mainBody.Add(nextBody.Elements())
End If
i += 1
Next
Now that we’ve built up the right body with all our letter data in it, we can replace the main document.xml part in the package and close it.
'Delete the current document.xml file in the template
p.DeletePart(uri)
'Replace that part with our new merged XDocument
Dim replace As PackagePart = p.CreatePart(uri, contentType)
Using sw As New StreamWriter(replace.GetStream())
mainDocumentXML.Save(sw)
sw.Close()
End Using
p.Close()
End Using
Now when we run this, all the letters will be mail merged into the same document for easy printing:
As I mentioned if we have embedded images or objects that we need to replace it gets a little trickier. In the DevProConnections Article I have an example of how to create multiple documents with embedded pictures of the employee’s photo. In the next post I’ll show you how we can use the Open XML SDK to create a single document with embedded pictures as well.
Until then, have a look at the complete code I put up on Code Gallery.
[UPDATE: Merging Text & Photos into a Word Document using Open XML SDK]
Enjoy!
Comments
Anonymous
September 24, 2009
Did you consider using XSLT to perform the replacement and transformation, this would allow the flexibility of changing the document without a recompile? I attempted a similar project for creating public documents for a council in 2005 using Office 2003. The hardest part about it was accepting a word document as a plain .doc file and converting it to .xml to then append it as a supplement to the end of the report. In Office 2003 this project was definitely a death march project, but from looking at your examples it looks like this project would now be significantly easier. I did get the project working however because of the complexity of the xslt's produced and the fact that Linq to XML was not yet available the datasets and xslt's were extremely susceptible to the slightest change.Anonymous
September 24, 2009
Hi Peter, XSLT would certianly work. I have used it in the past for many types of content generation. This problem above is a simple one so it would work well. However, XSTL isn't a real programming language and I find that it quickly gets unweildy if you have a lot of complex logic. If you didn't want to recompile this program you could use XDocument.Load to load the XML and then use axis properties to find the elements you wanted to replace instead of embedded expressions. Cheers, -BAnonymous
October 08, 2009
Thanks Beth, having learned a lot from the program, if we have header and footer in the sourcefile, and using the ReplaceNodes method to replace the body contents, then the we lost the header and footer, wondering if there is some method to solve this problemAnonymous
October 08, 2009
Hi Beth, Just find another article from you, "Merging Text & Photos into a Word Document using Open XML SDK" I got the solution (^_^) you have writen great articles !! thank you so much!Anonymous
November 04, 2009
Thanks Splend! Just for reference for other folks reading this thread, headers and footer appear as links (relationship IDs like images) and so this technique above will work if you have references to them and they can be the same for every page. Inserting the contents into <body> tag should also include the relationship IDs to them in the sectPr element node. i.e. <w:body> ... <w:sectPr w:rsidR="00993580" w:rsidSect="00993580"> <w:headerReference w:type="default" r:id="rId6"/> <w:footerReference w:type="default" r:id="rId7"/> <w:pgSz w:w="12240" w:h="15840"/> ... </w:sectPr> </w:body> Thanks for reading!Anonymous
February 02, 2010
In this example you used an XML literal. I want to do the same thing as 'Peter', i.e. load the XML dynamically so the I don't have to recompile when the user wants to change the template a bit. I need a pointer to some examples as to how I would find and replace your placeholder (e.g. CompanyName) with the dynamic code <% = CompanyName %>. Any help appreciated.Anonymous
February 02, 2010
In this example you used an XML literal. I want to do the same thing as 'Peter', i.e. load the XML dynamically so the I don't have to recompile when the user wants to change the template a bit. I need a pointer to some examples as to how I would find and replace your placeholder (e.g. CompanyName) with the dynamic code <% = CompanyName %>. Any help appreciated.Anonymous
February 03, 2010
Hi PhilM, You wouldn't be able to use an XML literal in this case because the XML is compiled. What you would do instead is use XElement.Load to load in the XML template and then you'd need to query the XML (using axis properties) to find the elements in which you wanted to replace the value. But because WordProcessing XML is hard to query I'd use content controls instead. That way you could just bind them to your custom XML stored inside the document and work with that much easier. I have an example here: http://blogs.msdn.com/bethmassi/archive/2009/02/12/oba-part-3-storing-and-reading-data-in-word-documents.aspx HTH, -BethAnonymous
February 03, 2010
The comment has been removedAnonymous
February 04, 2010
You wouldn't be modifying the content control itself, you'd be modifying the CustomXML that they are bound to, so yes you could transform that using LINQ to XML any way you want as long as the binding remained intact. Though this approach would produce multiple documents. You should be able to get them to repeat or merge the documents using the Open XML SDK though. I haven't played with it but it seems possible.Anonymous
February 04, 2010
The comment has been removedAnonymous
February 04, 2010
The comment has been removedAnonymous
February 04, 2010
Thanks, you've been quite helpful. This is close to being another dead end I guess. At least I learned how it worked. I thought when I saw that you hoisted the XML out of the Word template, put it in the program and modified it, that I could do the same at run time. That way the user could modify the layout of the template without a recompile of the program. No soap I guess. Now what, I'll look over your 'reverse' idea but I really want to have the user have control of the template yet still be able to run a merge using it. There's gotta be a way... sigh.Anonymous
February 04, 2010
You could definitely do this with a VSTO add-in or template. That way you could give the user the ability to drag placeholders onto the actual letter and then you could just use VSTO to query the data and populate the document directly. I'm assuming you don't like this idea though because you're trying to automate and scale the generation process. But VSTO is a great end-user solution. I'll think about your scenario, I think there's a way I just need to focus on it for a bit (and I have a million things running around attm ;-))Anonymous
February 04, 2010
Yeah, I'm actually headed for a server side solution with no MSOffice Installed. So that's why the VSTO solution is out. Thanks, I'll check often to see if you have new ideas. What I'll do for now is recompile when the user changes the letter layout (or anything about it). There must be a way..... Thanks againAnonymous
February 04, 2010
Hey, I found an archaic way to do it!! I can now load the XML from a template docx, loop through all the orders creating letters with the data from the order in each one. It uses your code but drops the XML Literal and doesn't use LINQ. (Sorry, but LINQ seems to be great for compile time). This may be self-evident: Sub Main() Console.WriteLine("Mail merging....") Dim db As New NorthwindDataContext 'Letter query -- Create a query that will create Letter objects for each ' of the customers in Northwind that have shipping orders: 'My version Dim strLetter As String Dim strLetter2 As String 'Now load it dynamically Dim sourceFile = CurDir() & "QuickTest.docx" 'Open the Document template using the Open XML SDK Using wordDoc = WordprocessingDocument.Open(sourceFile, False) 'get the main document part (document.xml) Dim mainPart = wordDoc.MainDocumentPart() Dim mainDocumentXML As XDocument Using sr As New StreamReader(mainPart.GetStream) mainDocumentXML = XDocument.Load(sr) 'load the document XML End Using 'Prepend the document XML. StrLetter becomes our defacto template for each Letter. strLetter = "<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>" + mainDocumentXML.ToString End Using Dim MyLetters As New List(Of Letter) Dim MyLetter As Letter For Each Order In db.Orders 'No filters to make it simple MyLetter = New Letter strLetter2 = strLetter.Replace("CUSTOMERNAME", HtmlEncode(Order.Customer.CompanyName)) 'Must Import System.Web.Httputility MyLetter.CustomerID = Order.Customer.CustomerID MyLetter.Document = XDocument.Parse(strLetter2) MyLetters.Add(MyLetter) Next CreateOneLetter(MyLetters) End SubAnonymous
February 05, 2010
Hi PhilM, Great, you could also use XSLT as well this way. Just be careful with your Replace statements because it may give you unpredictable results. If you have a placeholder name that the user is also using in the template itself it will overwrite it. Also by using System.Web you won't be able to run on the Client Profile (but that probably doesn't matter for your case). Cheers, -BAnonymous
February 05, 2010
Yeah, the next move would be to figure out how to use XSLT to replace the placeholders with the appropriate data. Then I wouldn't have to go to string, use replace, and then go back. And I probably would not have to import System.Web (I was only looking for HTMLEncode because some of the data has special characters). The Client Profile is new to me but I'm going to run all of this server-side where presumably the full .Net Framework will be installed. Thanks for your help. Anything I can do to say thanks? philm@cottagecomputing.comAnonymous
January 19, 2011
Great blog! I have made some modifications. Instead of adding the following <w:p w:rsidR="00622A50" w:rsidRDefault="00622A50"> <w:r> <w:br w:type="page"/> </w:r> </w:p> for the break page in the query, I have added it in the for each statement, like this For Each letter In letters Dim nextbody = letter.Documents...<w:body>.First If i = 0 Then mainBody.ReplaceNodes(nextbody.Elements) Else 'Add a page break if there is another letter mainBody.Add(<w:p w:rsidR="00622A50" w:rsidRDefault="00622A50"> <w:r> <w:br w:type="page"/> </w:r> </w:p>) mainBody.Add(nextbody.Elements) End If i += 1 Next This way, a blank page is not printed after all other pages have been printed. Good or bad? Is there a better way of doing it?Anonymous
April 11, 2011
Hi, this article is good and i am thankful to you. but some problem i am having that... URI path is not working.... can u please suggest me..... this part Dim uri As New Uri("/word/document.xml", UriKind.Relative) this URI not getting any path value.... and no styles and footer added to the newly created doc file! Please help Regards, Map.