Java and MS-Word
Java and MS-Word - followup
Earlier this month, I posted some references to some Java->WordML interop material. This is a followup.
I proved to myself that it is pretty easy and straightforward to use Java to dynamically create MS-Word documents, conforming to the WordProcessingML schema. Anyone can do this, using the schema documentation, an XML-aware Java application platform.
To use this approach, a developer really needs to have a working installation Word 2003 for the development or design stage: to design the document and generate the initial XML, and you need Word 2003 to verify that what you are producing is a valid WordProcessingML document.
How did I do it?
You all know that Microsoft Word (and other Office applications) can load and save XML, and you know the schema is published by Microsoft.
The XML phreaks out there, maybe they like to wake up in the morning, drink 7 cups of starbucks' best, look at a schema, and start coding angle brackets. Not me. Given an XML schema of reasonable complexity, I have little hope of independently generating an XML document that conforms to that schema, within my lifetime. So what I did was use MS-Word as the designer. I just wrote a document. Anybody can do that. I designed the document exactly as I wanted it. Then File... Save As.... XML. Boom, I have a template document that conforms to WordProcessingML.
From that starting point, I took 2 paths. The first was to just place within that Template document keywords or fields to be replaced programmatically at runtime, with a simple text replacement library. In Java, the java.lang.String class has a replaceAll() method that accepts regular expressions and inserts replacement text. Easy. I just inserted a set of "fields" that look like ##NAME##. These are not MS-Word "fields", just plain old text, within the XML document, of a well-known format. You can use any format you like. $$NAME$$ if you want, or whatever.
The Java application then populates a Hashtable of name/value pairs, then mechanically replaces all the fields in the doc whose names are present in the Hashtable, with the value of that key. Simple. Find ##FOO## in the doc, and replace that with Hashtable.get("FOO"). The Hashtable can be populated by any means - I inserted the current time of day as one of the name/value pairs, and I also populated the list with data from a SQL query. It could also be populated from a webservices call. Whatever. It's just a Hashtable.
After replacing the "fields", the result was a legal WordProcessingML document, dynamically-generated from data. Load that doc into MS-Word, print it, whatever. Easy.
The second path I took was more XML-ish. My data source was an XML document. All data, including current time of day, and anything you might retrieve from a database, gets formatted into an XML document. You choose the schema. This doc could be obtained via a webservices call, from a database query (SQL Server and other databases can return XML documents in response to queries) or just formed in memory. I took the latter approach. Anything will do.
I then de-constructed the template XML document, and formed it into an XSL transform that could accept the XML data document, and again, produce a WordProcessingML document. Then it is a simple matter of applying the XSL transform programmatically, at runtime. This requires at least Java 1.4, which you all should be using anyway because it is more current with security fixes. Also you should take this route only if you are comfortable with XSL. It is hairy for some people.
Either path - the template version or the XSL transform - produces the same result: a valid WordProcessingML document. Either works for standalone applications or in web applications.
In Action
Those of you who are familiar with XML technologies won't be surprised to learn that it just works. But even so, the ability to dynamically generate a rich Word document, with images, text formatting, tables, and so on, all from Java, may open up some possibilities for you. Check it out for yourself. Here's a working example that uses a JSP to dynamically generate a document file. You should have MS-Word installed on your PC if you want to see the result.
Next up
I didn't try the XSL-FO route or the RenderX stylesheet I mentioned in my previous post. Also I did not try to slurp up documents with custom-schema into Word. And I didn't transmit the XML documents over webservices. I may explore some of these things in the future. Anyone have any other ideas?
Let me know what you think!
Here's the example, including links to source code.
Enjoy.
-Dino
Comments
Anonymous
March 30, 2005
Dino Chiesa of Microsoft shows how to generate dynamically WordML documents using Java and XSLT. Yep, that's not a typo, Microsoft, WordML and Java. XML serves as peacemaker again. And he even provides a working JSP demo. Cool....Anonymous
March 31, 2005
If you are taking the "Replace All" approach, such as in CreateOrderConfViaTemplate.java, the value you insert into the XML should be XML-encoded.
For example, the following characters (spelled out) must be escaped:
"less-than"
"greater-than"
"apostrophe"
"double-quote"
"ampersand"Anonymous
March 31, 2005
Good point Martin. I've updated the examples. Thanks.Anonymous
April 07, 2005
I need to convert a generated WordML document to a .doc-file. Does somebody know how to do this? I would prefer a Java solution, but .NET solution is OK too.Anonymous
April 18, 2005
@Gunther,
to do that you could just automate MS-Word in .NET, open the WordML file, then SaveAs.
There are examples of how to automate office in the .NET SDK install.Anonymous
April 19, 2005
Can we achieve mail merge functionality of word with xml data with this approach?Anonymous
April 28, 2005
@Gunther, Dino,
A Java WordprocessingML to Doc converter sure would be nice though. I'm a Mac user. I paid half a grand for Office Pro, but Word 2004 doesn't do XML. I have to buy yet another copy of Word, 2003, and run it in Virtual PC, and I can't script the conversion from the OS X side. Where's the inter-op in that? In the future I really hope to see full support of WordprocessingML in all versions of Word so that someday we can actually distribute documents in that format, but until then a portable wordml2doc converter would be a good thing for all.Anonymous
April 28, 2005
Hi, I was trying to view and download the example that you said about generating word file in Jsp. Unfortunately the link was not working. Will it possible to email me the example with source code.
Thanks in advance,
MamunAnonymous
May 26, 2005
@Mamun,
Sorry, the quality of service on that machine is a little low. it was sitting on an old laptop that had some power problems. I've since migrated it to a newer machine. the link ought to work now?
http://dinoch.dyndns.org:7070/WordML/Anonymous
February 05, 2006
PingBack from http://www.neirrek.com/blog/2005/05/11/xml-a-la-rescousse-dela-generation-de-documents-microsoft-office-2003/Anonymous
October 03, 2006
A while back, the OpenXmlDeveloper.org website offered an example of how to create a WordProcessingMLAnonymous
January 17, 2007
In the past I've posted some articles [ 1 , 2 ] about generating Office 2003 documents from a server-sideAnonymous
September 09, 2008
You can use Rtf Writer2 to write rtf and open in Word or OpenOffice (Writer) ...Anonymous
April 09, 2009
Hi, I was trying to view and download the example that you said about generating word file in Jsp. Unfortunately the link was not working. Will it possible to email me the example with source code. Thanks in advance, MathieuAnonymous
April 20, 2009
Hi, I am looking for java code/utility to check if a given MS Word document has track changes ON or not. Any help is appreciated..Anonymous
May 14, 2009
Your source code links aint working 15/05/2009Anonymous
May 29, 2009
Yes, my server is down and cannot get up! Sorry!Anonymous
August 11, 2009
If you have any example on it please send it to my mail id. Thanks GhouseAnonymous
September 26, 2009
Plsease send me the example source code to zrosko@yahoo.comAnonymous
October 05, 2009
Please send the example source code to facp77@live.com.mxAnonymous
October 10, 2009
I am doing some similar job and need help. please send the example to wa0805@hotmail.com if u can. thanksAnonymous
November 27, 2009
Hi, I am trying to do a similar job, but need help with with tables and images....how to use data in the XML file to populate a word table? and similarly how to get word to load image from a link... anyone has a working example??Anonymous
December 23, 2009
The comment has been removedAnonymous
January 26, 2010
Hi, I'm trying to makeout similar one to this. Can you mail your source code to paonethestar@gmail.com to take that as starting point. Thanks, PavanAnonymous
March 10, 2010
It's a shame that the source code is no longer available. Could you please send it to me at raluca.stanculescu@gmail.comAnonymous
April 27, 2010
Dino, can you mail me your code? tksAnonymous
November 30, 2010
Dino, please send the source code to narayanf1@gmail.com. would be nice if you could share it on some website, since we don't expect you to email whenever someone asks you here :-PAnonymous
December 09, 2010
Good solution Dino, Can we convert a .DOT (a word template file) to a .DOC (ms word document file) programmatically by filling some values at given places. Ex: my template would have an Attribute Display Name: Attribute Value, the program should fill these values when I pass some array of values or name, pair, etc... Let me know if such thing is achievable mostly in Java, C or C++. Post your solution to rnvssudheer@hotmail.com. Thanks, SudheerAnonymous
April 05, 2012
The comment has been removed