Udostępnij za pośrednictwem


Java and MS-Word

Java and MS-Word - followup

Earlier this month, I posted some references to some Java->WordML interop material. This is a followup.

I proved to myself that it is pretty easy and straightforward to use Java to dynamically create MS-Word documents, conforming to the WordProcessingML schema. Anyone can do this, using the schema documentation, an XML-aware Java application platform.

To use this approach, a developer really needs to have a working installation Word 2003 for the development or design stage: to design the document and generate the initial XML, and you need Word 2003 to verify that what you are producing is a valid WordProcessingML document.

How did I do it?

You all know that Microsoft Word (and other Office applications) can load and save XML, and you know the schema is published by Microsoft.

The XML phreaks out there, maybe they like to wake up in the morning, drink 7 cups of starbucks' best, look at a schema, and start coding angle brackets. Not me. Given an XML schema of reasonable complexity, I have little hope of independently generating an XML document that conforms to that schema, within my lifetime. So what I did was use MS-Word as the designer. I just wrote a document. Anybody can do that. I designed the document exactly as I wanted it. Then File... Save As.... XML. Boom, I have a template document that conforms to WordProcessingML.

From that starting point, I took 2 paths. The first was to just place within that Template document keywords or fields to be replaced programmatically at runtime, with a simple text replacement library. In Java, the java.lang.String class has a replaceAll() method that accepts regular expressions and inserts replacement text. Easy. I just inserted a set of "fields" that look like ##NAME##. These are not MS-Word "fields", just plain old text, within the XML document, of a well-known format. You can use any format you like. $$NAME$$ if you want, or whatever.

The Java application then populates a Hashtable of name/value pairs, then mechanically replaces all the fields in the doc whose names are present in the Hashtable, with the value of that key. Simple. Find ##FOO## in the doc, and replace that with Hashtable.get("FOO"). The Hashtable can be populated by any means - I inserted the current time of day as one of the name/value pairs, and I also populated the list with data from a SQL query. It could also be populated from a webservices call. Whatever. It's just a Hashtable.

After replacing the "fields", the result was a legal WordProcessingML document, dynamically-generated from data. Load that doc into MS-Word, print it, whatever. Easy.

The second path I took was more XML-ish. My data source was an XML document. All data, including current time of day, and anything you might retrieve from a database, gets formatted into an XML document. You choose the schema. This doc could be obtained via a webservices call, from a database query (SQL Server and other databases can return XML documents in response to queries) or just formed in memory. I took the latter approach. Anything will do.

I then de-constructed the template XML document, and formed it into an XSL transform that could accept the XML data document, and again, produce a WordProcessingML document. Then it is a simple matter of applying the XSL transform programmatically, at runtime. This requires at least Java 1.4, which you all should be using anyway because it is more current with security fixes. Also you should take this route only if you are comfortable with XSL. It is hairy for some people.

Either path - the template version or the XSL transform - produces the same result: a valid WordProcessingML document. Either works for standalone applications or in web applications.

In Action

Those of you who are familiar with XML technologies won't be surprised to learn that it just works. But even so, the ability to dynamically generate a rich Word document, with images, text formatting, tables, and so on, all from Java, may open up some possibilities for you. Check it out for yourself. Here's a working example that uses a JSP to dynamically generate a document file. You should have MS-Word installed on your PC if you want to see the result.

Next up

I didn't try the XSL-FO route or the RenderX stylesheet I mentioned in my previous post. Also I did not try to slurp up documents with custom-schema into Word. And I didn't transmit the XML documents over webservices. I may explore some of these things in the future. Anyone have any other ideas?

Let me know what you think!

Here's the example, including links to source code.

Enjoy.
-Dino

Comments

  • Anonymous
    March 30, 2005
    Dino Chiesa of Microsoft shows how to generate dynamically WordML documents using Java and XSLT. Yep, that's not a typo, Microsoft, WordML and Java. XML serves as peacemaker again. And he even provides a working JSP demo. Cool....

  • Anonymous
    March 31, 2005
    If you are taking the "Replace All" approach, such as in CreateOrderConfViaTemplate.java, the value you insert into the XML should be XML-encoded.

    For example, the following characters (spelled out) must be escaped:

    "less-than"
    "greater-than"
    "apostrophe"
    "double-quote"
    "ampersand"

  • Anonymous
    March 31, 2005
    Good point Martin. I've updated the examples. Thanks.

  • Anonymous
    April 07, 2005
    I need to convert a generated WordML document to a .doc-file. Does somebody know how to do this? I would prefer a Java solution, but .NET solution is OK too.

  • Anonymous
    April 18, 2005
    @Gunther,
    to do that you could just automate MS-Word in .NET, open the WordML file, then SaveAs.

    There are examples of how to automate office in the .NET SDK install.

  • Anonymous
    April 19, 2005
    Can we achieve mail merge functionality of word with xml data with this approach?

  • Anonymous
    April 28, 2005
    @Gunther, Dino,

    A Java WordprocessingML to Doc converter sure would be nice though. I'm a Mac user. I paid half a grand for Office Pro, but Word 2004 doesn't do XML. I have to buy yet another copy of Word, 2003, and run it in Virtual PC, and I can't script the conversion from the OS X side. Where's the inter-op in that? In the future I really hope to see full support of WordprocessingML in all versions of Word so that someday we can actually distribute documents in that format, but until then a portable wordml2doc converter would be a good thing for all.

  • Anonymous
    April 28, 2005
    Hi, I was trying to view and download the example that you said about generating word file in Jsp. Unfortunately the link was not working. Will it possible to email me the example with source code.

    Thanks in advance,

    Mamun

  • Anonymous
    May 26, 2005
    @Mamun,
    Sorry, the quality of service on that machine is a little low. it was sitting on an old laptop that had some power problems. I've since migrated it to a newer machine. the link ought to work now?
    http://dinoch.dyndns.org:7070/WordML/

  • Anonymous
    February 05, 2006
    PingBack from http://www.neirrek.com/blog/2005/05/11/xml-a-la-rescousse-dela-generation-de-documents-microsoft-office-2003/

  • Anonymous
    October 03, 2006
    A while back, the OpenXmlDeveloper.org website offered an example of how to create a WordProcessingML

  • Anonymous
    January 17, 2007
    In the past I've posted some articles [ 1 , 2 ] about generating Office 2003 documents from a server-side

  • Anonymous
    September 09, 2008
    You can use Rtf Writer2 to write rtf and open in Word or OpenOffice (Writer) ...

  • Anonymous
    April 09, 2009
    Hi, I was trying to view and download the example that you said about generating word file in Jsp. Unfortunately the link was not working. Will it possible to email me the example with source code. Thanks in advance, Mathieu

  • Anonymous
    April 20, 2009
    Hi, I am looking for java code/utility to check if a given MS Word document has track changes ON or not.   Any help is appreciated..

  • Anonymous
    May 14, 2009
    Your source code links aint working 15/05/2009

  • Anonymous
    May 29, 2009
    Yes, my server is down and cannot get up!  Sorry!

  • Anonymous
    August 11, 2009
    If you have any example on it please send it to my mail id. Thanks Ghouse

  • Anonymous
    September 26, 2009
    Plsease send me the example source code to zrosko@yahoo.com

  • Anonymous
    October 05, 2009
    Please send the example source code to facp77@live.com.mx

  • Anonymous
    October 10, 2009
    I am doing some similar job and need help. please send the example to wa0805@hotmail.com if u can. thanks

  • Anonymous
    November 27, 2009
    Hi, I am trying to do a similar job, but need help with with tables and images....how to use data in the XML file to populate a word table? and similarly  how to get word to load image from a link... anyone has a working example??

  • Anonymous
    December 23, 2009
    The comment has been removed

  • Anonymous
    January 26, 2010
    Hi, I'm trying to makeout similar one to this. Can you mail your source code to paonethestar@gmail.com to take that as starting point. Thanks, Pavan

  • Anonymous
    March 10, 2010
    It's a shame that the source code is no longer available. Could you please send it to me at raluca.stanculescu@gmail.com

  • Anonymous
    April 27, 2010
    Dino, can you mail me your code? tks

  • Anonymous
    November 30, 2010
    Dino, please send the source code to narayanf1@gmail.com. would be nice if you could share it on some website, since we don't expect you to email whenever someone asks you here :-P

  • Anonymous
    December 09, 2010
    Good solution Dino, Can we convert a .DOT (a word template file) to a .DOC (ms word document file) programmatically by filling some values at given places. Ex: my template would have an Attribute Display Name: Attribute Value, the program should fill these values when I pass some array of values or name, pair, etc... Let me know if such thing is achievable mostly in Java, C or C++. Post your solution to rnvssudheer@hotmail.com. Thanks, Sudheer

  • Anonymous
    April 05, 2012
    The comment has been removed