다음을 통해 공유


Word XSLT: Data Only Transform

If you've played around with Word 2003's XML support, you're probably aware that you can load your own schemas into Word and markup the document with your XML. When you save the file out, you get both the WordML and your XML mixed together. This allows you to search the files for your XML while still maintaining the presentation information. You also have the option to save as Data Only, so the result is just you're pure XML.

Often, it's best to store the files with both your XML and the WordML, so that all the presentation information is preserved. You can always tranform the file later to remove the WordML if at some point you want to work just with your data. Here's a simple transform you can run on a WordML file that will give you the equivalent of a data only save:

https://jonesxml.com/resources/basicDataOnly.zip

It's a really simple transform. There are some additional pieces of functionality people have requested from our Data Only save such as line breaks for paragraphs, and pretty printing. I'll look at getting some of that added to the transform at some point and send out another update.

-Brian

Comments

  • Anonymous
    July 08, 2005
    Just discovered your blog. Keep the great content coming!

    I wrote a similar stylesheet for "Office 2003 XML". It's included in the book example files: http://examples.oreilly.com/officexml/

    It's the saveDataOnly.xsl file (in the chapter 4 examples), and it includes some other features like a configurable option to ignore mixed content, and the reconstruction of processing instructions stored in custom document properties.

    It had a dual purpose of demystifying the "Save Data Only" behavior (at least for people familiar with XSLT) and of being used in the "Apply Transform" option when saving the main example of the chapter. The biggest advantage as I recall was in giving the developer the choice as to what mixed content to keep and what to throw away, rather than having to make an all-or-nothing decision.

    Anyway, I thought you'd find it interesting to compare with.

    Evan

  • Anonymous
    July 09, 2005
    That's great Evan. I'll have to look through those other examples too. Thanks for the post!

  • Anonymous
    July 11, 2005
    <p>I would really appreciate a no-holds barred response to the design goals behind the article “<a href="http://songhaysystem.com/document.php?cmd=getDoc&amp;get=24" shape="rect">XHTML Schemas in Word 2003 Documents</a>.” Your post implies that formatting is <em>only</em> preserved by WordML and that any user-defined schemas loaded into a Word document are for data only. Are we confounding the designers, the application architects, of Word 2003 when we decide to load a formatting schema like XHTML into a Word document? Would you openly discourage such a move? Do you, in the very least, find it redundant and therefore useless?</p><p>Please do not be kind to our Mort and be frank in your reply.</p>

  • Anonymous
    July 11, 2005
    The comment has been removed

  • Anonymous
    July 11, 2005
    Thanks, Evan. I've read some of the sample chapter from Office 2003 XML. This is a start:

    <?xml version="1.0" encoding="UTF-8" ?>
    <xsl:stylesheet version="1.0"
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:o="urn:schemas-microsoft-com:office:office"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
    >

    <xsl:template match="w:wordDocument">
    <html>
    <head>
    <title><xsl:value-of select="o:DocumentProperties/o:Title" /></title>
    </head>
    <body>
    <xsl:apply-templates select="w:body" />
    </body>
    </html>
    </xsl:template>

    <xsl:template match="w:body">
    <xsl:apply-templates select="w:p" />
    </xsl:template>

    <xsl:template match="w:p">
    <p><xsl:apply-templates select="w:r | w:hlink" /></p>
    </xsl:template>

    <xsl:template match="w:r">
    <xsl:choose>
    <xsl:when test="w:rPr/w:i"><em><xsl:value-of select="w:t" /></em></xsl:when>
    <xsl:otherwise><xsl:value-of select="w:t" /></xsl:otherwise>
    </xsl:choose>
    </xsl:template>

    <xsl:template match="w:hlink">
    <a href="{@w:dest}"><xsl:apply-templates select="w:r" /></a>
    </xsl:template>

    </xsl:stylesheet>

  • Anonymous
    July 11, 2005
    That's a great start Bryan. I was going to post something similar, but hadn't got around to it yet.
    In answer to your earlier question, I definitely do not recommend loading the XHTML schema into Word and marking up the document, that's duplicating too much information.
    We designed the XML support so that you could leverage both WordML and your XML together. If there are features such as formatting, lists, and tables that Word already supports, then you don't need to mark that up. Instead you can just take the subset of your schema that isn't already represented by Word functionality, and only mark up with that.
    Then you can just transform on the way out into your schema. At one point I had an example of doing this for DocBook, but I can't seem to find it anywhere. I'll post it if I ever dig it up.
    -Brian

  • Anonymous
    July 21, 2005
    I think the xslt does not work on complicated files. i have tried using a form for car mileage claimts and apply the data only xslt given but there is no output after I ticked on the transform and the dataonly button on saving he document. Can some one look at my doc file to see any problem with it. Please email me at red131@gmail.com

  • Anonymous
    August 12, 2005
    The comment has been removed

  • Anonymous
    August 16, 2005
    The comment has been removed

  • Anonymous
    August 24, 2005
    Thanks Brian, your response is promising. The Word / XML story is becoming a bit clearer and I look forward to hearing more on the list issue, and more in general about Office and XML.

    I have another post on this at my site:

    http://ptsefton.com/blog/2005/08/25/word_xml_clarified_a_bit

  • Anonymous
    September 21, 2005
    The comment has been removed

  • Anonymous
    September 21, 2005
    Hey Barry, the Word 2003 XML support was not targeting the same scenarios that XML editors like Epic and XMetal are going after.
    The key scenarios were really around taking existing Word based solutions and leveraging the XML to add additional structure. With that added structure you can more easily pull out information from the file as well as program against your XML structures rather than just the Word structures.

    We've done a lot of work in Word "12" to make it even easier to mark up a document with your data, but still keep your data seperate from the presentation. The main scenario for the 12 work is to create mappings from the surface of the document into specific XML nodes in your Data structures.

    I'll try to provide more information on this, as we've already talked a bit about it at PDC last week. I'll also try to provide more info around the XML support in 2003 and see if I can help you better understand the functionality that is there, and that isn't.
    The main way we've recommended you enable the end user to insert the XML structures into the files is by creating boilerplate document chunks that are pre-structured and that your solution allows them to easily insert in the correct locations.

    -Brian

  • Anonymous
    October 16, 2005
    If you're planning on using the XML saved by Word in any other application, you're in for a wild (probably read frustrating) ride.

    I'm looking at the issue of producing XHTML from Word's XML output - there have been some really strange decisions made, in terms of how the documents structure is represented.

    Take tables for example - a cell that spans rows (two or more vertical cells, merged) is marked by a self-closing element. At some point, later in the document, another self-closing element appears to end the merge.

    Compare this to the rowspan attribute in XHTML - one attribute tells you in advance how many rows/cells will be affected.

    As a programmer I can't imagine what possible advantage these obscure decisions can have. Bizarre...

  • Anonymous
    January 22, 2006

    I have a WordML doc with a custom schema attached and would like to extract the data on a server form this custom schema. The bad thing is, that the nodes of my custom schema are mixed with the nodes from WordML schema and extraction is not easy and not reliable.

    I would like to do this with .net-code on a BizTalk or SharePoint server. Do you have an good idea how to achieve this? Or some links about this topic?

    Bye Marcel


    <ns1:Ship_Name>

    <w:proofErr w:type="spellStart" />

    <w:p>

    <w:pPr>

    <w:ind w:left="720" />

    </w:pPr>

    <w:r>

    <w:t>Alfreds</w:t>

    </w:r>

    <w:proofErr w:type="spellEnd" />

    <w:r>

    <w:t></w:t>

    </w:r>

    <w:proofErr w:type="spellStart" />

    <w:r>

    <w:t>Futterkiste</w:t>

    </w:r>

    <w:proofErr w:type="spellEnd" />

    <w:r>

    <w:t>. </w:t>

    </w:r>

    </w:p>

    </ns1:Ship_Name>





    I would like to get as a return just

    <ns1:Ship_Name>Alfreds Futterkiste</ns1:Ship_Name>





    -------------------------------

    www.gnoth.net


  • Anonymous
    January 26, 2006
    Marcel, did you try applying the transform that I linked to? You should be able to apply the transform and it would give you what you're looking for...

    -Brian

  • Anonymous
    February 05, 2006
    This transform is good for extracting data from WordML document with custom XML schema.
    But how can I generate Word XML document dynamically from a custom application to retain all Word formatting and to conform to my own XML custom schema?

  • Anonymous
    February 13, 2006
    Dinko,
    Could you describe your problem a bit more? What is the custom application? Is it consuming a Word document or just generating one? If it is only generating one, then what do you mean by retaining all Word formatting?
    I'm sorry, but I'm having a hard time understanding your question.

    -Brian

  • Anonymous
    February 13, 2006
    Hello Brian,
    We have a Captaris Workflow and a .NET written model for the workflow that needs to generate Word documents. Documents have a lot of fixed formatted text and about a dozen fields that are to be filled automatically during the workflow process.
    So far I have managed to create an XSD schema that contains my custom properties definitions and placed the schema fields in the right places in the document.
    Using WML2XSLT.EXE tool I have created XSLT transform.
    When I save the document with XML data only option I get not only my custom fields in the document but all the fixed text that is contained inside the main XML element.
    When I apply XSLT to this XML I get the document formatted fine but the problem is that I want only the data in the custom field to be contained in the XML data only document.
    This way the end user can change the formatting if likes to and generate new XSLT as long as he doesn't tamper any XML tags in the document.

    I read about this procedure in Word 2003 SDK and went through Memo Styles Sample but the sample is to simple :(

    (http://www.microsoft.com/downloads/details.aspx?FamilyId=4267E2FF-58C0-49DD-BB2A-02C729C68DD0&displaylang=en)

    I hope I made it clear this time :)

    Dinko

  • Anonymous
    February 14, 2006
    Hey Dinko, I think it's a bit more clear. Could you maybe provide a quick example of what your data looks like, and what the Word document looks like? Is it something like this:

    <dinko>
     <field1>Complete</field1>
     <field2>Submitted</field2>
    </dinko>

    and the result WordprocessingML looks like this (In shorthand):

    <w:body>
     <w:p>
       <w:t>Status: </t>
       <field1><w:t>Complete</w:t></field1>
     </w:p>
     <w:p>
       <w:t>Approval: </t>
       <field2><w:t>Submitted</w:t></field2>
     </w:p>
    </w:body>

    What I'm trying to understand is where the rich formatting is coming into play. Are the users formatting the values of one of your nodes, or is it somewhere else in the file that you don't care about? When you save, do you want to through away just the formatting? Or do you want to throw away the data that they've edited too?

    -Brian

  • Anonymous
    February 20, 2006
    Hi,
    The problem is that I have something like this when I save XML data only:

    <dinko>
     Document title
     some formatted text, document body that never changes
    <field1>Complete</field1>
     Text text text text text text etc   <field2>Submitted</field2>
    </dinko>

    I want to generate XML document like this:

    <dinko>
    <field1>Complete</field1>
    <field2>Submitted</field2>
    </dinko>

    and apply XSLT that contains all the formatting and text. When the end user wants to change something in the template, he would have to edit it in Word, save it as XML, use WML2XSLT.EXE to generate XSLT and the workflow application would againg just create the same XML:

    <dinko>
    <field1>Complete</field1>
    <field2>Submitted</field2>
    </dinko>
    .

    Or in simpler words, I want my end users to be able to change fonts, looks and appereance of a document as long as they leave all the necessary XML tags inside the document.
    My problem is that main element <dinko> wraps all the text in the document and I don't wan't to do that.


  • Anonymous
    February 20, 2006
    Have you tried to use the option "ignore mixed content"?
    It's one of the XML options you can set, and it will basically treat all mixed content as presentation text, and only preserve the content that is in leaf nodes.

    Let me know if that is the type of functionality you are looking for.

    -Brian

  • Anonymous
    February 21, 2006
    This was to easy :)
    It works just the way I need it to.

    Thank you for your time!

  • Anonymous
    May 31, 2006
    Hello Brian,

    My problem is much similar to what Dinko explained in the previous posts. Except that I want to know how I can programmatically load the DataOnly xml(whose values change) and apply the same XSLT (generated by WML2XSLT) to get the word 2003 document.

    Thanks,
    Kris

  • Anonymous
    August 01, 2006
    Using the data only xslt given above, my elements are filled with the namespaces i.e.

    <DISA_Header xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:wsp="http://schemas.microsoft.com/office/word/2003/wordml/sp2" xmlns:ns6="urn:schema4">DEFENSE INFORMATION SYSTEMS AGENCY PACIFIC (DISA-PAC)THEATER NETOPS CENTER (TNC) NETDEFENSE (ND) PACIFIC</DISA_Header>

    I was wanting to remove the namespaces from the element tag...how could I do that?
    Thanks,
    Denise

  • Anonymous
    August 02, 2006
    The comment has been removed

  • Anonymous
    August 04, 2006
    The comment has been removed

  • Anonymous
    April 23, 2008
    PingBack from http://ptsefton.com/2008/04/24/some-comments-on-the-nlm-xml-plugin-for-word-2007.htm

  • Anonymous
    January 07, 2009
    PingBack from http://hz-web.cn/archives/some-comments-on-the-nlm-xml-plugin-for-word-2007

  • Anonymous
    June 09, 2009
    PingBack from http://cellulitecreamsite.info/story.php?id=4281