Delen via


Powershell and Office Open XML Format Document Generation Sample

This is the Powershell sample I demoed at the Melbourne Victoria.NET User Group event in June. It's essentially a document generation sample and yup it really works and pretty darn quick.  On a Dual Core 1.8gig machine, this Powershell script was generating 60 to 70 Office Open XML Documents a second and it was only running on one thread/core... 

And yes you can doc generate Office Open XML documents on the server without requiring Office/Word to be installed and then run the WordML docs through something like https://www.renderx.com to generate PostScript for cheaper offsite printing, PDFs, XSLFOs etc..., all very cool and accessible!!

You'll find the sample at https://projectdistributor.net/Releases/Release.aspx?releaseId=418

Why Powershell and Office XML Format Doc Generation

  1. I wanted to play around with ideas for document generation that also had the flexibility of a script engine drive the process, potential ease of management etc...
  2. More likely scenario is that this script would be an extensibility point for an overall solution, not the solution itself...  Check out the Windows PowerShell SDK for information on invoking Powershell scripts from an application.
  3. I wanted to learn how to load .NET assemblies in to Powershell, flow control etc...
  4. Stuff that is missing - lol, ok, this is a sample, it's missing Powershell Exception Handling
  5. I reckon this sample could go considerably faster, say 20 to 30%, I reload and crack open the template document for every customer record I process, it would make sense to load the template only once at initialisation, but for now I'm not going to optimise...

Sample Contents

The sample zip file at https://projectdistributor.net/Releases/Release.aspx?releaseId=418 contains the following files:-

  1. An Office Open XML Format document as a template, marked up with text controls and bound to XML Custom parts using the Word 2007 Content Control Toolkit.
    1. Check out Matthew Scott: Application Development using the Open XML File Formats for more info on using the Word 2007 Content Control Toolkit.
  2. A Customer.xml document, which contains stuff like first and last names, amount due, payment date etc, this could easy be data from a database.
  3. StandardText.xml - which has standard parts for a letter, terms and conditions, company name etc
  4. The C# version of the same scenario
  5. by default this samples runs out of the d:\dgDemo directory, tweak the script as required...

This is the general flow for the Powershell script...

  1. Initialise global variables: General initialization of globals happens at
  2. Initialise(): Load the StdTexts.xml in to a Hash table and load the customers.xml in to a DOM
  3. Main: Initiate the main customer processing loop
  4. ProcessDocs([int]$docNumber): Create a copy of the XML Doc and load the doc in to a DOM using System.IO.Packaging.Package and find the main document part in the XML Word doc
  5. GetDocumentPart($OfficeDocRel): From the document part, go find all the XML Custom Parts
  6. GetCustomXmlPart($customXmlRel): Now you've got a reference to the XML Custom Parts, loop through each element, use the Element name to look up the hash table for standard letter parts, if not in the hash table then check out the customer.xml dom for matching element/attribute name, if it exists insert the data in to the document

For more information on Powershell then check out

  1. https://www.microsoft.com/technet/scriptcenter/hubs/msh.mspx

  2. https://blogs.msdn.com/powershell

 

Cheers and enjoy Dave

BTW, if you think you'll use it or some derivation then let me know via my blog, just interested:)

 

=================================================================================================

 

 # $psAppDomain = [AppDomain]::CurrentDomain
# $psAppDomain.GetAssemblies()


d:
cd \dgDemo

## Load additional required assemblies ##
$dummy = [System.Reflection.Assembly]::Load("Windowsbase, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35")

## Initialise global variables ##
$documentType = "https://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument"
$customXmlType = "https://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml"
[uri]$uri = $null; $dummy = [Uri]::TryCreate("/", [UriKind]::Relative, [ref]$uri)
$doc = new-object -type system.xml.xmldocument
$custRecord = $null
[string]$text = ""
[hashtable]$stdTexts = @{}

function GetCustomXmlPart($customXmlRel)
{
    #Get CustomXML Part
    $docPart = $Package.GetPart([System.IO.Packaging.PackUriHelper]::ResolvePartUri($uri, $customXmlRel.TargetUri))
    $doc.load($docPart.getstream())
    
    #This demo looks one level below the document element for element names to match either customer data or std texts
    $docElementName = $doc.get_documentelement().get_name()    

    #loop through all the child elements in the customXML doc part looking for matches
    $doc.$docElementName.get_childnodes() | 
    &{process{
        $docElementName = $_.get_name()
        if ($stdTexts.Containskey($docElementName)) {$text = $stdtexts[$docElementName]} 
        else {$text = $custRecord.$docElementName}
        $_.Set_InnerText($text) }
    }

    #Save the data inserted in to the customXML doc part
    $doc.save($docpart.getstream())
}

function GetDocumentPart($OfficeDocRel)
{
    #get ref to word document part find all relationships that are of type CustomXML
    $documentPart = $Package.GetPart([System.IO.Packaging.PackUriHelper]::ResolvePartUri($uri, $OfficeDocRel.TargetUri))
    foreach ($customXmlRel in $documentPart.GetRelationshipsByType($customXmlType)) { GetCustomXmlPart($customXmlRel) }
}

function ProcessDocs([int]$docNumber) 
{
    #create new doc from the template
    $newFile = (get-location).path + "\processedDocs\LatePayment" + $docNumber + ".docx"
    $sourceFile = (get-location).path + "\Template.docx"
    copy -path $sourceFile -destination $newFile

    #loop through the root relationships in the doc looking for word document part - there will only be one
    $Package = [System.IO.Packaging.Package]::Open($newFile)
    foreach ($rel in $Package.GetRelationshipsByType($documentType)) { GetDocumentPart($rel) }
    $Package.Close()
}

function Initialise()
{    
    
    if(test-path -path processedDocs) {remove-item -path Processeddocs -recurse}
    $dummy = mkdir ProcessedDocs

    #load customers.xml in to a global cust oject
    [xml]$global:cust = get-content .\customers.xml
    #load standard letter parts from standardtexts.xml in to a hash table
    [xml]$stdTextsXml = get-content standardtext.xml
    $stdTextsXml.texts.text | &{process{$stdTexts[$_.key] = $_.value}}
}

Initialise
## Main ##
$start = [datetime]::now

#loop through all customer childnodes in the customers.xml doc
$cust.Customers.Customer | &{begin{$records=0} process{$records++; $custRecord = $_; ProcessDocs($records)} end{$global:totRecords = $records} }

$end = [datetime]::now
"============================================================"
"Total Documents Generated: " + $totRecords
"Total processing time (seconds): " + ($end.subtract($start)).totalseconds
"Office Open XML documents generated/sec: " + $totRecords / ($end.subtract($start)).totalseconds
"============================================================"



Comments