Share via


how to pretty-print XML in PowerShell, and text pipelines

When I've needed to format an XML document nicely in PowerShell for the first time, I was pretty new to PowerShell. Doing it directly dind't go too well but then I've found somewhere on the Internet an example that had shown me a different side of PowerShell, it really drove home the point that PowerShell is a shell to the .NET virtual machine. Here is my version of that example, with a bunch of niceties included:

 function Format-Xml {
<#
.SYNOPSIS
Format the incoming object as the text of an XML document.
#>
    param(
        ## Text of an XML document.
        [Parameter(ValueFromPipeline = $true)]
        [string[]]$Text
    )

    begin {
        $data = New-Object System.Collections.ArrayList
    }
    process {
        [void] $data.Add($Text -join "`n")
    }
    end {
        $doc=New-Object System.Xml.XmlDataDocument
        $doc.LoadXml($data -join "`n")
        $sw=New-Object System.Io.Stringwriter
        $writer=New-Object System.Xml.XmlTextWriter($sw)
        $writer.Formatting = [System.Xml.Formatting]::Indented
        $doc.WriteContentTo($writer)
        $sw.ToString()
    }
}
Export-ModuleMember -Function Format-Xml

Aside from the formatting itself, it shows how to handle the input pipelines. You can use it either way:

 Format-Xml (Get-Content c:\work\AzureNano\xml\state.xml)
Get-Content c:\work\AzureNano\xml\state.xml | Format-Xml

But as you can see from the source code, the handling of the input is a bit convoluted. This is because of the disconnect between Get-Content reading every line as a separate result object and the pipeline handling assuming that the function must act separately on each incoming object. Well, they do connect if you want the function to process the input data line-by-line but not if you want to process the whole input as a complete text. Or you could potentially use

 Get-Content -ReadCount 0

to get the whole text in one chunk but this option is not exactly mnemonic and I forget it.

Otherwise if you want a complete text, first you have to collect the whole text. And the PowerShell arrays are not a good data structure to collect the whole text from pieces, because they are immutable. Instead you have to go again to the raw .NET classes, create a mutable ArrayList, and collect your data there. But then see how nicely you can use the PowerShell operator -join on that ArrayList, list like on a PowerShell array, because this operator uses the common interface implemented by both classes.

Comments