แชร์ผ่าน


sed in PowerShell

One of the things that had been annoying me in PowerShell is the lack of sed, the Streaming text EDitor. So I've made my own.

Like other things in PowerShell, it's not the same, it's much more verbose in usage than the original sed, so that it can't really be used as an one-liner. On the other hand, it allows a lot more flexible scripting. My immediate need was to extract the fragments of the bcdedit output, so let me give you an example of this usage.The output of "bcdedit /v" looks like this:

Windows Boot Manager
--------------------
identifier              {9dea862c-5cdd-4e70-acc1-f32b344d4795}
device                  partition=R:
description             Windows Boot Manager
locale                  en-US
inherit                 {7ea2e1ac-2e61-4728-aaa3-896d9d0a9f0e}
... more stuff ...

Windows Boot Loader
-------------------
identifier              {ea740bc9-53ac-11e3-8c04-7446a0a13fa8}
device                  partition=C:
path                    \windows\system32\winload.exe
description             Windows 8.1
locale                  en-US
inherit                 {6efb52bf-1766-41db-a6b3-0ee5eff72bd7}
... more stuff ...

I wanted to get the identifier of the Windows Boot Loader section (there can be multiple of them but more on that later). So I want to find the lines between "Windows Boot Loader" and the empty line, and in them find the line "identifier" and get the value out of it. It can be done like this:

bcdedit /v | xsed -Select {
    if ($_ -match "^Windows Boot Loader") { Skip-TextSelect }
},{
    if ($_ -match "^identifier ") {Enable-OneLine; Skip-TextSelect}
    elseif ($_ -match "^$") {Skip-TextSelect}
} | % { $_ -replace "^.*({[^ ]+}).*",'$1' }

Even though it's long and ugly, it's an approximate analog of two real seds combined:

sed -n '/^Windows Boot Loader/,/^$/p' | sed -n '/^identifier/s/^.*({[^ ]+}).*/$1/p'

Only it's a bit better than that because the particular classic sed example would not handle the bcdedit output above right, because when the "WindowsBootLoader" section goes last, it doesn't have an empty line after it, so that pattern won't work, and good luck making up a pattern that can handle both an empty line and end of text at  the end.

The kind-of-sed-like program for xsed is represented as an array of script blocks in the parameter -Select. Each line is fed to the first script block which decides what to do with it. The current line is passed to it as $_, and the actions are expressed as the small helper commands. The command Skip-TextSelect says to switch over to the next script block for processing the following lines. If there are no more script blocks, the rest of the lines are processed according to the last setting, either passing them through or throwing them away (the default initial mode is throwing away, like sed -n, and can be changed with the switch -Enabled). The command Enable-OneLine is like "p" in sed, passing the current line through.

Or here is another way to do the same, using the buffers, which are an extension of xsed:

bcdedit /v  | xsed -Select {
    if ($_ -match "^Windows Boot Loader") { Clear-TextBuffer -Enable }
    elseif ($_ -match "^$") { Paste-TextBuffer; Disable-TextBuffer }
    elseif ($_ -match "^identifier ") { Add-TextBuffer ($_ -replace "^identifier *{(.*)}",'$1') }
}

The idea of buffers is the generalization of the logic above: when a number of lines need to be selected from the input and then either sent to the output or not depending on some values in the middle of the buffer. Thus at least the first lines of the buffer need to be collected before the decision can be made whether they are to be printed or not. That's what the buffers do: they collect the lines and remember if the buffer is enabled or disabled. When a buffer is pasted, it gets actually sent only if it has been enabled. Here the lines with identifiers have the values extracted and placed into the buffer. At the end of each section the buffer gets pasted (it's also automatically pasted at the end of input) and disabled. When the Boot Loader section is met, it enables the buffer, so at its end the paste will actually work.

This version can select the identifiers of all the Boot Loader sections, even if there are multiple of them. The first example can be adapted to select identifiers from multiple sections too:

bcdedit /v | xsed -Select "START",{
    if ($_ -match "^Windows Boot Loader") { Skip-TextSelect }
},{
    if ($_ -match "^identifier ") {Enable-OneLine}
    elseif ($_ -match "^$") {Skip-TextSelect "START"}
} | % { $_ -replace "^.*({[^ ]+}).*",'$1' }

This is done by adding a label "START" (-Select is actually a mix of the script blocks and test labels) and going back to it at the end of the section, thus looping the logic.

Of course, you can combine both approaches. For example, if you want to find the identifiers of the Boot Loader sections whose description matches a pattern, you can do it like this:

$pattern ="My Windows"
bcdedit /v | Edit-Text -Select "START",{
if ($_ -match "^Windows Boot Loader") {Clear-TextBuffer;Skip-TextSelect}
},{
if ($_ -match "^identifier ") {Add-TextBuffer $_}
elseif ($_ -match "^description ") {
$v = $_ -replace "^description *",""
if ($v -match $pattern) {Enable-TextBuffer}
} elseif ($_ -match "^$") {Paste-TextBuffer;Skip-TextSelect "START" }
} | % { $_ -replace "^.*({[^ ]+}).*",'$1' }

The sections of the Windows Boot Loader type are selected by building a state machine, the identifiers from them are buffered, and then the correct description enables the output of the current buffer. Note that the description may go before or after the identifier, the order doesn't matter, the identifier is added to the buffer in any case, and then the command Paste-TextBuffer is executed at the end of the section and actually pastes the buffer only if it has been enabled by seeing a matching description.

This example also shows that the script blocks can see the variables, $pattern here, from the caller's context. However the script blocks can't change the values of these variables, and that's what I've meant by saying that they're not quite closures (or maybe they can be called closures in the Haskell sense but not in the Lisp sense). If you need to change them, the only way I know of is to use the script-level variables. Which might be inconvenient. So xsed helps by providing an implicit hashtable variable $_v that can be used to keep the values throughout the running of one edit. Here is an example of how it can be used if the script needs to find the identifier of a section that has both the matching description and the matching device:

$desc_pattern ="My Windows"
$dev_pattern ="^vhd="
bcdedit /v | Edit-Text -Select "START",{
    if ($_ -match "^Windows Boot Loader") {$_v.Clear();Clear-TextBuffer;Skip-TextSelect}
},{
    if ($_ -match "^identifier ") {Add-TextBuffer $_}
    elseif ($_ -match "^description ") {
        $v = $_ -replace "^description *",""
        if ($v -match $desc_pattern) {$_v.desc = $true}
    } elseif ($_ -match "^device ") {
        $v = $_ -replace "^device *",""
        if ($v -match $dev_pattern) {$_v.dev = $true}
    } elseif ($_ -match "^$") {
        if ($_v["dev"] -and $_v["desc"]) { Enable-TextBuffer }
        Paste-TextBuffer
        Skip-TextSelect "START"
    }
} -SelectEnd {
    if ($_v.dev -and $_v.desc) { Enable-TextBuffer }
} | % { $_ -replace "^.*({[^ ]+}).*",'$1' }

This example also shows the parameter -SelectEnd that executes a script block at the end of the text. It's needed here to decide whether the last buffer needs to be enabled before it gets implicitly pasted at the end of editing.

There is one more variable passed to the script blocks, $_lineno, that contains the number of the current line and can be used to simulate the sed patterns like "5,$d".

And here is the list of all the helper functions, copy-pasted from the xsed help:

##   Enable-FromThisLine - start passing the lines from this one,
##     and switch to the next script block for the following lines
##   Enable-FromNextLine - start passing the lines from next one,
##     and switch to the next script block for the following lines
##   Disable-FromThisLine - stop passing the lines from this one,
##     and switch to the next script block for the following lines
##   Disable-FromNextLine - stop passing the lines from next one,
##     and switch to the next script block for the following lines
##   Skip-TextSelect [label] - keep whatever choice for the current line
##     and switch to the next script block for the following lines;
##     if the label argument is used, then the script block at that
##     label is used instead of the next one
##   Reparse-TextSelect [label] - switch to the next script block immediately
##     and re-test the current line with it; when combined with
##     En/Disable-* commandlets, it forces the immediate reparsing
##     with the next block, without waiting for the next line;
##     if the label argument is used, then the script block at that
##     label is used instead of the next one
##   Enable-OneLine - pass the current line without changing the general mode
##   Disable-OneLine - skip the current line without changing the general mode
##   Set-OneLine <value> - replace the contents of the current line and enable it
##   Add-BeforeThisLine lines - insert lines before the current one
##   Add-AfterThisLine lines - insert lines after the current one
##     (they won't participate in the Reparse)
##   Set-MultiLine lines - replace the current line with multiple lines
##     (this translates to inserting the argument lines before the
##     current one and disabling the current one line)
##   Add-TextBuffer [switches] - add the current line (or an explicit text)
##     to the holding buffer. The buffer is a convenient concept if a bunch
##     of lines need to be enabled or disabled together based on the value
##     of one of them.
##   Clear-TextBuffer [switches] - clear the current contents of the
##     holding buffer and revert its enablement state
##   Enable-TextBuffer - mark the buffer as enabled, thus allowing to paste it
##   Disable-TextBuffer - mark the buffer as disabled, thus disallowing to paste it
##   Paste-TextBuffer [switches] - insert the contents of the holding buffer
##     after the current line if it was enabled and clear the buffer
##   Get-TextBuffer - get the current contents of the holding buffer
##   Get-TextBufferEnabled - check the current enablement of the holding buffer

The module itself is attached. I kind of wanted to show how its logic is so small and simple but then it got bloated with the added features, and this post also got bloated with all the examples. So I'll just attach it and say that I've found out about the InvokeWithContext when working on xsed.

xsed.psm1

See Also: all the text tools