The “Joy” of Reg-ex part 3: Select-String
One of the PowerShell tools I’ve been using a lot recently is Select-String : Going through lots of files trying to find a mistake I know is in several of them – can just bang in
select-string -SimpleMatch "Split-path" -Path *.ps1 –list | edit
Finding text in files is not exactly radical. But Select string has two things which allow it to do some really clever stuff. First of all, and no surprise since this is a series on Regular Expressions is it can use a reg-ex instead of a simple match. And I can feel people with unix experience mumbling “hasn’t he heard of Grep”. Where things move up a gear, this being PowerShell is we get objects back – so by way of example I’ve wanted to show people how my PowerShell library for Hyper-V is built up – for example, what WMI objects I use, and select string’s objects allow me to get to that in a handful of commands. First off get the script and strip out lines which are blank or comments. I could just pipe this into next command but for now I’m going to store it in a variable.
$script = get-content .\disk.ps1, .\Helper.ps1, .\menu.ps1 | where {($_ -ne "") -and ($_ -notMatch "^\s*#")}
I actually had a longer list of files, but once everything is in $script I can the then use Select-string. What I want it find is is: the word “Function” followed by at least one space, followed by word characters, then a hyphen, then word characters which come to a word boundary. Again, for ease of reading I’ll put the lines into a variable rather than pipe them.
$Lines = $script | Select-String -Pattern "Function\s+(\w+-\w+)\b"
Select string returns a MatchInfo objects which have contain the file name (if it is reading files), the line number where the match was found, the pattern which triggered the match (we might have specified more than one), and collection of matches. Each match in that collection has (among other useful things) a groups collection. Groups[0] holds the whole match but using brackets allows us to isolate groups inside a bigger expression. The brackets in regular expression I used - "Function\s+(\w+-\w+)\b" – said “and isolate the function name”. So I can refer to each line’s Matches[0].groups[1].value
to get the function name. What I want is new objects with function name an the line number where it was declared. You’ll see in a moment why I also want them sorted.
$fnlines= $Lines | ForEach-object {new-object psobject -property @{ Function =($_.matches[0].groups[1].value); lineNo =($_.lineNumber)}} | sort-object -descending -property lineNo
Which gives me items that look like this.
Function lineNo
-------- ------
Sync-VMClusterConfig 789
Set-VMSerialPort 456
Next I do the same thing for WMI classes – all the classes are strips of text which start either MSVM, WIN32 or CIM, so I can specify those with a single regular expression, and I can pick up the match using matches[0].groups[0]
(I’m looking for the whole match this time, hence group[0] ). This time I want new objects with the WMIClass name and function name where it was found. I know the line where they were found, and that previous set of objects hold the line number where each function was declared , so I need the first of those objects with a line number less than the one where the class was used (which was why I sorted in reverse order before), and to get its function property.
$Lines = $script | select-string -pattern "Msvm_\w+|win32_\w+|CIM_\w+" ForEach ($line in $lines) {new-object psobject -property @{ WmiClass =($line.matches[0].groups[0].value); FunctionName =($fnlines | where {$_.lineNo -lt $line.LineNumber} | | select-object -first 1).Function } }
So now I get items back which look like this
AndFunctionName WmiClass
------------ --------
Add-VMNewHardDisk Msvm_ComputerSystem
Add-VMDisk Msvm_ComputerSystem
This example is a case of telling Select-string “Look one of many possibles, and then show me what matched” . It’s possible to take this further. A lot further. The following line builds a giant reg-ex with every PowerShell cmdlet in it.
Get-command -CommandType cmdlet | foreach-object -Begin {$cmd = ""} `
-process {$cmd += "$_|"}`
-end {$cmd = $cmd -replace "\|$",""}
I can use this to get back lines which contain a cmdlet, and since there might be more than one cmdlet on a line, I use the -allmatches switch to make sure I get all of them
$Lines = $script | select-string -pattern $cmd -AllMatches
So now I have a similar set of lines to what I got before – this time there is more than one match in each, but each match will only have one group – so I can unpack the objects to get cmdlet names and group the results to see which ones get used most like this
$lines | foreach-Object {$_.matches | foreach-object {$_.groups[0].value }} |
group-object -NoElement | sort-object -property count -desc
Now this is something I just couldn’t have done in any other language I have worked in. (We did SNOBOL at University and if I had every got to grips with it … well maybe). The combination of objects, which give back something way more empowering than plain text and built in cmdlets which will do so much of the work for us with the Power of regular expressions is amazing. And for those who like to play “PowerShell Golf” – that is doing the job with the fewest strokes you can get it down to one (wrapped) line.
gc .\disk.ps1, .\Helper.ps1 | ?{($_ -ne "") -and ($_ -notMatch "^\s*#")} |
select-string -pattern $cmd -All | %{$_.matches | %{$_.groups[0]$_.value}} |
group -No | sort count -Desc