[PowerShell] Web ページ上のリンク URL を抜き出す
11?30???? PowerShell ?????????????????????????????Web ???????? URL ?????????????
?? PDC (Professional Developers Conference) 2009 ?????????????????????????????????????????? https://microsoftpdc.com/Videos ??????????????????????????????????????????????????????????????????????????????????????????????????PowerShell ???????????????
??????????????????????????
?????????????? HTTP ??????????????? HTML ?????????????? CL05 ?????
- <tr class="">
- <td>CL05</td>
- <td>
- <a href="/Sessions/CL05" alt="">Embodiment: The Third Great Wave of Computing Applications</a>
- <br />
- <span class="speakers"><em>Butler Lampson</em></span>
- </td>
- <td><a href="https://ecn.channel9.msdn.com/o9/pdc09/wmv/CL05.wmv" alt="WMV">WMV</a></td>
- <td><a href="https://ecn.channel9.msdn.com/o9/pdc09/wmvhigh/CL05.wmv" alt="WMVHigh">WMVHigh</a></td>
- <td><a href="https://ecn.channel9.msdn.com/o9/pdc09/mp4/CL05.mp4" alt="MP4">MP4</a></td>
- <td><a href="https://ecn.channel9.msdn.com/o9/pdc09/ppt/CL05.pptx" alt="Slides">Slides</a></td>
- </tr>
????????a ??? href ?????????????????????????????? ?? PowerShell ??????????????????HTML ????????????7???????????
- $w = new-object system.net.webclient
- $enc = [Text.Encoding]::GetEncoding("utf-8")
- $url = "https://microsoftpdc.com/Videos"
- $h = $enc.GetString($w.DownloadData($url))
- $regex = "<\s*a\s*[^>]*?href\s*=\s*[`"']*([^`"'>]+)[^>]*?>"
- $m = $h | select-string -pattern $regex -AllMatches
- $m.matches | %{$_.groups[1].value} | select-string "wmvhigh"
????????
https://ecn.channel9.msdn.com/o9/pdc09/wmvhigh/CL05.wmv
https://ecn.channel9.msdn.com/o9/pdc09/wmvhigh/CL06.wmv
https://ecn.channel9.msdn.com/o9/pdc09/wmvhigh/CL07.wmv
???????????????????????????????????????????????????????????????????????????????????????????(???????????????)?????????????????????????
???1??? .NET Framework ? WebClient ??????????????????????????????????????????????????HTTP ??????????????
2?????HTTP ???????????????????????????????(???GetEncoding ???????????new-object ????????????????????)
3????????????? URL ?????????4????????????????????????????? HTML ?????????????????????? $h ????????????????????????????1???????? $h ??????????????????$h –join “” ??????1???????????
6????????????????? <a href=”…”> ????????????? … ?????????????????????????????????????????????????
Precision Computing: Unit Testing in PowerShell – a Link Parser
https://www.leeholmes.com/blog/UnitTestingInPowerShellALinkParser.aspx
7??? $h ??? HTML ???????????$regex ??????????????????????select-string ???????? –AllMatches ??????????????????????????????????????????????????????????????
$m ?????????????????????Get-Member ? $m ????????Microsoft.PowerShell.Commands.MatchInfo ?????????????????????????????????????????????? Matches ???????????????????15???????????????
PSH> ($m.Matches)[15]
Groups : {<a href="https://ecn.channel9.msdn.com/o9/pdc09/
wmvhigh/CL05.wmv" alt="WMVHigh">, https://ecn.cha
nnel9.msdn.com/o9/pdc09/wmvhigh/CL05.wmv}
Success : True
Captures : {<a href="https://ecn.channel9.msdn.com/o9/pdc09/
wmvhigh/CL05.wmv" alt="WMVHigh">}
Index : 6057
Length : 79
Value : <a href="https://ecn.channel9.msdn.com/o9/pdc09/w
mvhigh/CL05.wmv" alt="WMVHigh">
????????????? Groups ??????2?????? (????) ??
($m.Matches)[15].Groups[1].Value
???????????
8??????????????? Groups[1].Value ?????????????????????????????????????? select-string ????? “wmvhigh” ??????????? URL ??????????select-string ??????????????????????????????????????????????
???????????????? PowerShell ????.NET Framework ?????????????????????????????????????????????