How to: Identify Hyperlinks in an HTML String in Visual Basic
This example demonstrates a simple regular expression for identifying hyperlinks in an HTML document.
Example
This example uses the regular expression <A[^>]*?HREF\s*=\s*"([^"]+)"[^>]*?>([\s\S]*?)<\/A>
, which means:
The string "<A", followed by
The smallest set of zero or more characters that does not include the character ">", followed by
The string "HREF", followed by
Zero or more space characters, followed by
The character "=", followed by
Zero or more space characters, followed by
The quotation-mark character, followed by
The set of characters that does not include the quotation-mark character (captured), followed by
The smallest set of zero or more characters that does not include the character ">", followed by
The character ">", followed by
The smallest set of zero or more characters (captured), followed by
The string "</A>".
The Regex object is initialized with the regular expression, and specified to be case-insensitive.
The Regex object's Matches method returns a MatchCollection object that contains information about all the parts of the input string that the regular expression matches.
''' <summary>Identifies hyperlinks in HTML text.</summary>
''' <param name="htmlText">HTML text to parse.</param>
''' <remarks>This method displays the label and destination for
''' each link in the input text.</remarks>
Sub IdentifyLinks(ByVal htmlText As String)
Dim hrefRegex As New Regex( _
"<A[^>]*?HREF\s*=\s*""([^""]+)""[^>]*?>([\s\S]*?)<\/A>", _
RegexOptions.IgnoreCase)
Dim output As String = ""
For Each m As Match In hrefRegex.Matches(htmlText)
output &= "Link label: " & m.Groups(2).Value & vbCrLf
output &= "Link destination: " & m.Groups(1).Value & vbCrLf
Next
MsgBox(output)
End Sub
This example requires that you use the Imports statement to import the System.Text.RegularExpressions namespace. For more information, see Imports Statement.