How to: Identify Text in an HTML String in Visual Basic
This example demonstrates how to use a simple regular expression to remove tags from an HTML document.
HTML tags can be matched with the regular expression \<[^\>]+\>
, which means:
The character "<", followed by
A set of one or more characters, not including the ">" character, followed by
The character ">".
This example uses the shared System.Text.RegularExpressions.Regex.Replace(System.String,System.String,System.String) method to replace all matches of the tag regular expression with the empty string.
''' <summary>Removes the tags from an HTML document.</summary>
''' <param name="htmlText">HTML text to parse.</param>
''' <returns>The text of an HTML document without tags.</returns>
''' <remarks></remarks>
Function GetTextFromHtml(ByVal htmlText As String) As String
Dim output As String = Regex.Replace(htmlText, "\<[^\>]+\>", "")
Return output
End Function
This example requires that you use the Imports statement to import the System.Text.RegularExpressions namespace. For more information, see Imports Statement.
See Also
How to: Identify Hyperlinks in an HTML String in Visual Basic
How to: Strip Invalid Characters from a String