How to: Identify Text in an HTML String in Visual Basic
This example demonstrates how to use a simple regular expression to remove tags from an HTML document.
Example
HTML tags can be matched with the regular expression \<[^\>]+\>, which means:
The character "<", followed by
A set of one or more characters, not including the ">" character, followed by
The character ">".
This example uses the shared Regex.Replace method to replace all matches of the tag regular expression with the empty string.
''' <summary>Removes the tags from an HTML document.</summary>
''' <param name="htmlText">HTML text to parse.</param>
''' <returns>The text of an HTML document without tags.</returns>
''' <remarks></remarks>
Function GetTextFromHtml(ByVal htmlText As String) As String
Dim output As String = Regex.Replace(htmlText, "\<[^\>]+\>", "")
Return output
End Function
This example requires that you use the Imports statement to import the System.Text.RegularExpressions namespace. For more information, see Imports Statement (.NET Namespace and Type).
See Also
Tasks
How to: Identify Hyperlinks in an HTML String in Visual Basic
How to: Strip Invalid Characters from a String