Share via


How to: Identify Text in an HTML String in Visual BasicĀ 

This example demonstrates how to use a simple regular expression to remove tags from an HTML document.

Example

HTML tags can be matched with the regular expression \<[^\>]+\>, which means:

  1. The character "<", followed by

  2. A set of one or more characters, not including the ">" character, followed by

  3. The character ">".

This example uses the shared System.Text.RegularExpressions.Regex.Replace(System.String,System.String,System.String) method to replace all matches of the tag regular expression with the empty string.

    ''' <summary>Removes the tags from an HTML document.</summary>
    ''' <param name="htmlText">HTML text to parse.</param>
    ''' <returns>The text of an HTML document without tags.</returns>
    ''' <remarks></remarks>
    Function GetTextFromHtml(ByVal htmlText As String) As String
        Dim output As String = Regex.Replace(htmlText, "\<[^\>]+\>", "")
        Return output
    End Function

This example requires that you use the Imports statement to import the System.Text.RegularExpressions namespace. For more information, see Imports Statement.

See Also

Tasks

How to: Identify Hyperlinks in an HTML String in Visual Basic
How to: Strip Invalid Characters from a String

Other Resources

Parsing Strings in Visual Basic