Get a Web Page's Title from a URL (C#)
I was creating an app that saves URLs copied to the clipboard into an XML file. This little bit of code came in handy so I thought I'd be worth sharing.
This code checks to make sure the URL is to a valid HTML page by first checking the type of request, then checking the header of the page. If it is an HTML page, then the page is downloaded and a regular expression is used to pull out the <title> contents.
Example Code: GetWebPageTitle.zip
Uses namespaces: System.Net, System.Collections.Generic, System.Text.RegularExpressions
public static string GetWebPageTitle(string url)
{
// Create a request to the url
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
// If the request wasn't an HTTP request (like a file), ignore it
if (request == null) return null;
// Use the user's credentials
request.UseDefaultCredentials = true;
// Obtain a response from the server, if there was an error, return nothing
HttpWebResponse response = null;
try { response = request.GetResponse() as HttpWebResponse; }
catch (WebException) { return null; }
// Regular expression for an HTML title
string regex = @"(?<=<title.*>)([\s\S]*)(?=</title>)";
// If the correct HTML header exists for HTML text, continue
if (new List<string>(response.Headers.AllKeys).Contains("Content-Type"))
if (response.Headers["Content-Type"].StartsWith("text/html"))
{
// Download the page
WebClient web = new WebClient();
web.UseDefaultCredentials = true;
string page = web.DownloadString(url);
// Extract the title
Regex ex = new Regex(regex, RegexOptions.IgnoreCase);
return ex.Match(page).Value.Trim();
}
// Not a valid HTML page
return null;
}
Comments
Anonymous
February 19, 2007
The comment has been removedAnonymous
February 19, 2007
Handy, but a bit expensive; it requires downloading of the entire page in order to get at something usually at the start of the HTML.Anonymous
February 19, 2007
The comment has been removedAnonymous
February 19, 2007
Peter, You're right, this piece of code is obviously not designed for performance critical scenarios, but it is easy and works well for my client-side apps. If performance was a concern, one could combine it into a single request that downloads byte by byte and uses text parsing (instead of a RegEx) as the file is downloaded to look at the header and body and stop downloading after the wrong header or </title> tag is found.Anonymous
March 02, 2007
It's too bad everyone gripes about hte expense. If it does what you want then use it. If not or you have a better solution, back it up with the code for the rest of us to look at.Anonymous
May 22, 2008
Thanks for regular expression : @"(?<=<title.>)([sS])(?=</title>)"; That's what I wanted.... ThanksAnonymous
January 06, 2009
No really, I think that if you look more you will get a free version that does the same thing. Just try moreAnonymous
November 24, 2009
thanx for ur ,,, can u help me to optain the title of a live score web site , in which the title changes when the page gets refeshedAnonymous
March 28, 2010
My homegrown bookmark utility didn't work with Chrome because it doesn't use the "FileGroupDescriptor" like FF and IE do. Your code just fixed that! That it takes two requests is a small price to pay for being able to Chrome! Thank you very much.Anonymous
May 01, 2010
Hi! I tried to use your code in my app and it works mostly, but when I try to get the title of a webpage like http://www.arabic-keyboard.org/ it shows this: "Arabic Keyboard ™ Ů„ŮŘŘ© المŮŘ§ŘŞŮŠŘ Ř§Ů„ŘąŘ±Ř¨ŮŠŘ©". I tried using System.Web.HttpUtility.HtmlDecode, but it still doesn't work. Is it in the encoding or something else? Does anyone know a solution to this problem? Thanks!Anonymous
September 08, 2010
hi, nice example. I noticed however, that a repeated call won't come back. At least in my config with .net 4. Closing the response resolves this issue ;) --> response.Close();Anonymous
September 08, 2010
hi, nice example. I noticed however, that a repeated call won't come back. At least in my config with .net 4. Closing the response resolves this issue ;) --> response.Close();Anonymous
September 07, 2011
Nice, was looking exactly for this.