HTTPWebRequest 404 Error
A while back I developed an application which made requests to several web sites, scraped some data and stored the data into a database. The application worked great for several months without issue. One day the application failed to get a successful response from the sites I was screen scraping. Obviously, the first thing I checked was to insure that the failure was not due to the publisher updating their site and changing any critical HTML on me, no dice. So I began debugging and found that my request was getting hung up when it made the call to request the HTML page I needed. I kept getting a n exception that stated “The remote server returned an error: (404) Not Found.” The object I was using to make my request was “HTTPWebRequest” and as I stated earlier all was fine in paradise. The code below is what I started with and have been running for months:
Private Function GetHTMLData(ByVal source As String) As String
Dim request As HttpWebRequest = CType(WebRequest.Create(source), HttpWebRequest)
Dim encode As Encoding = System.Text.Encoding.UTF8
Dim r As New IO.StreamReader(request.GetResponse.GetResponseStream(), encode)
Dim htmlString As String = r.ReadToEnd().ToUpper(CultureInfo.CurrentCulture)
r.Close()
Return htmlString
End Function
After a couple of days of searching the Internet to find a resolution for this issue I had no luck in resolving the matter. What made matters worse was I was able to take the same exact URL that was being requested in my code and paste it into the my web browser and see the data just fine. So same machine, same network what could be happening. It did not help that a “404” is usually a “Page Not Found Error” but clearly the page exists and contains all the information I am expecting. I finally decided to fire up Fiddler to see what was happening with my request. Shortly, after making the request from my browser and then again from my application I began the comparison of the two requests only to realize the only difference was the “User-Agent” header value. My code did not set this value however the browser clearly does. Turns out I can set the “User-Agent” value to whatever I like as long as it is not blank I get data returned to me. So I inserted 1 line of code and it resolved the matter entirely and now my application runs as smooth as a …. Below is what the code looks like now:
Private Function GetHTMLData(ByVal source As String) As String
Dim request As HttpWebRequest = CType(WebRequest.Create(source), HttpWebRequest)
request.UserAgent = "Fiddler"
Dim encode As Encoding = System.Text.Encoding.UTF8
Dim r As New IO.StreamReader(request.GetResponse.GetResponseStream(), encode)
Dim htmlString As String = r.ReadToEnd().ToUpper(CultureInfo.CurrentCulture)
r.Close()
Return htmlString
End Function
Comments
Anonymous
October 18, 2011
Amazing. Really Really very helpful :)Anonymous
February 21, 2012
Thanks for your post. You're a lifesaver.Anonymous
September 12, 2012
Absolutely wonderful..! Can you please tell me what consequences I can face in my application after using request.UserAgent = "Fiddler"?Anonymous
October 13, 2015
It worked like a charm! Many thanks ;)