Compartir a través de


How to reverse engineer a shortened URL

URL shortening services have become very popular as Twitter hits the "top web sites" charts day by day. Since Twitter has a 140 characters limitation for its messaging system, this limitation creates the need to find a way to save more space for texting. Speaking of more space, URL shortening services help a lot.

A URL shortening service, such as tiny.cc or bit.ly, shrinks the long URLs into the short forms and saves the shortened copy and the original URL in their databases. When a request arrives to the shortened URL, the URL shortening service gets the original URL from the database and redirects users to the original web site.

For example, https://blogs.msdn.com/b/amb/archive/2011/02/02/tmg-may-not-log-to-sql-server-if-tmg-service-pack-1-is-installed.aspx link is 116 characters long and if I would like to use this link in Twitter then I would have only 24 characters left to write some more text in a tweet. However, if I would use a URL shortening service, such as https://bit.ly, then the link would turn into https://bit.ly/hlqwPI which is 20 characters long. This change would help me to write 120 characters more in my tweet.

Needless to say, there is more benefit using a URL shortening service but that is out of scope for this post. For more information about URL shortening services you can visit one of them or make a search on the Internet.

So, why would we need to know the original URL?

There would be several answers for this question but one of them is pretty obvious: security. An attacker would point to a “harmful” web site or content and you would only see a short URL which would not tell anything about the FQDN (e.g.: https://blogs.msdn.com/) or page/path information (e.g.: /b/amb/archive/…) of the original URL.

Of course there are several web sites which helps you to find the original URL from a shortened one, such as https://longurl.org/, but as a developer it would be nice to understand how to achieve this task by your own self.

So I decided to develop a simple web application to find the original URL for a given shortened one.

The application takes a shortened URL from a text box and the form button’s click event does the magic. I found that this was a really simple task. Here is the code snippet which does the main job:

        string url = TextBox1.Text;

        HttpWebRequest request = (HttpWebRequest)(WebRequest.Create(url));

        using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())

        {

            string uriString = response.ResponseUri.AbsoluteUri;

            Label1.Text = uriString;

        }

Logic is very simple: we need to make an HTTP request to the shortened URL and get the original URL from HTTP response headers.

As the URL shortening services redirect the user to the original link, the response to the client should be a “redirect information” (such as HTTP 302) with the original URL in “Location” header information. This is mapped to HttpWebResponse object’s ResponseUri.AbsouluteUri property.

As the code above shows clearly, shortened URL is entered in a text box (TextBox1) by a user and the original URL is written in a label control (Label1). I am sure that you can easily modify the code above for your needs.

Oops...I cannot get the correct “original URL” for some URLs shortened by some services...

While I was testing my simple project, my application did not work as expected for some shortened URLs. To give an example, the links shortened by URL shortening services focus on Facebook links, such as the ones start with https://fb.me/. Every time I tested an fb.me link with my application, I was always given the same URL: https://www.facebook.com/common/browser.php

When I browsed the URL above, I understood that the fb.me was not happy with my user-agent. A user-agent is simply a mark of your browser and as the code snippet above does not tell anything about my user-agent, my application failed with the URL shortened services which check the client’s user-agent. This is probably done to avoid some crawlers or automated tools/sites which are not clever enough to provide the user-agent information.

The solution – let’s make the application more clever

Simply I took the user-agent information from the request to my web application (by Request.ServerVariables["HTTP_USER_AGENT"] line) and just added it to my request object which makes an HTTP call to the shortened URL:

        string url = TextBox1.Text;

        string userAgent = Request.ServerVariables["HTTP_USER_AGENT"];

        HttpWebRequest request = (HttpWebRequest)(WebRequest.Create(url));

        request.UserAgent = userAgent;

        using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())

        {

            string uriString = response.ResponseUri.AbsoluteUri;

            Label1.Text = uriString;

        }

I have then tested fb.me URLs and the result was as expected, working fine. I guess there is no need to say that I haven’t tested my application with all URL shortening services, so you may find bugs with some other shortening services. If so, please let me know by using the comments section below.

<UPDATE>

Thanks to Richard Deeming (see the comments section please), there is a better approach to avoid unnecessary round-trip between the client (in this case IIS server) and the target server:

You can set request object's AllowAutoRedirect property to false and read the response's Location header directly. Here is what you need:

        request.AllowAutoRedirect = false;

        uriString = response.Headers["Location"];

As I said, this will avoid unnecessary round-trip between IIS and the target web server.

</UPDATE>

References

HttpWebRequest Class
https://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx

HttpWebResponse Class
https://msdn.microsoft.com/en-us/library/system.net.httpwebresponse.aspx

How to use HttpWebRequest and HttpWebResponse in .NET
https://www.codeproject.com/KB/IP/httpwebrequest_response.aspx

Applies To

.NET Framework

--
AMB

Comments

  • Anonymous
    February 13, 2011
    That's a nice one!
  • Anonymous
    February 13, 2011
    Awesome post!
  • Anonymous
    February 15, 2011
    Nice! I did something similar a while back, but I decided I didn't want to make an unnecessary request to the original URL. Instead, I set the AllowAutoRedirect property of the request to false, checked the response for one of the redirection status codes (Moved, Redirect, RedirectMethod or RedirectKeepVerb), and read the value of the Location response header.
  • Anonymous
    February 16, 2011
    The comment has been removed