Build a Google Reverse Proxy Site on Azure Web App in Less than 3 Minutes

Recently I live in China, where many popular websites, such as Google and Facebook, are not accessible. So I write this article to share the easiest way to make these sites accessible by using Microsoft Azure. I'll use Google as the example.

In this article, I will first explain why I choose to use Azure Web App, which is followed by the step-by-step guidance. And finally, the article ends with explanation about the configurations indetail.

There Are Many Ways to Make These Sites Accessible

1. Set up a VPN by using Azure Virtual Machine.

2. Create an Azure CDN with Google as the origin server.

3. Host a REST service, such as Web API, on Web Role or Web App as a relay.

4. Use ARR and URL Rewrite modules in IIS to set up a reverse proxy service in a virtual machine.

5. Use ARR and URL Rewrite modules to build a reverse proxy service Web App.

Approach 1 is the ultimate solution, which allows you to access any public website (be aware of DNS cache poisoning, though). However it requires you to create a virtual machine and purchase a certificate if you want to share it with others. Not everyone likes it.

Approach 2 is simple but it doesn't work for server redirection or references to the backend server with absolute URL.

Approach 3 requires to write some code, which can be tedious.

Approach 4 also requires to create a virtual machine, which is more expensive.

Approach 5 is the most lightweight approach. So I will share this approach in detail. How lightweight it is? Well, nothing other than a web browser is required, and you don't even need an Azure subscription to go through the steps below.

When Do You Want to Build A Reverse Proxy on Azure Web App?

1. The backend website is publicly accessible from Azure, but is not accessible in your region.

2. You can access Microsoft Azure global instance.

3. You have an Azure subscription (global instance) or you are willing to create one.

4. You probably want to share the site with others.

Step-by-step Guidance

(If you already have an Azure subscription, and know how to create a Web App, you can go to step 3 directly.)

1. Go to https://tryappservice.azure.com in your web browser. In Select your app section, choose Web App, and click Next button.

2. In Select a template and create your Web App section, choose Empty Site, and click Create button. When asked to log in, use your Microsoft Account.

3. Now your web app is created. In my case, it is https://efe73b37-0ee0-4-231-b9ee.azurewebsites.net/ (it is a temporary one, and won't be available when you read this article). I will use "xyz" to replace the "efe73b37-0ee0-4-231-b9ee" part below. You will have a similar one which is different in the first part of the domain name. Don't visit your new site yet! If you happen to do so, refer to steps a-d.

4. Go to the SCM (Kudu) site of your web app: https://xzy.scm.azurewebsites.net/. Not that you need to insert "scm." before "azurewebsite.net". This is the administration site for your web app.

5. Navigate to Debug console -> CMD.

6. Navigate to site folder.

7. In the CMD console, run echo 1 > applicationHost.xdt to create the applicationHost.xdt file.

8. You will see the file applicationHost.xdt  created. Click the Edit icon to edit it.

9. Replace all contents in the editor with below text (also available here). Then click the Save button.

  <?xml version="1.0"?>
 <configuration xmlns:xdt="https://schemas.microsoft.com/XML-Document-Transform">
 <system.webServer>
 <proxy xdt:Transform="InsertIfMissing" enabled="true" preserveHostHeader="false" reverseRewriteHostInResponseHeaders="false" />
 <rewrite>
 <allowedServerVariables>
 <add name="HTTP_ACCEPT_ENCODING" xdt:Transform="InsertIfMissing" />
 <add name="HTTP_X_ORIGINAL_HOST" xdt:Transform="InsertIfMissing" />
 </allowedServerVariables>
 </rewrite>
 </system.webServer>
 </configuration>

Note revised on 1/14/2018, changing Insert to InsertIfMissing for HTTP_ACCEPT_ENCODING and HTTP_X_ORIGINAL_HOST as advised by David Ebbo.

10. Navigate to wwwroot folder.

11. In the CMD console, run echo 1 > web.config to create the web.config file.

12. You will see the file web.config created. Click the Edit icon to edit it.

13. Replace all contents in the editor with below text (also available here). Then click the Save button.

 <configuration>
 <system.webServer>
 <httpErrors errorMode="Detailed" />
 <rewrite>
 <rules>
 <rule name="ForceSSL" stopProcessing="true">
 <match url="^(.*)" />
 <conditions>
 <add input="{HTTPS}" pattern="^off$" ignoreCase="true" />
 </conditions>
 <action type="Redirect" url="https://{HTTP_HOST}/{R:1}" redirectType="Permanent" />
 </rule>
 <rule name="ProxyGStatic" stopProcessing="true">
 <match url="^gstatic/(.*)" />
 <action type="Rewrite" url="https://encrypted-tbn1.gstatic.com/{R:1}" />
 <serverVariables>
 <set name="HTTP_ACCEPT_ENCODING" value="" />
 <set name="HTTP_X_ORIGINAL_HOST" value="{HTTP_HOST}" />
 </serverVariables>
 </rule>
 <rule name="ProxySSLGStatic" stopProcessing="true">
 <match url="^sslgstatic/(.*)" />
 <action type="Rewrite" url="https://ssl.gstatic.com/{R:1}" />
 <serverVariables>
 <set name="HTTP_ACCEPT_ENCODING" value="" />
 <set name="HTTP_X_ORIGINAL_HOST" value="{HTTP_HOST}" />
 </serverVariables>
 </rule>
 <rule name="Proxy" stopProcessing="true">
 <match url="(.*)" />
 <action type="Rewrite" url="https://www.google.com.sg/{R:1}" />
 <serverVariables>
 <set name="HTTP_ACCEPT_ENCODING" value="" />
 <set name="HTTP_X_ORIGINAL_HOST" value="{HTTP_HOST}" />
 </serverVariables>
 </rule>
 </rules>
 <outboundRules>
 <preConditions>
 <preCondition name="IsHTML">
 <add input="{RESPONSE_CONTENT_TYPE}" pattern="^text/html" />
 </preCondition>
 <preCondition name="IsJson">
 <add input="{RESPONSE_CONTENT_TYPE}" pattern="^application/json" />
 </preCondition>
 </preConditions>
 <rule name="ChangeReferencesToOriginalUrl" preCondition="IsHTML">
 <match filterByTags="A, Area, Base, Form, Frame, Head, IFrame, Img, Input, Link, Script" pattern="^https://www.google.com.sg/(.*)" />
 <action type="Rewrite" value="https://{HTTP_X_ORIGINAL_HOST}/{R:1}" />
 </rule>
 <rule name="ChangeReferencesToOriginalUrlInJson" patternSyntax="ExactMatch" preCondition="IsJson">
 <match pattern="www.google.com.sg" />
 <action type="Rewrite" value="{HTTP_X_ORIGINAL_HOST}" />
 </rule>
 <rule name="ChangeGStaticReferencesToOriginalUrl" preCondition="IsHTML">
 <match filterByTags="A, Area, Base, Form, Frame, Head, IFrame, Img, Input, Link, Script" pattern="^https://encrypted-tbn[0-9].gstatic.com/(.*)" />
 <action type="Rewrite" value="https://{HTTP_X_ORIGINAL_HOST}/gstatic/{R:1}" />
 </rule>
 <rule name="ChangeGStaticReferencesToOriginalUrlInJson" preCondition="IsJson">
 <match pattern="encrypted-tbn[0-9]\.gstatic\.com" />
 <action type="Rewrite" value="{HTTP_X_ORIGINAL_HOST}/gstatic" />
 </rule>
 <rule name="ChangeSSLGstaticReferencesToOriginalUrl" patternSyntax="ExactMatch" preCondition="IsHTML">
 <match pattern="ssl.gstatic.com" />
 <action type="Rewrite" value="{HTTP_X_ORIGINAL_HOST}/sslgstatic" />
 </rule>
 <rule name="RewriteBackendRelativeUrlsInRedirects" preCondition="IsHTML">
 <match serverVariable="RESPONSE_LOCATION" pattern="^https://www.google.com.sg/(.*)" />
 <action type="Rewrite" value="https://{HTTP_X_ORIGINAL_HOST}/{R:1}" />
 </rule>
 </outboundRules>
 </rewrite>
 </system.webServer>
 </configuration>

14. Now you can browse https://xyz.azurewebsites.net/. Voila! The main page of Google is displayed.

If you have visited the site before editing applicationHost.xdt file, you need to do the following steps to restart the web app, since applicationHost.xdt is only processed when the web app starts.

a. Navigate to Process Explorer.

b. Press Properties.. button for the w3wp.exe (the one without "scm").

c. In the Properties pop up window, press the red Kill button.

d. Now you can visit https://efe73b37-0ee0-4-231-b9ee.azurewebsites.net/. Refresh for several times (with Ctrl + F5 in IE to avoid content caching) if you still see the old page. The w3wp.exe process will be recreated automatically when you visit the site.

Explanations in Detail

The reverse proxy Web App works in the way illustrated as below.

User sends the request to your web app https://xyz.azurewebsites.net. When the request reaches your web app, the URL rewrite rules configured in your web app kick in. The request's URL is changed to https://www.google.com, Accept-Encoding HTTP header is cleared, and then the modified request is forwarded to Google.
Google sends back the uncompressed response to your web app. Your web app scans the response and replaces references from https://www.google.com/* to https://xyz.azurewebsites.net/*, as well as other minor changes, and then sends back the modified response to the user. The users see the response as if your web app hosts a clone of Google site.

Step 1-3: A Web App created with Try Azure App Service lasts for only one hour for you to play with. If you find this approach useful, why not start with a free Azure subscription today?

Step 7: ApplicationHost.xdt is an Web App extension for you to transform the ApplicationHost.config file of your website. For more details, read this article.

Step 9: The proxy tag is to enable revere proxy functionality of Application Request Routing. The allowedServerVariables tag is to enable server variables to be set in URL Rewrite rules. I will discuss URL Rewrite rules in detail when explaining step 13. Without this allowedServerVariables tag, you will get URL Write Module Error such as 'The server variable "HTTP_ACCEPT_ENCODING" is not allowed to be set. Add the server variable name to the allowed server variable list.'

Step 13: Web.config file is used mainly to describe URL Rewrite rules. See this article for how to write rules for URL Rewrite 2.0.

httpErrors tag is to turn on debug information in case there is any URL Rewrite module error. The rules, except ForceSSL, in rules (inbound rules) tag define how the web request is pre-processed before forwarded to Google.

Rule ForceSSL is to redirect HTTP traffic to HTTPS endpoint for your site.

Rule ProxyGStatic is to rewrite URL of https://xyz.azurewebsites.net/gstatic/* to https://encrypted-tbn1.gstatic.com/*. This is used together with the outbound rules ChangeGStaticReferencesToOriginalUrl and ChangeGStaticReferencesToOriginalUrlInJson. ProxySSLGStatic is for similar purpose.

Rule Proxy is to rewrite URL of https://xyz.azurewebsites.net/* to https://www.google.com/*. This is the key of the configurations. All other rules are just to improve user experience of the web app.

The serverVariables tags in inbound rules are used to clear the Accept-Encoding HTTP header and save the host name. It is important to clear the Accept-Encoding HTTP header, or otherwise you cannot execute outbound rules since the content returned by Google is compressed. I configured allowedServerVariables in applicationHost.xdt so that these server variables can be set in URL Rewrite rules.

The rules in outboundRules tag defines how the web responses from Google are post-processed in the web app before sent back to user.

Rule ChangeReferencesToOriginalUrl is to change references (such as hyperlinks, images) to Google in the HTML responses to go through your web app.

Rule ChangeReferencesToOriginalUrlInJson is to change all references to Google in JSON responses to your web app.

Rule ChangeGStaticReferencesToOriginalUrl is to change references (such as hyperlinks, images) to https://encrypted-tbn[0-9].gstatic.com/* in the HTML responses to https://xzy.azurewebsites.net/gstatic/* virtual directory of your web app. It is supported by the ProxyGStatic inbound rule.

Rule ChangeGStaticReferencesToOriginalUrlInJson is to change all references to https://encrypted-tbn[0-9].gstatic.com/* in JSON responses to the https://xzy.azurewebsites.net/gstatic/* virtual directory of your web app. It is supported by the ProxyGStatic inbound rule.

Rule ChangeSSLGstaticReferencesToOriginalUrl is to change all references to ssl.gstatic.com to the https://xzy.azurewebsites.net/sslgstatic/* virtual directory of your web app. It is supported by the ProxySSLGStatic inbound rule.

Rule RewriteBackendRelativeUrlsInRedirects is to change location of 301/302 redirection to your web app if it is originally https://www.google.com/*.

The two pre-conditions in preCondition tag, IsHTML and IsJson, are used by the outbound rules to check if the response content type is HTML or JSON, respectively.

References

https://tomssl.com/2015/06/15/create-your-own-free-reverse-proxy-with-azure-web-apps/

https://ruslany.net/2014/05/using-azure-web-site-as-a-reverse-proxy/

https://www.iis.net/learn/extensions/url-rewrite-module/url-rewrite-module-20-configuration-reference

https://stackoverflow.com/questions/15926203/iis-as-a-reverse-proxy-compression-of-rewritten-response-from-backend-server

Comments

  • Anonymous
    May 08, 2017
    In the allowedServerVariables, please change "Insert" to "InsertIfMissing". We've seen cases where sites break after using this because another xdt (e.g. from ApplicationInsights) also adds those, and they end up duplicated. InsertIfMissing takes care of it. Would be great if you could update your post to reflect that. Thanks!
  • Anonymous
    June 21, 2017
    This is so well written but unfortunately it's not working for me. Does this need to be updated anywhere?