Troubleshooting connections from Windows HPC Server to Azure
This topic provides information to help you troubleshoot connection problems between Windows HPC Server 2008 R2 (with at least Service Pack 1) and Azure. Connection problems have several causes and can prevent you from performing operations related to deploying or using worker or VM nodes in Azure in “burst” scenarios from an on-premises cluster, such as:
- Creating a Windows Azure node template
- Provisioning a Windows Azure node
- Uploading a .vhd file from the head node to a Windows Azure subscription
- Running a diagnostic test
This topic lists some common causes, with solutions or workarounds, for connection problems.
Internet connectivity
It’s basic, but first check that your head node and any client computer that you are using to manage the cluster (for example, by using HPC Cluster Manager, or HPC PowerShell) can access the Internet.
Management certificate
Operations between the HPC cluster and Windows Azure use the Windows Azure management API, so must be authenticated by an X.509 management certificate.
If a management certificate is not installed or configured properly, you might see an error similar to: The remote server returned an error: (403) Forbidden
. This error message could appear in a dialog box or the Operations log in HPC Cluster Manager. (Note that this generic error message can also indicate a problem with clock synchronization; see below.) Or, you might see multiple occurrence of the following entry in the Operations log: Retrying operation ConfigureAzureStorageBeforeDeployment
.
To check the management certificate, verify that the certificate .cer file is added to Management Certificates in the Windows Azure subscription, and the private key certificate (.pfx file) is imported on the head node of the cluster in the Trusted Root Certification Authorities store of the local computer account. In addition, if you are using a client computer to manage the cluster, you must also import the .pfx file in the Trusted Root Certification Authorities of the local computer account on that computer.
For example, it’s easy to mix up the certs and import the .cer file in a local store, or import the .pfx file in the wrong store. Either will cause the Azure node deployment or other management operations to fail. Double-check that the correct .cer file is in the Azure subscription, and the private key certificate is in the right local place!
Note
Windows HPC Server automatically deploys a test certificate that you can use. So, if you’re having trouble using a certificate that you generated yourself (say, using makecert), try a deployment using HPC Cluster Manager on the head node with the test certificate. Just upload the %CCP_HOME%bin\hpccert.cer file to your Windows Azure subscription. Then, when you create a node template or upload a .vhd file, etc. make sure to select the certificate named Default Microsoft HPC Azure Management. If this certificate works in your test deployment, then maybe you have a problem generating the proper certificate yourself.
Windows Azure Services Connection Test
If you are running at least Windows HPC Server 2008 R2 with SP2, the Windows Azure Services Connection Test verifies that the information in each Windows Azure node template is configured properly for connection with Windows Azure, and the correct certificate is available on the head node.
Firewall ports
A number of firewall ports must be open on your network (enterprise) firewall to allow the traffic for HPC services to Windows Azure. The following table is reproduced from Requirements for Windows Azure Nodes in Windows HPC Server 2008 R2):
Protocol | Direction | Port | Purpose |
---|---|---|---|
TCP |
Outbound |
80 |
HTTP
|
TCP |
Outbound |
443 |
HTTPS
|
TCP |
Outbound |
3389 |
RDP
|
TCP |
Outbound |
5901 |
Service-oriented architecture (SOA) services
|
TCP |
Outbound |
5902 |
SOA services
|
TCP |
Outbound |
7998 |
File staging |
TCP |
Outbound |
7999 |
Job scheduling |
If traffic is not allowed on ports 80 and 443, you won’t be able to create a Windows Azure node template or upload .vhd file. If traffic is not allowed on the other ports in the table, then node provisioning, job scheduling, or node administration in Windows Azure could fail. Note that just because you can create a Windows Azure node template does not guarantee that you have all of the connectivity that Windows HPC Server needs to provision nodes and run jobs on them.
Windows Azure Firewall Ports Test
If you have upgraded to Windows HPC Server 2008 R2 with SP2, you can run the Windows Azure Firewall Ports diagnostic test to test that these firewall ports are open. Alternatively, you can try running the following telnet command to see whether the HPC service name used in the Windows Azure node deployment is reachable:
telnet <ServiceName>.cloudapp.net 7999
You can also try running the telnet command for the following additional ports: 5901, 5902, 7998.
Proxy servers and clients
Proxy servers or firewalls in your enterprise network, or the proxy clients installed on the computers, might cause connection issues that are difficult to troubleshoot and resolve. One indication that this might be a problem is that the Windows Azure Firewall Ports test (available in at least Windows HPC Server 2008 R2 with SP2) fails, but you have verified that all of the required firewall ports for communication between Windows HPC Server and Windows Azure are open. Alternatively, you might be able to make connections to Windows Azure only when a proxy server is temporarily disabled or the traffic from Windows HPC Server is temporarily redirected.
Proxy server must be HTTP 1.1 compliant
Connections to Windows Azure require HTTP version 1.1, so any proxy server must be fully compliant with HTTP version 1.1. If it is not, the Windows Azure Firewall Ports test will likely fail on all ports other than ports 80 and 443.
Services running under the system account might be blocked
Several Windows HPC Server services, running under the system account, must be able to communicate with services in Windows Azure. The Windows HPC Server services include HPCManagement, HPCScheduler, and HPCBrokerWorker. Communication by services that run under the system account might be blocked by a proxy server or network firewall, so check whether you need to configure the proxy server, or a proxy client, to allow this traffic. For example, if your network firewall is Forefront TMG or a version of ISA Server, you might need to configure the Forefront TMG Client or Microsoft Firewall Client software on the head node to associate specific user credentials with these services. For an example of how to do this, see Configure a Proxy Client on the Head Node.
IP address ranges might be blocked
Certain corporate firewalls might block IP addresses that are used for deploying Windows Azure nodes. This can cause sporadic problems with connectivity to nodes after the nodes have been provisioned in Windows Azure. If your corporate firewall uses IP address filtering, note that the IP addresses used by Windows Azure can change with every deployment. Also, the IP addresses used for diagnostic tests are different than the IP addresses of your actual Windows Azure deployments.
Clock synchronization
The clock must be synchronized properly on any on-premises computer that communicates with Windows Azure. If it is not, the computer will not be able to perform Windows Azure storage transactions, such as uploading a .vhd. If you have a clock synchronization problem, you might see authentication error messages similar to: The remote server returned an error: (403) Forbidden
. (Note that this generic error message can also indicate a problem with the management certificate ; see above.)
Double-check the time server configuration on your primary domain controller. To configure the Windows Time service, see Configure the Time Source for the Forest.
Version of HPC Pack on VHD
The version of the Microsoft HPC Pack 2008 R2 for Windows Azure Virtual Machines (which you install using the HpcAzureVM.msi installer) needs to be at the same version level as the full HPC Pack installed on the head node.
Incorrect components installed on the VHD
If you install the incorrect version of HPC Pack on a VHD image, provisioning of a Windows Azure VM node can fail with an error in the provisioning log similar to Node <NodeName> never became available. As described in Create a VHD for VM Nodes, the only HPC component on your VHD should be Microsoft HPC Pack 2008 R2 for Windows Azure Virtual Machines (which you install using the HpcAzureVM.msi installer). Don't install on your VHD images the same version of HPC Pack 2008 R2 that you install on your on-premises compute nodes!
Note
You must also install the Windows Azure integration components on your VHDs, in addition to any applications that you want to run on the VM nodes. Optionally, to enable Windows Azure Connect, install the endpoint software for Windows Azure Connect.