Partager via


Steps to Deploy Azure Nodes with Microsoft HPC Pack

 

Applies To: HPC Pack 2016, Microsoft HPC Pack 2012, Microsoft HPC Pack 2012 R2

This topic describes the overall process for deploying Azure nodes in a “burst” scenario in a cluster running HPC Pack.

Note


Support for adding Azure nodes is available starting with HPC Pack 2008 R2 with Service Pack 1.

Prerequisites

Before you deploy Azure nodes in your Windows HPC cluster, ensure the following:

  • Your cluster and network environments meet the requirements for deploying Azure nodes

  • You can access an Azure subscription

  • The Azure subscription is configured with the necessary management certificate, cloud service, storage account, and other Azure features that your scenario requires

For more information, see the following topics:

Note


If you plan to deploy a large number of Azure nodes, additional configuration may be needed in your local cluster environment and in your Azure subscription. For more information, see Best Practices for Large Deployments of Azure Nodes with Microsoft HPC Pack.

Step 1: Create an Azure node template

To create an Azure node template, use the Create Node Template Wizard in HPC Cluster Manager.

An Azure node template includes the following configuration information:

  • Information from the Azure subscription that will be used to add a set of Azure nodes to the cluster. Minimally this information includes the Azure subscription ID, a certificate thumbprint for an Azure management certificate, the name of an Azure cloud service, and the name of a storage account. For more information, see Understanding Azure Subscription Information for Microsoft HPC Pack.

  • Optionally, settings to enable additional Azure features that are supported by your version of HPC Pack. For more information about configuring these additional settings, see Configuring an Azure Node Template for Microsoft HPC Pack.

    Note


    If supported by your version of HPC Pack, certain Azure features such as an Azure virtual network must be preconfigured in the Azure subscription before they can be configured in an Azure node template.

  • The availability policy of the nodes—that is, how and when the Azure nodes are started (the Azure role instances are provisioned) and stopped (the role instances are removed from the Azure cloud service). For more information, see Understanding the Azure Node Availability Policy.

To create an Azure node template

  1. Start HPC Cluster Manager.

  2. In Configuration, in the Navigation Pane, click Node Templates.

  3. In the Actions pane, click New. The Create Node Template Wizard appears.

  4. On the Choose Node Template Type page, click Azure node template, and then click Next.

  5. On the Specify Template Name page, type a name for the node template, and optionally type a description for it. Click Next.

  6. On the Provide Subscription Information page, provide the following information from the Azure subscription that will be used to add the nodes:

    1. In the Subscription ID text box, type or paste the ID of a valid Azure subscription account.

    2. In the Management certificate text box, type, paste, or browse to a thumbprint of a certificate with a private key that is in the appropriate Certificates stores on the computer. Then click Next.

      Note

      • The thumbprint must identify a private key certificate that corresponds to the management certificate that is configured in the Azure subscription.
      • If you type or paste the thumbprint, ensure that you remove all spaces.
      • If you click Browse, a list of available server authentication certificates appears, including certificates that you may have configured on the computer. Select a name in the list to add the corresponding thumbprint.
      • If you previously configured the certificate that was automatically generated on the head node when HPC Pack was installed, click Browse and then select Default Microsoft HPC Azure Management. For information about using the Default Microsoft HPC Azure Management certificate, see Options to Configure the Azure Management Certificate for Azure Burst Deployments.
      • If you do not see a certificate that you expect in the list, or there is an error with the certificate that you select, see Troubleshoot certificate problems.

      Important


      If the services running on the head node cannot connect to Azure, you may see an error message similar to The remote server returned an error: (403) Forbidden. This may indicate a problem with the configuration of the network firewall, the management certificate on the head node, or a proxy client that communicates with the network firewall. To ensure that you have configured HPC Pack properly to communicate with Azure, see Requirements to Add Azure Nodes with Microsoft HPC Pack.

  7. On the Provide Service Information page, select an Azure cloud service name and a storage account name that appear in the dropdown lists. Click Next.

  8. Depending on the version of HPC Pack that is installed, you may be able to configure additional Azure settings in the template, such as Remote Desktop credentials, or the name of an Azure virtual network. For more information about these additional settings, see the help topics in Configuring an Azure Node Template for Microsoft HPC Pack.

  9. On the Configure Azure Availability Policy page, select how you want the Azure nodes to start (this provisions the role instances in Azure) and stop (this removes the role instances from Azure):

    1. If you want to start and stop the nodes manually, select that option, and then click Next. Go to the last step in this procedure.

    2. If you want nodes to start (and be brought online) and stop automatically, select that option, and then click Configure Availability Policy. The Configure Azure Availability Policy dialog box appears.

    3. In the Configure Azure Availability Policy dialog box, click and drag the mouse to select the days and hours for the nodes to start and stop.

    4. Optionally, specify the number of minutes before the nodes stop (no new jobs will start on the nodes).

    5. To save your settings, click OK, and then click Next.

      Important

      • Deploying the Azure role instances can take several minutes under some conditions, and deleting the instances can also take several minutes.
      • If you select the option to start and stop nodes automatically, plan additional time in each online time block for node deployment, in addition to the time that you want the nodes to be available to run jobs. You should also avoid scheduling online time blocks at short intervals.
  10. To create the node template, on the Review page, click Create.

To edit an Azure node template

  1. In HPC Cluster Manager, in Configuration, in the Navigation Pane, click Node Templates.

  2. In the views pane, select an Azure node template.

  3. In the Actions pane, click Edit. The Node Template Editor dialog box appears.

  4. To modify the existing template properties, you can specify a template name and description, or modify the additional settings on the Connection Information and other tabs.

  5. To validate the Azure connection information, such as the names of the cloud service and the storage account, on the Connection Information tab, click Validate connection information.

  6. After you are done editing the template, click Save.

Additional considerations

  • To add or validate the subscription information in an Azure node template, you must have an Internet connection and the management certificate for Azure must be properly configured.

  • Editing the connection information does not affect connection settings for the Azure nodes that are already deployed by using the node template. Only nodes that you add later use the new connection information in the template.

  • Editing the Azure node availability policy changes the policy for nodes that are already added to the HPC cluster by using the node template, as well as for nodes that you add later. For example, you can edit the Azure node template so that Azure nodes that are configured to start and stop automatically according to a weekly schedule are now configured to start and stop manually.

    Note


    After you configure an automatic availability policy in an existing Azure node template, the policy does not immediately affect nodes that are currently started (provisioned) in Azure but are offline. If you make this change during one of the configured availability intervals in the template, the provisioned nodes that are offline remain in that state during the interval. These nodes will stop automatically according to the policy but will only start (and be brought online) automatically at the beginning of subsequent availability intervals.

  • Depending on the configuration of the availability policy in the Azure node template and the Task Cancel Grace Period setting in Job Scheduler Configuration, the exact time when Azure nodes are stopped and the deployment ends can differ from the scheduled end of an online time block. This can occur when HPC tasks are still running near the end of the online time block. For more information, see Understanding the Azure Node Availability Policy.

  • You can upload a file package to the storage account that is specified in the template. For example, you might want to upload application or service files that will run on the nodes. If you do this, the package is automatically installed in the nodes at the time the role instances are deployed in Azure. For more information about packaging files and uploading them to a storage account, see hpcpack.

Step 2: Add Azure nodes to the Windows HPC cluster

After you create an Azure node template, you can add the nodes to the cluster by using the Add Node Wizard.

To add nodes, you specify the Azure node template and the following information:

  • The number of nodes The number of role instances that will be deployed in Azure when you start the nodes. Ensure that the number is within the quota of role instances in the subscription for Azure.

  • The size of the nodes One of the worker role instance sizes in Azure that can be used with HPC Pack. The size determines characteristics such as the number of CPU cores, the memory capacity, and the local file system size of each role instance. For more information, see Azure worker role instance sizes that can be used in burst deployments.

    Note


    Starting in HPC Pack 2012 R2 Update 1, HPC Pack automatically detects and allows you to select additional supported worker role sizes if they are introduce later in Azure.

To add Azure nodes

  1. In HPC Cluster Manager, in Resource Management (called Node Management in some versions of HPC Pack), in the Actions pane, click Add Node. The Add Node Wizard appears.

  2. On the Select Deployment Method page, click Add Azure nodes, and then click Next.

  3. On the Specify New Nodes page, select a node template, specify the number and the size of the nodes, and then click Next.

  4. On the Completing the Add Node Wizard page, click Finish.

Additional considerations

  • To add Azure nodes, you can also use the Add-HpcNodeSet HPC PowerShell cmdlet.

  • After they are added, the Azure nodes are in the Not-Deployed state and they have a node health state of Unapproved. Before you can use them to run jobs, they must be started (provisioned) and then brought online. The nodes are started and brought online either manually or automatically, as specified in the node template.

  • All Azure nodes that are added to the cluster by using a specific node template define a set of nodes that will be deployed and can be managed together in Azure when you start the nodes. This includes Azure nodes that you add later by using the same node template.

  • For more information, see Add Azure Nodes.

Step 3: Start (provision) the Azure nodes

To provision the role instances in Azure, you must start the Azure nodes that you added to the HPC cluster. Then you bring the nodes online so that they are available to run cluster jobs.

How the nodes are started and brought online depends on the availability policy that you configured in the Azure node template as follows:

  • Automatic The nodes are automatically configured to be in the Online state during one or more intervals each week. You do not need to perform other actions.

  • Manual You must first start the nodes, and then bring them online to make them available to run cluster jobs.

Important


Starting in HPC Pack 2012 R2 Update 1, you manually select one or more nodes that you want to start in Azure. The nodes that you specify to start can come from more than one Azure node deployment. In earlier versions of HPC Pack, you can only start a complete set of nodes that were deployed using one node template.

To manually start specific Azure nodes (introduced in HPC Pack 2012 R2 Update 1)

  1. In Resource Management (called Node Management in some versions of HPC Pack), in the Navigation Pane, click Nodes.

  2. In the List or Heat Map view, select one or more Azure nodes that you want to start.

  3. In the Actions pane, click Start. The Start Azure Nodes dialog box appears.

  4. During the starting process, the state of the nodes changes from Not-Deployed to Provisioning. If you want to track the provisioning progress, select a node, and then in the Details Pane, click the Provisioning Log tab.

  5. After a node starts successfully, the node state changes to Offline.

  6. To bring the nodes online, select the nodes that are in the Offline state, right-click, and then click Bring Online.

To start a set of Azure nodes manually and bring them online (HPC Pack 2012 R2 and earlier versions

  1. In Resource Management (called Node Management in some versions of HPC Pack), in the Navigation Pane, click Nodes.

  2. In the List or Heat Map view, select one or more nodes.

  3. In the Actions pane, click Start. The Start Azure Nodes dialog box appears.

  4. If you selected nodes that were added by using different node templates, select a node template to specify the set of nodes to start. Then click Start.

  5. During the starting process, the state of the nodes changes from Not-Deployed to Provisioning. If you want to track the provisioning progress, select a node, and then in the Details Pane, click the Provisioning Log tab.

  6. After a node starts successfully, the node state changes to Offline.

  7. To bring the nodes online, select the nodes that are in the Offline state, right-click, and then click Bring Online.

    Note


    Starting with HPC Pack 2008 R2 with SP3, you can bring some nodes online and start running jobs on them as soon as the nodes have moved from the Provisioning node state to the Offline node state, even if other nodes in the group of nodes that you started to provision are still in the Provisioning state. In this case, the health of the whole group of nodes still appears as Transitional. You do not need to wait for the health of the nodes to transition to OK.

Additional considerations

  • To manually start the set of Azure nodes added using a single node template, you can also use the Start-HpcNodeSet HPC PowerShell cmdlet.

  • Starting in HPC Pack 2012 R2 Update 1, to manually start one or more specified Azure nodes, you can also use the Start-HpcAzureNode HPC PowerShell cmdlet.

  • Starting Azure nodes can take some time to complete, depending on the number of nodes and the performance of Azure. The provisioning log updates infrequently during this time. You can cancel the provisioning of the nodes while the node health is Transitional. If there are errors during the provisioning of one or more nodes, the state of those nodes is set to Unknown and the node health is set to Unapproved. To determine the reason for the failure, review the provisioning logs. You can find additional information about the status of the role instances in the portal. You can also review trace log files that are generated on the role instances. For more information, see Troubleshoot Deployments of Azure Nodes with Microsoft HPC Pack.

  • If an automatic availability policy is configured, the nodes are available to run jobs in an online time block only after the role instances have been provisioned in Azure. The scheduled time to start (and bring online) the nodes does not include the time that Azure takes to provision the role instances.

  • The subscription for Azure will be charged for the time that the nodes are available, as well as for the compute and storage services that are used. For more information, review the terms of the subscription for Azure.

  • Each time that you start a set of Azure nodes, additional proxy role instances are automatically configured by HPC Pack in Azure to facilitate communication between the head node and the nodes. The number and size of the proxy role instances depends on the version of HPC Pack. The proxy role instances are not listed in HPC Cluster Manager after the nodes are provisioned. However, the instances appear in the portal. The proxy role instances incur charges in Azure along with the Azure node instances. For more information, see Set the Number of Azure Proxy Nodes.

Step 4: Stop the Azure nodes

If you configured an automatic availability policy in the node template for the Azure nodes, the nodes are automatically taken offline and stopped at the end of each online time block in the policy. Stopping the nodes shuts down and removes the instances from the Azure cloud service, and returns the nodes to the Not-Deployed state in the cluster.

If you configured a manual availability policy for the nodes, you can manually stop the worker role instances at any time. You might want to do this to ensure that you are not charged for Azure resources that are not in use.

Important


Starting in HPC Pack 2012 R2, you can manually stop specific nodes from Azure, to scale down Azure nodes when they are no longer needed. The nodes that you specify to stop can come from more than one Azure node deployment. In earlier versions of HPC Pack, you can only stop a complete set of nodes that were deployed using one node template.

To manually stop specific Azure nodes (introduced in HPC Pack 2012 R2)

  1. In Resource Management (called Node Management in some versions of HPC Pack), in the Navigation Pane, click Nodes.

  2. In the List or Heat Map view, select one or more Azure nodes that you want to stop.

  3. In the Actions pane, click Stop. The Stop Azure Nodes dialog box appears.

  4. If you want to cancel jobs that are running on the nodes to stop the nodes immediately, select that option. Otherwise, the nodes will stop gracefully after any running jobs are drained. Then click Stop.

  5. If you want to track the stopping progress, select a node, and then in the Details Pane, click the Provisioning Log tab.

To manually stop a set of Azure nodes (HPC Pack 2012 and earlier versions)

  1. In Resource Management (called Node Management in some versions of HPC Pack), in the Navigation Pane, click Nodes.

  2. In the List or Heat Map view, select one or more Azure nodes.

  3. In the Actions pane, click Stop. The Stop Azure Nodes dialog box appears.

  4. If you selected nodes that were added by using different node templates, select a node template to specify the set of nodes to stop.

  5. If you want to cancel jobs that are running on the nodes to stop the nodes immediately, select that option. Otherwise, the nodes will stop gracefully after any running jobs are drained. Then click Stop.

  6. If you want to track the stopping progress, select a node, and then in the Details Pane, click the Provisioning Log tab.

Additional considerations

  • To manually stop a set of Azure nodes (deployed using a single Azure node template), you can also use the Stop-HpcNodeSet HPC PowerShell cmdlet

  • If you want to manually stop a set of Azure nodes and remove them from the cluster, you can use the Remove action or the Remove-HpcNodeSet HPC PowerShell cmdlet.

  • Stopping or removing a set of nodes in Azure can take several minutes to complete. Proxy nodes in the cloud service are also removed during this process.

  • Starting in HPC Pack 2012 R2, to manually stop or remove specified Azure nodes, you can also use the Stop-HpcAzureNode or Remove-HpcAzureNode HPC PowerShell cmdlet.

  • You should only stop or remove Azure Nodes by using HPC Cluster Manager. Do not use the portal or other Azure tools to remove role instances.

See Also

Burst to Azure Worker Instances with Microsoft HPC Pack
Configuring an Azure Node Template for Microsoft HPC Pack