Condividi tramite


New Feature Evaluation Guide for Windows HPC Server 2008 R2 SP2

Updated: February 2012

Applies To: Windows HPC Server 2008 R2

This guide provides scenarios and steps to try new features in HPC Pack 2008 R2 Service Pack 2.

Before following the steps in this guide, review the Release Notes for Microsoft HPC Pack 2008 R2 Service Pack 2. The release notes also include instructions and information for installing SP2.

This guide includes the following scenarios:

  • Integration with Windows Azure

    • Run an MPI job on Azure Nodes

    • Offload Excel workbooks to Azure Nodes

  • Job Scheduling

    • Use Smart Card authentication for job submission

    • Guarantee availability of computing resources for different user groups.

    • Over-subscribe or under-subscribe core or socket counts on cluster nodes

    • Create a job submission page that lets users submit a job from the HPC Web Portal

    • Specify a custom job filter at the job template level

Integration with Windows Azure

The scenarios in this section help you try new Windows Azure integration features in HPC Pack 2008 R2 SP 2.

For information about deploying Windows Azure Virtual Machine nodes, see Steps for Deploying Windows Azure VM Nodes. For information about deploying Windows Azure worker roles, see Steps for Deploying Windows Azure Worker Nodes in a Windows HPC Server Cluster.

Run an MPI job on Azure Nodes

Scenario

You have MPI jobs that you would like to try running on Azure Nodes. You periodically have increases in the number of MPI jobs that you have to run, or you have long wait times in the job queue. You want to test the possibility of adding Azure nodes to the cluster to handle the extra workload.

Goal

As an example of how to run an MPI job on Azure Nodes, this guide walks through the steps to run Linpack on a set of Azure Nodes that are deployed with the Worker Node template.

  • Upload Linpack files to the Azure Nodes.

  • Run the Linkpack program as an HPC job.

  • Retrieve the output file from the Azure Node.

Note
This guide is meant to introduce some of the steps and tools for deploying and running MPI applications on Azure Nodes, so all of the steps for uploading the application and configuring the firewall and netmask are performed manually in the procedures below. After you have verified application deployment steps manually, many of these steps can be defined as part of the Azure Node provisioning process, so that the application can be automatically available on new Azure Node instances. For more information, see Appendix 2: Configure a Startup Script for Azure Nodes.

Requirements

  • A cluster with HPC Pack 2008 R2 SP2 installed.

  • Administrator permissions on the cluster.

  • Four extra-large Azure worker node instances added to the cluster (this results in 32 cores in Azure).

For detailed requirements and steps for adding Azure worker nodes to your cluster, see Steps for Deploying Windows Azure Worker Nodes in a Windows HPC Server Cluster.

Important considerations

  • Azure worker nodes cannot access on-premises nodes, shares, and license servers without additional setup.

  • Local storage on Azure Nodes is not persistent. When the nodes instances are stopped and then restarted on a different hardware node, the data stored in local storage does not follow the node instance.

  • Applications deployed to Windows Azure are subject to the licensing terms associated with the application.

  • MPI jobs that are not particularly latency and bandwidth sensitive are more likely to scale well in the Azure environment. Latency and bandwidth sensitive MPI jobs can perform well as small jobs, where a single task runs on no more than a few nodes. For example, in the case of an engineering simulation, you can run many small jobs to explore and define the parametric space before increasing the model size.

  • You must register each MPI application with the firewall on the Azure Nodes. This allows MPI communications to take place on a port that is assigned dynamically by the firewall.

  • When you run MPI jobs on Azure Nodes, you must ensure that the IP addresses of the Azure Nodes are within the range of accepted IP addresses that is specified for the MPI network mask. The cluster-wide range is defined through the CCP_MPI_NETMASK cluster environment variable, and the value that is specified in this cluster variable is automatically set as a system environment variable on all cluster nodes. Depending on your requirements, there are several ways that you can configure the network mask. You can disable the netmask on the cluster, broaden the range to include your Azure Nodes, or override this value at the node level or at the job level. For example, you can reset the value of CCP_MPI_NETMASK on only your Azure Nodes, or you set it at the job level by including –env MPICH_NETMASK <range> in the mpiexec command arguments.

  • MPI jobs cannot run across Azure Nodes that are deployed using different node templates. For example, if you have one set of Azure Nodes that are deployed with a worker role template, and one set of Azure Nodes that are deployed with a virtual machine role template, the MPI job must run on one set or the other.

  • When you add Azure Nodes to your cluster and bring them Online, the HPC Job Scheduler Service will immediately try to start jobs on the nodes. If only a portion of your workload can run on Azure, ensure that you update or create job templates to define what job types can run on Azure. For example, to ensure that jobs submitted with a template only run on on-premises compute nodes, you can add the Node Groups property to the job template and select Compute Nodes as the required value.

Steps

In the following procedure, we upload the Lizard files and an input file to a set of Azure Nodes that are already deployed. The Lizard utility helps optimize Linpack input on the HPC cluster, and includes the Linpack binaries that we will use for this example. We will run the Linpack command directly, rather than using the Lizard, so that we can run it as a job on the Azure Nodes.

Upload Linpack files and input file to Azure Nodes

  1. Download the Lizard utility that is included in the Microsoft HPC Pack 2008 R2 Tool Pack (lizard_x86.msi).

  2. Run the lizard_x86.msi installer on the head node, and change the default installation path to C:\Lizard\.

  3. Prepare the Linpack input file.

    Note
    This example uses a small problem size (15000 Ns) to speed up the run time. This will not provide a representative measure of cluster performance.  If you are using fewer than 32 cores, change the number of Qs to match the number of cores. For more information about these parameters, see Lizard Help.

    Open Notepad and copy the following text into a new file:

    HPLinpack benchmark input file
    Innovative Computing Laboratory, University of Tennessee
    HPL.out      output file name (if any)
    6            device out (6=stdout,7=stderr,file)
    1            # of problems sizes (N)
    15000 Ns
    1            # of NBs
    160 NBs
    0            PMAP process mapping (0=Row-,1=Column-major)
    1            # of process grids (P x Q)  (1xN works best on Eth hub)
    1  Ps
    32 Qs
    16.00         threshold
    1            # of panel fact
    2 PFACTs (0=left, 1=Crout, 2=Right)
    1            # of recursive stopping criterium
    4 NBMINs (>= 1)
    1            # of panels in recursion
    2            NDIVs
    1            # of recursive panel fact.
    2            RFACTs (0=left, 1=Crout, 2=Right)
    1            # of broadcast
    1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
    1            # of lookahead depth
    0            DEPTHs (>=0)
    2            SWAP (0=bin-exch,1=long,2=mix)
    160            swapping threshold
    0            L1 in (0=transposed,1=no-transposed) form
    0            U  in (0=transposed,1=no-transposed) form
    0            Equilibration (0=no,1=yes)
    16            memory alignment in double (> 0)
  4. Save the file into the Lizard folder as hpl.dat.

    Important
    The hpl.dat input file must be in the same folder as xhplmkl.exe.
  5. Create a folder named C:\AzurePkgs.

  6. Package the Lizard files for upload to Azure storage using hpcPack and save the package to the C:\AzurePkgs folder. At an elevated command prompt window type the following command:

    hpcpack create c:\AzurePkgs\Lizard.zip c:\Lizard

    Note
    You do not need an elevated command prompt (run as Administrator) to run hpcpack create, but the hpcpack upload and clusrun commands in upcoming steps do require elevation.
  7. Upload the package to Azure storage by using the following command, where myHeadNode is the name of your head node, and myAzureTemplate is the name of the template that you used to deploy the Azure Nodes:

    hpcpack upload Lizard.zip /scheduler:myHeadNode /nodetemplate:myAzureTemplate /relativepath:Lizard

    The upload might take a couple of minutes to complete.

  8. Use hpcSync to copy the Lizard files from Azure storage to the Azure Nodes. The files will be extracted to %CCP_PACKAGE_ROOT%\Lizard. Run the following command:

    clusrun /nodegroup:AzureNodes hpcsync

  9. Use hpcfwutil to register the Linpack application with the firewall on the Azure Nodes. Run the following command:

    clusrun /nodegroup:AzureNodes hpcfwutil register lizard """%CCP_PACKAGE_ROOT%\lizard\xhplmkl.exe"""

  10. Use one of the following methods to ensure that the netmask will allow communication between your MPI processes:

    • You can disable the netmask (allow all IP addresses) on your Azure Nodes:

      clusrun /nodegroup:AzureNodes setx CCP_MPI_NETMASK=”0.0.0.0/0.0.0.0”

    • You can specify the desired range (or disable) at the job level by using the MPI environment variable MPICH_NETMASK. For example, if your Azure Nodes start with 10.28.x.x:

      Job submit /nodegroup:azurenodes /numcores:32 /stdout: %CCP_PACKAGE_ROOT%\lizard\out.txt /workdir: %CCP_PACKAGE_ROOT%\lizard mpiexec –env MPICH_NETMASK 10.28.0.0/255.255.0.0 xhplmkl.exe

  11. Use job submit to submit the Linkpack job. Run the following command (numcores should be the same as the number you specified for Q in the input file):

    Job submit /nodegroup:azurenodes /numcores:32 /stdout:%CCP_PACKAGE_ROOT%\lizard\out.txt /workdir:%CCP_PACKAGE_ROOT%\lizard mpiexec xhplmkl.exe

  12. When the job is complete, collect the output file. In this case, only the Rank 0 process writes to the standard out file, so you only need to collect one file. To identify which node has the output file, run the following command:

    Clusrun /nodegroup:azurenodes dir %CCP_PACKAGE_ROOT%\lizard\out.txt

  13. Use HpcFile to download the file from the Azure Node to the head node (where you are running the command from). Run the following command, where AzureCN-000x is the node with the out.txt file, and c:\users\myName\ is the destination folder on the head node:

    Hpcfile get /targetnode:AzureCN-000x /file:%CCP_PACKAGE_ROOT%\lizard\out.txt /destFile:c:\users\myName\

  14. Open out.txt in Notepad.

Expected results

The output file will include text like the following, which lists the number of floating point operations per second that Linpack measured during the test. Because the problem size was intentionally set to a small number for demonstration purposes, this is not a representative measurement of cluster performance. In the example below, the measured performance was 3.498e+001 Gflops.

Related Resources

^ Top of page

Offload Excel workbooks to Azure Nodes

Scenario

You have Microsoft Excel workbooks that can run in parallel on a Windows HPC cluster by using the HPC Macro framework and the Windows HPC Server 2008 R2: HPC Services for Excel. You periodically have sharp increases in the amount of workbook offloading jobs that you need to run, and you want to test the possibility of adding Azure Nodes to the cluster to handle the extra workload.

Goal

  • Prepare a VHD and node template for Azure Nodes that have Microsoft Excel installed.

  • Upload an Excel workbook to the Azure Nodes.

  • Run the workbook.

Requirements

  • A cluster with the Enterprise Edition of HPC Pack 2008 R2 SP2 installed (this includes the HPC Services for Excel).

  • A WCF Broker Node configured.

  • A Windows Azure subscription.

  • Participation in the Azure Beta program (Azure Virtual Machine roles are a pre-release feature of Windows Azure).

  • The head node configured to be able to communicate with Windows Azure, for detailed information, see the Requirements for Windows Azure Nodes in Windows HPC Server 2008 R2.

  • A cluster-enabled Excel workbook that uses the HPC Macro framework to integrate with HPC Services for Excel.

  • Administrator permissions on the cluster.

Important considerations

  • When you run a workbook offloading job on Azure Nodes, the workbook and any dependencies must be copied to each Azure node.

  • Workbook offloading jobs can run on Azure VM roles that are added to the cluster, not on Azure worker roles. UDF offloading jobs can run on Azure VM nodes and on Azure Worker nodes and do not require that you install Excel on the nodes, for more information, see Upload an XLL file to a Windows Azure storage account.

  • Azure nodes cannot access on-premises nodes or shares directly. Workbooks that use external data or shares might require additional development and planning before they can run on Azure Nodes.

  • The Microsoft.Hpc.Scheduler and Microsoft.Hpc.Scheduler.Properties COM APIs are not registered on Azure VMs and therefore workbooks that reference these APIs will fail to resolve the reference.

  • The ExcelClient COM API cannot be used directly on Azure VMs because of limited access back to the on-premise scheduler. The COM APIs will correctly resolve to workbooks referencing them to run correctly, but should not be used from within HPC_Execute.

  • Applications deployed to Windows Azure are subject to the licensing terms associated with the application.

Steps

The following steps outline how to prepare a VHD and a node template for Azure virtual machine nodes that already have Microsoft Excel installed. For detailed requirements and steps for adding Azure VM nodes to your cluster, see Steps for Deploying Windows Azure VM Nodes.

Prepare a VHD and node template for Azure Nodes that have Microsoft Excel installed

  1. Follow the instructions in Step 4: Create a VHD for Azure VM Nodes in a Windows HPC Cluster to prepare a VHD operating system image, and for the “Install and configure your applications” step, install Microsoft Excel 2010 and any updates to Excel that you require. Do not copy workbooks or any small dependencies that are likely to change frequently directly to the VHD image.

    Note
    The cluster-side features of HPC Services for Excel are automatically included when you install the Windows HPC Server components on the VHD.
  2. Upload the VHD to your Azure account following the instructions in Step 5: Upload a VHD to Azure from a Windows HPC Cluster.

  3. Create an Azure Node template for VM roles. In HPC Cluster Manager, click Configuration, and then click Node Templates.

  4. Click New, and in the Create Node Template Wizard, select the following options:

    1. In Choose Node Template Type, select Windows Azure node template.

    2. In Specify Template Name, type AzureVMExcel.

    3. Provide you subscription and service information.

    4. In Specify Node Role, select VM role.

    5. In the VHD image dropdown list, select the VHD image that you just created.

    6. Configure the remote desktop credentials. This enables you to use the Remote Desktop action in HPC Cluster Manager to connect to the Azure Nodes.

    7. Select the option to configure Azure availability policy manually.

The following steps describe how to stage an Excel workbook to Azure Storage. Packages that are staged to Azure storage using hpcPack are automatically copied to any new Azure Node instances that you provision (or that are automatically reprovisioned by the Windows Azure system).

Stage an Excel workbook to Azure storage

  1. On the head node, create a folder named C:\AzurePkgs.

  2. Package the workbook for upload to Azure storage by using hpcPack and save the package to the C:\AzurePkgs folder. At an elevated command prompt window type the following command, where “c:\Excel\myWorkbook.xlsb” points to your workbook:

    hpcpack create C:\AzurePkgs\myWorkbook.zip c:\Excel\myWorkbook.xlsb

    Important
    The package name must be the same as the workbook name. If your workbook has dependencies such as other workbooks or DLLs, create a folder that includes the workbook and supporting files, and then package the entire folder. For example: hpcpack create C:\AzurePkgs\myWorkbook.zip c:\Excel\myWorkbookFiles. The workbook must be at the top level of the folder that you are packaging and cannot be contained in a sub-folder. In the example, the workbook would be in “c:\Excel\myWorkbookFiles\myWorkbook.xlsb”.
    Note
    You do not need an elevated command prompt (run as Administrator) to run hpcpack create, but the hpcpack upload command in an upcoming step does require elevation.
  3. Upload the package to Azure storage, for example type:

    hpcpack upload myWorkbook.zip /scheduler:myHeadNode /nodetemplate:AzureVMExcel

Note
If you upload packages to storage after the nodes are started, you can use hpcSync to manually copy the files to the nodes. For example: clusrun /nodegroup:AzureNodes hpcsync

The following procedure describes how to start new Azure Node instances. These instances will include Microsoft Excel, the HPC Services for Excel cluster-side features, and any workbooks that you have staged to Azure Storage.

Add Azure Nodes to the cluster

  1. In HPC Cluster Manager, in Node Management click, Add Node and use the wizard to specify the VM node template, the number, and the size of the nodes to add. The nodes will appear in the node list in the Not-Deployed state.

  2. Select the Azure nodes, right-click, and then click Start. This action deploys a set of VM role instances in Windows Azure, and can take some time to complete.

  3. When the nodes move from the Provisioning state to the Offline state, right-click the nodes and then click Bring Online.

You are now ready to submit the offloading job to the cluster. There should be no difference from the point of view of the cluster user that is running the workbook.

Note
When you add Azure Nodes to your cluster and bring them Online, the HPC Job Scheduler Service will immediately try to start jobs on the nodes. If only a portion of your workload can run on Azure, ensure that you update or create job templates to define what job types can run on Azure. For example, to ensure that jobs submitted with a template only run on on-premises compute nodes, you can add the Node Groups property to the job template and select Compute Nodes as the required value.

Expected results

Workbook offloading jobs (using the HPC macro framework and HPC Service for Excel) can run on Azure Nodes with no change for the end user.

Related Resources

^ Top of page

Job Scheduling

The scenarios in this section help you try new job scheduling features in HPC Pack 2008 R2 SP 2.

Use Smart Card authentication for job submission

Scenario

Your organization uses Smart Card authentication. You want users to be able to use their Smart Card credentials to submit jobs to the HPC cluster. Some of these jobs are long running, and you don’t want jobs to fail because of ticket time outs.

Because Smart Card users do not have passwords, you want them to be able to use their Smart Card to generate a Soft Card certificate that can be used as credentials on the cluster. If the Soft Card credentials are nearing their expiration date, you want users to generate a new Soft Card before submitting new jobs.

Goal

Configure the HPC Soft Card authentication policy on the cluster and identify a certificate template that must be used when generating an HPC Soft Card for the cluster. Set the Soft Card expiration warning period.

Generate an HPC Soft Card credential and submit a job.

Requirements

  • A cluster with HPC Pack 2008 R2 SP2 installed.

  • The head node and compute must have a version of the Windows Server 2008 R2 operating system. The key storage provider (KSP) for HPC soft cards is not supported on Windows Server 2008.

  • Administrator permissions on the cluster.

  • The Active Directory and Active Directory doMayn controllers must be configured for Smart Card authentication.

  • The certificate template that will be used to generate HPC Soft Card credentials must allow the private key to be exported.

Note
To generate HPC soft card credentials on a client computer, the client computer must have the Windows Vista® or Windows® 7 operating system installed.

Important considerations

  • You can use HPC Soft Card credentials to submit jobs, run SOA sessions, and run diagnostic tests.

  • If you are using HPC Soft Card credentials, you cannot run jobs as a different user.

  • HPC Soft Card authentication is not supported for cluster deployment operations (for example, unattended installs and joining compute nodes to the doMayn).

Steps

Before enabling HPC Soft Card authentication on the cluster, work with your certification authority (CA) or PKI administrator to choose or create a certificate template that should be used when generating a soft card for the cluster. The certificate template must allow the private key to be exported. Ensure that the validity period in the template is long enough to accommodate the job lifecycle. Optionally, the template can also have an associated access control list that defines who can use the certificate template.

Note
The CA role service includes several default certificate templates. The CA administrator can create an HPC soft card template by copying and then modifying the default Smart Card Logon template as follows:
  1. In Application Policies, remove smart card.

  2. In Request Handling, select “Allow private key to be exported”.

  3. In Security, specify the users who can enroll (optional).

  4. Ensure that the validity period in the template is long enough to accommodate the job lifecycle.

For more information about installing and managing the CA role service, see Active Directory Certificate Services. For more information about creating certificate templates, see Creating Certificate Templates.

Configure Soft Card authentication on the cluster

  1. Install the key storage provider (KSP) on the head node, compute nodes, and workstation nodes. The installer is included in the SP2 download. Run the version that is appropriate for the operating system on each computer: HpcKsp64.msi or HpcKsp86.msi.

    You can copy the installers to a shared folder that all on-premises nodes can access and then use the clusrun command to install the KSP on all nodes. For example you can copy the installers to the ccpspooldir share on the head node (\\<headnode>\ccpspooldir) and then run the following command (for 64-bit computers):

    clusrun msiexec  /passive /I  \\<headnode>\ccpspooldir\hpcksp_x64.msi

  2. Set the HPC Soft Card authentication policy on the head node by setting the HpcSoftCard cluster property. HpcSoftCard property is set to Disabled by default. If you want users to always use soft card authentication, set the property to Required. If you want users to choose between password or soft card log on, set the property to Allowed.

    For example, run HPC PowerShell as an Administrator and type:

    Set-HpcClusterProperty –HpcSoftCard:Allowed

    Or at an elevated command prompt window, type:

    cluscfg setparams “hpcSoftCard=Allowed”

  3. Set the HpcSoftCardTemplate cluster property to specify the certificate template that should be used to generate a soft card credential.

    For example, run HPC PowerShell as an Administrator and type:

    Set-HpcClusterProperty –HpcSoftCardTemplate:

    Or at an elevated command prompt window, type:

    cluscfg setparams “hpcSoftCardTemplate=”

  4. You can configure the warning period for soft cards that are nearing their expiration date. By default, this value is set to 5 days. If a user tries to submit a job with less than 5 days before their credentials expire, the job will be rejected. The user will see an error message about the soft card expiration, and will need to generate a new soft card certificate before resubmitting the job. You can configure this value by setting the SoftCardExpirationWarning cluster property.

    For example, run HPC PowerShell as an Administrator and type:

    Set-HpcClusterProperty –SoftCardExpirationWarning:3

    Or at an elevated command prompt window, type:

    cluscfg setparams “SoftCardExpirationWarning=3”

    Note
    To disable expiration warnings, you can set SoftCardExpirationWarning to 0.

The following procedure describes how a cluster user can generate an HPC Soft Card credential. You can use HPC PowerShell or a command prompt window. The commands are used to generate a public key pair and obtain the certificate from the CA that is configured for your Active Directory doMayn. The certificate is based on the template that is specified by the HpcSoftCardTemplate cluster property. The certificate is placed in your personal certificate store on your computer.

Note
The computer that you log on to must have the HPC Pack 2008 R2 SP2 client utilities installed.

Generate a new HPC Soft Card certificate

  1. Log on to the computer with your Smart Card.

  2. Use one of the following methods to generate an HPC Soft Card.

    Run HPC PowerShell and type:

    New-HpcSoftCard

    Or at a command prompt window, type:

    hpccred createcert

Submit a job using the HPC Soft Card credentials

  1. Use one of the following methods to delete any previously cached credentials, if any:

    Run HPC PowerShell and type:

    remove-hpcJobCredential

    Or at a command prompt window, type:

    hpccred delcreds

  2. Submit a test job, for example:

    Run HPC PowerShell and type:

    New-hpcJob|add-hpcTask –command:”echo hello”|submit-hpcjob

    Or at a command prompt window, type:

    Job submit echo hello

  3. When prompted, select which credentials to use.

You can cache credentials on the cluster for jobs, diagnostics, or SOA sessions. To cache an HPC Soft Card, the certificate must be in the user’s personal store on the local computer (from which you are running the command to set credentials). The certificate along with the corresponding key pair will be encrypted and transmitted to the HPC Job Scheduler Service. If an HPC Soft Card for that user is already cached, it will be replaced.

You can use the following commands to manage your HPC Soft Card credentials, or to set SOA or diagnostic credentials:

 

Task HPC PowerShell Command prompt window

Get your HPC Soft Card credential

get-hpcJobCredential

Hpccred getcreds

Delete your cached credentials

remove-hpcJobCredential

Hpccred delcreds

Cache your HPC Soft Card on the cluster (jobs)

set-hpcJobCredential -softcard

Hpccred setcreds -softcard

Cache your HPC Soft Card on the cluster (SOA)

set-hpcSoaCredential -softcard

not available

Cache your HPC Soft Card on the cluster (diagnostics)

set-hpcTestCredential -softcard

test setcreds -softcard

Expected results

  • You can use the HPC Soft Card to submit a job. If Soft Cards are allowed (not required), you will be prompted to select an authentication method. If you use HPC Soft Card authentication, the soft card that you created will be used automatically. If there is more than one certificate in your certificate store, you will be prompted to choose from a list of available certificates.

  • If your HPC Soft Card is within SoftCardExpirationWarning days of expiring, you will be prompted to create a new HPC Soft Card before submitting the job.

Related Resources

^ Top of page

Guarantee availability of computing resources for different user groups.

Scenario

Various user groups in your organization have contributed to the cluster budget, and in return they expect to have a determined portion of the cluster at their disposal. If at any given time a group has a light workload and does not utilize their entire share of the cluster, you want those resources temporarily made available to other groups. So to guarantee availability and maximize cluster utilization, you want the HPC Job Scheduler Service to allocate resources based on Resource Pools.

Goal

Create Resource Pools to define guaranteed cluster proportions. Create Job Templates to associate each user group or job type with a Resource Pool. Configure the HPC Job Scheduler Service to allocate resources based on Resource Pools.

Requirements

  • A cluster with HPC Pack 2008 R2 SP2 installed.

  • Administrator permissions on the cluster.

Important considerations

  • Resource pool definitions

    Weight: An integer between 0 and 999,999 that represents the proportion of cluster cores that should be guaranteed to the pool.

    Guaranteed cores: The number of cores that correspond to the weight defined for the pool. The number of guaranteed cores will vary according to how many nodes are Online and reachable at any given time. The number of guaranteed cores is calculated as (poolWeight/totalWeights)*NumberOfCoresOnline.

    Allocated cores: The number of cores that are actually being used by jobs that are submitted to the pool. This number can be higher or lower than the number of guaranteed cores.

  • A pool with a weight of 0 has no guaranteed cores, but can have allocated cores if there are jobs that are submitted to the pool, and the other pools are not using all of their resources.

  • The Default Pool cannot be deleted. When Resource Pools are enabled in the HPC Job Scheduler Service, any jobs that do not specify a pool will use the Default Pool. Unlike custom pools, specifying the Default Pool does not provide any guarantee of resources. You can set the weight of the Default Pool to 0.

  • When the Job Scheduler calculates the number of cores for each Resource Pool (according to pool weight), the resulting value is rounded down to the nearest whole number. The reMaynder cores are added to the Default Pool.

  • Resource Pools and node groups provide distinct ways to allocate cluster resources to a job, and they are not intended to be used together. If you add both specific node groups and Resource Pools to a job template, the Job Scheduler will restrict access to cluster resources based on both properties independently.

Steps

In this example, let’s say you have two user groups, and each group expects to be able to use the following proportions of the cluster at any given time: Group A 60%, and Group B 40%. Let’s also say that Group A has two distinct types of jobs for which they want separate job templates: one type is high priority, and the other type is low priority. To enforce the desired scheduling policies, you create three node templates: “GroupA_HighPriJobs”, “GroupA_LowPriJobs”, and “GroupB_AllJobs”.

1. Define Resource Pools

  1. In HPC Cluster Manager, click Configuration.

  2. In the Navigation Pane, click Resource Pools.

  3. In Actions, click Edit Pools and Weights. The dialog box appears.

  4. In the dialog box, click Add two times. Two new rows appear in the list of pools.

  5. In the Pool Name column, rename the pools PoolA and PoolB.

  6. In the Weight column, type the desired weights for each group (60, 40).

  7. Set the weight for the Default pool to 0.

  8. Click OK to save and close the dialog box.

2. Create Job Templates

  1. In the Navigation Pane, click Job Templates.

  2. In Actions, click New to open the Job Template wizard and then define the template as follows:

    • Name: GroupA_HighPriJobs

    • Maximum priority: Highest

    • Default priority: Highest

  3. Open the job template “GroupA_HighPriJobs” in the Job Template Editor and define the Pool and the user permissions as follows:

    • Add the Pool property to the template, and for the Valid Value, select “PoolA”.

    • Click Permissions, and ensure that only users in Group A have permission to submit jobs with that template.

    • Save the changes.

  4. Create a job template for “GroupA_LowPriJobs” as follows:

    • In the Job Template list, right-click “GroupA_HighPriJobs” and then click Copy.

    • Right-click the copy, click Edit, and then define the template as follows:

    • Name: GroupA_LowPriJobs

    • Maximum priority: Normal

    • Default priority: BelowNormal

    • Because this template is based on the “GroupA_HighPriJobs”, the permissions and Pool are already set correctly.

  5. Create a new job template for “GroupB_AllJobs” with the following properties:

    • Name: GroupB_AllJobs

    • Maximum priority: Highest

    • Default priority: Normal

    • Add the Pool property to the template, and for the Valid Value, select “PoolB”.

    • Click Permissions, and ensure that only users in Group B have permission to submit jobs with that template.

3. Configure the HPC Job Scheduler Service to use Resource Pools

  1. In HPC Cluster Manager, click Options, and then click Job Scheduler Configuration.

  2. Select the Resource Pools tab.

  3. Select the Enable Resource Pools check box.

  4. Click OK.

Expected results

  • All jobs that are assigned to a particular resource Pool will collectively be guaranteed the proportion of cluster cores that are defined for the Resource Pool, and will be scheduled within the pool according to job priority, submit time, and scheduling mode (Queued or Balanced). For example, jobs that are submitted using the job templates “GroupA_HighPriJobs” and “GroupA_LowPriJobs” will collectively be guaranteed 60% of the Online cluster cores.

  • If both groups have jobs in the queue, the cluster will be shared according to the resource pool weights.

  • If one group has no jobs in the queue, or not enough jobs to keep their share of the cluster busy, the other group can temporarily make use of the resources.

Related Resources

^ Top of page

Over-subscribe or under-subscribe core or socket counts on cluster nodes

Scenario

Cluster administrators can fine tune cluster performance by controlling how many HPC tasks should run on a particular node. Over-subscription provides the ability to schedule more processes on a node than there are physical cores or sockets. A process could be an MPI rank, a single-core task, a sub-task, or an instance of a SOA service. For example, if a node has eight cores, then normally eight processes could potentially run on that node. With over-subscription, you can set the subscribedCores node property to a higher number, for example 16, and the HPC Job Scheduler Service could potentially start 16 processes on that node. Conversely, under-subscription provides the ability to schedule fewer processes on a node than there are physical cores or sockets.

For example, this can be useful in the following scenarios:

  • Part of the cluster workload consists of coordinator tasks that use very few compute cycles. An MPI code can have a master process that does not run very much, but distributes work to the other processes. To improve utilization, you can oversubscribe a node, and ensure that your Rank0 process starts on that node (see MPI Rank0 placement script)

  • Your MPI code needs more memory bandwidth than the processor can support if all cores are running. To improve performance, you can undersubscribe the node so that only the desired number of processes can run on that node.

  • You only want to use a subset of cores or sockets on a particular node for running cluster jobs, so you undersubscribe the node. For example, if you enable the compute node role on a broker node or head node, you can essentially limit the number of cores that are used for the compute node role by undersubscribing the node.

Goal

Set the number of subscribed cores and sockets on a node.

Submit a job.

Requirements

  • A cluster with HPC Pack 2008 R2 SP2 installed.

  • Administrator permissions on the cluster.

Important considerations

Node property definitions:

  • subscribedCores

    Specifies the number of cores that you want the HPC Job Scheduler Service to use when it is allocating tasks to the node. It can be larger or smaller than the number of physical cores. To clear this property, set the value to $null.

    Ensure that the number of subscribed cores is divisible by the number of subscribed sockets (or of physical sockets is no value is set for subscribedSockets). That is to say, each socket must have the same number of cores (for example 8 cores and 4 sockets is ok, but 10 cores and 4 sockets is not).

  • subscribedSockets

    Specifies the number of sockets that the HPC Job Scheduler Service should use when it is allocating tasks to the node. It can be larger or smaller than the number of physical sockets. To clear this property, set the value to $null.

    Ensure that the number of subscribed cores (or of physical cores, if no value is set for subscribedCores) is divisible by the number of subscribed sockets. That is to say, each socket must have the same number of cores (for example 8 cores and 4 sockets is ok, but 10 cores and 4 sockets is not).

  • affinity

    Specifies how affinity is managed for tasks that run on the node. By default, the value is null, which means that affinity is managed according to the job scheduler affinity policy. For more information about the job scheduler affinity policy, see Understanding Affinity. If this property is set, node affinity overrides the job scheduler affinity settings. If it is set to false, affinity on the node is not managed by the HPC services, and the operating system or the application manages placement of tasks on physical cores. If it is set to true, the HPC Node Manager Service sets affinity for tasks (assigns tasks to specific cores).

Note
These properties can only be set on nodes that are in the Offline node state.

Steps

As an example, let’s say we have a node named CN001 that has 4 cores and 1 socket. We want to set the subscribed cores and sockets to 8 and 2.

To over-subscribe cores and sockets on a node

  1. Run HPC PowerShell as an Administrator.

  2. List node names, core, and socket counts:

    get-hpcnode|select netbiosname, processorcores, sockets, subscribedcores, subscribedsockets

  3. Take CN001 Offline (if you want to cancel any tasks that are running on the node, include the –force parameter):

    set-hpcnodestate –name:CN001 –state:offline

  4. Set subscribed cores and sockets on the node:

    set-hpcnode –name:CN001 –subscribedcores:8 –subscribedsockets:2

  5. Verify the property changes:

    get-hpcnode –name:CN001|select netbiosname, processorcores, sockets, subscribedcores, subscribedsockets

    Note
    Node properties changes are applied during scheduling passes. If your changes are not reflected when you run get-hpcnode, wait a few seconds and then try again.
  6. Bring the node Online:

    set-hpcnodestate –name:CN001 –state:online

Now you can submit a job to verify that the HPC Job Scheduler will now start up to 8 tasks (or sub-tasks) on CN001. For example, in HPC PowerShell, to submit a parametric sweep job that requires 8 cores and requests CN001:

New-hpcjob –numcores:8|add-hpctask –type:parametricsweep –requirednodes:CN001 –command:”echo *”|submit-hpcjob

You can use one of the following methods to verify core allocation of a completed job:

  • In HPC Cluster Manager, in Job Management, double-click the job you just submitted. In the View Job dialog box, click Allocated Nodes.

  • At a command prompt window, type the following (replace “4” with the ID of your job):

    Job view 4 -detailed

To create a customized node view to monitor settings for subscribed cores and sockets

  1. In HPC Cluster Manager, click Node Management.

  2. Click the blank tab to create a new node view, and then click Customize Tab.

  3. In the Customize Tab dialog box, select the following columns:

    • CPU Usage (%)

    • Cores In Use (Note: this reflects physical cores, not subscribed cores)

    • Subscribed Cores

    • Subscribed Sockets

    • Affinity

    • Running Tasks

    • Cores

    • Sockets

  4. You can sort the list by the value of Subscribed Cores by clicking the Subscribed Cores column header in the list view.

  5. To see the running jobs for a node, select the node in the node list, and then in the Actions Pane, click Jobs for the Selected Nodes.

Expected results

The HPC Job Scheduler Service will schedule processes based on the subscribed core and socket counts for a node.

Related Resources

^ Top of page

Create a job submission page that lets users submit a job from the HPC Web Portal

Scenario

You want your cluster users to be able to submit and monitor jobs from a web portal.

Goal

  • Launch the HPC Web Portal.

  • Create an Application Profile.

  • Create a Job Submission page.

  • Submit a job through the portal.

Requirements

  • A cluster with HPC Pack 2008 R2 SP2 installed.

  • Administrator permissions on the cluster.

  • The installation file HpcWebComponents.msi. HpcWebComponents.msi is included in the HPC2008R2SP2-Update-x64.zip file available at the Microsoft Download Center, or you can locate the file on the full installation media for HPC Pack 2008 R2 with SP2 or later.

Important considerations

  • When the portal is set up, cluster users will be able to submit and monitor jobs from the https://<headnode>/hpcportal site. Users will need to be logged on to their computer or launch Internet Explorer with their doMayn credentials.

  • You can use a job template as a basis for more than one submission page.

  • Job submission pages allow the cluster administrator to provide default values for tasks and task properties.

  • Job submission pages allow the cluster administrator to provide a simplified submission experience, but do not provide actual constraints on job or task properties. The submission page can specify constraints on values that are allowed when creating the job, but value limitations that are defined in the web portal are not enforced by the HPC Job Scheduler Service. After the job is submitted, the job owner will be able to see the values for all properties, and modify values within the restrictions defined by the underlying job template, as permitted by the job state. For information about what properties can be modified in different job states, see Modify a Job.

  • Restrictions that are specified by the job template cannot be overridden by the submission page. But the default values that are specified in the submission page do override the default values that are specified in the job template.

Steps

The following procedure describes how to set up the web portal. For detailed steps and other options, see Installing the Microsoft HPC Pack Web Components.

To set up the web portal

  1. On the head node, run HpcWebComponents.msi.

  2. Run HPC PowerShell as an Administrator, and then type:

    Powershell.exe –executionpolicy bypass –command set-hpcwebcomponents portal -enable

  3. Select a certificate option from the displayed list. For testing purposes you can type 0 to generate and configure a self-signed certificate.

  4. When configuration and installation are complete, open Internet Explorer and add the web portal to the list of trusted sites as follows:

    1. On the Tools menu, click Internet Options.

    2. In the Security tab, select the Trusted sites zone, and then click Sites.

    3. Add https://localhost/hpcportal to the list of sites.

      Note
      The default security level for trusted sites (Medium) allows AJAX, which is required to view the web portal.
  5. Open Internet Explorer and go to the following address:

    https://localhost/hpcportal

    • If you see a certificate error warning, click “Continue to this website”.

    • If you are prompted, type your doMayn credentials.

To demonstrate how the application profile works, the following procedure describes how to create a profile for the ping command with the parameter –n. So for example, a user could run the following command ping localhost –n 2

Create an Application Profile for the ping command

  1. In the Windows HPC Server Web Portal, click Application Profiles.

  2. Click New.

  3. Define the computer name parameter for the ping command by specifying the following properties:

    • Type: Text

    • Name: computer

    • Default value: localhost

    • Format: {1}

  4. Click Add to specify a second parameter.

  5. Define the count parameter for the ping command by specifying the following properties:

    • Type: Numeric

    • Name: count

    • Default value: 2

    • Switch: n

    • Format: -{0} {1}

  6. In Specify a command, type ping.

  7. Click Show preview and verify that the command appears as expected. In this example, the command appears as:

    ping localhost –n 2

  8. In Save this application profile as, type “PingCommand”, and then click Save.

The following screenshot illustrates this application profile definition:

To create a submission page for a parametric ping job

  1. In the Windows HPC Server Web Portal, click Submission pages, and then click New.

  2. In the name and job template page, specify the following properties:

    1. Name for submission page: “PingJob”

    2. Job template: “Default”

    3. Page type: “Parametric Sweep Job”

  3. In the job property visibility and defaults page, configure the properties as follows:

    1. Job Name: “myPingJob”.

    2. Project Name: Clear the Show check box for this property.

    3. Define the parametric sweep values: start=1, end=50, increment=1.

    4. Clear the Show check boxes for the eMayl, working directory, and standard in, out and error properties.

  4. On the next page, define Node Preparation and Release tasks as follows:

    1. Do not show the Node Preparation task options, and specify a default value of “echo hello”.

      Note
      If a property is not shown, the default value is applied, and the job owner cannot change the value from the job submission page. After the job is submitted, the job owner will be able to see the values for all properties, and modify values within the restrictions defined by the underlying job template, as permitted by the job state. For information about what properties can be modified in different job states, see Modify a Job.
    2. Select the check box to show the Node Release task options, and specify a default value of “echo goodbye”.

  5. On the Specify application profile page, select the option to use an existing profile, click Select, select “PingCommand” in the dialog box, and then click OK.

  6. On the next page, accept the default visibility settings for the PingCommand Application Profile.

  7. On the last page, click Finish.

To submit a job from the portal

  1. In the Windows HPC Server Web Portal, click New Job, and then click “PingJob”.

  2. In the PingJob submission page, under Application parameters, verify that the following default values appear:

    1. In “computer”, type “localhost”.

    2. In “count”, type “2”.

  3. In the Node Prep/Node Release Tasks section, notice that the Node Preparation task (echo hello) is not displayed, but you can modify the default command for the Node Release task (echo goodbye).

  4. Click Submit.

  5. In the job list, right-click “myPingJob”, click View Job, and then select the View Tasks tab.

Expected results

  • The sample job “myPingJob” ran a node preparation and a node release task, as defined in the submission page, on each allocated node.

  • The sample job ran the command ping localhost –n 2 50 times.

Related Resources

^ Top of page

Specify a custom job filter at the job template level

Scenario

You support a varied workload on your cluster, and you want to use custom filters to provide additional checks and controls on jobs that are submitted to your cluster. However, some of the properties and conditions that you want to check for only apply to certain job types, so rather than specifying a single filter for all jobs that are submitted to the cluster, you want to specify one or more filters that should run only on jobs that are submitted with a particular job template. For example, you can ensure that an activation filter that checks for license availability only runs on jobs that require a license.

Goal

To demonstrate how custom filters that are defined at the job template level work, this guide describes how to compile and install a sample submission filter from the SDK code samples. The submission filter checks for job property values and if conditions are met, reassigns the job to a different job template. If a job owner already selected a custom job template, we do not want to reassign the job, so we will run this filter only on jobs that are submitted with the Default job template.

  • Build the submission filter sample in the Microsoft HPC Pack 2008 R2 with SP2 SDK.

  • Add the submission filter to the Default job template.

  • Submit a test job.

Requirements

  • A cluster with HPC Pack 2008 R2 SP2 installed.

  • Administrator permissions on the cluster.

  • A development computer with the following software:

    • Microsoft Visual Studio

    • Microsoft HPC Pack 2008 R2 with SP2 SDK and code samples (both are available from the Microsoft Download Center.

Important considerations

  • Filters that are specified at the job template level must be defined as a DLL (and will run in the same process as the HPC Job Scheduler Service), rather than as an executable like the cluster-wide filters (which run in a separate process).

  • Job template filters can modify jobs and influence how the HPC Job Scheduler Service processes jobs in the same way as cluster-wide filters. For more information about how the job scheduler interprets filter exit codes, see Understanding Activation and Submission Filters [Help link?] [HPCv3Administrator].

  • When a job is submitted or ready for activation, the job-template filters will run in the order listed in the template, and will run before the cluster-wide filter.

  • HPC Server 2008 R2 has the .NET 3.5 framework installed by default. If you are compiling in Visual Studio 2010, you must select .NET 3.5 when compiling your filter DLL for the job scheduler (the default framework in Visual Studio 2010 is .NET 4.0). Even if you install .NET 4.0 on the cluster, the job scheduler is based on .NET 3.5, so any DLL that loads in the scheduler process must also be .NET 3.5.

Steps

To experiment with job template filters, you can build and try the sample submission filter that is included in the SP2 code samples. The SP2 code samples include a C# project file that you can compile in Visual Studio to produce a DLL file. You can then deploy this DLL file to the cluster and associate it with a job template.

To build the sample submission filter

  1. On your development computer, extract the HPC2008R2.SampleCode.zip file.

  2. In the sample code folder, open the SubmissionJobSize folder (in the HPC2008R2.SampleCode\Scheduler\Filters\dll folder).

  3. Open the SubmissionJobSize Visual C# Project file with Visual Studio.

  4. In the Build menu, click Build Solution to compile the sample, as illustrated in the following screen shot:

  5. Accept the default save location.

  6. The SubmissionJobSize folder now includes a \bin\Debug folder. The Release folder includes the SubmissionJobSize DLL (application extension file) and PDB (program debug database file).

To deploy the filter to the cluster

  1. On the head node, open the %CCP_DATA%Filters folder (typically, this is C:\Program Files\Microsoft HPC Pack 2008 R2\Data\Filters).

  2. Create a new sub-folder named “SubmissionJobSize”.

  3. Copy the SubmissionJobSize DLL and PDB files (from the SubmissionJobSize\bin\Debug folder) and place them in the new folder you just created.

    Note
    If your filters have more than one file, or if they create output files, it is good practice to create a sub-folder for each filter in the %CCP_DATA%Filters folder. The PDB file is not required, but you can include it to help when debugging the filter.

The sample submission filter checks the job XML file to see if the maximum number of requested resources for the job is greater than 1 (and not set to autocalculate). If so, the filter changes the value for the job template property and assigns a job template named “LargeJobTemplate” (to test this filter, you must first create a job template named “LargeJobTemplate”). We will add the filter to the Default job template. That way, any job that is submitted to the Default template will be checked for maximum requested resource settings, and if greater than 1, the job will be assigned to the new template.

To configure the job templates

  1. In HPC Cluster Manager, in Configuration, select Job Templates.

  2. Click New.

  3. In the wizard, type “LargeJobTemplate” for the template name, select the Finish tab, and then click Finish.

  4. Right-click the Default job template and then click Edit.

  5. Click Add, and then select SubmissionFilters.

  6. In Valid Values, specify the location of the filter relative to the %CCP_DATA%Filters folder. In this case, type SubmissionJobSize\SubmissionJobSize.dll, as illustrated in the following screen shot:

    Note
    You can specify more than one filter in the value field. List each filter on its own line (filters are delimited by a carriage return). Filters run in the listed order, and will run before the cluster-wide filter, if specified.
  7. Click Save.

To submit a test job

  1. In HPC Cluster Manager, click Job Management.

  2. Click New Job and define the job as follows:

    1. Job name: FilterTest

    2. Job template: Default

    3. Resource type: Core

    4. Maximum: 2

  3. Click Edit Tasks, click Add, and then select Basic Task.

  4. In command line, type echo hello, and then click OK.

  5. Click Submit.

  6. In the job list, select your “FilterTest” job, and then in the Details pane, click the Job Details tab to view the assigned job template.

Expected results

  • The job template value for the test job is now set to “LargeJobTemplate”.

  • The demo filter will only run on jobs that are submitted to the Default job template.

Related Resources

^ Top of page

Additional resources