Partager via


What's New in HPC Pack 2016 Update 3

This document lists the new features and changes that are available in Microsoft HPC Pack 2016 Update 3, comparing with HPC Pack 2016 update 2. You can download and install HPC Pack 2016 Update 3 in your on-premises computers, or use the deployment templates to create a cluster on Azure with HPC Pack 2016 Update 3.

Performance and reliability improvements

  • Improve scheduler task dispatching rate.
  • Improve SOA request dispatching rate when some service hosts idle time out.
  • Improve service reliability by fixing service crash and leak issues under certain circumstances.

Setup and Deployment

  • Support not to install rras/dhcp/wds components on head node slipstream installation with "-SkipComponent:rras,dhcp,wds" option.

  • Use new VM extension (instead of HPC compute node image) to deploy Azure IaaS Windows nodes with the following operating systems: Windows Server 2019, Windows Server 2016, Windows Server 2012 R2, Windows Server 2012, Windows Server 2008 R2 SP1.

  • Windows Server 2019 can now be specified as the operating system for Azure PaaS nodes.

  • User can specify one of the following .Net Framework version which will be installed in Azure PaaS nodes: NDP461(default value), NDP462, NDP47, NDP471, NDP472, NDP48. Below is an example to specify it with PowerShell command.

    Add-PSSnapin Microsoft.Hpc
    Set-HpcClusterRegistry Microsoft.Hpc.Azure.NetFxVersion NDP472
    

Management

  • Machine accounts can now be added in add user dialog in HPC Cluster Manager.

  • Enable environment variable value containing "=" in HPC Cluster Manager and HPC Job Manager.

  • Increase task command line length limit from 480 to 1024 in HPC Cluster Manager and HPC Job Manager.

SOA Runtime

  • Support PowerShell Export-HPCSoaTrace cmdlet.

  • Support auto grow nodes for SOA jobs by the estimated time for the queued requests to complete.

    A cluster property named SoaJobTimeToComplete (in minutes) besides current SoaJobGrowThreshold/SoaRequestsPerCore is introduced to help make the grow decision for SOA jobs. The growing number is the max value of the previous calculation by request number and the new calculation by job time to complete estimation. The new logic uses the simple formula to calculate the growing count: growingCount = CallDuration * OutstandingCalls / SoaJobTimeToComplete – RunningTasks. Note this estimation takes a few assumptions e.g. no prefetch, no concurrency, no faults, request time evenly distributed, and zero grow time. This implementation is expected to solve long running request issue in practice. Also note the default value of SoaJobTimeToComplete is 0, which means growing by request remaining time is not enabled.

Scheduler

  • Change the way of translating AAD(email format) username to domain format username. (Remove hash and use completed AAD name as username for Linux node. Include domain name in hash for Windows node).

  • Custom scheduler node sorter.

    To use the custom node sorter for node selection when scheduling jobs, first implement the Microsoft.Hpc.Scheduler.AddInFilter.HpcClient.INodeSorter interface defined in Microsoft.Hpc.Scheduler.dll as shown below, then rename the custom sorter dlls to 0.dll~63.dll (max 64 custom sorters) and copy them under the folder %CCP_DATA%NodeSorters on the head node, finally use #0~#63 for job's orderby property when submitting the job. E.g. job submit /orderby:#0 hostname.

    namespace Microsoft.Hpc.Scheduler.AddInFilter.HpcClient
    {
        public interface INodeSorter
        {
            // Sort nodes by comparing nodes according to node names
            // Return values:
            //    Less than zero - The first nodeX precedes the second nodeY in the sort order
            //    Zero	- The nodes occur in the same position in the sort order
            //    Greater than zero	- The first nodeX follows the second nodeY in the sort order
            int Compare(string nodeX, string nodeY);
        }
    }
    
  • Support gMSA (Group Managed Service Account).

    With this support, the cluster may have gMSA accounts setup for cluster users or admins. To setup gMSA accounts, please check the online docs. Basically it requires to add a KDS root key and create an ADServiceAccount, and install the ADServiceAccount on the nodes. To submit a job with gMSA account, just specify the pseudo password "GMSA". E.g. job submit /user:hpc\hpcgmsa$ /password:GMSA hostname. Note job owner who is submitting the job must use the same gMSA account.

  • Support docker task on Windows.

  • Support job template environments.

    Users can set environments in job templates now. The scope of job template environments is between cluster environments and job environments. They will override cluster environments and be overridden by job environments.

  • REST API Improvements.

    1. REST API can finish a Job/Task.
    2. REST API can "Call as another user" by setting HTTP header "x-ms-as-user".
    3. Server pushes job/task events by SignalR to http clients.
    4. REST API endpoint "/hpc/" now supports HTTP Basic Auth.
    5. Add filter "NodeGroup=<name>" to get jobs for Scheduler REST & Web APIs.
  • Avoid retrieving client version from the dll file when doing impersonation.

  • Allow Job Administrators to connect service as client.

    Previously we only allow cluster Administrators to connect service as client, now we also allow cluster Job Administrators to do so.

  • (Preview) HpcData service scheduler integration.

    Users can setup and run HpcData service on compute nodes (both Linux and Windows) and specify inputFiles and outputFiles properties for tasks to download input files and upload output files from/to Azure Blobs, Azure Files or File Servers. Please check README.txt under %CCP_HOME%Core for more details. Note this is a preview feature and may be subject to future changes according to customer feedbacks.

  • Linux node manager improvement.

    1. Add environment variable CCP_DISABLE_CGROUP to enable running a task without cgroup.
    2. Change the default working directory to home.
    3. Enable task statistics when task is running.
    4. Add properties 'CcpVersion' and 'CustomProperties' in node registration info.
    5. Support monitoring InfiniBand network usage.
    6. Support monitoring multiple instances of network usage, which is set as default instead of monitoring total usage.