CycleCloud Slurm 3.0

아티클
04/28/2023

Slurm 스케줄러 지원은 CycleCloud 8.4.0 릴리스의 일부로 다시 작성되었습니다. 주요 기능은 다음과 같습니다.

동적 노드 및 동적 nodearays를 통한 동적 파티션 지원, 단일 및 여러 VM 크기 모두 지원
새 slurm 버전 23.02 및 22.05.8
CLI를 통한 azslurm 비용 보고
azslurm cli 기반 자동 크기 조정기
Ubuntu 20 지원
토폴로지 플러그 인에 대한 필요성이 제거되었으므로 제출 플러그 인도 제거되었습니다.

CycleCloud 버전 < 8.4.0의 Slurm 클러스터

클러스터 변경

CycleCloud에 배포된 Slurm 클러스터에는 클러스터를 쉽게 변경할 수 있도록 라는 azslurm CLI가 포함되어 있습니다. 클러스터를 변경한 후 Slurm 스케줄러 노드에서 루트로 다음 명령을 실행하여 를 다시 azure.conf 빌드하고 클러스터의 노드를 업데이트합니다.

      $ sudo -i
      # azslurm scale

이렇게 하면 올바른 수의 노드와 적절한 gres.conf 파티션을 만들고 를 다시 시작해야 slurmctld합니다.

더 이상 실행 노드를 미리 만들지 않음

CycleCloud Slurm 프로젝트의 버전 3.0.0을 기준으로 CycleCloud에서 노드를 더 이상 미리 만들지 않습니다. 노드는 가 호출될 때 azslurm resume 생성되거나 CLI를 통해 CycleCloud에서 수동으로 만들어집니다.

추가 파티션 만들기

Azure CycleCloud와 함께 제공되는 기본 템플릿에는 세 개의 파티션(hpc및 htcdynamic)이 있으며 Slurm 파티션에 직접 매핑되는 사용자 지정 nodearray를 정의할 수 있습니다. 예를 들어 GPU 파티션을 만들려면 클러스터 템플릿에 다음 섹션을 추가합니다.

   [[nodearray gpu]]
   MachineType = $GPUMachineType
   ImageName = $GPUImageName
   MaxCoreCount = $MaxGPUExecuteCoreCount
   Interruptible = $GPUUseLowPrio
   AdditionalClusterInitSpecs = $ExecuteClusterInitSpecs

      [[[configuration]]]
      slurm.autoscale = true
      # Set to true if nodes are used for tightly-coupled multi-node jobs
      slurm.hpc = false

      [[[cluster-init cyclecloud/slurm:execute:3.0.1]]]
      [[[network-interface eth0]]]
      AssociatePublicIpAddress = $ExecuteNodesPublic

동적 파티션

에서는 3.0.1동적 파티션을 지원합니다. 다음을 nodearray 추가하여 동적 파티션에 대한 맵을 만들 수 있습니다. myfeature 원하는 기능 설명일 수 있습니다. 쉼표로 구분된 두 개 이상의 기능일 수도 있습니다.

      [[[configuration]]]
      slurm.autoscale = true
      # Set to true if nodes are used for tightly-coupled multi-node jobs
      slurm.hpc = false
      # This is the minimum, but see slurmd --help and [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) for more information.
      slurm.dynamic_config := "-Z --conf \"Feature=myfeature\""

그러면 다음과 같은 동적 파티션이 생성됩니다.

# Creating dynamic nodeset and partition using slurm.dynamic_config=-Z --conf "Feature=myfeature"
Nodeset=mydynamicns Feature=myfeature
PartitionName=mydynamicpart Nodes=mydynamicns

동적 파티션을 사용하여 자동 크기 조정

기본적으로 동적 파티션에는 노드를 정의하지 않습니다. 대신 CycleCloud를 통해 또는 수동으로 호출하여 노드를 azslurm resume 시작할 수 있으며 선택한 이름으로 클러스터에 조인합니다. 그러나 Slurm은 이러한 노드에 대해 알지 못하므로 노드를 자동으로 스케일 업할 수 없습니다.

대신 과 같이 노드 레코드를 미리 만들 수도 있습니다. 그러면 Slurm이 노드 레코드를 자동으로 스케일 업할 수 있습니다.

scontrol create nodename=f4-[1-10] Feature=myfeature State=CLOUD

동적 파티션의 다른 장점 중 하나는 동일한 파티션에서 여러 VM 크기를 지원할 수 있다는 것입니다. VM 크기 이름을 기능으로 추가하기 azslurm 만 하면 사용할 VM 크기를 구분할 수 있습니다.

참고 VM 크기는 암시적으로 추가됩니다. 에 추가할 필요가 없습니다. slurm.dynamic_config

scontrol create nodename=f4-[1-10] Feature=myfeature,Standard_F4 State=CLOUD
scontrol create nodename=f8-[1-10] Feature=myfeature,Standard_F8 State=CLOUD

어느 쪽이든, 이러한 노드를 State=Cloud 에 만든 후에는 이제 다른 노드와 마찬가지로 자동 크기 조정에 사용할 수 있습니다.

CycleCloud nodearray에서 여러 VM 크기를 지원하려면 를 추가하여 Config.Mutiselect = true여러 VM 크기를 허용하도록 템플릿을 변경할 수 있습니다.

        [[[parameter DynamicMachineType]]]
        Label = Dyn VM Type
        Description = The VM type for Dynamic nodes
        ParameterType = Cloud.MachineType
        DefaultValue = Standard_F2s_v2
        Config.Multiselect = true

동적 확장

기본적으로 동적 파티션의 모든 노드는 다른 파티션과 마찬가지로 축소됩니다. 이를 사용하지 않도록 설정하려면 SuspendExcParts를 참조하세요.

수동 크기 조정

cyclecloud_slurm 자동 크기 조정이 사용하지 않도록 설정된 것을 감지하는 경우(SuspendTime=-1) FUTURE 상태를 사용하여 Slurm의 전원 상태에 의존하는 대신 전원이 꺼진 노드를 나타냅니다. 즉, 자동 크기 조정을 사용하도록 설정하면 off 노드는 sinfo에서와 같이 idle~ 표시됩니다. 자동 크기 조정을 사용하지 않도록 설정하면 off 노드가 sinfo에 전혀 표시되지 않습니다. 를 사용하여 해당 정의를 scontrol show nodes --future계속 볼 수 있습니다.

새 노드를 시작하려면 를 실행 /opt/azurehpc/slurm/resume_program.sh node_list 합니다(예: htc-[1-10]).

노드를 종료하려면 를 실행 /opt/azurehpc/slurm/suspend_program.sh node_list 합니다(예: htc-[1-10]).

이 모드에서 클러스터를 시작하려면 템플릿의 추가 slurm 구성에 를 추가 SuspendTime=-1 하기만 하면 됩니다.

클러스터를 이 모드로 전환하려면 slurm.conf에 를 추가하고 SuspendTime=-1 를 실행합니다 scontrol reconfigure. 그런 다음, azslurm remove_nodes && azslurm scale을 실행합니다.

문제 해결

2.7에서 3.0으로 전환

설치 폴더가 변경됨 /opt/cycle/slurm ->/opt/azurehpc/slurm
이제 자동 크기 조정 로그가 대신 /var/log/slurmctld에 있습니다/opt/azurehpc/slurm/logs. 은 slurmctld.log 이 폴더에 있습니다.

cyclecloud_slurm.sh 더 이상 존재하지 않습니다. 대신 루트로 실행할 수 있는 새 azslurm cli가 있습니다. azslurm 는 자동 완성을 지원합니다.

[root@scheduler ~]# azslurm
usage: 
accounting_info      - 
buckets              - Prints out autoscale bucket information, like limits etc
config               - Writes the effective autoscale config, after any preprocessing, to stdout
connect              - Tests connection to CycleCloud
cost                 - Cost analysis and reporting tool that maps Azure costs to Slurm Job Accounting data. This is an experimental feature.
default_output_columns - Output what are the default output columns for an optional command.
generate_topology    - Generates topology plugin configuration
initconfig           - Creates an initial autoscale config. Writes to stdout
keep_alive           - Add, remove or set which nodes should be prevented from being shutdown.
limits               - 
nodes                - Query nodes
partitions           - Generates partition configuration
refresh_autocomplete - Refreshes local autocomplete information for cluster specific resources and nodes.
remove_nodes         - Removes the node from the scheduler without terminating the actual instance.
resume               - Equivalent to ResumeProgram, starts and waits for a set of nodes.
resume_fail          - Equivalent to SuspendFailProgram, shuts down nodes
retry_failed_nodes   - Retries all nodes in a failed state.
scale                - 
shell                - Interactive python shell with relevant objects in local scope. Use --script to run python scripts
suspend              - Equivalent to SuspendProgram, shuts down nodes
wait_for_resume      - Wait for a set of nodes to converge.

CycleCloud에서 노드가 더 이상 미리 채워지지 않습니다. 필요한 경우에만 만들어집니다.
모든 slurm 이진 파일은 아래의 azure-slurm-install-pkg*.tar.gz 파일 slurm-pkgs내에 있습니다. 특정 이진 릴리스에서 가져옵니다. 현재 이진 릴리스는 2023-03-13입니다.
MPI 작업의 경우 기본적으로 존재하는 유일한 네트워크 경계는 파티션입니다. 파티션당 2.x와 같은 여러 "배치 그룹"이 없습니다. 따라서 파티션당 하나의 공동 배치된 VMSS만 있습니다. 더 이상 필요하지 않은 작업 제출 플러그 인을 사용해야 하는 토폴로지 플러그 인도 사용되지 않습니다. 대신 여러 파티션에 제출하는 것이 이제 여러 배치 그룹에 작업을 제출해야 하는 사용 사례에 권장되는 옵션입니다.

CycleCloud는 스케줄러에서 표준 자동 중지 특성 집합을 지원합니다.

attribute	Description
cyclecloud.cluster.autoscale.stop_enabled	이 노드에서 자동 중지를 사용할 수 있나요? [true/false]
cyclecloud.cluster.autoscale.idle_time_after_jobs	노드가 축소되기 전에 작업을 완료한 후 유휴 상태로 앉을 시간(초)입니다.
cyclecloud.cluster.autoscale.idle_time_before_jobs	노드가 축소되기 전에 작업을 완료하기 전에 유휴 상태로 앉을 시간(초)입니다.

다음을 통해 공유