Falha na propagação de recursos: ClusterResourcePlacementRolloutStarted é falso

Artigo
08/06/2024

Este artigo descreve como solucionar ClusterResourcePlacementRolloutStarted problemas ao propagar recursos usando o objeto de API no Gerenciador de Frotas do ClusterResourcePlacement Kubernetes do Azure.

Sintomas

Ao usar o objeto de ClusterResourcePlacement API no Gerenciador de Frota do Kubernetes do Azure para propagar recursos, os recursos selecionados não são distribuídos em todos os clusters agendados e o status da ClusterResourcePlacementRolloutStarted condição é mostrado como False.

Observação

Para obter mais informações sobre por que a distribuição não é iniciada, você pode verificar os logs do controlador de implantação.

Motivo

A estratégia de implementação Posicionamento de Recursos de Cluster está bloqueada porque a RollingUpdate configuração é muito rígida.

Etapas para solucionar problemas

Na seção de ClusterResourcePlacement status, marque a placementStatuses opção para identificar clusters que têm o RolloutStarted status definido como False.
Localize o correspondente ClusterResourceBinding para o cluster identificado. Para obter mais informações, consulte Como posso encontrar o recurso ClusterResourceBinding mais recente? Esse recurso deve indicar o Work status (se foi criado ou atualizado).
Verifique os valores e maxSurge certifique-se de maxUnavailable que eles estejam alinhados com suas expectativas.

Estudo de caso

No exemplo a seguir, o ClusterResourcePlacement está tentando propagar um namespace para três clusters membros. No entanto, durante a criação inicial do ClusterResourcePlacement, o namespace não existia no cluster do hub, e a frota atualmente é composta por dois clusters membros chamados kind-cluster-1 e kind-cluster-2.

Especificação de ClusterResourcePlacement

spec:
  policy:
    numberOfClusters: 3
    placementType: PickN
  resourceSelectors:
  - group: ""
    kind: Namespace
    name: test-ns
    version: v1
  revisionHistoryLimit: 10
  strategy:
    type: RollingUpdate

Status de ClusterResourcePlacement

status:
  conditions:
  - lastTransitionTime: "2024-05-07T23:08:53Z"
    message: could not find all the clusters needed as specified by the scheduling
      policy
    observedGeneration: 1
    reason: SchedulingPolicyUnfulfilled
    status: "False"
    type: ClusterResourcePlacementScheduled
  - lastTransitionTime: "2024-05-07T23:08:53Z"
    message: All 2 cluster(s) start rolling out the latest resource
    observedGeneration: 1
    reason: RolloutStarted
    status: "True"
    type: ClusterResourcePlacementRolloutStarted
  - lastTransitionTime: "2024-05-07T23:08:53Z"
    message: No override rules are configured for the selected resources
    observedGeneration: 1
    reason: NoOverrideSpecified
    status: "True"
    type: ClusterResourcePlacementOverridden
  - lastTransitionTime: "2024-05-07T23:08:53Z"
    message: Works(s) are successfully created or updated in the 2 target clusters'
      namespaces
    observedGeneration: 1
    reason: WorkSynchronized
    status: "True"
    type: ClusterResourcePlacementWorkSynchronized
  - lastTransitionTime: "2024-05-07T23:08:53Z"
    message: The selected resources are successfully applied to 2 clusters
    observedGeneration: 1
    reason: ApplySucceeded
    status: "True"
    type: ClusterResourcePlacementApplied
  - lastTransitionTime: "2024-05-07T23:08:53Z"
    message: The selected resources in 2 cluster are available now
    observedGeneration: 1
    reason: ResourceAvailable
    status: "True"
    type: ClusterResourcePlacementAvailable
  observedResourceIndex: "0"
  placementStatuses:
  - clusterName: kind-cluster-2
    conditions:
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: 'Successfully scheduled resources for placement in kind-cluster-2 (affinity
        score: 0, topology spread score: 0): picked by scheduling policy'
      observedGeneration: 1
      reason: Scheduled
      status: "True"
      type: Scheduled
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: Detected the new changes on the resources and started the rollout process
      observedGeneration: 1
      reason: RolloutStarted
      status: "True"
      type: RolloutStarted
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: No override rules are configured for the selected resources
      observedGeneration: 1
      reason: NoOverrideSpecified
      status: "True"
      type: Overridden
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: All of the works are synchronized to the latest
      observedGeneration: 1
      reason: AllWorkSynced
      status: "True"
      type: WorkSynchronized
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: All corresponding work objects are applied
      observedGeneration: 1
      reason: AllWorkHaveBeenApplied
      status: "True"
      type: Applied
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: All corresponding work objects are available
      observedGeneration: 1
      reason: AllWorkAreAvailable
      status: "True"
      type: Available
  - clusterName: kind-cluster-1
    conditions:
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: 'Successfully scheduled resources for placement in kind-cluster-1 (affinity
        score: 0, topology spread score: 0): picked by scheduling policy'
      observedGeneration: 1
      reason: Scheduled
      status: "True"
      type: Scheduled
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: Detected the new changes on the resources and started the rollout process
      observedGeneration: 1
      reason: RolloutStarted
      status: "True"
      type: RolloutStarted
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: No override rules are configured for the selected resources
      observedGeneration: 1
      reason: NoOverrideSpecified
      status: "True"
      type: Overridden
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: All of the works are synchronized to the latest
      observedGeneration: 1
      reason: AllWorkSynced
      status: "True"
      type: WorkSynchronized
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: All corresponding work objects are applied
      observedGeneration: 1
      reason: AllWorkHaveBeenApplied
      status: "True"
      type: Applied
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: All corresponding work objects are available
      observedGeneration: 1
      reason: AllWorkAreAvailable
      status: "True"
      type: Available

A saída anterior indica que o namespace de recurso test-ns nunca existiu no cluster de hub e mostra os seguintes ClusterResourcePlacement status de condição:

O ClusterResourcePlacementScheduled status da condição é mostrado como False, pois a política especificada visa escolher três clusters, mas o agendador só pode acomodar posicionamentos em dois clusters atualmente disponíveis e ingressados.
O ClusterResourcePlacementRolloutStarted status da condição é exibido como True, pois o processo de distribuição foi iniciado com dois clusters selecionados.
O ClusterResourcePlacementOverridden status da condição é exibido como True, pois nenhuma regra de substituição está configurada para os recursos selecionados.
O status da ClusterResourcePlacementWorkSynchronized condição é mostrado como True.
O status da ClusterResourcePlacementApplied condição é mostrado como True.
O status da ClusterResourcePlacementAvailable condição é mostrado como True.

Para garantir a propagação contínua do namespace entre os clusters relevantes, prossiga para criar o test-ns namespace no cluster do hub.

Status ClusterResourcePlacement após a criação do namespace "test-ns" no cluster do hub

status:
  conditions:
  - lastTransitionTime: "2024-05-07T23:08:53Z"
    message: could not find all the clusters needed as specified by the scheduling
      policy
    observedGeneration: 1
    reason: SchedulingPolicyUnfulfilled
    status: "False"
    type: ClusterResourcePlacementScheduled
  - lastTransitionTime: "2024-05-07T23:13:51Z"
    message: The rollout is being blocked by the rollout strategy in 2 cluster(s)
    observedGeneration: 1
    reason: RolloutNotStartedYet
    status: "False"
    type: ClusterResourcePlacementRolloutStarted
  observedResourceIndex: "1"
  placementStatuses:
  - clusterName: kind-cluster-2
    conditions:
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: 'Successfully scheduled resources for placement in kind-cluster-2 (affinity
        score: 0, topology spread score: 0): picked by scheduling policy'
      observedGeneration: 1
      reason: Scheduled
      status: "True"
      type: Scheduled
    - lastTransitionTime: "2024-05-07T23:13:51Z"
      message: The rollout is being blocked by the rollout strategy
      observedGeneration: 1
      reason: RolloutNotStartedYet
      status: "False"
      type: RolloutStarted
  - clusterName: kind-cluster-1
    conditions:
    - lastTransitionTime: "2024-05-07T23:08:53Z"
      message: 'Successfully scheduled resources for placement in kind-cluster-1 (affinity
        score: 0, topology spread score: 0): picked by scheduling policy'
      observedGeneration: 1
      reason: Scheduled
      status: "True"
      type: Scheduled
    - lastTransitionTime: "2024-05-07T23:13:51Z"
      message: The rollout is being blocked by the rollout strategy
      observedGeneration: 1
      reason: RolloutNotStartedYet
      status: "False"
      type: RolloutStarted
  selectedResources:
  - kind: Namespace
    name: test-ns
    version: v1

Na saída anterior, o status da ClusterResourcePlacementScheduled condição é mostrado como False. O ClusterResourcePlacementRolloutStarted status também é mostrado como False com a mensagem: The rollout is being blocked by the rollout strategy in 2 cluster(s).

Verifique a versão mais recente ClusterResourceSnapshot executando o comando em Como posso encontrar o recurso ClusterResourceBinding mais recente?

ClusterResourceSnapshot mais recente

apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourceSnapshot
metadata:
  annotations:
    kubernetes-fleet.io/number-of-enveloped-object: "0"
    kubernetes-fleet.io/number-of-resource-snapshots: "1"
    kubernetes-fleet.io/resource-hash: 72344be6e268bc7af29d75b7f0aad588d341c228801aab50d6f9f5fc33dd9c7c
  creationTimestamp: "2024-05-07T23:13:51Z"
  generation: 1
  labels:
    kubernetes-fleet.io/is-latest-snapshot: "true"
    kubernetes-fleet.io/parent-CRP: crp-3
    kubernetes-fleet.io/resource-index: "1"
  name: crp-3-1-snapshot
  ownerReferences:
  - apiVersion: placement.kubernetes-fleet.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: ClusterResourcePlacement
    name: crp-3
    uid: b4f31b9a-971a-480d-93ac-93f093ee661f
  resourceVersion: "14434"
  uid: 85ee0e81-92c9-4362-932b-b0bf57d78e3f
spec:
  selectedResources:
  - apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        kubernetes.io/metadata.name: test-ns
      name: test-ns
    spec:
      finalizers:
      - kubernetes

ClusterResourceSnapshot Na especificação, a selectedResources seção agora mostra o namespace test-ns.

Verifique se ClusterResourceBinding kind-cluster-1 ele foi atualizado depois que o namespace test-ns foi criado. Para obter mais informações, consulte Como posso encontrar o recurso ClusterResourceBinding mais recente?.

ClusterResourceBinding para kind-cluster-1

apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourceBinding
metadata:
  creationTimestamp: "2024-05-07T23:08:53Z"
  finalizers:
  - kubernetes-fleet.io/work-cleanup
  generation: 2
  labels:
    kubernetes-fleet.io/parent-CRP: crp-3
  name: crp-3-kind-cluster-1-7114c253
  resourceVersion: "14438"
  uid: 0db4e480-8599-4b40-a1cc-f33bcb24b1a7
spec:
  applyStrategy:
    type: ClientSideApply
  clusterDecision:
    clusterName: kind-cluster-1
    clusterScore:
      affinityScore: 0
      priorityScore: 0
    reason: picked by scheduling policy
    selected: true
  resourceSnapshotName: crp-3-0-snapshot
  schedulingPolicySnapshotName: crp-3-0
  state: Bound
  targetCluster: kind-cluster-1
status:
  conditions:
  - lastTransitionTime: "2024-05-07T23:13:51Z"
    message: The resources cannot be updated to the latest because of the rollout
      strategy
    observedGeneration: 2
    reason: RolloutNotStartedYet
    status: "False"
    type: RolloutStarted
  - lastTransitionTime: "2024-05-07T23:08:53Z"
    message: No override rules are configured for the selected resources
    observedGeneration: 2
    reason: NoOverrideSpecified
    status: "True"
    type: Overridden
  - lastTransitionTime: "2024-05-07T23:08:53Z"
    message: All of the works are synchronized to the latest
    observedGeneration: 2
    reason: AllWorkSynced
    status: "True"
    type: WorkSynchronized
  - lastTransitionTime: "2024-05-07T23:08:53Z"
    message: All corresponding work objects are applied
    observedGeneration: 2
    reason: AllWorkHaveBeenApplied
    status: "True"
    type: Applied
  - lastTransitionTime: "2024-05-07T23:08:53Z"
    message: All corresponding work objects are available
    observedGeneration: 2
    reason: AllWorkAreAvailable
    status: "True"
    type: Available

O ClusterResourceBinding permanece inalterado. ClusterResourceBinding Na especificação, o resourceSnapshotName still faz referência ao nome antigoClusterResourceSnapshot. Esse problema ocorre quando não há entrada explícita RollingUpdate do usuário porque os valores padrão são aplicados:

O maxUnavailable valor é configurado para 25% × 3 (o número desejado), arredondado para 1.
O maxSurge valor é configurado para 25% × 3 (o número desejado), arredondado para 1.

Por que ClusterResourceBinding não é atualizado

Inicialmente, quando o ClusterResourcePlacement foi criado, dois ClusterResourceBindings foram gerados. No entanto, como o lançamento não se aplicava à fase inicial, a ClusterResourcePlacementRolloutStarted condição foi definida como True.

Ao criar o test-ns namespace no cluster de hub, o controlador de distribuição tentou atualizar os dois arquivos .ClusterResourceBindings No entanto, maxUnavailable foi definido 1 devido à falta de clusters de membros, o que fez com que a RollingUpdate configuração fosse muito rígida.

Observação

Durante a atualização, se uma das associações não for aplicada, ela também violará a configuração, o RollingUpdate que faz com maxUnavailable que seja definido como 1.

Resolução

Nessa situação, para resolver esse problema, considere definir maxUnavailable manualmente para um valor maior do que 1 para relaxar a RollingUpdate configuração. Como alternativa, você pode ingressar em um terceiro cluster de membros.

Entre em contato conosco para obter ajuda

Se você tiver dúvidas ou precisar de ajuda, crie uma solicitação de suporte ou peça ajuda à comunidade de suporte do Azure. Você também pode enviar comentários sobre o produto para a comunidade de comentários do Azure.

Compartilhar via