资源传播失败:ClusterResourcePlacementRolloutStarted 为 false
本文介绍如何排查 ClusterResourcePlacementRolloutStarted
在 Azure Kubernetes Fleet Manager 中使用 ClusterResourcePlacement
API 对象传播资源时出现的问题。
现象
使用 ClusterResourcePlacement
Azure Kubernetes Fleet Manager 中的 API 对象传播资源时,所选资源不会在所有计划群集中推出, ClusterResourcePlacementRolloutStarted
条件状态显示为 False
。
注意
若要详细了解推出为何不启动,可以检查 推出控制器 日志。
原因
群集资源放置推出策略被阻止,因为 RollingUpdate
配置过于严格。
疑难解答步骤
- 在
ClusterResourcePlacement
“状态”部分中,检查placementStatuses
标识状态设置为False
的RolloutStarted
群集。 - 找到标识的群集的相应
ClusterResourceBinding
位置。 有关详细信息,请参阅 如何查找最新的 ClusterResourceBinding 资源? 此资源应指示Work
状态(是创建还是更新)。 - 验证其值
maxUnavailable
并确保maxSurge
它们符合预期。
案例研究
在以下示例中,尝试 ClusterResourcePlacement
将命名空间传播到三个成员群集。 但是,在初始创建ClusterResourcePlacement
期间,中心群集上不存在命名空间,并且机群当前包含两个名为和kind-cluster-2
的成员群集kind-cluster-1
。
ClusterResourcePlacement 规范
spec:
policy:
numberOfClusters: 3
placementType: PickN
resourceSelectors:
- group: ""
kind: Namespace
name: test-ns
version: v1
revisionHistoryLimit: 10
strategy:
type: RollingUpdate
ClusterResourcePlacement 状态
status:
conditions:
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: could not find all the clusters needed as specified by the scheduling
policy
observedGeneration: 1
reason: SchedulingPolicyUnfulfilled
status: "False"
type: ClusterResourcePlacementScheduled
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: All 2 cluster(s) start rolling out the latest resource
observedGeneration: 1
reason: RolloutStarted
status: "True"
type: ClusterResourcePlacementRolloutStarted
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: No override rules are configured for the selected resources
observedGeneration: 1
reason: NoOverrideSpecified
status: "True"
type: ClusterResourcePlacementOverridden
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: Works(s) are successfully created or updated in the 2 target clusters'
namespaces
observedGeneration: 1
reason: WorkSynchronized
status: "True"
type: ClusterResourcePlacementWorkSynchronized
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: The selected resources are successfully applied to 2 clusters
observedGeneration: 1
reason: ApplySucceeded
status: "True"
type: ClusterResourcePlacementApplied
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: The selected resources in 2 cluster are available now
observedGeneration: 1
reason: ResourceAvailable
status: "True"
type: ClusterResourcePlacementAvailable
observedResourceIndex: "0"
placementStatuses:
- clusterName: kind-cluster-2
conditions:
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: 'Successfully scheduled resources for placement in kind-cluster-2 (affinity
score: 0, topology spread score: 0): picked by scheduling policy'
observedGeneration: 1
reason: Scheduled
status: "True"
type: Scheduled
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: Detected the new changes on the resources and started the rollout process
observedGeneration: 1
reason: RolloutStarted
status: "True"
type: RolloutStarted
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: No override rules are configured for the selected resources
observedGeneration: 1
reason: NoOverrideSpecified
status: "True"
type: Overridden
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: All of the works are synchronized to the latest
observedGeneration: 1
reason: AllWorkSynced
status: "True"
type: WorkSynchronized
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: All corresponding work objects are applied
observedGeneration: 1
reason: AllWorkHaveBeenApplied
status: "True"
type: Applied
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: All corresponding work objects are available
observedGeneration: 1
reason: AllWorkAreAvailable
status: "True"
type: Available
- clusterName: kind-cluster-1
conditions:
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: 'Successfully scheduled resources for placement in kind-cluster-1 (affinity
score: 0, topology spread score: 0): picked by scheduling policy'
observedGeneration: 1
reason: Scheduled
status: "True"
type: Scheduled
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: Detected the new changes on the resources and started the rollout process
observedGeneration: 1
reason: RolloutStarted
status: "True"
type: RolloutStarted
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: No override rules are configured for the selected resources
observedGeneration: 1
reason: NoOverrideSpecified
status: "True"
type: Overridden
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: All of the works are synchronized to the latest
observedGeneration: 1
reason: AllWorkSynced
status: "True"
type: WorkSynchronized
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: All corresponding work objects are applied
observedGeneration: 1
reason: AllWorkHaveBeenApplied
status: "True"
type: Applied
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: All corresponding work objects are available
observedGeneration: 1
reason: AllWorkAreAvailable
status: "True"
type: Available
上述输出指示中心群集上从未存在资源 test-ns
命名空间,并显示以下 ClusterResourcePlacement
条件状态:
- 条件
ClusterResourcePlacementScheduled
状态显示为False
,因为指定的策略旨在选取三个群集,但计划程序只能容纳两个当前可用群集和已加入群集中的位置。 - 条件
ClusterResourcePlacementRolloutStarted
状态显示为True
,因为推出过程已从选择了两个群集开始。 - 条件
ClusterResourcePlacementOverridden
状态显示为True
,因为未为所选资源配置替代规则。 - 条件
ClusterResourcePlacementWorkSynchronized
状态显示为True
。 - 条件
ClusterResourcePlacementApplied
状态显示为True
。 - 条件
ClusterResourcePlacementAvailable
状态显示为True
。
若要确保跨相关群集无缝传播命名空间,请继续在中心群集上创建 test-ns
命名空间。
在中心群集上创建命名空间“test-ns”后,ClusterResourcePlacement 状态
status:
conditions:
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: could not find all the clusters needed as specified by the scheduling
policy
observedGeneration: 1
reason: SchedulingPolicyUnfulfilled
status: "False"
type: ClusterResourcePlacementScheduled
- lastTransitionTime: "2024-05-07T23:13:51Z"
message: The rollout is being blocked by the rollout strategy in 2 cluster(s)
observedGeneration: 1
reason: RolloutNotStartedYet
status: "False"
type: ClusterResourcePlacementRolloutStarted
observedResourceIndex: "1"
placementStatuses:
- clusterName: kind-cluster-2
conditions:
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: 'Successfully scheduled resources for placement in kind-cluster-2 (affinity
score: 0, topology spread score: 0): picked by scheduling policy'
observedGeneration: 1
reason: Scheduled
status: "True"
type: Scheduled
- lastTransitionTime: "2024-05-07T23:13:51Z"
message: The rollout is being blocked by the rollout strategy
observedGeneration: 1
reason: RolloutNotStartedYet
status: "False"
type: RolloutStarted
- clusterName: kind-cluster-1
conditions:
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: 'Successfully scheduled resources for placement in kind-cluster-1 (affinity
score: 0, topology spread score: 0): picked by scheduling policy'
observedGeneration: 1
reason: Scheduled
status: "True"
type: Scheduled
- lastTransitionTime: "2024-05-07T23:13:51Z"
message: The rollout is being blocked by the rollout strategy
observedGeneration: 1
reason: RolloutNotStartedYet
status: "False"
type: RolloutStarted
selectedResources:
- kind: Namespace
name: test-ns
version: v1
在前面的输出中, ClusterResourcePlacementScheduled
条件状态显示为 False
。 状态 ClusterResourcePlacementRolloutStarted
也显示为 False
消息: The rollout is being blocked by the rollout strategy in 2 cluster(s)
。
ClusterResourceSnapshot
通过在“如何查找最新的 ClusterResourceBinding”资源中运行命令来检查最新资源?
最新 ClusterResourceSnapshot
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourceSnapshot
metadata:
annotations:
kubernetes-fleet.io/number-of-enveloped-object: "0"
kubernetes-fleet.io/number-of-resource-snapshots: "1"
kubernetes-fleet.io/resource-hash: 72344be6e268bc7af29d75b7f0aad588d341c228801aab50d6f9f5fc33dd9c7c
creationTimestamp: "2024-05-07T23:13:51Z"
generation: 1
labels:
kubernetes-fleet.io/is-latest-snapshot: "true"
kubernetes-fleet.io/parent-CRP: crp-3
kubernetes-fleet.io/resource-index: "1"
name: crp-3-1-snapshot
ownerReferences:
- apiVersion: placement.kubernetes-fleet.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: ClusterResourcePlacement
name: crp-3
uid: b4f31b9a-971a-480d-93ac-93f093ee661f
resourceVersion: "14434"
uid: 85ee0e81-92c9-4362-932b-b0bf57d78e3f
spec:
selectedResources:
- apiVersion: v1
kind: Namespace
metadata:
labels:
kubernetes.io/metadata.name: test-ns
name: test-ns
spec:
finalizers:
- kubernetes
在规范中 ClusterResourceSnapshot
,该 selectedResources
部分现在显示命名空间 test-ns
。
ClusterResourceBinding
检查是否kind-cluster-1
在创建命名空间后更新了命名空间test-ns
。 有关详细信息,请参阅 如何查找最新的 ClusterResourceBinding 资源?。
kind-cluster-1 的 ClusterResourceBinding
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourceBinding
metadata:
creationTimestamp: "2024-05-07T23:08:53Z"
finalizers:
- kubernetes-fleet.io/work-cleanup
generation: 2
labels:
kubernetes-fleet.io/parent-CRP: crp-3
name: crp-3-kind-cluster-1-7114c253
resourceVersion: "14438"
uid: 0db4e480-8599-4b40-a1cc-f33bcb24b1a7
spec:
applyStrategy:
type: ClientSideApply
clusterDecision:
clusterName: kind-cluster-1
clusterScore:
affinityScore: 0
priorityScore: 0
reason: picked by scheduling policy
selected: true
resourceSnapshotName: crp-3-0-snapshot
schedulingPolicySnapshotName: crp-3-0
state: Bound
targetCluster: kind-cluster-1
status:
conditions:
- lastTransitionTime: "2024-05-07T23:13:51Z"
message: The resources cannot be updated to the latest because of the rollout
strategy
observedGeneration: 2
reason: RolloutNotStartedYet
status: "False"
type: RolloutStarted
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: No override rules are configured for the selected resources
observedGeneration: 2
reason: NoOverrideSpecified
status: "True"
type: Overridden
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: All of the works are synchronized to the latest
observedGeneration: 2
reason: AllWorkSynced
status: "True"
type: WorkSynchronized
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: All corresponding work objects are applied
observedGeneration: 2
reason: AllWorkHaveBeenApplied
status: "True"
type: Applied
- lastTransitionTime: "2024-05-07T23:08:53Z"
message: All corresponding work objects are available
observedGeneration: 2
reason: AllWorkAreAvailable
status: "True"
type: Available
保持不变 ClusterResourceBinding
。 在规范中 ClusterResourceBinding
,仍 resourceSnapshotName
引用旧 ClusterResourceSnapshot
名称。 如果用户没有显式 RollingUpdate
输入,因为应用了默认值,则会出现此问题:
- 该值
maxUnavailable
配置为 25% × 3(所需数字),舍入为1
。 - 该值
maxSurge
配置为 25% × 3(所需数字),舍入为1
。
为何未更新 ClusterResourceBinding
最初,创建后 ClusterResourcePlacement
会生成两 ClusterResourceBindings
个。 但是,由于推出不适用于初始阶段,条件 ClusterResourcePlacementRolloutStarted
设置为 True
。
在中心群集上创建 test-ns
命名空间时,推出控制器尝试更新两个现有 ClusterResourceBindings
命名空间。 但是, maxUnavailable
由于 1
缺少成员群集,因此 RollingUpdate
配置过于严格。
注意
在更新期间,如果其中一个绑定无法应用,它还会违反 RollingUpdate
配置,这会导致 maxUnavailable
设置为 1
。
解决方法
在这种情况下,若要解决此问题,请考虑手动设置为 maxUnavailable
大于 1
放宽 RollingUpdate
配置的值。 或者,可以加入第三个成员群集。
联系我们寻求帮助
如果你有任何疑问或需要帮助,请创建支持请求或联系 Azure 社区支持。 你还可以将产品反馈提交到 Azure 反馈社区。