Azure Red Hat OpenShift 4.0 support policy

Artikkeli
09/30/2024

Certain configurations for Azure Red Hat OpenShift 4 clusters can affect your cluster's supportability. Azure Red Hat OpenShift 4 allows cluster administrators to make changes to internal cluster components, but not all changes are supported. The support policy below shares what modifications violate the policy and void support from Microsoft and Red Hat.

Note

Features marked Technology Preview in OpenShift Container Platform are not supported in Azure Red Hat OpenShift.

Cluster configuration requirements

Compute

The cluster must have a minimum of three worker nodes and three master nodes.
Don't scale the cluster workers to zero, or attempt a cluster shutdown. Deallocating or powering down any virtual machine in the cluster resource group isn't supported.
Don't create more than 250 worker nodes on a cluster. 250 is the maximum number of nodes that can be created on a cluster. See Configure multiple IP addresses per ARO cluster load balancer for more information.
If you're making use of infrastructure nodes, don't run any undesignated workloads on them as this can affect the Service Level Agreement and cluster stability. Also, it's recommended to have three infrastructure nodes; one in each availability zone. See Deploy infrastructure nodes in an Azure Red Hat OpenShift (ARO) cluster for more information.
Non-RHCOS compute nodes aren't supported. For example, you can't use an RHEL compute node.
Don't attempt to remove, replace, add, or modify a master node. That's a high risk operation that can cause issues with etcd, permanent network loss, and loss of access and manageability by ARO SRE. If you feel that a master node should be replaced or removed, contact support before making any changes.
Ensure ample VM quota is available in case control plane nodes need to be scaled up by keeping at least double your current control plane vCPU count available.

Operators

All OpenShift Cluster operators must remain in a managed state. The list of cluster operators can be returned by running oc get clusteroperators.

Workload management

Don't add taints that would prevent any default OpenShift components from being scheduled.
To avoid disruption resulting from cluster maintenance, in-cluster workloads should be configured with high availability practices, including but not limited to pod affinity and anti-affinity, pod disruption budgets, and adequate scaling.
Don't run extra workloads on the control plane nodes. While they can be scheduled on the control plane nodes, it causes extra resource usage and stability issues that can affect the entire cluster.
Running custom workloads (including operators installed from Operator Hub or other operators provided by Red Hat) in infrastructure nodes isn't supported.

Logging and monitoring

Don't remove or modify the default cluster Prometheus service, except to modify scheduling of the default Prometheus instance.
Don't remove or modify the default cluster Alertmanager svc, default receiver, or any default alerting rules, except to add other receivers to notify external systems.
Don't remove or modify Azure Red Hat OpenShift service logging (mdsd pods).

Network and security

Unless you're using your own Network Security Group through the "bring your own" Network Security Group feature, the ARO-provided Network Security Group can't be modified or replaced. Any attempt to modify or replace it will be reverted.
All cluster virtual machines must have direct outbound internet access, at least to the Azure Resource Manager (ARM) and service logging (Geneva) endpoints. No form of HTTPS proxying is supported.
The Azure Red Hat OpenShift service accesses your cluster via Private Link Service. Don't remove or modify service access.
Migrating from OpenShift SDN to OVN isn't supported.

Cluster management

Don't remove or modify the 'arosvc.azurecr.io' cluster pull secret.
Don't create new MachineConfig objects or modify existing ones, unless explicitly supported in the Azure Red Hat OpenShift documentation.
Don't create new KubeletConfig objects or modify existing ones, unless explicitly supported in the Azure Red Hat OpenShift documentation.
Don't set any unsupportedConfigOverrides options. Setting these options prevents minor version upgrades.
Don't place policies within your subscription or management group that prevent SREs from performing normal maintenance against the Azure Red Hat OpenShift cluster. For example, don't require tags on the Azure Red Hat OpenShift RP-managed cluster resource group.
Don't circumvent the deny assignment that is configured as part of the service, or perform administrative tasks normally prohibited by the deny assignment.
OpenShift relies on the ability to automatically tag Azure resources. If you have configured a tagging policy, don't apply more than 10 user-defined tags to resources in the managed resource group.

Incident management

An incident is an event that results in a degradation or outage Azure Red Hat OpenShift services. Incidents are raised by a customer or Customer Experience and Engagement (CEE) member through a support case, directly by the centralized monitoring and alerting system, or directly by a member of the ARO Site Reliability Engineer (SRE) team.

Depending on the impact on the service and customer, the incident is categorized in terms of severity.

The general workflow of how a new incident is managed is described below:

An SRE first responder is alerted to a new incident and begins an initial investigation.
After the initial investigation, the incident is assigned an incident lead, who coordinates the recovery efforts.
The incident lead manages all communication and coordination around recovery, including any relevant notifications or support case updates.
The incident is recovered.
The incident is documented and a root cause analysis (RCA) is performed within 5 business days of the incident.
An RCA draft document is shared with the customer within 7 business days of the incident.

Supported virtual machine sizes

Azure Red Hat OpenShift 4 supports node instances on the following virtual machine sizes:

Control plane nodes

Series	Size	vCPU	Memory: GiB
Dsv3	Standard_D8s_v3	8	32
Dsv3	Standard_D16s_v3	16	64
Dsv3	Standard_D32s_v3	32	128
Dsv4	Standard_D8s_v4	8	32
Dsv4	Standard_D16s_v4	16	64
Dsv4	Standard_D32s_v4	32	128
Dsv5	Standard_D8s_v5	8	32
Dsv5	Standard_D16s_v5	16	64
Dsv5	Standard_D32s_v5	32	128
Dasv4	Standard_D8as_v4	8	32
Dasv4	Standard_D16as_v4	16	64
Dasv4	Standard_D32as_v4	32	128
Dasv5	Standard_D8as_v5	8	32
Dasv5	Standard_D16as_v5	16	64
Dasv5	Standard_D32as_v5	32	128
Easv4	Standard_E8as_v4	8	64
Easv4	Standard_E16as_v4	16	128
Easv4	Standard_E20as_v4	20	160
Easv4	Standard_E32as_v4	32	256
Easv4	Standard_E48as_v4	48	384
Easv4	Standard_E64as_v4	64	512
Easv4	Standard_E96as_v4	96	672
Easv5	Standard_E8as_v5	8	64
Easv5	Standard_E16as_v5	16	128
Easv5	Standard_E20as_v5	20	160
Easv5	Standard_E32as_v5	32	256
Easv5	Standard_E48as_v5	48	384
Easv5	Standard_E64as_v5	64	512
Easv5	Standard_E96as_v5	96	672
Eisv3	Standard_E64is_v3	64	432
Eis4	Standard_E80is_v4	80	504
Eids4	Standard_E80ids_v4	80	504
Eisv5	Standard_E104is_v5	104	672
Eidsv5	Standard_E104ids_v5	104	672
Esv4	Standard_E8s_v4	8	64
Esv4	Standard_E16s_v4	16	128
Esv4	Standard_E20s_v4	20	160
Esv4	Standard_E32s_v4	32	256
Esv4	Standard_E48s_v4	48	384
Esv4	Standard_E64s_v4	64	504
Esv5	Standard_E8s_v5	8	64
Esv5	Standard_E16s_v5	16	128
Esv5	Standard_E20s_v5	20	160
Esv5	Standard_E32s_v5	32	256
Esv5	Standard_E48s_v5	48	384
Esv5	Standard_E64s_v5	64	512
Esv5	Standard_E96s_v5	96	672
Fsv2	Standard_F72s_v2	72	144
Mms*	Standard_M128ms	128	3892

*Standard_M128ms' doesn't support encryption at host

Worker nodes

General purpose

Series	Size	vCPU	Memory: GiB
Dasv4	Standard_D4as_v4	4	16
Dasv4	Standard_D8as_v4	8	32
Dasv4	Standard_D16as_v4	16	64
Dasv4	Standard_D32as_v4	32	128
Dasv4	Standard_D64as_v4	64	256
Dasv4	Standard_D96as_v4	96	384
Dasv5	Standard_D4as_v5	4	16
Dasv5	Standard_D8as_v5	8	32
Dasv5	Standard_D16as_v5	16	64
Dasv5	Standard_D32as_v5	32	128
Dasv5	Standard_D64as_v5	64	256
Dasv5	Standard_D96as_v5	96	384
Dsv3	Standard_D4s_v3	4	16
Dsv3	Standard_D8s_v3	8	32
Dsv3	Standard_D16s_v3	16	64
Dsv3	Standard_D32s_v3	32	128
Dsv4	Standard_D4s_v4	4	16
Dsv4	Standard_D8s_v4	8	32
Dsv4	Standard_D16s_v4	16	64
Dsv4	Standard_D32s_v4	32	128
Dsv4	Standard_D64s_v4	64	256
Dsv5	Standard_D4s_v5	4	16
Dsv5	Standard_D8s_v5	8	32
Dsv5	Standard_D16s_v5	16	64
Dsv5	Standard_D32s_v5	32	128
Dsv5	Standard_D64s_v5	64	256
Dsv5	Standard_D96s_v5	96	384

Memory optimized

Series	Size	vCPU	Memory: GiB
Easv4	Standard_E4as_v4	4	32
Easv4	Standard_E8as_v4	8	64
Easv4	Standard_E16as_v4	16	128
Easv4	Standard_E20as_v4	20	160
Easv4	Standard_E32as_v4	32	256
Easv4	Standard_E48as_v4	48	384
Easv4	Standard_E64as_v4	64	512
Easv4	Standard_E96as_v4	96	672
Easv5	Standard_E8as_v5	8	64
Easv5	Standard_E16as_v5	16	128
Easv5	Standard_E20as_v5	20	160
Easv5	Standard_E32as_v5	32	256
Easv5	Standard_E48as_v5	48	384
Easv5	Standard_E64as_v5	64	512
Easv5	Standard_E96as_v5	96	672
Esv3	Standard_E4s_v3	4	32
Esv3	Standard_E8s_v3	8	64
Esv3	Standard_E16s_v3	16	128
Esv3	Standard_E32s_v3	32	256
Esv4	Standard_E4s_v4	4	32
Esv4	Standard_E8s_v4	8	64
Esv4	Standard_E16s_v4	16	128
Esv4	Standard_E20s_v4	20	160
Esv4	Standard_E32s_v4	32	256
Esv4	Standard_E48s_v4	48	384
Esv4	Standard_E64s_v4	64	504
Esv5	Standard_E4s_v5	4	32
Esv5	Standard_E8s_v5	8	64
Esv5	Standard_E16s_v5	16	128
Esv5	Standard_E20s_v5	20	160
Esv5	Standard_E32s_v5	32	256
Esv5	Standard_E48s_v5	48	384
Esv5	Standard_E64s_v5	64	512
Esv5	Standard_E96s_v5	96	672
Edsv5	Standard_E96ds_v5	96	672
Eisv3	Standard_E64is_v3	64	432
Eis4	Standard_E80is_v4	80	504
Eids4	Standard_E80ids_v4	80	504
Eisv5	Standard_E104is_v5	104	672
Eidsv5	Standard_E104ids_v5	104	672

Compute optimized

Series	Size	vCPU	Memory: GiB
Fsv2	Standard_F4s_v2	4	8
Fsv2	Standard_F8s_v2	8	16
Fsv2	Standard_F16s_v2	16	32
Fsv2	Standard_F32s_v2	32	64
Fsv2	Standard_F72s_v2	72	144

Memory and compute optimized

Series	Size	vCPU	Memory: GiB
Mms*	Standard_M128ms	128	3892