openshiftbare-metal-server

Newly install okd4 cluster getting machine-config errors


I have installed the latest version of okd4 on a 5 node cluster where 3 control-planes and compute nodes.

When running oc get co I am seing the following error messages at the machine-config

NAME                                       VERSION                          AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.10.0-0.okd-2022-07-09-073606   True        False         False      7h14m   
baremetal                                  4.10.0-0.okd-2022-07-09-073606   True        False         False      15h     
cloud-controller-manager                   4.10.0-0.okd-2022-07-09-073606   True        False         False      15h     
cloud-credential                           4.10.0-0.okd-2022-07-09-073606   True        False         False      15h     
cluster-autoscaler                         4.10.0-0.okd-2022-07-09-073606   True        False         False      15h     
config-operator                            4.10.0-0.okd-2022-07-09-073606   True        False         False      15h     
console                                    4.10.0-0.okd-2022-07-09-073606   True        False         False      7h14m   
csi-snapshot-controller                    4.10.0-0.okd-2022-07-09-073606   True        False         False      13h     
dns                                        4.10.0-0.okd-2022-07-09-073606   True        False         False      13h     
etcd                                       4.10.0-0.okd-2022-07-09-073606   True        False         False      13h     
image-registry                             4.10.0-0.okd-2022-07-09-073606   True        False         False      3h1m    
ingress                                    4.10.0-0.okd-2022-07-09-073606   True        False         False      8h      
insights                                   4.10.0-0.okd-2022-07-09-073606   True        False         False      14h     
kube-apiserver                             4.10.0-0.okd-2022-07-09-073606   True        False         False      13h     
kube-controller-manager                    4.10.0-0.okd-2022-07-09-073606   True        False         False      13h     
kube-scheduler                             4.10.0-0.okd-2022-07-09-073606   True        False         False      14h     
kube-storage-version-migrator              4.10.0-0.okd-2022-07-09-073606   True        False         False      13h     
machine-api                                4.10.0-0.okd-2022-07-09-073606   True        False         False      14h     
machine-approver                           4.10.0-0.okd-2022-07-09-073606   True        False         False      15h     
machine-config                                                              True        True          True       13h     Unable to apply 4.10.0-0.okd-2022-07-09-073606: timed out waiting for the condition during syncRequiredMachineConfigPools: error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)
marketplace                                4.10.0-0.okd-2022-07-09-073606   True        False         False      15h     
monitoring                                 4.10.0-0.okd-2022-07-09-073606   True        False         False      8h      
network                                    4.10.0-0.okd-2022-07-09-073606   True        False         False      13h     
node-tuning                                4.10.0-0.okd-2022-07-09-073606   True        False         False      8h      
openshift-apiserver                        4.10.0-0.okd-2022-07-09-073606   True        False         False      13h     
openshift-controller-manager               4.10.0-0.okd-2022-07-09-073606   True        False         False      33m     
openshift-samples                          4.10.0-0.okd-2022-07-09-073606   True        False         False      13h     
operator-lifecycle-manager                 4.10.0-0.okd-2022-07-09-073606   True        False         False      14h     
operator-lifecycle-manager-catalog         4.10.0-0.okd-2022-07-09-073606   True        False         False      14h     
operator-lifecycle-manager-packageserver   4.10.0-0.okd-2022-07-09-073606   True        False         False      13h     
service-ca                                 4.10.0-0.okd-2022-07-09-073606   True        False         False      15h     
storage                                    4.10.0-0.okd-2022-07-09-073606   True        False         False      15h 

when running oc get mcp I am getting:

oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master                                                      False     True       True       3              0                   0                     3                      15h
worker   rendered-worker-04b4cdd431c21b96c1f98ca595ded448   True      False      False      2              2                   2                     0                      15h

and when I describe the degraded machine config I see the following:

oc describe mcp master
Name:         master
Namespace:    
Labels:       machineconfiguration.openshift.io/mco-built-in=
              operator.machineconfiguration.openshift.io/required-for-upgrade=
              pools.operator.machineconfiguration.openshift.io/master=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool
Metadata:
  Creation Timestamp:  2022-07-24T03:25:28Z
  Generation:          2
  Managed Fields:
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:machineconfiguration.openshift.io/mco-built-in:
          f:operator.machineconfiguration.openshift.io/required-for-upgrade:
          f:pools.operator.machineconfiguration.openshift.io/master:
      f:spec:
        .:
        f:configuration:
        f:machineConfigSelector:
          .:
          f:matchLabels:
            .:
            f:machineconfiguration.openshift.io/role:
        f:nodeSelector:
          .:
          f:matchLabels:
            .:
            f:node-role.kubernetes.io/master:
        f:paused:
    Manager:      machine-config-operator
    Operation:    Update
    Time:         2022-07-24T03:25:28Z
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:configuration:
          f:name:
          f:source:
    Manager:      machine-config-controller
    Operation:    Update
    Time:         2022-07-24T05:05:35Z
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
        f:configuration:
        f:degradedMachineCount:
        f:machineCount:
        f:observedGeneration:
        f:readyMachineCount:
        f:unavailableMachineCount:
        f:updatedMachineCount:
    Manager:         machine-config-controller
    Operation:       Update
    Subresource:     status
    Time:            2022-07-24T05:05:40Z
  Resource Version:  41348
  UID:               6eea1467-dfd1-4e25-a0a5-a303d21c4076
Spec:
  Configuration:
    Name:  rendered-master-5ac7b1a497e20b76e47aaf715bc0dc6f
    Source:
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         00-master
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-master-container-runtime
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-master-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-master-generated-crio-seccomp-use-default
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-master-generated-registries
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-master-okd-extensions
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-master-ssh
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-okd-master-disable-mitigations
  Machine Config Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  master
  Node Selector:
    Match Labels:
      node-role.kubernetes.io/master:  
  Paused:                              false
Status:
  Conditions:
    Last Transition Time:  2022-07-24T05:05:36Z
    Message:               
    Reason:                
    Status:                False
    Type:                  RenderDegraded
    Last Transition Time:  2022-07-24T05:05:40Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updated
    Last Transition Time:  2022-07-24T05:05:40Z
    Message:               All nodes are updating to rendered-master-5ac7b1a497e20b76e47aaf715bc0dc6f
    Reason:                
    Status:                True
    Type:                  Updating
    Last Transition Time:  2022-07-24T05:05:40Z
    Message:               
    Reason:                
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2022-07-24T05:05:40Z
    Message:               Node okd4-control-plane-1 is reporting: "machineconfig.machineconfiguration.openshift.io \"rendered-master-d06288fa8a499313709afdb2c727de31\" not found", Node okd4-control-plane-2 is reporting: "machineconfig.machineconfiguration.openshift.io \"rendered-master-d06288fa8a499313709afdb2c727de31\" not found", Node okd4-control-plane-3 is reporting: "machineconfig.machineconfiguration.openshift.io \"rendered-master-d06288fa8a499313709afdb2c727de31\" not found"
    Reason:                3 nodes are reporting degraded status on sync
    Status:                True
    Type:                  NodeDegraded
  Configuration:
  Degraded Machine Count:     3
  Machine Count:              3
  Observed Generation:        2
  Ready Machine Count:        0
  Unavailable Machine Count:  3
  Updated Machine Count:      0
Events:                       <none>

Any suggestion how to solve this?


Solution

  • Fixed it by deleting the master mcp which triggered it to be recreated and then everything got clean.

    oc delete mcp master