Rook没有创建任何mgr、osd 或其他守护进程

乡下的树 2022年05月13日 1,497次浏览

问题:

rook卸载干净后,重装,
运行kubectl apply -f cluster.yaml后,mgr/osd/crashcollector一直没有创建

状态如下:

[root@kubeadm-master examples]# kubectl get pod -n rook-ceph
NAME                                                       READY   STATUS             RESTARTS   AGE
csi-cephfsplugin-bsnks                                     3/3     Running            0          42m
csi-cephfsplugin-hvhhv                                     3/3     Running            0          42m
csi-cephfsplugin-nhclv                                     3/3     Running            0          42m
csi-cephfsplugin-provisioner-6f46df5457-ftgck              6/6     Running            17         42m
csi-cephfsplugin-provisioner-6f46df5457-j2chc              6/6     Running            11         42m
csi-rbdplugin-4dbgm                                        3/3     Running            0          42m
csi-rbdplugin-8xb28                                        3/3     Running            0          42m
csi-rbdplugin-provisioner-679495696f-9tjff                 6/6     Running            4          42m
csi-rbdplugin-provisioner-679495696f-dq7cw                 6/6     Running            11         42m
csi-rbdplugin-vl2xp                                        3/3     Running            0          42m
rook-ceph-mon-a-6b5f64864-vbbq7                            1/1     Running            0          42m
rook-ceph-mon-b-7cc87fd477-9pprq                           1/1     Running            0          41m
rook-ceph-mon-c-6cf888779b-sw4kj                           1/1     Running            2          41m
rook-ceph-operator-7b4f6fd594-xbgmc                        1/1     Running            0          43m

正常有以下组件:

NAME                                                      READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-27zxv                                    3/3     Running     0          15m
csi-cephfsplugin-bp9sl                                    3/3     Running     0          15m
csi-cephfsplugin-lhm7v                                    3/3     Running     0          15m
csi-cephfsplugin-provisioner-6f46df5457-cbs25             6/6     Running     0          15m
csi-cephfsplugin-provisioner-6f46df5457-t9c6d             6/6     Running     0          15m
csi-rbdplugin-b6sch                                       3/3     Running     0          15m
csi-rbdplugin-provisioner-679495696f-j5bbm                6/6     Running     0          15m
csi-rbdplugin-provisioner-679495696f-nwj55                6/6     Running     0          15m
csi-rbdplugin-qdksn                                       3/3     Running     0          15m
csi-rbdplugin-slb26                                       3/3     Running     0          15m
rook-ceph-crashcollector-kubeadm-master-66f5f8669-v7wvp   1/1     Running     0          13m
rook-ceph-crashcollector-kubeadm-node1-78cc4455db-nnwlt   1/1     Running     0          13m
rook-ceph-crashcollector-kubeadm-node2-ffdd84f87-lqbw7    1/1     Running     0          13m
rook-ceph-mgr-a-8468845986-7s8hm                          2/2     Running     0          14m
rook-ceph-mgr-b-6658c6746d-gb8qj                          2/2     Running     0          14m
rook-ceph-mon-a-59c465f888-gj4fk                          1/1     Running     0          15m
rook-ceph-mon-b-6b6df8ff7-mhlt2                           1/1     Running     0          14m
rook-ceph-mon-c-5976d5bff8-jn8pj                          1/1     Running     0          14m
rook-ceph-operator-7b948fb4b9-7xb46                       1/1     Running     0          16m
rook-ceph-osd-0-598c64b764-b2t8g                          1/1     Running     0          13m
rook-ceph-osd-1-685d6dfb69-qxxbn                          1/1     Running     0          13m
rook-ceph-osd-2-5f58b485f5-mp2j5                          1/1     Running     0          13m
rook-ceph-osd-3-57b547fdbd-8m2l9                          1/1     Running     0          13m
rook-ceph-osd-4-59fbff7f6-7xvbr                           1/1     Running     0          13m
rook-ceph-osd-5-f784b6747-ztlnt                           1/1     Running     0          13m
rook-ceph-osd-6-5957ddd669-k6js6                          1/1     Running     0          13m
rook-ceph-osd-7-7c5c685f4b-9csv4                          1/1     Running     0          13m
rook-ceph-osd-8-54dfd587dd-lkccb                          1/1     Running     0          13m
rook-ceph-osd-prepare-kubeadm-master-fcs8b                0/1     Completed   0          13m
rook-ceph-osd-prepare-kubeadm-node1-qb4dw                 0/1     Completed   0          13m
rook-ceph-osd-prepare-kubeadm-node2-p4gk4                 0/1     Completed   0          13m

检查operator日志:

没有对应的异常日志

2022-04-28 07:27:00.979471 E | ceph-cluster-controller: failed to reconcile CephCluster "rook-ceph/rook-ceph". failed to reconcile cluster "rook-ceph": failed to configure local ceph cluster: failed to create cluster: failed to start ceph monitors: failed to start mon pods: failed to check mon quorum b: failed to wait for mon quorum: exceeded max retry count waiting for monitors to reach quorum
2022-04-28 07:27:00.979534 I | ceph-cluster-controller: reconciling ceph cluster in namespace "rook-ceph"
2022-04-28 07:27:00.985564 I | op-mon: parsing mon endpoints: b=192.168.144.210:6789,c=192.168.56.52:6789,a=192.168.248.204:6789
2022-04-28 07:27:01.006742 I | ceph-spec: detecting the ceph image version for image quay.io/ceph/ceph:v16.2.7...
2022-04-28 07:27:02.462914 I | ceph-spec: detected ceph image version: "16.2.7-0 pacific"
2022-04-28 07:27:02.462954 I | ceph-cluster-controller: validating ceph version from provided image
2022-04-28 07:27:02.465954 I | op-mon: parsing mon endpoints: b=192.168.144.210:6789,c=192.168.56.52:6789,a=192.168.248.204:6789
2022-04-28 07:27:02.467929 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2022-04-28 07:27:02.468077 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2022-04-28 07:27:17.547979 E | ceph-cluster-controller: failed to get ceph daemons versions, this typically happens during the first cluster initialization. failed to run 'ceph versions'. . timed out: exit status 1
2022-04-28 07:27:17.548007 I | ceph-cluster-controller: cluster "rook-ceph": version "16.2.7-0 pacific" detected for image "quay.io/ceph/ceph:v16.2.7"
2022-04-28 07:27:17.583608 I | op-mon: start running mons
2022-04-28 07:27:17.586670 I | op-mon: parsing mon endpoints: b=192.168.144.210:6789,c=192.168.56.52:6789,a=192.168.248.204:6789
2022-04-28 07:27:17.594786 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"rook-ceph","monitors":["192.168.144.210:6789","1
92.168.56.52:6789","192.168.248.204:6789"]}] data:b=192.168.144.210:6789,c=192.168.56.52:6789,a=192.168.248.204:6789 mapping:{"node":{"a":{"Name":"kubeadm-node2","H
ostname":"kubeadm-node2","Address":"10.4.7.53"},"b":{"Name":"kubeadm-node1","Hostname":"kubeadm-node1","Address":"10.4.7.52"},"c":{"Name":"kubeadm-master","Hostname
":"kubeadm-master","Address":"10.4.7.51"}}} maxMonId:2]
2022-04-28 07:27:17.840326 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2022-04-28 07:27:17.840541 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2022-04-28 07:27:19.447544 I | op-mon: targeting the mon count 3
2022-04-28 07:27:19.457815 I | op-config: setting "global"="mon allow pool delete"="true" option to the mon configuration database
2022-04-28 07:27:34.458189 I | exec: timeout waiting for process ceph to return. Sending interrupt signal to the process
2022-04-28 07:27:34.459957 I | op-config: setting "global"="mon cluster log file"="" option to the mon configuration database
2022-04-28 07:27:49.461483 I | exec: timeout waiting for process ceph to return. Sending interrupt signal to the process
2022-04-28 07:27:49.463327 I | op-config: setting "global"="mon allow pool size one"="true" option to the mon configuration database
2022-04-28 07:28:04.465268 I | exec: timeout waiting for process ceph to return. Sending interrupt signal to the process
2022-04-28 07:28:04.467138 I | op-config: setting "global"="osd scrub auto repair"="true" option to the mon configuration database
2022-04-28 07:28:19.468157 I | exec: timeout waiting for process ceph to return. Sending interrupt signal to the process
2022-04-28 07:28:19.470010 W | op-mon: failed to set Rook and/or user-defined Ceph config options before starting mons; will retry after starting mons. failed to apply default Ceph configurations: failed to set one or more Ceph configs: failed to set ceph config in the centralized mon configuration database; you may need to use the rook-config-override ConfigMap. output: Cluster connection aborted: exit status 1: failed to set ceph config in the centralized mon configuration database; you may need to use the rook-config-override ConfigMap. output: Cluster connection aborted: exit status 1: failed to set ceph config in the centralized mon configuration database; you may need to use the rook-config-override ConfigMap. output: Cluster connection aborted: exit status 1: failed to set ceph config in the centralized mon configuration database; you may need to use the rook-config-override ConfigMap. output: Cluster connection aborted: exit status 1
2022-04-28 07:28:19.470029 I | op-mon: checking for basic quorum with existing mons
2022-04-28 07:28:19.501103 I | op-mon: mon "b" endpoint is [v2:192.168.144.210:3300,v1:192.168.144.210:6789]
2022-04-28 07:28:19.509047 I | op-mon: mon "c" endpoint is [v2:192.168.56.52:3300,v1:192.168.56.52:6789]
2022-04-28 07:28:19.875043 I | op-mon: mon "a" endpoint is [v2:192.168.248.204:3300,v1:192.168.248.204:6789]
2022-04-28 07:28:20.481182 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"rook-ceph","monitors":["192.168.248.204:6789","1
92.168.144.210:6789","192.168.56.52:6789"]}] data:b=192.168.144.210:6789,c=192.168.56.52:6789,a=192.168.248.204:6789 mapping:{"node":{"a":{"Name":"kubeadm-node2","H
ostname":"kubeadm-node2","Address":"10.4.7.53"},"b":{"Name":"kubeadm-node1","Hostname":"kubeadm-node1","Address":"10.4.7.52"},"c":{"Name":"kubeadm-master","Hostname
":"kubeadm-master","Address":"10.4.7.51"}}} maxMonId:2]
2022-04-28 07:28:21.073006 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2022-04-28 07:28:21.073197 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2022-04-28 07:28:21.493428 I | op-mon: deployment for mon rook-ceph-mon-b already exists. updating if needed
2022-04-28 07:28:21.501327 I | op-k8sutil: deployment "rook-ceph-mon-b" did not change, nothing to update
2022-04-28 07:28:21.501357 I | op-mon: waiting for mon quorum with [b c a]
2022-04-28 07:28:22.075107 I | op-mon: mons running: [b c a]
2022-04-28 07:28:42.178852 I | op-mon: mons running: [b c a]
2022-04-28 07:29:02.281005 I | op-mon: mons running: [b c a]
2022-04-28 07:29:22.497241 I | op-mon: mons running: [b c a]
2022-04-28 07:29:42.611496 I | op-mon: mons running: [b c a]
2022-04-28 07:30:02.717940 I | op-mon: mons running: [b c a]

原因:

删除卸载rook后,一定要清理磁盘,完全删除此类原始块磁盘上的任何信息

处理:

清理所有节点上的磁盘信息和数据

# dd if=/dev/zero of=/dev/sdX bs=1M status=progress
# rm -rf /var/lib/rook