master
узел, процесс восстановления неисправностиНе слишком запутывайтесь в настоящем и не беспокойтесь слишком сильно о будущем. Когда вы что-то испытываете, пейзаж перед вами уже не тот, что раньше. —— Харуки Мураками
Сегодня я провел эксперимент и узнал , один из кластеров master
на узле etcd
и apiserver
Все зависло, информация о кластере
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl get nodes
NAME STATUS ROLES AGE VERSION
vms100.liruilongs.github.io Ready control-plane 415d v1.25.1
vms101.liruilongs.github.io Ready control-plane 415d v1.25.1
vms102.liruilongs.github.io Ready control-plane 415d v1.25.1
vms103.liruilongs.github.io Ready <none> 415d v1.25.1
vms105.liruilongs.github.io Ready <none> 415d v1.25.1
vms106.liruilongs.github.io Ready <none> 415d v1.25.1
┌──[root@vms100.liruilongs.github.io]-[~]
└─$
vms100.liruilongs.github.io
этот узел на apiserver
и etcd
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl get pod -A -o wide | grep apiserver
kube-system kube-apiserver-vms100.liruilongs.github.io 0/1 CrashLoopBackOff 1448 (3m23s ago) 415d 192.168.26.100 vms100.liruilongs.github.io <none> <none>
kube-system kube-apiserver-vms101.liruilongs.github.io 1/1 Running 272 (3h18m ago) 415d 192.168.26.101 vms101.liruilongs.github.io <none> <none>
kube-system kube-apiserver-vms102.liruilongs.github.io 1/1 Running 246 (3h18m ago) 415d 192.168.26.102 vms102.liruilongs.github.io <none> <none>
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl get pod -A -o wide | grep etcd
kube-system etcd-vms100.liruilongs.github.io 0/1 CrashLoopBackOff 1244 (3m6s ago) 415d 192.168.26.100 vms100.liruilongs.github.io <none> <none>
kube-system etcd-vms101.liruilongs.github.io 1/1 Running 167 (3h18m ago) 415d 192.168.26.101 vms101.liruilongs.github.io <none> <none>
kube-system etcd-vms102.liruilongs.github.io 1/1 Running 173 (3h18m ago) 415d 192.168.26.102 vms102.liruilongs.github.io <none> <none>
Проверять keepalived
Соответствующий статический модуль работает нормально.
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl get pod -A -o wide | grep keep
kube-system keepalived-vms100.liruilongs.github.io 1/1 Running 63 (3h50m ago) 415d 192.168.26.100 vms100.liruilongs.github.io <none> <none>
kube-system keepalived-vms101.liruilongs.github.io 1/1 Running 54 (3h51m ago) 415d 192.168.26.101 vms101.liruilongs.github.io <none> <none>
kube-system keepalived-vms102.liruilongs.github.io 1/1 Running 60 (3h51m ago) 415d 192.168.26.102 vms102.liruilongs.github.io <none> <none>
┌──[root@vms100.liruilongs.github.io]-[~]
└─$
так что это может быть etcd
Данные не синхронизированы или по какой-либо причине 导致etcd
Вешать трубку. потому что каждый master
Узловой apiserver
Только книга Узловой etcd
руководить коммуникация(каждый etcd
Заявки на запись будут перенаправлены на etcd
лидерный узел), и т.д. Повесь трубку, аписервер Он не может предоставить возможности, поэтому тоже умрет.
проходить etcdctl
можно найти vms100.liruilongs.github.io
на etcd
Полностью мертв
┌──[root@vms100.liruilongs.github.io]-[~]
└─$ETCDCTL_API=3 etcdctl --endpoints https://127.0.0.1:2379 \
--cert="/etc/kubernetes/pki/etcd/server.crt" \
--key="/etc/kubernetes/pki/etcd/server.key" \
--cacert="/etc/kubernetes/pki/etcd/ca.crt" \
member list -w table
Error: dial tcp 127.0.0.1:2379: connect: connection refused
Здесь мы меняем etcd
узел осуществлять Заказ
Проверять etcd члены кластера
┌──[root@vms100.liruilongs.github.io]-[~]
└─$ssh vms101.liruilongs.github.io
Last login: Sat Mar 2 09:52:01 2024 from 192.168.26.100
┌──[root@vms101.liruilongs.github.io]-[~]
└─$ETCDCTL_API=3 etcdctl --endpoints https://127.0.0.1:2379 \
--cert="/etc/kubernetes/pki/etcd/server.crt" \
--key="/etc/kubernetes/pki/etcd/server.key" \
--cacert="/etc/kubernetes/pki/etcd/ca.crt" \
member list -w table
+------------------+---------+-----------------------------+-----------------------------+-----------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+-----------------------------+-----------------------------+-----------------------------+
| ee392e5273e89e2 | started | vms100.liruilongs.github.io | https://192.168.26.100:2380 | https://192.168.26.100:2379 |
| 70059e836d19883d | started | vms101.liruilongs.github.io | https://192.168.26.101:2380 | https://192.168.26.101:2379 |
| b8cb9f66c2e63b91 | started | vms102.liruilongs.github.io | https://192.168.26.102:2380 | https://192.168.26.102:2379 |
+------------------+---------+-----------------------------+-----------------------------+-----------------------------+
Проверятьузелсостояние
┌──[root@vms101.liruilongs.github.io]-[~]
└─$ETCDCTL_API=3 etcdctl --endpoints https://127.0.0.1:2379 \
--cert="/etc/kubernetes/pki/etcd/server.crt" \
--key="/etc/kubernetes/pki/etcd/server.key" \
--cacert="/etc/kubernetes/pki/etcd/ca.crt" \
endpoint status --cluster -w table
Failed to get the status of endpoint https://192.168.26.100:2379 (context deadline exceeded)
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://192.168.26.101:2379 | 70059e836d19883d | 3.5.4 | 88 MB | false | 603 | 22208417 |
| https://192.168.26.102:2379 | b8cb9f66c2e63b91 | 3.5.4 | 88 MB | true | 603 | 22208417 |
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
Конечно ETCD узел Вина
┌──[root@vms101.liruilongs.github.io]-[~]
└─$ETCDCTL_API=3 etcdctl --endpoints https://127.0.0.1:2379 \
--cert="/etc/kubernetes/pki/etcd/server.crt" \
--key="/etc/kubernetes/pki/etcd/server.key" \
--cacert="/etc/kubernetes/pki/etcd/ca.crt" \
endpoint health --cluster -w table
https://192.168.26.101:2379 is healthy: successfully committed proposal: took = 3.753357ms
https://192.168.26.102:2379 is healthy: successfully committed proposal: took = 2.989943ms
https://192.168.26.100:2379 is unhealthy: failed to connect: dial tcp 192.168.26.100:2379: connect: connection refused
Error: unhealthy cluster
Проверять etcd
Журналы контейнера
┌──[root@vms100.liruilongs.github.io]-[~]
└─$docker ps -a | grep etcd
0f2f98ebf8c3 a8a176a5d5d6 "etcd --advertise-cl…" 4 minutes ago Exited (2) 4 minutes ago k8s_etcd_etcd-vms100.liruilongs.github.io_kube-system_e8c17bb99f9bd8119cdd769556041e18_1252
a4b39d16a753 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 4 hours ago Up 4 hours k8s_POD_etcd-vms100.liruilongs.github.io_kube-system_e8c17bb99f9bd8119cdd769556041e18_54
┌──[root@vms100.liruilongs.github.io]-[~]
└─$docker logs 0f2f98ebf8c3
{"level":"info","ts":"2024-03-16T14:46:54.644Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--advertise-client-urls=https://192.168.26.100:2379","--cert-file=/etc/kubernetes/pki/etcd/server.crt","--client-cert-auth=true","--data-dir=/var/lib/etcd","--experimental-initial-corrupt-check=true","--experimental-watch-progress-notify-interval=5s","--initial-advertise-peer-urls=https://192.168.26.100:2380","--initial-cluster=vms100.liruilongs.github.io=https://192.168.26.100:2380","--key-file=/etc/kubernetes/pki/etcd/server.key","--listen-client-urls=https://127.0.0.1:2379,https://192.168.26.100:2379","--listen-metrics-urls=http://127.0.0.1:2381","--listen-peer-urls=https://192.168.26.100:2380","--name=vms100.liruilongs.github.io","--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt","--peer-client-cert-auth=true","--peer-key-file=/etc/kubernetes/pki/etcd/peer.key","--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt","--snapshot-count=10000","--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt"]}
{"level":"info","ts":"2024-03-16T14:46:54.645Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/lib/etcd","dir-type":"member"}
{"level":"info","ts":"2024-03-16T14:46:54.645Z","caller":"embed/etcd.go:131","msg":"configuring peer listeners","listen-peer-urls":["https://192.168.26.100:2380"]}
{"level":"info","ts":"2024-03-16T14:46:54.645Z","caller":"embed/etcd.go:479","msg":"starting with peer TLS","tls-info":"cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, client-cert=, client-key=, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
{"level":"info","ts":"2024-03-16T14:46:54.645Z","caller":"embed/etcd.go:139","msg":"configuring client listeners","listen-client-urls":["https://127.0.0.1:2379","https://192.168.26.100:2379"]}
{"level":"info","ts":"2024-03-16T14:46:54.645Z","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.4","git-sha":"08407ff76","go-version":"go1.16.15","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":true,"name":"vms100.liruilongs.github.io","data-dir":"/var/lib/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":10000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://192.168.26.100:2380"],"listen-peer-urls":["https://192.168.26.100:2380"],"advertise-client-urls":["https://192.168.26.100:2379"],"listen-client-urls":["https://127.0.0.1:2379","https://192.168.26.100:2379"],"listen-metrics-urls":["http://127.0.0.1:2381"],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-size-bytes":2147483648,"pre-vote":true,"initial-corrupt-check":true,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
panic: freepages: failed to get all reachable pages (page 7744: multiple references)
goroutine 109 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2(0xc00009c480)
/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:1056 +0xe9
created by go.etcd.io/bbolt.(*DB).freepages
/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:1054 +0x1cd
┌──[root@vms100.liruilongs.github.io]-[~]
└─$
Самый быстрый способ здесь — повторно синхронизировать его. узелиздата, то есть убрать этот неисправный узел кластер, после устранения неисправности узла старые данные повторно добавить,Этапы работы
Очистить каталог данных
,Полный документ статического модуля: лучшее время киносервиса,Затемудалитьetcd
каталог данных。Удалить глючный узел
:использоватьmember remove
Заказ Устранить ошибкиузел,Можно в здоровье изузелосуществлять Заказ.Добавить узел
:использоватьmember add
Заказ添加Винаузел。Перезапуск
:двигаться Винаузелyamlдокумент,руководитьзапускать
Примечание
: Статический модуль выполнить загружает указанный каталог из yaml документ для расписания,kubelet
Будет регулярно сканировать и удалять ходы yaml файл, статический Pod Он остановится автоматически по той же причине. добавить в yaml Файлы автоматически создаются статически. Pod
двигаться Статический модуль ямл документ
┌──[root@vms100.liruilongs.github.io]-[~]
└─$mv /etc/kubernetes/manifests/{etcd.yaml,kube-apiserver.yaml} /tmp/
удалитьetcd
каталог данных
┌──[root@vms100.liruilongs.github.io]-[~]
└─$rm -rf /var/lib/etcd/*
подтверждатьузел из etcd
и apiservier
Все остановились
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl get pod -A -o wide | grep apiserver
kube-system kube-apiserver-vms101.liruilongs.github.io 1/1 Running 272 (4h15m ago) 415d 192.168.26.101 vms101.liruilongs.github.io <none> <none>
kube-system kube-apiserver-vms102.liruilongs.github.io 1/1 Running 246 (4h15m ago) 415d 192.168.26.102 vms102.liruilongs.github.io <none> <none>
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl get pod -A -o wide | grep etcd
kube-system etcd-vms101.liruilongs.github.io 1/1 Running 167 (4h15m ago) 415d 192.168.26.101 vms101.liruilongs.github.io <none> <none>
kube-system etcd-vms102.liruilongs.github.io 1/1 Running 173 (4h15m ago) 415d 192.168.26.102 vms102.liruilongs.github.io <none> <none>
┌──[root@vms100.liruilongs.github.io]-[~]
└─$
Получить неисправный узел ID, следующая операция мы здоровье из etcd
узелосуществлять, или можно изменить --endpoints
┌──[root@vms101.liruilongs.github.io]-[~]
└─$ETCDCTL_API=3 etcdctl --endpoints https://192.168.26.101:2379 --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" member list -w table
+------------------+---------+-----------------------------+-----------------------------+-----------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+-----------------------------+-----------------------------+-----------------------------+
| ee392e5273e89e2 | started | vms100.liruilongs.github.io | https://192.168.26.100:2380 | https://192.168.26.100:2379 |
| 70059e836d19883d | started | vms101.liruilongs.github.io | https://192.168.26.101:2380 | https://192.168.26.101:2379 |
| b8cb9f66c2e63b91 | started | vms102.liruilongs.github.io | https://192.168.26.102:2380 | https://192.168.26.102:2379 |
+------------------+---------+-----------------------------+-----------------------------+-----------------------------+
Удалить глючный узел
┌──[root@vms101.liruilongs.github.io]-[~]
└─$ETCDCTL_API=3 etcdctl --endpoints https://127.0.0.1:2379 --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" member remove ee392e5273e89e2
Member ee392e5273e89e2 removed from cluster 4816f346663d82a7
повторно добавить
┌──[root@vms101.liruilongs.github.io]-[~]
└─$ETCDCTL_API=3 etcdctl --endpoints https://127.0.0.1:2379 --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" member add vms100.liruilongs.github.io --peer-urls=https://192.168.26.100:2380
Member 456f71fdc1ad9917 added to cluster 4816f346663d82a7
ETCD_NAME="vms100.liruilongs.github.io"
ETCD_INITIAL_CLUSTER="vms100.liruilongs.github.io=https://192.168.26.100:2380,vms101.liruilongs.github.io=https://192.168.26.101:2380,vms102.liruilongs.github.io=https://192.168.26.102:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.26.100:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
вернуться в 100 узел автомат, мобильный Yaml документ,восстановить узел
┌──[root@vms100.liruilongs.github.io]-[~]
└─$mv /tmp/{etcd.yaml,kube-apiserver.yaml} /etc/kubernetes/manifests/
Подтвердить статус модуля
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl get pod -A -o wide | grep etcd
kube-system etcd-vms100.liruilongs.github.io 1/1 Running 0 16s 192.168.26.100 vms100.liruilongs.github.io <none> <none>
kube-system etcd-vms101.liruilongs.github.io 1/1 Running 167 (4h32m ago) 415d 192.168.26.101 vms101.liruilongs.github.io <none> <none>
kube-system etcd-vms102.liruilongs.github.io 1/1 Running 173 (4h32m ago) 415d 192.168.26.102 vms102.liruilongs.github.io <none> <none>
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl get pod -A -o wide | grep apiserver
kube-system kube-apiserver-vms100.liruilongs.github.io 1/1 Running 0 24s 192.168.26.100 vms100.liruilongs.github.io <none> <none>
kube-system kube-apiserver-vms101.liruilongs.github.io 1/1 Running 272 (4h32m ago) 415d 192.168.26.101 vms101.liruilongs.github.io <none> <none>
kube-system kube-apiserver-vms102.liruilongs.github.io 1/1 Running 246 (4h32m ago) 415d 192.168.26.102 vms102.liruilongs.github.io <none> <none>
┌──[root@vms100.liruilongs.github.io]-[~]
└─$
Проверять etcd Статус кластера
┌──[root@vms101.liruilongs.github.io]-[~]
└─$ETCDCTL_API=3 etcdctl --endpoints https://127.0.0.1:2379 --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" member list -w table
+------------------+-----------+-----------------------------+-----------------------------+-----------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+-----------+-----------------------------+-----------------------------+-----------------------------+
| 54952f3b494c0286 | unstarted | | https://192.168.26.100:2380 | |
| 70059e836d19883d | started | vms101.liruilongs.github.io | https://192.168.26.101:2380 | https://192.168.26.101:2379 |
| b8cb9f66c2e63b91 | started | vms102.liruilongs.github.io | https://192.168.26.102:2380 | https://192.168.26.102:2379 |
+------------------+-----------+-----------------------------+-----------------------------+-----------------------------+
Здесь мы находим Новое дополнение изузелсостояние это не нормально, так было всегда unstarted
мы Винаузелосуществлять etcd
Заказ. Было обнаружено, что узел неисправности не был добавлен в кластер, а работал как один узел.
┌──[root@vms100.liruilongs.github.io]-[~]
└─$ETCDCTL_API=3 etcdctl --endpoints https://127.0.0.1:2379 --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" member list -w table
+-----------------+---------+-----------------------------+-----------------------------+-----------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+-----------------+---------+-----------------------------+-----------------------------+-----------------------------+
| ee392e5273e89e2 | started | vms100.liruilongs.github.io | https://192.168.26.100:2380 | https://192.168.26.100:2379 |
+-----------------+---------+-----------------------------+-----------------------------+-----------------------------+
┌──[root@vms100.liruilongs.github.io]-[~]
└─$ETCDCTL_API=3 etcdctl --endpoints https://127.0.0.1:2379 --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" endpoint status --cluster -w table
+-----------------------------+-----------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-----------------------------+-----------------+---------+---------+-----------+-----------+------------+
| https://192.168.26.100:2379 | ee392e5273e89e2 | 3.5.4 | 815 kB | true | 2 | 2261 |
+-----------------------------+-----------------+---------+---------+-----------+-----------+------------+
┌──[root@vms100.liruilongs.github.io]-[~]
└─$
Синхронизации тоже нет текущийкластеризданные
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl get pod -A -o wide --server=https://vms100.liruilongs.github.io:6443
No resources found
В этом случае большинство причин кто-то Узловой etcd
Конфигурациядокументизвопрос,Мой вопрос Вина Узловой etcd Конфигурациядокумент,Нет конфигурации, связанной с информацией о кластере.
,Итак, запишите соответствующую конфигурацию кластера в конфигурацию
Первоначально из документа конфигурации
┌──[root@vms100.liruilongs.github.io]-[~]
└─$cat /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.26.100:2379
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://192.168.26.100:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --experimental-initial-corrupt-check=true
- --experimental-watch-progress-notify-interval=5s
- --initial-advertise-peer-urls=https://192.168.26.100:2380
- --initial-cluster=vms100.liruilongs.github.io=https://192.168.26.100:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://192.168.26.100:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://192.168.26.100:2380
- --name=vms100.liruilongs.github.io
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: registry.aliyuncs.com/google_containers/etcd:3.5.4-0
。。。。。。。。。。。。。。。。
кластер Неполная информацияиз, после добавления настройте документ
┌──[root@vms100.liruilongs.github.io]-[~]
└─$cat /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.26.100:2379
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://192.168.26.100:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --experimental-initial-corrupt-check=true
- --experimental-watch-progress-notify-interval=5s
- --initial-advertise-peer-urls=https://192.168.26.100:2380
- --initial-cluster=vms100.liruilongs.github.io=https://192.168.26.100:2380,vms101.liruilongs.github.io=https://192.168.26.101:2380
- --initial-cluster-state=existing
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://192.168.26.100:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://192.168.26.100:2380
- --name=vms100.liruilongs.github.io
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
Потом мы восстановили его еще раз так же, как описано выше, и обнаружили, что узел не встал напрямую.
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl get pod -A -o wide | grep apiserver
kube-system kube-apiserver-vms100.liruilongs.github.io 0/1 CrashLoopBackOff 1 (18s ago) 39s 192.168.26.100 vms100.liruilongs.github.io <none> <none>
kube-system kube-apiserver-vms101.liruilongs.github.io 1/1 Running 272 (5h29m ago) 415d 192.168.26.101 vms101.liruilongs.github.io <none> <none>
kube-system kube-apiserver-vms102.liruilongs.github.io 1/1 Running 246 (5h29m ago) 415d 192.168.26.102 vms102.liruilongs.github.io <none> <none>
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl get pod -A -o wide | grep etcd
kube-system etcd-vms100.liruilongs.github.io 0/1 CrashLoopBackOff 3 (21s ago) 53s 192.168.26.100 vms100.liruilongs.github.io <none> <none>
kube-system etcd-vms101.liruilongs.github.io 1/1 Running 167 (5h29m ago) 415d 192.168.26.101 vms101.liruilongs.github.io <none> <none>
kube-system etcd-vms102.liruilongs.github.io 1/1 Running 173 (5h29m ago) 415d 192.168.26.102 vms102.liruilongs.github.io <none> <none>
Проверятьбревно
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl logs etcd-vms100.liruilongs.github.io -n kube-system
.............................
{"level":"fatal","ts":"2024-03-16T16:25:19.981Z","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"error validating peerURLs {ClusterID:4816f346663d82a7 Members:[&{ID:b8cb9f66c2e63b91 RaftAttributes:{PeerURLs:[https://192.168.26.102:2380] IsLearner:false} Attributes:{Name:vms102.liruilongs.github.io ClientURLs:[https://192.168.26.102:2379]}} &{ID:3fbbbed942c51f7b RaftAttributes:{PeerURLs:[https://192.168.26.100:2380] IsLearner:false} Attributes:{Name: ClientURLs:[]}} &{ID:70059e836d19883d RaftAttributes:{PeerURLs:[https://192.168.26.101:2380] IsLearner:false} Attributes:{Name:vms101.liruilongs.github.io ClientURLs:[https://192.168.26.101:2379]}}] RemovedMemberIDs:[]}: member count is unequal","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/main.go:40\nmain.main\n\t/go/src/go.etcd.io/etcd/release/etcd/server/main.go:32\nruntime.main\n\t/go/gos/go1.16.15/src/runtime/proc.go:225"}
По данным журнала можно увидеть полезную информацию RemovedMemberIDs:[]}: member count is unequal
,количество участников не равно, в журнале анализа
{
"level": "info",
"ts": "2024-03-16T16:25:19.961Z",
"caller": "etcdmain/etcd.go:73",
"msg": "Running: ",
"args": [
"etcd",
"--advertise-client-urls=https://192.168.26.100:2379",
"--cert-file=/etc/kubernetes/pki/etcd/server.crt",
"--client-cert-auth=true",
"--data-dir=/var/lib/etcd",
"--experimental-initial-corrupt-check=true",
"--experimental-watch-progress-notify-interval=5s",
"--initial-advertise-peer-urls=https://192.168.26.100:2380",
"--initial-cluster=vms100.liruilongs.github.io=https://192.168.26.100:2380,vms101.liruilongs.github.io=https://192.168.26.101:2380",
"--initial-cluster-state=existing",
"--key-file=/etc/kubernetes/pki/etcd/server.key",
"--listen-client-urls=https://127.0.0.1:2379,https://192.168.26.100:2379",
"--listen-metrics-urls=http://127.0.0.1:2381",
"--listen-peer-urls=https://192.168.26.100:2380",
"--name=vms100.liruilongs.github.io",
"--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt",
"--peer-client-cert-auth=true",
"--peer-key-file=/etc/kubernetes/pki/etcd/peer.key",
"--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt",
"--snapshot-count=10000",
"--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt"
]
}
..............................................................................
{
"level": "warn",
"ts": "2024-03-16T16:25:19.981Z",
"caller": "etcdmain/etcd.go:146",
"msg": "failed to start etcd",
"error": "error validating peerURLs {ClusterID:4816f346663d82a7 Members:[&{ID:b8cb9f66c2e63b91 RaftAttributes:{PeerURLs:[https://192.168.26.102:2380] IsLearner:false} Attributes:{Name:vms102.liruilongs.github.io ClientURLs:[https://192.168.26.102:2379]}} &{ID:3fbbbed942c51f7b RaftAttributes:{PeerURLs:[https://192.168.26.100:2380] IsLearner:false} Attributes:{Name: ClientURLs:[]}} &{ID:70059e836d19883d RaftAttributes:{PeerURLs:[https://192.168.26.101:2380] IsLearner:false} Attributes:{Name:vms101.liruilongs.github.io ClientURLs:[https://192.168.26.101:2379]}}] RemovedMemberIDs:[]}: member count is unequal"
}
{
"level": "fatal",
"ts": "2024-03-16T16:25:19.981Z",
"caller": "etcdmain/etcd.go:204",
"msg": "discovery failed",
"error": "error validating peerURLs {ClusterID:4816f346663d82a7 Members:[&{ID:b8cb9f66c2e63b91 RaftAttributes:{PeerURLs:[https://192.168.26.102:2380] IsLearner:false} Attributes:{Name:vms102.liruilongs.github.io ClientURLs:[https://192.168.26.102:2379]}} &{ID:3fbbbed942c51f7b RaftAttributes:{PeerURLs:[https://192.168.26.100:2380] IsLearner:false} Attributes:{Name: ClientURLs:[]}} &{ID:70059e836d19883d RaftAttributes:{PeerURLs:[https://192.168.26.101:2380] IsLearner:false} Attributes:{Name:vms101.liruilongs.github.io ClientURLs:[https://192.168.26.101:2379]}}] RemovedMemberIDs:[]}: member count is unequal",
"stacktrace": "go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/main.go:40\nmain.main\n\t/go/src/go.etcd.io/etcd/release/etcd/server/main.go:32\nruntime.main\n\t/go/gos/go1.16.15/src/runtime/proc.go:225"
}
Вы можете видеть, что это подсказывает Возможна ошибка с vms102.liruilongs.github.io
связанный с узлом
Тогда давайте посмотрим vms102.liruilongs.github.io
из Конфигурациядокумент
┌──[root@vms102.liruilongs.github.io]-[~]
└─$cat /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.26.102:2379
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://192.168.26.102:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --experimental-initial-corrupt-check=true
- --experimental-watch-progress-notify-interval=5s
- --initial-advertise-peer-urls=https://192.168.26.102:2380
- --initial-cluster=vms100.liruilongs.github.io=https://192.168.26.100:2380,vms102.liruilongs.github.io=https://192.168.26.102:2380,vms101.liruilongs.github.io=https://192.168.26.101:2380
- --initial-cluster-state=existing
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://192.168.26.102:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://192.168.26.102:2380
- --name=vms102.liruilongs.github.io
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
проходитьНастроить сравнение документов
,можно найти,Проблема с предыдущей конфигурацией все еще существует. Проблема с конфигурацией все еще существует.,один отсутствуетvms102.liruilongs.github.io=https://192.168.26.102:2380
узелинформация。
"--initial-cluster=vms100.liruilongs.github.io=https://192.168.26.100:2380,vms101.liruilongs.github.io=https://192.168.26.101:2380",
"--initial-cluster=vms100.liruilongs.github.io=https://192.168.26.100:2380,vms102.liruilongs.github.io=https://192.168.26.102:2380,vms101.liruilongs.github.io=https://192.168.26.101:2380"
После изменения конфигурации выполните тот же процесс, что и выше, чтобы восстановить узел. восстановление узла
проходить etcdctl
Заказчек
┌──[root@vms100.liruilongs.github.io]-[~]
└─$ETCDCTL_API=3 etcdctl --endpoints https://127.0.0.1:2379 --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" member list -w table
+------------------+---------+-----------------------------+-----------------------------+-----------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+-----------------------------+-----------------------------+-----------------------------+
| 70059e836d19883d | started | vms101.liruilongs.github.io | https://192.168.26.101:2380 | https://192.168.26.101:2379 |
| ac5f6045dbe477b3 | started | vms100.liruilongs.github.io | https://192.168.26.100:2380 | https://192.168.26.100:2379 |
| b8cb9f66c2e63b91 | started | vms102.liruilongs.github.io | https://192.168.26.102:2380 | https://192.168.26.102:2379 |
+------------------+---------+-----------------------------+-----------------------------+-----------------------------+
┌──[root@vms100.liruilongs.github.io]-[~]
└─$ETCDCTL_API=3 etcdctl --endpoints https://127.0.0.1:2379 --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" endpoint status --cluster -w table
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://192.168.26.101:2379 | 70059e836d19883d | 3.5.4 | 88 MB | false | 603 | 22227327 |
| https://192.168.26.100:2379 | ac5f6045dbe477b3 | 3.5.4 | 88 MB | false | 603 | 22227327 |
| https://192.168.26.102:2379 | b8cb9f66c2e63b91 | 3.5.4 | 88 MB | true | 603 | 22227327 |
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
┌──[root@vms100.liruilongs.github.io]-[~]
└─$
Винавосстановление узла,В реальной эксплуатации,Закончил добавление узла,Нам нужно подтвердить неисправность Узлового документа конфигурации, корректен ли он из документа конфигурации.
© liruilonger@gmail.com, 2018–2024. Все права сохранены. Attribution-NonCommercial-ShareAlike (CC BY-NC-SA 4.0).