For Kubernetes production, a separated etcd
cluster with 3 nodes, outside of Kubernetes is preferred. Then a disaster recovery plan for just etcd
is required for production operation.
TL;DR
Backup etcd
in a Kubernetes cluster
Get on the node where etcdctl
is installed and execute with SSL
sudo etcdctl --key /etc/ssl/etcd/ssl/admin-node1-key.pem --cert /etc/ssl/etcd/ssl/admin-node1.pem snapshot save snapshot.db
The output looks like the following screenshot
Restore etcd
This is quite straight forward
sudo etcdctl --key /etc/ssl/etcd/ssl/admin-node1-key.pem --cert /etc/ssl/etcd/ssl/admin-node1.pem snapshot restore snapshot.db
Other Commands for Operation
- Display the cluster status
sudo etcdctl --key /etc/ssl/etcd/ssl/admin-node1-key.pem --cert /etc/ssl/etcd/ssl/admin-node1.pem endpoint --cluster status --write-out=table
The following is the expected output. You can find the endpoint 10.0.10.16
is the leader of etcd
cluster currently
- List members
sudo etcdctl --key /etc/ssl/etcd/ssl/admin-node1-key.pem --cert /etc/ssl/etcd/ssl/admin-node1.pem member list
Quite similar to the output of displaying cluster status
- Display endpoint health status
sudo etcdctl --key /etc/ssl/etcd/ssl/admin-node1-key.pem --cert /etc/ssl/etcd/ssl/admin-node1.pem endpoint health
Reference > https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/