Time estimate: 45 minutes
In this section, we will learn how to tear down and reinstall a Kubernetes cluster. We will also work with calicoctl and etcdctl for examining our Calico and etcd resources.
Chapter Details | |
---|---|
Chapter Goal | Kubernetes cluster tear down and reinstallation |
Chapter Sections |
Step 1 Find Calico cni pods:
$ kubectl -n kube-system get pod -o wide -l k8s-app=calico-node
NAME READY STATUS RESTARTS AGE IP NODE ...
calico-node-9qmdk 1/1 Running 0 2d 172.16.1.90 node1
calico-node-bscn7 1/1 Running 0 2d 172.16.1.56 node2
calico-node-zfmx4 1/1 Running 0 2d 172.16.1.41 master
Step 2 Investigate the calico-node pod. Note in the output the usage of an init Container:
$ kubectl -n kube-system describe pod calico-node-xxxxx
Name: calico-node-zfmx4
Namespace: kube-system
....
Node: master/172.16.1.41
Start Time: Tue, 11 Mar 2020 16:51:49 +0000
Labels: controller-revision-hash=2178918083
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 172.16.1.41
Controlled By: DaemonSet/calico-node
Init Containers:
install-cni:
Container ID: docker://94253d62fd04282b54b72d72a0e3f9544910517342245fe7bfc47c4222e40137
Image: calico/cni:v3.13.1
Image ID: docker-pullable://calico/cni@sha256:c699d5ec4d0799ca5785e9134cfb1f55a1376ebdbb607f5601394736fceef7c8
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 11 Mar 2020 21:08:41 +0000
Finished: Fri, 11 Mar 2020 21:08:41 +0000
Ready: True
Restart Count: 0
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-zfmx4 (ro)
Containers:
calico-node:
Container ID: docker://b8984f13d50a367af13665755e16c5c1774b289848b20e49111025ee2197e6be
Image: calico/node:v3.13.1
Image ID: docker-pullable://calico/node@sha256:f24c59e93881178bfae85ee1375889fe9399edf1e15b0026713b2870cef079be
Port: <none>
Host Port: <none>
State: Running
Started: Fri, 11 Mar 2020 21:08:42 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 250m
Liveness: exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_IPV4POOL_CIDR: 192.168.0.0/16
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_LOGSEVERITYSCREEN: info
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-zfmx4 (ro)
Conditions:
....
Tolerations: :NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events: <none>
Step 4 Take a look at the install-cni.sh script. Considering that the script resides in an initContainer, we will use Docker to start a container and redirect the output to the terminal. Observe the option to monitor Kubernetes and updates the pod’s configuration. If we needed additional flexibility, we would need to reconfigure this container from being an initContainer to run indefinitely:
$ docker run -it --rm calico/cni:v3.13.1 tail -20 /install-cni.sh
# Unless told otherwise, sleep forever.
# This prevents Kubernetes from restarting the pod repeatedly.
should_sleep=${SLEEP:-"true"}
echo "Done configuring CNI. Sleep=$should_sleep"
while [ "$should_sleep" == "true" ]; do
# Kubernetes Secrets can be updated. If so, we need to install the updated
# version to the host. Just check the timestamp on the certificate to see if it
# has been updated. A bit hokey, but likely good enough.
if [ -e ${SECRETS_MOUNT_DIR}/etcd-cert ];
then
stat_output=$(stat -c%y ${SECRETS_MOUNT_DIR}/etcd-cert 2>/dev/null)
sleep 10;
if [ "$stat_output" != "$(stat -c%y ${SECRETS_MOUNT_DIR}/etcd-cert 2>/dev/null)" ]; then
echo "Updating installed secrets at: $(date)"
cp -p ${SECRETS_MOUNT_DIR}/* /host/etc/cni/net.d/calico-tls/
fi
else
sleep 10
fi
done
Step 5 Check the processes running in the calico-node container, which is a good example of a complex, multi-process container:
$ kubectl -n kube-system exec calico-node-xxxxx -- sh -c "ps axf"
PID USER TIME COMMAND
1 root 0:00 /sbin/runsvdir -P /etc/service/enabled
69 root 0:00 runsv felix
70 root 0:00 runsv bird
71 root 0:00 runsv confd
72 root 0:00 runsv bird6
73 root 0:01 bird6 -R -s /var/run/calico/bird6.ctl -d -c /etc/calico/confd/config/bird6.cfg
74 root 0:17 confd -confdir=/etc/calico/confd
75 root 0:01 bird -R -s /var/run/calico/bird.ctl -d -c /etc/calico/confd/config/bird.cfg
76 root 2:29 calico-felix
2201 root 0:00 sh -c ps axf
2205 root 0:00 ps axf
Step 6 Check the calico-node configurations for felix. bird, and confd:
$ kubectl -n kube-system exec calico-node-xxxxx -- sh -c "ls -R /etc/calico"
/etc/calico:
confd felix.cfg
/etc/calico/confd:
conf.d config templates
/etc/calico/confd/conf.d:
bird.toml bird6_aggr.toml bird_aggr.toml tunl-ip.toml
bird6.toml bird6_ipam.toml bird_ipam.toml
/etc/calico/confd/config:
bird.cfg bird6_aggr.cfg bird_aggr.cfg
bird6.cfg bird6_ipam.cfg bird_ipam.cfg
....
Step 7 Download calicoctl client application:
$ curl -O -L https://github.com/projectcalico/calicoctl/releases/download/v3.13.1/calicoctl
$ chmod +x calicoctl
Step 8 Use calicoctl to interact with Calico cni:
$ export DATASTORE_TYPE=kubernetes
$ export KUBECONFIG=~/.kube/config
$ sudo -E ./calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+-------------+
| 172.16.1.14 | node-to-node mesh | up | 17:25:07 | Established |
| 172.16.1.119 | node-to-node mesh | up | 17:25:35 | Established |
+--------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
$ sudo -E ./calicoctl get workloadendpoints -n kube-system
NAMESPACE WORKLOAD NODE NETWORKS INTERFACE
kube-system calico-kube-controllers-788d6b9876-kcmd6 master 192.168.219.65/32 caliab5181af135
kube-system coredns-6955765f44-wjz2d master 192.168.219.66/32 calif23cafbf8fb
kube-system coredns-6955765f44-z5vbv master 192.168.219.67/32 cali0302f58d39d
Notes
calicoctl is namespace-aware, so try running the ./calicoctl get workloadendpoints with a different namespace (e.g. sudo -E ./calicoctl get wep -n monitoring)
Step 1 Find etcd pods:
$ kubectl -n kube-system get pod -o wide -l component=etcd
NAME READY STATUS RESTARTS AGE IP NODE
etcd-master 1/1 Running 0 5d 172.16.1.182 master
Step 2 Download the etcdctl client application shipped with the container image:
$ kubectl -n kube-system cp etcd-master:usr/local/bin/etcdctl ./etcdctl
$ chmod +x etcdctl
$ sudo cp etcdctl /usr/local/bin/etcdctl
Step 3 The command for using etcdctl requires many parameters. To simplify it let’s define a shell alias for etcdctl to interact with etcd:
$ sudo -i
# alias ctl8="ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt"
Step 4 Use etcdctl to manage etcd:
# ctl8 version
etcdctl version: 3.4.3
API version: 3.4
# ctl8 member list
a874c87fd42044f, started, master, https://127.0.0.1:2380, https://127.0.0.1:2379, false
# ctl8 endpoint status
https://127.0.0.1:2379, a874c87fd42044f, 3.3.3, 11 MB, true, false, 2, 7783047, 7783047
# ctl8 check perf
60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%1m0s
PASS: Throughput is 151 writes/s
PASS: Slowest request took too long: 0.025225s
PASS: Stddev is 0.002121s
PASS
# ctl8 snapshot save snap1.etcdb
...
Snapshot saved at snap1.etcdb
# ctl8 --write-out=table snapshot status snap1.etcdb
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| bbfe515e | 6793301 | 19150 | 24 MB |
+----------+----------+------------+------------+
Step 5 Use etcdctl to query etcd:
# ctl8 get --prefix --keys-only /registry/services
/registry/services/endpoints/default/kubernetes
/registry/services/endpoints/kube-system/calico-typha
/registry/services/endpoints/kube-system/kube-controller-manager
...
# ctl8 get --prefix --keys-only /registry/pods/kube-system
/registry/pods/kube-system/calico-node-9b2sz
/registry/pods/kube-system/calico-node-ggwrd
/registry/pods/kube-system/calico-node-w7zq6
...
Step 6 Log-out from root user before moving on to the next step:
# logout
stack@master:~$
Tear down and reinstallation involves all the nodes your cluster.
Step 1 Before proceeding further record the ip address of the nodes in your cluster and make sure you can ssh to each one as follows:
$ for node in node1 node2; do
ssh $node uptime
done
Step 2 For each worker node in the cluster drain the node, delete it, and reset it:
$ for node in node1 node2; do
kubectl drain $node --delete-local-data --force --ignore-daemonsets
kubectl delete node $node
ssh $node sudo kubeadm reset
ssh $node sudo rm -f /etc/cni/net.d/*
done
node/node1 cordoned
WARNING: Ignoring DaemonSet-managed pods: calico-node-zqfxt, kube-proxy-4t4rr
node "node1" deleted
[reset] WARNING: changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] are you sure you want to proceed? [y/N]: y
[preflight] running pre-flight checks
[reset] stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] removing kubernetes-managed containers
[reset] cleaning up running containers using crictl with socket /var/run/dockershim.sock
[reset] failed to list running pods using crictl: exit status 1. Trying to use docker instead[reset] no etcd manifest found in "/etc/kubernetes/manifests/etcd.yaml". Assuming external etcd
[reset] deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
....
Step 3 Check for available nodes:
$ kubectl get node
NAME STATUS ROLES AGE VERSION
master Ready master 1d v1.17.4
Step 4 Drain, delete, and reset the master node:
$ kubectl drain master --delete-local-data --force --ignore-daemonsets
node/master cordoned
WARNING: Ignoring DaemonSet-managed pods: calico-node-kj7w6, kube-proxy-p5frg; Deleting pods with local storage: metrics-server-5c4945fb9f-kmsbn
pod/metrics-server-5c4945fb9f-kmsbn evicted
pod/coredns-78fcdf6894-cmnpk evicted
pod/coredns-78fcdf6894-gsbjl evicted
node/master evicted
If the above command doesn’t complete because the calico Pod is evicted you can safely terminate it with ^C and continue with:
$ kubectl delete node master
node "master" deleted
Reset the master node:
$ sudo kubeadm reset
[reset] WARNING: changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] are you sure you want to proceed? [y/N]: y
[preflight] running pre-flight checks
[reset] stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] removing kubernetes-managed containers
[reset] cleaning up running containers using crictl with socket /var/run/dockershim.sock
[reset] failed to list running pods using crictl: exit status 1. Trying to use docker instead[reset] deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/etcd]
[reset] deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
Step 5 Remove kubectl configuration and any cache files:
$ sudo rm -rf ~/.kube/*
$ sudo rm -f /etc/cni/net.d/*
Step 1 Before installing a cluster with kubeadm, make sure that each node has docker and kubeadm:
$ docker version
$ kubeadm version
$ for node in node1 node2; do
ssh $node docker version && kubeadm version
done
Also, there should be no cluster running on the nodes already (see 10.3. Tear Down the Cluster).
Step 2 Apply the Calico or Flannel CNI plugin instructions.
For Calico:
$ export POD_NETWORK="192.168.0.0/16"
For Flannel:
$ export POD_NETWORK="10.244.0.0/16"
Step 3 On the master node:
$ sudo kubeadm init --kubernetes-version v1.17.4 \
--apiserver-advertise-address $PrivateIP \
--pod-network-cidr $POD_NETWORK
...
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join 172.16.1.43:6443 --token bx8ny9.61p0sedk22w6qfev --discovery-token-ca-cert-hash sha256:b1101068444867fcc00fd612474bdec560cc67870e3fa37115356c7ec6435369
Record the value of token, --discovery-token-ca-cert-hash, and your apiserver IP address:port:
$ token=$(sudo kubeadm token list | awk 'NR > 1 {print $1}')
$ discoveryhash=sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Step 4 Follow the instruction in the output from the previous step:
$ sudo cp -i /etc/kubernetes/admin.conf ~/.kube/config
$ sudo chown stack:stack ~/.kube/config
Step 5 Apply the Calico or Flannel CNI plugin for the Kubernetes cluster.
For Calico execute:
stack@master:~$ kubectl apply -f ~/k8s-examples/addons/calico/kube-calico.yaml
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
For Flannel execute:
$ kubectl apply -f ~/k8s-examples/addons/flannel/kube-flannel.yaml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created
Step 6 Test your master node control-plane (below output is for Flannel):
$ kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-58b5ccf64b-swwp5 1/1 Running 0 19m
coredns-58b5ccf64b-x6g9w 1/1 Running 0 19m
etcd-master 1/1 Running 0 18m
kube-apiserver-master 1/1 Running 0 18m
kube-controller-manager-master 1/1 Running 0 18m
kube-flannel-ds-amd64-jh5hh 1/1 Running 0 3m9s
kube-flannel-ds-amd64-qlpth 1/1 Running 0 3m11s
kube-flannel-ds-amd64-rls7k 1/1 Running 0 4m44s
kube-proxy-mfg8j 1/1 Running 0 19m
kube-proxy-wjbdn 1/1 Running 0 3m9s
kube-proxy-zx2tl 1/1 Running 0 3m11s
kube-scheduler-master 1/1 Running 0 18m
Step 7 Add node1 and node2 to the cluster using the master’s IP address and token:
$ for node in node1 node2 ; do
ssh $node sudo kubeadm join ${PrivateIP}:6443 --token $token --discovery-token-ca-cert-hash $discoveryhash
done
[preflight] running pre-flight checks
I0909 17:04:37.833965 29232 kernel_validator.go:81] Validating kernel version
I0909 17:04:37.834034 29232 kernel_validator.go:96] Validating kernel config
[discovery] Trying to connect to API Server "172.16.1.43:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.16.1.43:6443"
[discovery] Requesting info from "https://172.16.1.43:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "172.16.1.43:6443"
[discovery] Successfully established connection with API Server "172.16.1.43:6443"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[preflight] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "node1" as an annotation
This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
....
Step 8 Test your cluster by creating a Deployment and testing the Pod endpoints:
$ cat <<EOF | kubectl create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: echoserver
spec:
replicas: 2
selector:
matchLabels:
app: echoserver
template:
metadata:
labels:
app: echoserver
spec:
containers:
- name: echoserver
image: k8s.gcr.io/echoserver:1.6
ports:
- containerPort: 8080
EOF
deployment.apps/echoserver created
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE ...
echoserver-5d5c779b47-7zvnq 1/1 Running 0 39s 10.244.2.3 node2
echoserver-5d5c779b47-l7prm 1/1 Running 0 39s 10.244.1.3 node1
$ curl 10.244.2.3:8080
Hostname: echoserver-5d5c779b47-7zvnq
Pod Information:
....
$ curl 10.244.1.3:8080
Hostname: echoserver-5d5c779b47-l7prm
Pod Information:
....
Congratulations! You have successfully re-installed your cluster with the Network plugin of your choice!