In this section, we will learn how to tear down and reinstall a Kubernetes cluster
Chapter Details | |
---|---|
Chapter Goal | Kubernetes cluster tear down and reinstallation |
Chapter Sections |
Step 1 Find Calico cni pods:
$ kubectl -n kube-system get pod -o wide -l k8s-app=calico-node
NAME READY STATUS RESTARTS AGE IP NODE
calico-node-9qmdk 2/2 Running 0 2d 172.16.1.90 node1
calico-node-bscn7 2/2 Running 0 2d 172.16.1.56 node2
calico-node-zfmx4 2/2 Running 0 2d 172.16.1.41 master
Step 2 Investigate the calico-node pod comprising of two containers:
$ kubectl -n kube-system describe pod calico-node-xxxxx
Name: calico-node-zfmx4
Namespace: kube-system
....
Node: master/172.16.1.41
Start Time: Tue, 11 Sep 2018 16:51:49 +0000
Labels: controller-revision-hash=2178918083
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 172.16.1.41
Controlled By: DaemonSet/calico-node
Containers:
calico-node:
Container ID: docker://b1e9ff232809d730791e34ade1ac63377335a45f93044858aabb1b2a0e42a00e
Image: quay.io/calico/node:v3.1.3
....
Requests:
cpu: 250m
Liveness: http-get http://:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: http-get http://:9099/readiness delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
DATASTORE_TYPE: kubernetes
FELIX_LOGSEVERITYSCREEN: info
CLUSTER_TYPE: k8s,bgp
....
Mounts:
/lib/modules from lib-modules (ro)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-t6zwv (ro)
install-cni:
Container ID: docker://edd0926e4884c33bca135c55290ae1375f77134ab7bc567d0ac2f0205532e423
Image: quay.io/calico/cni:v3.1.3
....
Command:
/install-cni.sh
....
Environment:
CNI_CONF_NAME: 10-calico.conflist
....
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-t6zwv (ro)
Conditions:
....
Tolerations: :NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events: <none>
Step 3 Check the processes running in the install-cni container:
$ kubectl -n kube-system exec calico-node-xxxxx -c install-cni -- sh -c "ps axf"
PID USER TIME COMMAND
1 root 0:00 {install-cni.sh} /bin/sh /install-cni.sh
1467 root 0:00 sleep 10
1468 root 0:00 ps axf
Step 4 Take a look at the install-cni.sh script. Observe how it monitors Kubernetes and updates the pod’s configuration:
$ kubectl -n kube-system exec calico-node-xxxxx -c install-cni -- sh -c "tail -20 /install-cni.sh"
# Unless told otherwise, sleep forever.
# This prevents Kubernetes from restarting the pod repeatedly.
should_sleep=${SLEEP:-"true"}
echo "Done configuring CNI. Sleep=$should_sleep"
while [ "$should_sleep" == "true" ]; do
# Kubernetes Secrets can be updated. If so, we need to install the updated
# version to the host. Just check the timestamp on the certificate to see if it
# has been updated. A bit hokey, but likely good enough.
if [ -e ${SECRETS_MOUNT_DIR}/etcd-cert ];
then
stat_output=$(stat -c%y ${SECRETS_MOUNT_DIR}/etcd-cert 2>/dev/null)
sleep 10;
if [ "$stat_output" != "$(stat -c%y ${SECRETS_MOUNT_DIR}/etcd-cert 2>/dev/null)" ]; then
echo "Updating installed secrets at: $(date)"
cp -p ${SECRETS_MOUNT_DIR}/* /host/etc/cni/net.d/calico-tls/
fi
else
sleep 10
fi
done
Step 5 Check the processes running in the calico-node container, which is a good example of a complex, multi-process container:
$ kubectl -n kube-system exec calico-node-xxxxx -c calico-node -- sh -c "ps axf"
PID USER TIME COMMAND
1 root 0:00 /sbin/runsvdir -P /etc/service/enabled
69 root 0:00 runsv felix
70 root 0:00 runsv bird
71 root 0:00 runsv confd
72 root 0:00 runsv bird6
73 root 0:01 bird6 -R -s /var/run/calico/bird6.ctl -d -c /etc/calico/confd/config/bird6.cfg
74 root 0:17 confd -confdir=/etc/calico/confd
75 root 0:01 bird -R -s /var/run/calico/bird.ctl -d -c /etc/calico/confd/config/bird.cfg
76 root 2:29 calico-felix
2201 root 0:00 sh -c ps axf
2205 root 0:00 ps axf
Step 6 Check the calico-node configurations for felix. bird, and confd:
$ kubectl -n kube-system exec calico-node-xxxxx -it -c calico-node -- sh -c "ls -R /etc/calico"
/etc/calico:
confd felix.cfg
/etc/calico/confd:
conf.d config templates
/etc/calico/confd/conf.d:
bird.toml bird6_aggr.toml bird_aggr.toml tunl-ip.toml
bird6.toml bird6_ipam.toml bird_ipam.toml
/etc/calico/confd/config:
bird.cfg bird6_aggr.cfg bird_aggr.cfg
bird6.cfg bird6_ipam.cfg bird_ipam.cfg
....
Step 7 Download calicoctl client application:
$ curl -O -L https://github.com/projectcalico/calicoctl/releases/download/v3.1.3/calicoctl
$ chmod +x calicoctl
Step 8 Use calicoctl to interact with Calico cni:
$ export DATASTORE_TYPE=kubernetes
$ export KUBECONFIG=~/.kube/config
$ sudo -E ./calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+-------------+
| 172.16.1.14 | node-to-node mesh | up | 17:25:07 | Established |
| 172.16.1.119 | node-to-node mesh | up | 17:25:35 | Established |
+--------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
$ sudo -E ./calicoctl get workloadendpoints
WORKLOAD NODE NETWORKS INTERFACE
php-apache-55b8c5f78f-xpp5b node1 192.168.2.3/32 cali2c0e9a68ed4
Step 1 Find etcd pods:
$ kubectl -n kube-system get pod -o wide -l component=etcd
NAME READY STATUS RESTARTS AGE IP NODE
etcd-master 1/1 Running 0 50d 172.16.1.182 master
Step 2 Download the etcdctl client application shipped with the container image:
$ kubectl -n kube-system cp etcd-master:/usr/local/bin/etcdctl-3.2.18 .
$ chmod +x etcdctl-3.2.18
$ sudo cp etcdctl-3.2.18 /usr/local/bin/etcdctl
Step 3 The command for using etcdctl requires many parameters. To simplify it let’s define a shell alias for etcdctl to interact with etcd:
$ sudo -i
# alias ctl8="ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt"
Step 4 Use etcdctl to manage etcd:
# ctl8 version
etcdctl version: 3.2.18
# ctl8 member list
a874c87fd42044f, started, master, https://127.0.0.1:2380, https://127.0.0.1:2379
# ctl8 endpoint status
https://127.0.0.1:2379, a874c87fd42044f, 3.2.18, 2.4 MB, true, 2, 7783047
# ctl8 check perf
60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%1m0s
PASS: Throughput is 151 writes/s
Slowest request took too long: 0.549516s
PASS: Stddev is 0.050280s
FAIL
# ctl8 snapshot save snap1.etcdb
Snapshot saved at snap1.etcdb
# ctl8 --write-out=table snapshot status snap1.etcdb
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| bbfe515e | 6793301 | 19150 | 24 MB |
+----------+----------+------------+------------+
Step 5 Use etcdctl to query etcd:
# ctl8 get --prefix --keys-only /registry/services
/registry/services/endpoints/default/kubernetes
/registry/services/endpoints/kube-system/calico-typha
/registry/services/endpoints/kube-system/kube-controller-manager
...
# ctl8 get --prefix --keys-only /registry/pods/kube-system
/registry/pods/kube-system/calico-node-9b2sz
/registry/pods/kube-system/calico-node-ggwrd
/registry/pods/kube-system/calico-node-w7zq6
...
Tear down and reinstallation involves all the nodes your cluster.
Step 1 Before proceeding further record the ip address of the nodes in your cluster and make sure you can ssh to each one as follows:
$ for node in node1 node2; do
ssh $node uptime
done
Step 2 For each worker node in the cluster drain the node, delete it, and reset it:
$ for node in node1 node2; do
kubectl drain $node --delete-local-data --force --ignore-daemonsets
kubectl delete node $node
ssh $node sudo kubeadm reset
done
node/node1 cordoned
WARNING: Ignoring DaemonSet-managed pods: calico-node-zqfxt, kube-proxy-4t4rr
node "node1" deleted
[reset] WARNING: changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] are you sure you want to proceed? [y/N]: y
[preflight] running pre-flight checks
[reset] stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] removing kubernetes-managed containers
[reset] cleaning up running containers using crictl with socket /var/run/dockershim.sock
[reset] failed to list running pods using crictl: exit status 1. Trying to use docker instead[reset] no etcd manifest found in "/etc/kubernetes/manifests/etcd.yaml". Assuming external etcd
[reset] deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
....
Step 3 Check for available nodes:
$ kubectl get node
NAME STATUS ROLES AGE VERSION
master Ready master 1d v1.11.1
Step 4 Drain, delete, and reset the master node:
$ kubectl drain master --delete-local-data --force --ignore-daemonsets
node/master cordoned
WARNING: Ignoring DaemonSet-managed pods: calico-node-kj7w6, kube-proxy-p5frg; Deleting pods with local storage: metrics-server-5c4945fb9f-kmsbn
pod/metrics-server-5c4945fb9f-kmsbn evicted
pod/coredns-78fcdf6894-cmnpk evicted
pod/coredns-78fcdf6894-gsbjl evicted
If the above command doesn’t complete because the calico Pod is evicted you can safely terminate it with ^C and continue with:
$ kubectl delete node master
node "master" deleted
$ sudo kubeadm reset
[reset] WARNING: changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] are you sure you want to proceed? [y/N]: y
[preflight] running pre-flight checks
[reset] stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] removing kubernetes-managed containers
[reset] cleaning up running containers using crictl with socket /var/run/dockershim.sock
[reset] failed to list running pods using crictl: exit status 1. Trying to use docker instead[reset] deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/etcd]
[reset] deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
Step 5 Remove kubectl configuration and any cache files:
$ sudo rm -rf ~/.kube/*
Step 1 Before installing a cluster with kubeadm, make sure that each node has docker and kubeadm:
$ docker version
$ kubeadm version
$ for node in node1 node2; do
ssh $node docker version && kubeadm version
done
Also, there should be no cluster running on the nodes already (see 13.3. Tear Down the Cluster).
Step 2 Apply the Calico or Flannel CNI plugin instructions.
For Calico:
$ export POD_NETWORK="192.168.0.0/16"
For Flannel:
$ export POD_NETWORK="10.244.0.0/16"
Step 3 On the master node:
$ sudo kubeadm init --kubernetes-version v1.11.1 \
--apiserver-advertise-address $PrivateIP \
--pod-network-cidr $POD_NETWORK
...
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join 172.16.1.43:6443 --token bx8ny9.61p0sedk22w6qfev --discovery-token-ca-cert-hash sha256:b1101068444867fcc00fd612474bdec560cc67870e3fa37115356c7ec6435369
Record the value of token, --discovery-token-ca-cert-hash, and your apiserver IP address:port:
$ token=$(sudo kubeadm token list | awk 'NR > 1 {print $1}')
$ discoveryhash=sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Step 4 Follow the instruction in the output from the previous step:
$ sudo cp -i /etc/kubernetes/admin.conf ~/.kube/config
$ sudo chown stack:stack ~/.kube/config
Step 5 Apply the Calico or Flannel CNI plugin for Kubernetes v1.11.1.
For Calico execute:
stack@master:~$ kubectl apply -f \
https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
stack@master:~$ kubectl apply -f \
https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
configmap/calico-config created
service/calico-typha created
deployment.apps/calico-typha created
daemonset.extensions/calico-node created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
serviceaccount/calico-node created
For Flannel execute:
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.extensions/kube-flannel-ds created
Step 6 Test your master node control-plane (below output is for Flannel):
$ kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-8v94c 1/1 Running 0 27m
coredns-78fcdf6894-f9tsx 1/1 Running 0 27m
etcd-master 1/1 Running 0 5m
kube-apiserver-master 1/1 Running 0 5m
kube-controller-manager-master 1/1 Running 0 5m
kube-flannel-ds-hr5jh 1/1 Running 0 5m
kube-proxy-8fj87 1/1 Running 0 27m
kube-scheduler-master 1/1 Running 0 5m
Step 7 Add node1 and node2 to the cluster using the master’s IP address and token:
$ for node in node1 node2 ; do
ssh $node sudo kubeadm join ${PrivateIP}:6443 --token $token --discovery-token-ca-cert-hash $discoveryhash
done
[preflight] running pre-flight checks
I0909 17:04:37.833965 29232 kernel_validator.go:81] Validating kernel version
I0909 17:04:37.834034 29232 kernel_validator.go:96] Validating kernel config
[discovery] Trying to connect to API Server "172.16.1.43:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.16.1.43:6443"
[discovery] Requesting info from "https://172.16.1.43:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "172.16.1.43:6443"
[discovery] Successfully established connection with API Server "172.16.1.43:6443"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[preflight] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "node1" as an annotation
This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
....
Step 8 Test your cluster:
$ cat <<EOF | kubectl create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: echoserver
spec:
replicas: 2
selector:
matchLabels:
app: echoserver
template:
metadata:
labels:
app: echoserver
spec:
containers:
- name: echoserver
image: k8s.gcr.io/echoserver:1.6
ports:
- containerPort: 8080
EOF
deployment.apps/echoserver created
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
echoserver-5d5c779b47-7zvnq 1/1 Running 0 39s 10.244.2.3 node2
echoserver-5d5c779b47-l7prm 1/1 Running 0 39s 10.244.1.3 node1
$ curl 10.244.2.3:8080
Hostname: echoserver-5d5c779b47-7zvnq
Pod Information:
....