How to gracefully remove a node from Kubernetes?

I want to scale up/down the number of machines to increase/decrease the number of nodes in my Kubernetes cluster. When I add one machine, I’m able to successfully register it with Kubernetes; therefore, a new node is created as expected. However, it is not clear to me how to smoothly shut down the machine later. A good workflow would be:

  1. Mark the node related to the machine that I am going to shut down as unschedulable;
  2. Start the pod(s) that is running in the node in other node(s);
  3. Gracefully delete the pod(s) that is running in the node;
  4. Delete the node.

If I understood correctly, even kubectl drain (discussion) doesn't do what I expect since it doesn’t start the pods before deleting them (it relies on a replication controller to start the pods afterwards which may cause downtime). Am I missing something?

How should I properly shutdown a machine?

192757 次浏览

kubectl drain确实像你描述的那样工作。有一些停机时间,就像机器崩溃了一样。

你能描述一下你的设置吗?您有多少个副本,是否提供了这样的配置,使您无法处理单个副本的任何停机时间?

列出节点并获取要清除的 <node-name>或(从集群中删除)

kubectl get nodes

1)首先清除节点

kubectl drain <node-name>

您可能必须忽略机器中的守护进程集和本地数据

kubectl drain <node-name> --ignore-daemonsets --delete-local-data

2)为节点编辑实例组(只有在使用 kops 的情况下)

kops edit ig nodes

将 MIN 和 MAX 大小设置为 -1 只需保存文件(不需要做任何额外的事情)

您仍然可以看到一些与守护进程集相关的节点,比如网络插件、用于日志的 fluentd、 kubedns/coredns 等

3)最后删除节点

kubectl delete node <node-name>

4)在 s3中提交 KOPS 的状态: (只有当您使用 KOPS 时)

kops update cluster --yes

OR (if you are using kubeadm)

如果您正在使用 kubeadm 并希望将机器重置为运行 kubeadm join之前的状态,那么运行

kubeadm reset
  1. kubectl get nodes查找节点。我们将假设要删除的节点名称为“ mynode”,将其替换为实际的节点名称。
  2. Drain it with kubectl drain mynode
  3. kubectl delete node mynode删除它
  4. 如果使用 kubeadm,则在“ mynode”本身 kubeadm reset上运行
If the cluster is created by kops


1.kubectl drain <node-name>
now all the pods will be evicted


ignore daemeondet:
2.kubectl drain <node-name> --ignore-daemonsets --delete-local-data


3.kops edit ig  nodes-3  --state=s3://bucketname


set max and min value of instance group to 0


4. kubectl delete node


5. kops update cluster --state=s3://bucketname  --yes


Rolling update if required:


6. kops rolling-update cluster  --state=s3://bucketname  --yes


validate cluster:


7.kops validate cluster --state=s3://bucketname


Now the instance will be terminated.

kubectl drain时,我会有一些奇怪的行为。这里是我的 extra steps,在我的情况下是 数据会丢失

简短的回答: 检查是否没有向此节点装入持久卷。如果有一些 PV,请参阅下面的描述来删除它。


在执行 kubectl drain时,我注意到,一些 Pods 被 没有驱逐(它们只是没有出现在像 evicting pod xxx这样的日志中)。

在我的例子中,一些是具有软反亲和力的 pods (因此它们不喜欢访问剩余的节点) ,一些是 StatueSet 大小为1的 pods,希望保留至少1个 pod。

If I directly delete that node (using the commands mentioned in other answers), data will get lost because those pods have some PersistentVolumes, and deleting a Node will also delete PersistentVolumes (if using some cloud providers).

Thus, please 手动操作 delete those pods one by one. After deleted, kuberentes will re-schedule the pods to other nodes (because this node is SchedulingDisabled).

删除所有 pods (不包括 DaemonSet)后,请检查是否没有向此节点挂载 Persisentvoluk。

然后可以安全地删除节点本身:)

从 Kubernetes 删除 worker 节点

  1. Kubectl 得到节点
  2. Kubectl 渠道 < node-name > ——忽略-守护进程集
  3. Kubectl delete node < node-name >

当清空一个节点时,我们可能会面临节点不平衡和某些进程停机的风险。该方法的目的是在避免停机时间的同时,尽可能保持节点之间的负载平衡。

# Mark the node as unschedulable.
echo Mark the node as unschedulable $NODENAME
kubectl cordon $NODENAME


# Get the list of namespaces running on the node.
NAMESPACES=$(kubectl get pods --all-namespaces -o custom-columns=:metadata.namespace --field-selector spec.nodeName=$NODENAME | sort -u | sed -e "/^ *$/d")


# forcing a rollout on each of its deployments.
# Since the node is unschedulable, Kubernetes allocates
# the pods in other nodes automatically.
for NAMESPACE in $NAMESPACES
do
echo deployment restart for $NAMESPACE
kubectl rollout restart deployment/name -n $NAMESPACE
done


# Wait for deployments rollouts to finish.
for NAMESPACE in $NAMESPACES
do
echo deployment status for $NAMESPACE
kubectl rollout status deployment/name -n $NAMESPACE
done


# Drain node to be removed
kubectl drain $NODENAME

下面的命令只有在有大量副本、中断预算等情况下才能工作,但是对于提高集群利用率有很大帮助。在我们的集群中,整天都有集成测试启动(pods 运行一个小时,然后自动降低) ,还有一些 dev 工作负载(运行几天,直到开发人员手动降低)。我每天晚上运行这个程序,从集群中的约100个节点下降到约20个节点——这加起来相当节省:

for node in $(kubectl get nodes -o name| cut -d "/" -f2); do
kubectl drain --ignore-daemonsets --delete-emptydir-data $node;
kubectl delete node $node;
done