Kubernetes 配置 Taint（污点）和 Toleration（容忍）

通过污点和容忍让pod运行在特定节点上

参考官网：https://k8smeetup.github.io/docs/concepts/configuration/taint-and-toleration/

1、taint 排斥效果

taint的effect定义对Pod排斥效果：

NoSchedule：仅影响调度过程，对现存的Pod对象不产生影响；但容忍的pod同时也能够被分配到集群中的其它节点NoExecute：既影响调度过程，也影响现在的Pod对象；不容忍的Pod对象将被驱逐PreferNoSchedule：NoSchedule的柔性版本，最好别调度过来，实在没地方运行调过来也行

2、添加污点

给 worker1，worker2，worker3 三个节点添加污点

kubectl taint node rancher-k8s-worker1 item-name=assistant:NoExecutekubectl taint node rancher-k8s-worker2 item-name=sca:NoExecutekubectl taint node rancher-k8s-worker3 item-name=kuiyuan:NoExecute

说明：

1）给 worker1 节点设置 key 为 item-name，value 为 assistant 的 taint（污点），只要拥有和这个 taint 相匹配的 toleration（容忍）的 pod 才能够被分配到 worker1 这个节点上。worker2 和 worker3 同理。

2）taint 的 effect 值 NoExecute ，它会影响已经在节点上运行的 pod ：

如果 pod 不能忍受effect 值为 NoExecute 的 taint，那么 pod 将马上被驱逐如果 pod 能够忍受effect 值为 NoExecute 的 taint，但是在 toleration 定义中没有指定 tolerationSeconds，则 pod 还会一直在这个节点上运行。如果 pod 能够忍受effect 值为 NoExecute 的 taint，而且指定了 tolerationSeconds，则 pod 还能在这个节点上继续运行这个指定的时间长度。

附：删除污点命令

kubectl taint node rancher-k8s-worker1 item-name-kubectl taint node rancher-k8s-worker2 item-name-kubectl taint node rancher-k8s-worker3 item-name-

3、pod添加容忍

分别在三个节点上运行对应容忍的pod。

1）pod 定义 toleration，匹配 key 为 item-name，value 为 assistant 的 taint

cat > nginx-assistant.yaml <<EOFapiVersion: v1kind: Podmetadata:  name: nginx-assistant  labels:    app: nginx-assistantspec:  containers:  - name: nginx-assistant    image: nginx    resources:      limits:        cpu: 30m        memory: 20Mi      requests:        cpu: 20m        memory: 10Mi  tolerations:  - key: item-name    value: assistant operator: Equal    effect: NoExecuteEOF

2）pod 定义 toleration，匹配 key 为 item-name，value 为 sca 的 taint

cat > nginx-sca.yaml <<EOFapiVersion: v1kind: Podmetadata:  name: nginx-sca  labels:    app: nginx-scaspec:  containers:  - name: nginx-sca    image: nginx    resources:      limits:        cpu: 30m        memory: 20Mi      requests:        cpu: 20m        memory: 10Mi  tolerations:  - key: item-name    value: sca operator: Equal    effect: NoExecuteEOF

3）pod 定义 toleration，匹配 key 为 item-name，value 为 kuiyuan 的 taint

cat > nginx-kuiyuan.yaml <<EOFapiVersion: v1kind: Podmetadata:  name: nginx-kuiyuan  labels:    app: nginx-kuiyuanspec:  containers:  - name: nginx-kuiyuan    image: nginx    resources:      limits:        cpu: 30m        memory: 20Mi      requests:        cpu: 20m        memory: 10Mi  tolerations:  - key: item-name    value: kuiyuan operator: Equal    effect: NoExecuteEOF

#创建pod

kubectl apply -f ./

4、查看 pod 运行主机

kubectl get pod -o wide

NAME                    READY   STATUS    RESTARTS   AGE    IP            NODE                  NOMINATED NODE   READINESS GATESnginx-79748b4cb-25cqr   1/1     Running   0          111s   10.42.10.25   rancher-k8s-worker4   <none>           <none>nginx-79748b4cb-tnknc   1/1     Running   0          107s   10.42.10.26   rancher-k8s-worker4   <none>           <none>nginx-79748b4cb-xpx76   1/1     Running   0          101s   10.42.10.27   rancher-k8s-worker4   <none>           <none>nginx-assistant         1/1     Running   0          33m    10.42.4.246   rancher-k8s-worker1   <none>           <none>nginx-kuiyuan           1/1     Running   0          33m    10.42.5.239   rancher-k8s-worker3   <none>           <none>nginx-sca               1/1     Running   0          33m    10.42.3.203   rancher-k8s-worker2   <none>           <none>

可以看到三个pod都运行到了对应的节点上，而未定义容忍度的 nginx 的三个pod都被驱逐到了worker4上。（如果没有匹配到对应的污点，则会调度到未配置污点的节点上）

5、基于 taint 的驱逐（alpha 特性）

这是在每个 pod 中配置的在节点出现问题时的驱逐行为。

1）当某种条件为真时，node controller会自动给节点添加一个 taint。

当前内置的 taint 包括：

node.kubernetes.io/not-ready：节点未准备好。这相当于节点状态 Ready 的值为 “False“。node.alpha.kubernetes.io/unreachable：node controller 访问不到节点. 这相当于节点状态 Ready 的值为 “Unknown“。node.kubernetes.io/out-of-disk：节点磁盘耗尽。node.kubernetes.io/memory-pressure：节点存在内存压力。node.kubernetes.io/disk-pressure：节点存在磁盘压力。node.kubernetes.io/network-unavailable：节点网络不可用。node.cloudprovider.kubernetes.io/uninitialized：如果 kubelet 启动时指定了一个 “外部” cloud provider，它将给当前节点添加一个 taint 将其标志为不可用。在 cloud-controller-manager 的一个 controller 初始化这个节点后，kubelet 将删除这个 taint。

在启用了 TaintBasedEvictions 这个 alpha 功能特性后，NodeController 会自动给节点添加这类 taint，上述基于节点状态 Ready 对 pod 进行驱逐的逻辑会被禁用。

注意：为了保证由于节点问题引起的 pod 驱逐rate limiting行为正常，系统实际上会以 rate-limited 的方式添加 taint。在像 master 和 node 通讯中断等场景下，这避免了 pod 被大量驱逐。使用这个 alpha 功能特性，结合 tolerationSeconds ，pod 就可以指定当节点出现一个或全部上述问题时还将在这个节点上运行多长的时间。

比如：可以查看之前创建的 nginx-assistant 的 tolerations：

kubectl describe nginx-assistant

Tolerations:     item-name=assistant:NoExecute node.kubernetes.io/not-ready:NoExecute for 300s                 node.kubernetes.io/unreachable:NoExecute for 300s

除了我们定义的容忍匹配的 taint 外，还默认匹配了 not-ready，unreachable 这两个 taint，并且指定 tolerationSeconds 为 5 分钟。这种自动添加 toleration 机制保证了在其中一种问题被检测到时 pod 默认能够继续停留在当前节点运行 5 分钟；这两个默认 toleration 是由 DefaultTolerationSeconds admission controller添加的。

另外：我们可以指定这个时间，在网络断开时，仍然希望停留在当前节点上运行一段较长的时间，愿意等待网络恢复以避免被驱逐。在这种情况下，pod 的 toleration 可能是下面这样的：

tolerations:- key: "node.alpha.kubernetes.io/unreachable"  operator: "Exists" effect: "NoExecute" tolerationSeconds: 6000

2）DaemonSet 中的 pod 被创建时，针对 taint 自动添加的 NoExecute 的 toleration 将不会指定 tolerationSeconds。

比如：系统pod（canal，dns等）不会指定 tolerationSeconds

kubectl describe pod/canal-75pct -n kube-system

Tolerations:     :NoSchedule                 :NoExecute                 CriticalAddonsOnly                 node.kubernetes.io/disk-pressure:NoSchedule                 node.kubernetes.io/memory-pressure:NoSchedule                 node.kubernetes.io/network-unavailable:NoSchedule                 node.kubernetes.io/not-ready:NoExecute                 node.kubernetes.io/unreachable:NoExecute                 node.kubernetes.io/unschedulable:NoSchedule

这保证了出现上述问题时 DaemonSet 中的 pod 永远不会被驱逐，这和 TaintBasedEvictions 这个特性被禁用后的行为是一样的。

标签：配置污点技巧 kubernetes Toleration

免责声明：本网信息来自于互联网，目的在于传递更多信息，并不代表本网赞同其观点。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，并请自行核实相关内容。本站不承担此类作品侵权行为的直接责任及连带责任。如若本网有任何内容侵犯您的权益，请及时联系我们，本站将会在24小时内处理完毕。