Note:
- This tutorial is available in an Oracle-provided free lab environment.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Use Taints and Tolerations with Oracle Cloud Native Environment
Introduction
The ability to influence the way Pods are scheduled to provide the best performance, reduce running costs and make Kubernetes cluster management easier is an important skill for an administrator to master. Taints and Tolerations work with Node affinity (not covered in this tutorial) to attract Pods to a set of nodes. Taints and Tolerations have the opposite effect by allowing nodes to repel Pods. Frequent use cases for taints and tolerations include:
- Identifying nodes with special hardware.
- The ability to dedicate nodes to specific application pods.
- The ability to define custom conditions to evict a pod from a node.
Taints allow the Kubernetes administrator to prevent unwanted Pods from executing on a predefined set of nodes. Tolerations allow any Pods to deploy onto a node with a matching Taint. Together these allow the administrator to fine-tune how Pods schedule to Nodes.
However, it is important to note that taints and tolerations cannot ensure that a pod schedules to a specific node. The Kubernetes scheduler can deploy a pod onto any node without a taint that repels it. Instead, use Node affinity when control over where Pods schedule is required.
This tutorial shows how to create and use Taints and Tolerations with Oracle Cloud Native Environment.
Objectives
In this lab, you will learn:
- The difference between a Taint and a Toleration
- How to use Taints and Tolerations to influence application deployment on Oracle Cloud Native Environment.
Prerequisites
-
Minimum of a 4-node Oracle Cloud Native Environment cluster:
- Operator node
- Kubernetes control plane node
- 2 Kubernetes worker nodes
-
Each system should have Oracle Linux installed and configured with:
- An Oracle user account (used during the installation) with sudo access
- Key-based SSH, also known as password-less SSH, between the hosts
- Installation of Oracle Cloud Native Environment
Deploy Oracle Cloud Native Environment
Note: If running in your own tenancy, read the linux-virt-labs
GitHub project README.md and complete the prerequisites before deploying the lab environment.
-
Open a terminal on the Luna Desktop.
-
Clone the
linux-virt-labs
GitHub project.git clone https://github.com/oracle-devrel/linux-virt-labs.git
-
Change into the working directory.
cd linux-virt-labs/ocne
-
Install the required collections.
ansible-galaxy collection install -r requirements.yml
-
Deploy the lab environment.
ansible-playbook create_instance.yml -e localhost_python_interpreter="/usr/bin/python3.6"
The free lab environment requires the extra variable
local_python_interpreter
, which setsansible_python_interpreter
for plays running on localhost. This variable is needed because the environment installs the RPM package for the Oracle Cloud Infrastructure SDK for Python, located under the python3.6 modules.Important: Wait for the playbook to run successfully and reach the pause task. At this stage of the playbook, the installation of Oracle Cloud Native Environment is complete, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys and any other deployment information needed while running the lab.
Confirm the Number of Nodes
It helps to know the number and names of nodes in your Kubernetes cluster.
-
Open a terminal and connect via SSH to the ocne-control-01 node.
ssh oracle@<ip_address_of_node>
-
List the nodes in the cluster.
kubectl get nodes
Example Output:
NAME STATUS ROLES AGE VERSION ocne-control-01 Ready control-plane 5m32s v1.28.3+3.el8 ocne-worker-01 Ready <none> 4m57s v1.28.3+3.el8 ocne-worker-02 Ready <none> 5m v1.28.3+3.el8
This confirms both worker nodes are in a
Ready
state.
Define and Apply a Taint
Taints are a node-specific property that prevents pods without matching toleration for those taints from being scheduled to run on that node. So how do taints and tolerations differ?
Taints
Taints can be applied to a node using kubectl taint
using this syntax: kubectl taint nodes <node name> <taint key>:<taint effect>
where:
<node name>
is the node receiving the taint<taint key>
is a user-defined label used to identify the applied taint<taint effect>
This can be one of three possible values:NoSchedule
- The Kubernetes scheduler will only schedule pods that have tolerations to this nodePreferNoSchedule
- Kubernetes scheduler tries to avoid scheduling pods without matching toleration to this nodeNoExecute
- The Kubernetes scheduler will evict all running pods from this node that do not have a toleration.
Tolerations
Define Tolerations in the application pod’s PodSpec using two different formats by using either the ‘Equal Operator’ or the ‘Exists Operator’, depending on requirements. See below for an example of the PodPSec syntax to use:
-
Equal Operator
tolerations: - key: "<taint key>" operator: "Equal" value: "<taint value>" effect: "<taint effect>"
-
Exists Operator
tolerations: - key: "<taint key>" operator: "Exists" effect: "<taint effect>"
Where:
- The
Equal
operator means the Pod will only schedule onto a node that exactly matches the taint on a node. - The
Exists
operator, on the other hand, accepts any value because it only requires a specific taint to be defined on the node to be scheduled to run on that node.
Both nodes and pods can have multiple taints and tolerations applied to them. Think of this approach as acting like a filter providing more flexibility over how Pods schedule to a cluster’s nodes.
In case you have ever wondered the reason Pods do not get deployed to the control plane node is because it has a taint applied to it.
-
Display any taints applied to the Kubernetes cluster.
kubectl describe nodes | grep -i taints
Example Output:
Taints: node-role.kubernetes.io/control-plane:NoSchedule Taints: <none> Taints: <none>
-
Display a detail view of any taints applied to the Kubernetes cluster nodes.
kubectl get nodes -o=custom-columns=NodeName:.metadata.name,TaintKey:spec.taints[*].key,TaintValue:/spec.taints[*].value,TaintEffect:.spec.taints[*].effect
Example Output:
NodeName TaintKey TaintValue TaintEffect ocne-control-01 node-role.kubernetes.io/control-plane <none> NoSchedule ocne-worker-01 <none> <none> <none> ocne-worker-02 <none> <none> <none>
Notice that the control plane node has a ‘NoSchedule’ taint applied (
node-role.kubernetes.io/control-plane:NoSchedule
), which is one of three possible taint ‘effects’ that can be assigned to a node. This taint prevents application pods from being scheduled to run on the control node - but only if the Pod does not have a matching toleration. The worker nodes, on the other hand, have no taints applied (<none>
). -
Apply a
NoSchedule
taint to the worker nodes.kubectl taint node ocne-worker-01 dedicated=test-taint:NoSchedule kubectl taint node ocne-worker-02 dedicated=test-taint:NoSchedule
Where the Taint’s Key, Value and Effect are:
Key = dedicated
Value = test-taint
Effect = NoScheduleNote: Taints can only be applied to a single node at a time. It is possible to taint multiple nodes simultaneously by using Node Labels. However, that is beyond the remit of this tutorial.
-
Confirm the taint applied to both worker nodes.
kubectl describe nodes | grep -i taints
Example Output:
Taints: node-role.kubernetes.io/control-plane:NoSchedule Taints: dedicated=test-taint:NoSchedule Taints: dedicated=test-taint:NoSchedule
Note: Applying a Taint is not retrospective, which means that if any applications without a toleration were already running on the cluster, they will continue to do so (unless the administrator evicts them).
Apply a Toleration to a Pod
Next, you will create a Pod with a Toleration for a previously created Taint to the worker nodes.
-
Create a Pod template to use.
kubectl run nginx-test --image=nginx --dry-run=client -o yaml > nginx-test.yaml
-
Remove the
status: {}
line from the generated deployment manifest file.sed -i '$ d' nginx-test.yaml
-
The generated template needs to include a toleration definition to allow the scheduler to deploy it onto a node with a matching taint. Next, add the toleration to the template.
cat << EOF | tee -a nginx-test.yaml > /dev/null tolerations: - key: "dedicated" value: "test-taint" effect: "NoSchedule" EOF
-
Review the template file just created.
cat nginx-test.yaml
Example Output:
apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: nginx-test name: nginx-test spec: containers: - image: nginx name: nginx-text resources: {} dnsPolicy: ClusterFirst restartPolicy: Always tolerations: - key: "dedicated" value: "test-taint" effect: "NoSchedule"
-
Run the deployment manifest file.
kubectl create -f nginx-test.yaml
-
Check which node Nginx deployed to.
kubectl get pods -n default -o wide
Example Output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-test 1/1 Running 0 6s 10.244.1.2 ocne-worker-01 <none> <none>
Because only one Nginx pod is specified in the YAML file, the actual node the scheduler chooses will very between either ocne-worker-01 or ocne-worker-02.
Note: You may have to query the status a couple of times before the status shows as Running.
Deploy an Application Without Configuring a Toleration
Applying the NoSchedule
taint across the Kubernetes cluster means no nodes are available for new deployments without a matching toleration. The following steps will attempt a new deployment without having a toleration defined.
-
Deploy Nginx.
kubectl run nginx --image=nginx
-
Confirm the pod status.
kubectl get pods
Example Output:
NAME READY STATUS RESTARTS AGE nginx 0/1 Pending 0 16s nginx-test 1/1 Running 0 15m
Notice that the
STATUS
shows this deployment as having a Pending status. -
Investigate why Nginx has not deployed.
kubectl get events | grep nginx
Example Output:
6m26s Normal Scheduled pod/nginx-test Successfully assigned default/nginx-test to ocne-worker-01 6m26s Normal Pulling pod/nginx-test Pulling image "nginx" 6m19s Normal Pulled pod/nginx-test Successfully pulled image "nginx" in 6.268s (6.268s including waiting) 6m19s Normal Created pod/nginx-test Created container nginx-test 6m19s Normal Started pod/nginx-test Started container nginx-test 5m38s Warning FailedScheduling pod/nginx 0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 node(s) had untolerated taint {dedicated: test-taint}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
The output shows the successful deployment of the
nginx-test
pod followed by thenginx
pod which is currently in Pending status. Looking at the log information for thenginx
pod confirms that the pod cannot schedule because no workers without a taint are available.
Remove the NoSchedule
Taint From a Node
Taints can be removed by specifying the taint key and its taint effect with a minus (-) sign appended using the following syntax:
kubectl taint nodes <node name> <taint key>:<taint effect>-
Note: Taints also have to be removed individually.
-
Remove the previously applied taint from one of the available nodes. This example removes the taint from the node that currently has no Pods deployed on it.
kubectl taint nodes <choose-a-node-with-no-deployments> dedicated=test-taint:NoSchedule-
Note: It does not matter which node has the
Taint
removed, but it may be more illustrative to remove theTaint
from a node that is currently unused. Therefore, replace the<choose-a-node-with-no-deployments>
with a node that is currently unused. The Example Output shown from here on, represents an scenario where the ocne-node-01 node has the taint removed. -
Confirm the taint was removed.
kubectl get nodes -o=custom-columns=NodeName:.metadata.name,TaintKey:spec.taints[*].key,TaintValue:/spec.taints[*].value,TaintEffect:.spec.taints[*].effect
Example Output:
NodeName TaintKey TaintValue TaintEffect ocne-control-01 node-role.kubernetes.io/control-plane <none> NoSchedule ocne-worker-01 <none> <none> <none> ocne-worker-02 dedicated <none> NoSchedule
-
Confirm that Nginx deployed.
kubectl get pods
Example Output:
NAME READY STATUS RESTARTS AGE nginx 1/1 Running 0 3m55s nginx-test 1/1 Running 0 8m29s
-
Check which node Nginx deployed to.
kubectl get pods -n default -o wide
Example Output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx 1/1 Running 0 22m 10.244.1.4 ocne-worker-01 <none> <none> nginx-test 1/1 Running 0 36m 10.244.2.3 ocne-worker-02 <none> <none>
Check the event log again.
kubectl get events | grep nginx
Example Output:
37m Normal Scheduled pod/nginx-test Successfully assigned default/nginx-test to ocne-worker-02 37m Normal Pulling pod/nginx-test Pulling image "nginx" 36m Normal Pulled pod/nginx-test Successfully pulled image "nginx" in 6.332s (6.332s including waiting) 36m Normal Created pod/nginx-test Created container nginx-test 36m Normal Started pod/nginx-test Started container nginx-test 117s Warning FailedScheduling pod/nginx 0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 node(s) had untolerated taint {dedicated: test-taint}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.. 34s Normal Scheduled pod/nginx Successfully assigned default/nginx to ocne-worker-01 34s Normal Pulling pod/nginx Pulling image "nginx" 27s Normal Pulled pod/nginx Successfully pulled image "nginx" in 6.661s (6.661s including waiting) 27s Normal Created pod/nginx Created container nginx 27s Normal Started pod/nginx Started container nginx
This confirms that removing the Taint from
ocne-worker-02
means that the Nginx pod deployed successfully onto theocne-worker-02
node.
Summary
Taints and tolerations, once mastered, are a flexible method of providing administrators with a way to repel pods from specific nodes. They are especially powerful when combined with node affinity to fine-tune the Kubernetes scheduler. The most effective way to use them is to keep them both short and simple to avoid overcomplicating their effect. Also, remember that taints and tolerations are not the correct way to ensure pods are scheduled to specific nodes. Instead, they should be used together with node affinity to achieve this.
This concludes the walkthrough by introducing Taints and Toleration and how to use them to provide flexibility to manage your deployments.
For More Information
- Oracle Cloud Native Environment Documentation
- Oracle Cloud Native Environment Track
- Oracle Linux Training Station
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Use Taints and Tolerations with Oracle Cloud Native Environment
F94459-06
June 2024