About Resource Requests and Limits
One of the core Linux technologies that is used to implement containers is cgroups. Cgroups can be used to enforce CPU and memory use limitations on a process or processes.
-
Determine the node a particular Pod should be created on: Kubernetes determines which nodes have enough free resources to satisfy your request.
-
Enforce these limits at runtime: Kubernetes ensures that applications do not exceed their requests.
Kubernetes chooses which node of the cluster to run a Pod on based on the Pod's resource requests, and the resources available on each node in the cluster. But once a Pod is scheduled onto a node, Kubernetes does not directly enforce limits at runtime. Rather, Kubernetes passes requests to Linux through cgroups. The node's Linux kernel then enforces the limits on Kubernetes behalf.
apiVersion: v1
kind: Pod
metadata:
name: samplePod
spec:
containers:
- image: container-registry.oracle.com/timesten/timesten:22.1.1.19.0
name: sample
resources:
limits:
cpu: 20
memory: 41Gi
requests:
cpu: 20
memory: 41Gi
Kubernetes determines which node it will run the Pod on. It does this by examining the Pod's resources requests to determine which node has enough free resources to accommodate the Pod.
Once the node is determined, Kubernetes creates a cpu
cgroup for the container (sample
in this example) and configures the cgroup to have a limit of 20 CPUs. Kubernetes also creates a memory
cgroup for the container and configures it to have a limit of 200 gigabytes. Kubernetes then forks off the lead process of the newly created container, and associates the initial process with the cpu
and memory
cgroups.
Since the lead process is associated with or running under these cgroups, that process and all its children are subject to the limits that the cgroups define. The Linux kernel on the node where the container is running automatically enforces these limits without any involvement from Kubernetes.
CPU limits are easily enforced by Linux. If an application wants to use more CPU than its limit, the kernel can choose not to dispatch the application for a period of time to keep its usage under control.
Memory limit enforcement is different than CPU enforcement. Linux has a component called the out
of
memory
(OOM) killer. If processes exceed their intended memory usage, or if the system gets stressed, the OOM killer terminates processes abruptly.
apiVersion: v1
kind: Pod
metadata:
name: samplePod
spec:
containers:
- image: container-registry.oracle.com/timesten/timesten:22.1.1.19.0
name: sample
In this Pod, there are no specified memory limits. The container runs in default cgroups with infinite limits. This cgroup is shared by all processes on the node that have no specified memory limit, whether the processes are running in containers or not. In this case, if the Linux node becomes memory constrained, the OOM killer chooses processes from the entire node to terminate. The victim might be TimesTen, it might be the Kubernetes kubelet, it might be some other process, or it might be all of them.