About Resource Requests and Limits

One of the core Linux technologies that is used to implement containers is cgroups. Cgroups can be used to enforce CPU and memory use limitations on a process or processes.

Kubernetes provides facilities that let you specify the amount of CPU and memory that a container consumes. When specified, Kubernetes uses your requests to do the following:

Determine the node a particular Pod should be created on: Kubernetes determines which nodes have enough free resources to satisfy your request.
Enforce these limits at runtime: Kubernetes ensures that applications do not exceed their requests.

Kubernetes chooses which node of the cluster to run a Pod on based on the Pod's resource requests, and the resources available on each node in the cluster. But once a Pod is scheduled onto a node, Kubernetes does not directly enforce limits at runtime. Rather, Kubernetes passes requests to Linux through cgroups. The node's Linux kernel then enforces the limits on Kubernetes behalf.

Let's look at an example of a Pod with the following definition:

apiVersion: v1
kind: Pod
metadata:
  name: samplePod
spec:
  containers:
  - image: container-registry.oracle.com/timesten/timesten:22.1.1.34.0
    name: sample
    resources:
      limits:
        cpu: 20
        memory: 200Gi
      requests:
        cpu: 20
        memory: 200Gi

Kubernetes determines which node it will run the Pod on. It does this by examining the Pod's resources requests to determine which node has enough free resources to accommodate the Pod.

Once the node is determined, Kubernetes creates a cpu cgroup for the container (sample in this example) and configures the cgroup to have a limit of 20 CPUs. Kubernetes also creates a memory cgroup for the container and configures it to have a limit of 200 gigabytes. Kubernetes then forks off the lead process of the newly created container, and associates the initial process with the cpu and memory cgroups.

Since the lead process is associated with or running under these cgroups, that process and all its children are subject to the limits that the cgroups define. The Linux kernel on the node where the container is running automatically enforces these limits without any involvement from Kubernetes.

CPU limits are easily enforced by Linux. If an application wants to use more CPU than its limit, the kernel can choose not to dispatch the application for a period of time to keep its usage under control.

Memory limit enforcement is different than CPU enforcement. Linux has a component called the out of memory (OOM) killer. If processes exceed their intended memory usage, or if the system gets stressed, the OOM killer terminates processes abruptly.

Next, let's consider a Pod with this definition:

apiVersion: v1
kind: Pod
metadata:
  name: samplePod
spec:
  containers:
  - image: container-registry.oracle.com/timesten/timesten:22.1.1.34.0
    name: sample

In this Pod, there are no specified memory limits. The container runs in default cgroups with infinite limits. This cgroup is shared by all processes on the node that have no specified memory limit, whether the processes are running in containers or not. In this case, if the Linux node becomes memory constrained, the OOM killer chooses processes from the entire node to terminate. The victim might be TimesTen, it might be the Kubernetes kubelet, it might be some other process, or it might be all of them.