1 Introduction to Kubernetes

Kubernetes is an open source system for automating the deployment, scaling, and management of containerized applications. Primarily, Kubernetes provides the tools to easily create a cluster of systems across which containerized applications can be deployed and scaled as required.

The Kubernetes project is maintained at:

https://kubernetes.io/

Kubernetes is fully tested on Oracle Linux and includes extra tools developed at Oracle to ease configuration and deployment of a Kubernetes cluster.

Kubernetes Components

You're likely to meet the following common components when you start working with Kubernetes on Oracle Cloud Native Environment. The descriptions provided are brief, and largely intended to help provide a glossary of terms and an overview of the architecture of a typical Kubernetes environment. Upstream documentation can be found at:

https://kubernetes.io/docs/concepts/

Nodes

Kubernetes Node architecture is described in detail at:

https://kubernetes.io/docs/concepts/architecture/nodes/

Control Plane Node

The control plane node is responsible for cluster management and for providing the API that's used to configure and manage resources within the Kubernetes cluster. Kubernetes control plane node components can be run within Kubernetes itself, as a set of containers within a dedicated pod. These components can be replicated to provide highly available (HA) control plane node functionality.

The following components are required for a control plane node:

  • API Server (kube-apiserver): The Kubernetes REST API is exposed by the API Server. This component processes and validates operations and then updates information in the Cluster State Store to trigger operations on the worker nodes. The API is also the gateway to the cluster.

  • Cluster State Store (etcd): Configuration data relating to the cluster state is stored in the Cluster State Store, which can roll out changes to the coordinating components such as the Controller Manager and the Scheduler. Ensure you have a backup plan in place for the data stored in this component of the cluster.

  • Cluster Controller Manager (kube-controller-manager): This manager is used to perform many of the cluster-level functions, and application management, based on input from the Cluster State Store and the API Server.

  • Scheduler (kube-scheduler): The Scheduler handles automatically decides where containers are run by monitoring availability of resources, quality of service, and affinity specifications.

The control plane node can be configured as a worker node within the cluster. Therefore, the control plane node also runs the standard node services: the kubelet service, the container runtime, and the kube-proxy service. Note that it's possible to taint a node to prevent workloads from running on an inappropriate node. The kubeadm utility automatically taints the control plane node so that no other workloads or containers can run on this node. This helps to ensure that the control plane node is never placed under any unnecessary load and that backup and restore of the control plane node for the cluster is simplified.

If the control plane node becomes unavailable for a period, cluster functionality is suspended, but the worker nodes continue to run container applications without interruption.

For single node clusters, when the control plane node is offline, the API is unavailable, so the environment is unable to respond to node failures and no new operations, such as creating new resources or editing or moving existing resources, can be performed.

A high availability cluster with many control plane nodes ensures that more requests for control plane node functionality can be handled, and with the help of control plane replica nodes, uptime is improved.

Control Plane Replica Nodes

Control plane replica nodes are responsible for duplicating the functionality and data contained on control plane nodes within a Kubernetes cluster configured for high availability. To benefit from increased uptime and resilience, you can host control plane replica nodes in different zones, and configure them to load balance for the Kubernetes cluster.

Replica nodes are designed to mirror the control plane node configuration and the current cluster state in real time so that if the control plane nodes become unavailable the Kubernetes cluster can fail over to the replica nodes automatically whenever they're needed. If a control plane node fails, the API continues to be available, the cluster can respond automatically to other node failures and you can still perform regular operations for creating and editing existing resources within the cluster.

Worker Nodes

Worker nodes within the Kubernetes cluster are used to run containerized applications and handle networking to ensure that traffic between applications across the cluster and from outside of the cluster can occur. The worker nodes perform any actions triggered by the Kubernetes API, which runs on the control plane node.

All nodes within a Kubernetes cluster must run the following services:

  • Kubelet Service (kubelet): The agent that allows each worker node to communicate with the API Server running on the control plane node. This agent is also responsible for setting up pod requirements, such as mounting volumes, starting containers, and reporting status.

  • Container Runtime: An environment where containers can be run. In this release, the container runtimes are either runC or Kata Containers. For more information about the container runtimes, see Container Runtimes.

  • Kube Proxy Service (kube-proxy): A service that programs rules to handle port forwarding and IP redirects to ensure that network traffic from outside the pod network can be transparently proxied to the pods in a service.

In all cases, these services are run from systemd as daemons.

Pods

Kubernetes introduces the concept of pods, which are groupings of one or more containers and their shared storage, and any specific options on how these are to be run together. Pods are used for tightly coupled applications that would typically run on the same logical host and which might require access to the same system resources. Typically, containers in a pod share the same network and memory space and can access shared volumes for storage. These shared resources allow the containers in a pod to communicate internally in a seamless way as if they were installed on a single logical host.

You can easily create or destroy pods as a set of containers. This makes it possible to do rolling updates to an application by controlling the scaling of the deployment. You can scale up or down easily by creating or removing replica pods. For more information on pods, see the upstream Kubernetes documentation.

ReplicaSet, Deployment, StatefulSet Controllers

Kubernetes provides various controllers that you can use to define how pods are set up and deployed within the Kubernetes cluster. These controllers can be used to group pods together according to their runtime needs and define pod replication and pod start up ordering.

You can define a set of pods that to be replicated with a ReplicaSet. You define the exact configuration for each of the pods in the group and which resources they can have access to. Using ReplicaSets not only caters to the easy scaling and rescheduling of an application, but also lets you perform rolling or multi track updates to an application. For more information on ReplicaSets, see the upstream Kubernetes documentation.

You can use a Deployment to manage pods and ReplicaSets. Deployments are useful when you need to roll out changes to ReplicaSets. By using a Deployment to manage a ReplicaSet, you can easily rollback to an earlier Deployment revision. A Deployment lets you create a newer revision of a ReplicaSet and then migrate existing pods from a previous ReplicaSet into the new revision. The Deployment can then manage the cleanup of older unused ReplicaSets. For more information on Deployments, see the upstream Kubernetes documentation.

You can use StatefulSets to create pods that guarantee start up order and unique identifiers, which are then used to ensure that the pod maintains its identity across the lifecycle of the StatefulSet. This feature makes it possible to run stateful applications within Kubernetes, as typical persistent components such as storage and networking are guaranteed. Furthermore, when you create pods they're always created in the same order and allocated identifiers that are applied to host names and the internal cluster DNS. Those identifiers ensure stable and predictable network identities for pods in the environment. For more information on StatefulSets, see the upstream Kubernetes documentation.

Services

You can use services to expose access to one or more mutually interchangeable pods. As pods can be replicated for rolling updates and for scalability, clients accessing an application must be directed to a pod running the correct application. Pods might also need access to applications outside of Kubernetes. In either case, you can define a service to make access to these facilities transparent, even if the actual backend changes.

Typically, services consist of port and IP mappings. How services function in network space is defined by the service type when it's created.

The default service type is the ClusterIP, and you can use this to expose the service on the internal IP of the cluster. This option makes the service only reachable from within the cluster. Therefore, use this option to expose services for applications that need to access each other from within the cluster.

Often, clients outside of the Kubernetes cluster might need access to services within the cluster. You can achieve this by creating a NodePort service type. This service type lets you to take advantage of the Kube Proxy service that runs on every worker node and reroute traffic to a ClusterIP, which is created automatically along with the NodePort service. The service is exposed on each node IP at a static port, called the NodePort. The Kube Proxy routes traffic destined to the NodePort into the cluster to be serviced by a pod running inside the cluster. This means that if a NodePort service is running in the cluster, it can be accessed from any node in the cluster, regardless of where the pod is running.

Building on top of these service types, the LoadBalancer service type makes it possible for you to expose the service externally by using a cloud provider's load balancer. An external load balancer can handle redirecting traffic to pods directly in the cluster from the Kube Proxy. A NodePort service and a ClusterIP service are automatically created when you set up the LoadBalancer service.

Important:

As you add services for different pods, you must ensure that the network is configured to allow traffic to flow for each service declaration. If you create a NodePort or LoadBalancer service, any of the ports exposed must also be accessible through any firewalls that are in place.

If you're running firewalld on any of the nodes, ensure you add rules to allow traffic for the external facing ports of the services that you create.

For more information on services, see the upstream Kubernetes documentation.

Volumes

In Kubernetes, a volume is storage that persists across the containers within a pod for the lifespan of the pod itself. When a container within the pod is restarted, the data in the Kubernetes volume is preserved. Furthermore, Kubernetes volumes can be shared across containers within the pod, providing a file store that different containers can access locally.

Kubernetes provides various volume types that define how the data is stored and how it's persisted, which are described in detail in the upstream Kubernetes documentation.

Kubernetes volumes typically have a lifetime that matches the lifetime of the pod, and data in a volume persists for while the pod using that volume exists. Containers can be restarted within the pod, but the data remains persistent. If the pod is destroyed, the data is usually destroyed with it.

Sometimes, you might require even more persistence to ensure the lifecycle of the volume is decoupled from the lifecycle of the pod. Kubernetes introduces the concepts of the PersistentVolume and the PersistentVolumeClaim. PersistentVolumes are similar to Volumes except that they exist independently of a pod. They define how to access a storage resource type, such as NFS, or iSCSI. You can configure a PersistentVolumeClaim to use the resources available in a PersistentVolume, and the PersistentVolumeClaim specifies the quota and access modes to be applied to the resource for a consumer. A pod you have created can then use the PersistentVolumeClaim to gain access to these resources with the appropriate access modes and size restrictions applied.

For more information about volumes and setting up and using persistent storage with Kubernetes applications, see Oracle Cloud Infrastructure Cloud Controller Manager Module and Rook Module.

Namespaces

Kubernetes implements and maintains strong separation of resources by using namespaces. Namespaces effectively run as virtual clusters backed by the same physical cluster and are intended for use in environments where Kubernetes resources must be shared across use cases.

Kubernetes takes advantage of namespaces to separate cluster management and specific Kubernetes controls from any other user-specific configuration. Therefore, all the pods, and services specific to the Kubernetes system are found within the kube-system namespace. A default namespace is also created to run all other deployments for which no namespace has been set.

For more information on namespaces, see the upstream Kubernetes documentation.

About CRI-O

When you deploy Kubernetes worker nodes, CRI-O is also deployed. CRI-O is an implementation of the Kubernetes Container Runtime Interface (CRI) to enable using Open Container Initiative (OCI) compatible runtimes. CRI-O is a lightweight alternative to using Docker as the runtime for Kubernetes. CRI-O allows Kubernetes to use any OCI-compliant runtime as the container runtime for pods.

CRI-O delegates containers to run on appropriate nodes, based on the configuration set in pod files. Privileged pods can be run using the runC runtime engine (runc), and unprivileged pods can be run using the Kata Containers runtime engine (kata-runtime). Defining whether containers are trusted or untrusted is set in the Kubernetes pod or deployment file.

For information on how to set the container runtime, see Container Runtimes.