Introduction to Kubernetes

1 Introduction to Kubernetes

Important:

The software described in this documentation is either in Extended Support or Sustaining Support. See Oracle Open Source Support Policies for more information.

We recommend that you upgrade the software described by this documentation as soon as possible.

Kubernetes is an open-source system for automating the deployment, scaling and management of containerized applications. Primarily, Kubernetes provides the tools to easily create a cluster of systems across which containerized applications can be deployed and scaled as required.

The Kubernetes project is maintained at:

https://kubernetes.io/

Kubernetes is fully tested on Oracle Linux 8 and Oracle Linux 7 and includes additional tools developed at Oracle to ease configuration and deployment of a Kubernetes cluster.

For more information on Kubernetes releases, hardware and software requirements, new and notable features, and known issues, see Release Notes.

Kubernetes Components

You are likely to encounter the following common components when you start working with Kubernetes on Oracle Linux. The descriptions provided are brief, and largely intended to help provide a glossary of terms and an overview of the architecture of a typical Kubernetes environment. Upstream documentation can be found at:

https://kubernetes.io/docs/concepts/

Nodes

Kubernetes Node architecture is described in detail at:

https://kubernetes.io/docs/concepts/architecture/nodes/

Control Plane Node

The control plane node is responsible for cluster management and for providing the API that is used to configure and manage resources within the Kubernetes cluster. Kubernetes control plane node components can be run within Kubernetes itself, as a set of containers within a dedicated pod. These components can be replicated to provide highly available (HA) control plane node functionality.

The following components are required for a control plane node:

API Server (kube-apiserver): The Kubernetes REST API is exposed by the API Server. This component processes and validates operations and then updates information in the Cluster State Store to trigger operations on the worker nodes. The API is also the gateway to the cluster.
Cluster State Store (etcd): Configuration data relating to the cluster state is stored in the Cluster State Store, which can roll out changes to the coordinating components like the Controller Manager and the Scheduler. It is essential to have a backup plan in place for the data stored in this component of your cluster.
Cluster Controller Manager (kube-controller-manager): This manager is used to perform many of the cluster-level functions, as well as application management, based on input from the Cluster State Store and the API Server.
Scheduler (kube-scheduler): The Scheduler handles automatically determining where containers should be run by monitoring availability of resources, quality of service and affinity and anti-affinity specifications.

The control plane node is also usually configured as a worker node within the cluster. Therefore, the control plane node also runs the standard node services: the kubelet service, the container runtime and the kube proxy service. Note that it is possible to taint a node to prevent workloads from running on an inappropriate node. The kubeadm utility automatically taints the control plane node so that no other workloads or containers can run on this node. This helps to ensure that the control plane node is never placed under any unnecessary load and that backup and restore of the control plane node for the cluster is simplified.

If the control plane node becomes unavailable for a period, cluster functionality is suspended, but the worker nodes continue to run container applications without interruption.

For single node clusters, when the control plane node is offline, the API is unavailable, so the environment is unable to respond to node failures and there is no way to perform new operations like creating new resources or editing or moving existing resources.

A high availability cluster with multiple control plane nodes ensures that more requests for control plane node functionality can be handled, and with the assistance of control plane replica nodes, uptime is significantly improved.

Control Plane Replica Nodes

Control plane replica nodes are responsible for duplicating the functionality and data contained on control plane nodes within a Kubernetes cluster configured for high availability. To benefit from increased uptime and resilience, you can host control plane replica nodes in different zones, and configure them to load balance for your Kubernetes cluster.

Replica nodes are designed to mirror the control plane node configuration and the current cluster state in real time so that if the control plane nodes become unavailable the Kubernetes cluster can fail over to the replica nodes automatically whenever they are needed. In the event that a control plane node fails, the API continues to be available, the cluster can respond automatically to other node failures and you can still perform regular operations for creating and editing existing resources within the cluster.

Worker Nodes

Worker nodes within the Kubernetes cluster are used to run containerized applications and handle networking to ensure that traffic between applications across the cluster and from outside of the cluster can be properly facilitated. The worker nodes perform any actions triggered via the Kubernetes API, which runs on the control plane node.

All nodes within a Kubernetes cluster must run the following services:

Kubelet Service: The agent that allows each worker node to communicate with the API Server running on the control plane node. This agent is also responsible for setting up pod requirements, such as mounting volumes, starting containers and reporting status.
Container Runtime: An environment where containers can be run. In this release, the container runtimes are either runC or Kata Containers. For more information about the container runtimes, see Container Runtimes.
Kube Proxy Service: A service that programs rules to handle port forwarding and IP redirects to ensure that network traffic from outside the pod network can be transparently proxied to the pods in a service.

In all cases, these services are run from systemd as inter-dependent daemons.

Pods

Kubernetes introduces the concept of "pods", which are groupings of one or more containers and their shared storage, and any specific options on how these should be run together. Pods are used for tightly coupled applications that would typically run on the same logical host and which may require access to the same system resources. Typically, containers in a pod share the same network and memory space and can access shared volumes for storage. These shared resources allow the containers in a pod to communicate internally in a seamless way as if they were installed on a single logical host.

You can easily create or destroy pods as a set of containers. This makes it possible to do rolling updates to an application by controlling the scaling of the deployment. It also allows you to scale up or down easily by creating or removing replica pods. For more information on pods, see the upstream documentation at:

https://kubernetes.io/docs/concepts/workloads/pods/

ReplicaSet, Deployment, StatefulSet Controllers

Kubernetes provides a variety of controllers that you can use to define how pods are set up and deployed within the Kubernetes cluster. These controllers can be used to group pods together according to their runtime needs and define pod replication and pod start up ordering.

You can define a set of pods that should be replicated with a ReplicaSet. This allows you to define the exact configuration for each of the pods in the group and which resources they should have access to. Using ReplicaSets not only caters to the easy scaling and rescheduling of an application, but also allows you to perform rolling or multi track updates to an application. For more information on ReplicaSets, see the upstream documentation at:

https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/

You can use a Deployment to manage pods and ReplicaSets. Deployments are useful when you need to roll out changes to ReplicaSets. By using a Deployment to manage a ReplicaSet, you can easily rollback to an earlier Deployment revision. A Deployment allows you to create a newer revision of a ReplicaSet and then migrate existing pods from a previous ReplicaSet into the new revision. The Deployment can then manage the cleanup of older unused ReplicaSets. For more information on Deployments, see the upstream documentation at:

https://kubernetes.io/docs/concepts/workloads/controllers/deployment/

You can use StatefulSets to create pods that guarantee start up order and unique identifiers, which are then used to ensure that the pod maintains its identity across the lifecycle of the StatefulSet. This feature makes it possible to run stateful applications within Kubernetes, as typical persistent components such as storage and networking are guaranteed. Furthermore, when you create pods they are always created in the same order and allocated identifiers that are applied to host names and the internal cluster DNS. Those identifiers ensure there are stable and predictable network identities for pods in the environment. For more information on StatefulSets, see the upstream documentation at:

https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/

Services

You can use services to expose access to one or more mutually interchangeable pods. Since pods can be replicated for rolling updates and for scalability, clients accessing an application must be directed to a pod running the correct application. Pods may also need access to applications outside of Kubernetes. In either case, you can define a service to make access to these facilities transparent, even if the actual backend changes.

Typically, services consist of port and IP mappings. How services function in network space is defined by the service type when it is created.

The default service type is the ClusterIP, and you can use this to expose the service on the internal IP of the cluster. This option makes the service only reachable from within the cluster. Therefore, you should use this option to expose services for applications that need to be able to access each other from within the cluster.

Frequently, clients outside of the Kubernetes cluster may need access to services within the cluster. You can achieve this by creating a NodePort service type. This service type enables you to take advantage of the Kube Proxy service that runs on every worker node and reroute traffic to a ClusterIP, which is created automatically along with the NodePort service. The service is exposed on each node IP at a static port, called the NodePort. The Kube Proxy routes traffic destined to the NodePort into the cluster to be serviced by a pod running inside the cluster. This means that if a NodePort service is running in the cluster, it can be accessed via any node in the cluster, regardless of where the pod is running.

Building on top of these service types, the LoadBalancer service type makes it possible for you to expose the service externally by using a cloud provider's load balancer. This allows an external load balancer to handle redirecting traffic to pods directly in the cluster via the Kube Proxy. A NodePort service and a ClusterIP service are automatically created when you set up the LoadBalancer service.

Important:

As you add services for different pods, you must ensure that your network is properly configured to allow traffic to flow for each service declaration. If you create a NodePort or LoadBalancer service, any of the ports exposed must also be accessible through any firewalls that are in place.

If you are running firewalld on any of your nodes, make sure you add rules to allow traffic for the external facing ports of the services that you create.

For more information on services, see the upstream documentation at:

https://kubernetes.io/docs/concepts/services-networking/service/

Volumes

In Kubernetes, a volume is storage that persists across the containers within a pod for the lifespan of the pod itself. When a container within the pod is restarted, the data in the Kubernetes volume is preserved. Furthermore, Kubernetes volumes can be shared across containers within the pod, providing a file store that different containers can access locally.

Kubernetes supports a variety of volume types that define how the data is stored and how persistent it is, which are described in detail in the upstream documentation at:

https://kubernetes.io/docs/concepts/storage/volumes/

Kubernetes volumes typically have a lifetime that matches the lifetime of the pod, and data in a volume persists for as long as the pod using that volume exists. Containers can be restarted within the pod, but the data remains persistent. If the pod is destroyed, the data is usually destroyed with it.

In some cases, you may require even more persistence to ensure the lifecycle of the volume is decoupled from the lifecycle of the pod. Kubernetes introduces the concepts of the PersistentVolume and the PersistentVolumeClaim. PersistentVolumes are similar to Volumes except that they exist independently of a pod. They define how to access a storage resource type, such as NFS or iSCSI. You can configure a PersistentVolumeClaim to make use of the resources available in a PersistentVolume, and the PersistentVolumeClaim will specify the quota and access modes that should be applied to the resource for a consumer. A pod you have created can then make use of the PersistentVolumeClaim to gain access to these resources with the appropriate access modes and size restrictions applied.

For more information about volumes and setting up and using persistent storage with Kubernetes applications, see Storage.

Namespaces

Kubernetes implements and maintains strong separation of resources through the use of namespaces. Namespaces effectively run as virtual clusters backed by the same physical cluster and are intended for use in environments where Kubernetes resources must be shared across use cases.

Kubernetes takes advantage of namespaces to separate cluster management and specific Kubernetes controls from any other user-specific configuration. Therefore, all of the pods and services specific to the Kubernetes system are found within the kube-system namespace. A default namespace is also created to run all other deployments for which no namespace has been set.

For more information on namespaces, see the upstream documentation at:

https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/

About CRI-O

When you deploy Kubernetes worker nodes, CRI-O is also deployed. CRI-O is an implementation of the Kubernetes Container Runtime Interface (CRI) to enable using Open Container Initiative (OCI) compatible runtimes. It is a lightweight alternative to using Docker as the runtime for Kubernetes. CRI-O allows Kubernetes to use any OCI-compliant runtime as the container runtime for running pods.

CRI-O delegates containers to run on appropriate nodes, based on the configuration set in pod files. Privileged pods can be run using the runC runtime engine (runc), and unprivileged pods can be run using the Kata Containers runtime engine (kata-runtime). Defining whether containers are trusted or untrusted is set in the Kubernetes pod or deployment file.

For information on how to set the container runtime, see Container Runtimes.