3 CNE Services
CNE provides the following common services for all the installed applications:
- Runtime Environment - CNE provides a runtime environment in which you can run all the cloud native applications.
- Automation - CNE provides automation solutions to deploy, upgrade, and maintain cloud native applications.
- Security - CNE provides multi-level security measures to protect against malicious attacks.
- Redundancy - CNE provides redundancy or high availability through the specification of anti-affinity rules at the infrastructure level.
- Observability - CNE provides services to capture metrics, logs, and traces for both itself (CNE) and the cloud native applications.
- Maintenance Access - CNE provides a Bastion Host to access the Kubernetes cluster for maintenance purposes.
3.1 Runtime Environment
Note:
CNE 23.4.x supports Kubernetes version 1.27.x.3.1.1 Kubernetes Cluster
The Kubernetes cluster in CNE contains three controller nodes. Kubernetes uses these controller nodes to coordinate the execution of application workloads across many worker nodes.
- sustain the loss of one controller node with no impact to the CNE service.
- tolerate the loss of two controller nodes without losing the cluster configuration data. Kubernetes is placed in the read-only mode.
In the case where two controller nodes are lost, Kubernetes is in the read-only mode. The application workload continues to run on the worker node. However, no changes can be made to the cluster configuration data. Due to this situation, it becomes impossible to install new applications and make changes to the application-specific custom resources that are used to change an application's configuration.
The CNE Kubernetes can contain many worker nodes and the minimum recommendation is six. Kubernetes distributes workloads across all the available worker nodes. The Kubernetes placement algorithm attempts to balance workloads across all available worker nodes so that no worker node is overloaded. Kubernetes also attempts to honor the affinity or anti-affinity rules specified by applications.
Additional Kubernetes cluster details are as follows:
- Cluster configuration data, as well as application-specific custom resources, are stored in etcd.
- Workloads in the Kubernetes cluster are run by the containerd runtime.
- NTP time synchronization is maintained by chrony.
3.1.2 Networking
In-cluster Networking
CNE includes the Calico plug-in for networking within the Kubernetes cluster. Calico provides network security solutions for containers. Calico is known for its performance, flexibility, and power.
External Networking
You must configure at least one external network to allow the applications that run in CNE to communicate with applications that are outside of the CNE platform.
You can also configure multiple external networks to CNE to allow specific applications to communicate on networks that are dedicated for specific purposes. For more information on configuring external networks, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.
3.1.3 Storage
Bare Metal Deployment Storage
When CNE is deployed on Bare Metal servers, it constructs a Ceph storage cluster from the disk drives attached to the servers designated as Kubernetes worker nodes. The Ceph cluster is then integrated with Kubernetes so that CNE common services and applications that run on CNE can use the Ceph cluster for persistent data storage.
Virtual Deployment Storage
When CNE is deployed on a virtual infrastructure, it delivers a cloud provider that corresponds to the Kubernetes cloud provider interface. For more information, see cloud provider interface. The cloud provider interacts with the Kubernetes cluster and the storage manager to provide storage when the applications request it. For example, OpenStack supports Cinder storage manager and VMware supports vSphere storage manager.
After Kubernetes has access to persistent storage, CNE then creates several
Kubernetes StorageClass
classes for the common services and applications to
use.
- CNE creates a storage class named
Standard
, which allows applications to store application-specific data.Standard
is the default storage class and is used when the applications create a Persistent Volume Claim (PVC) with no StorageClass reference. - CNE creates one storage class to allow Prometheus to store metrics data.
- CNE creates two storage classes to allow OpenSearch to store log and trace data.
3.1.4 Ingress Traffic Delivery
Each application must create a Kubernetes Service of the type LoadBalancer to allow the clients from an external network to connect with the applications within the CNE. Each LoadBalancer service is assigned to a service IP from an external network. CNE provides load balancers to ensure that ingress traffic from external clients is distributed evenly across Kubernetes worker nodes that host the application pods.
For Bare Metal deployments, the top of rack (ToR) switches perform the load balancing function. For virtualized deployments, dedicated virtual machines perform the load balancing function.
For more information on selecting an external network for an NF application, see the installation instructions of the NF.
Note:
CNE does not support NodePort services because they are considered insecure in a production environment.Ingress traffic distribution
CNE Load Balancers distribute ingress traffic evenly across all Kubernetes worker nodes. Each ingress message is distributed in a round-robin fashion to all in-service Kubernetes worker nodes, regardless of the sender, or destination IP or port.
Note:
- CNE Load Balancers distribute ingress traffic to Kubernetes worker nodes and not pods. Once an ingress message is delivered to a Kubernetes worker node, Kubernetes selects a pod belonging to the target service and delivers the ingress message to the pod. For more information on how Kubernetes delivers ingress messages to pods in the target service, see Kubernetes Network Model.
- CNE Load Balancers support only Cluster value for the
spec.externalTrafficPolicy
field in the Kubernetes service specification and do not support Local value.
3.1.5 Egress Traffic Delivery
CNE uses the Load Balancers used for ingress traffic distribution to deliver egress requests to servers outside the CNE cluster. CNE Load Balancers also perform Egress Network Address Translation (NAT) on the egress requests to ensure that the source IP field of all egress requests contains an IP address that belongs to the external network to which the egress request is delivered. This is done to ensure that the egress requests are not rejected by security rules on the external networks.
For more information on selecting an external network for an NF application, see to the installation instructions of the NF.
3.1.6 DNS Query Services
CoreDNS is used to resolve Domain Name System (DNS) queries for services within the Kubernetes cluster. DNS queries for services outside the Kubernetes cluster are routed to DNS nameservers running on the Bastion Hosts, which in turn use customer-specified DNS servers outside the CNE to resolve these DNS queries.
3.1.6.1 Local DNS
Local DNS feature is a reconfiguration of core DNS (CoreDNS) to support external hostname resolution. Local DNS allows the pods and services running inside the CNE cluster to connect with the ones running outside the CNE cluster using core DNS. That is, when Local DNS is enabled, CNE routes the connection to external hosts through core DNS rather than the nameservers on the Bastion Hosts. For information about activating this feature, see Activating Local DNS.
- A Records: allows to define the hostname and the IP address to locate the service
- SRV Records: allows to define the location (hostname and port number) of a specific service and how your domain handles the service
For additional details about this feature, see Activating and Configuring Local DNS.
3.2 Automation
CNE provides considerable automation throughout the phases of an application's deployment, upgrade, and maintenance cycles.
3.2.1 Deployment
CNE provides the Helm application to automate the deployment of applications into the Kubernetes cluster. The applications must provide a Helm chart to perform automated deployment. For more information about deploying an application in CNE, see Maintenance Procedures section.
You can deploy CNE automatically through custom scripts provided with the CNE platform. For more information about installing CNE, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.
3.2.2 Upgrade
CNE provides Helm to automate the upgrade of applications in the Kubernetes cluster. The applications must provide a Helm chart to perform an automated software upgrade. For more information about an application upgrade in CNE, see Maintenance Procedures section.
You can upgrade the CNE platform automatically through the execution of a Continuous Delivery (CD) pipeline. For more information on CNE upgrade instructions, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.
3.2.3 Maintenance
CNE provides a combination of automated and manual procedures for performing maintenance operations on the Kubernetes cluster and common services. For more instructions about performing maintenance operations, see Maintenance Procedures section.
3.3 Security
CNE provides multi-level security measures to protect against malicious attacks.
- All Kubernetes nodes and Bastion Hosts are security-hardened to prevent unauthorized access.
- Maintenance operations on the Kubernetes cluster can only be performed from the Bastion Hosts to prevent Denial of Service (DOS) attacks against the Kubernetes cluster.
- Only the Kubernetes worker nodes can access the Kubernetes controller nodes to protect sensitive cluster data.
- Kyverno provides the policies to ensure that malicious applications don't corrupt the Kubernetes controller, worker nodes, and cluster data.
- Bastion Host container registry is secured with TLS and can be accessed from CNE only.
- Kubernetes secrets are encrypted before they are stored using secretbox. This ensures that the Kubernetes secrets are secure and accessible to authorized users only.
For more information on security hardening, see Oracle Communications Cloud Native Core, Security Guide.
3.4 Redundancy
This section provides detail about maintaining redundancy in Kubernetes, ingress load balancing, common services, and Bastion Host.
Infrastructure
In virtualized infrastructures, CNE requires the specification of antiaffinity rules to distribute Kubernetes controller and worker nodes across several physical servers. In Bare Metal infrastructures, each Kubernetes controller and worker node are placed on its own physical server.
Kubernetes
For the Kubernetes cluster, CNE runs three controller nodes with an etcd node on each controller node.
- sustain the loss of one controller node with no impact on the service.
- sustain the loss of two controller nodes without losing the cluster data in the etcd database. However, in cases where two controller nodes are lost, Kubernetes is placed in read-only mode. In the read-only mode, the creation of new pods and the deployment of new services are not possible.
- restore etcd data from the backup when all the three controller nodes fail. When all the three controller nodes fail, all the cluster data in etcd is lost, and it is essential to restore data from the backup to run the operations.
CNE uses internal load balancers to distribute API requests from applications running in worker nodes evenly across all controller nodes.
Common Services
- OpenSearch uses three master nodes, three ingress nodes, and five data nodes by default. The number of data nodes provisioned can be configured as per the requirement. For more details, see the "Common Installation Configuration" section in Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide. The system can withstand failure of one node in each of these groups with no impact on the service.
- Fluentd OpenSearch runs as a
DaemonSet
, where one Fluentd OpenSearch pod runs on each Kubernetes worker node. When the worker node is up and running, Fluentd OpenSearch sends the logs for that node to OpenSearch. - Two Prometheus instances are run simultaneously to provide redundancy. Each Prometheus instance independently collects and stores metrics from all CNE services and all applications running in the CNE Kubernetes cluster. Automatic deduplication removes the repetitive metric information in Prometheus.
Kubernetes distributes common service pods across Kubernetes worker nodes such that even the loss of one worker node has no impact on the service. Kubernetes automatically reschedules the failed pods to maintain redundancy.
Ingress and Egress Traffic Delivery
- For virtual installations, the Load Balancer VMs (LBVM) perform both ingress and egress traffic delivery.
- Two Load Balancer VMs are run for each configured external network in virtual installations. These LB VMs run in an active or standby configuration. Only one LB VM (the active LB VM) processes the ingress traffic for a given external network.
- The active LB VM is continuously monitored. If the active LB VM fails, the standby LB VM is automatically promoted to be the active LB VM.
- If the virtual infrastructure is able to recover the failed LB VM, The recovered LB VM automatically becomes the standby LB VM, and load balancing redundancy is restored. If the infrastructure is NOT able to recover the failed LB VM, you must perform the Restoring a Failed Load Balancer fault recovery procedure described in Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide to restore load balancing redundancy.
Bastion Hosts
- There are two Bastion Hosts per CNE.
- Bastion Hosts run in an active or standby configuration.
- Bastion Host health is monitored from an application that runs in the Kubernetes cluster, and instructs standby Bastion Host to take over when active Bastion Host fails.
- A DaemonSet running on each Kubernetes worker node ensures that the worker node always retrieves container images and Helm charts from the active Bastion Host.
- Container images, Helm charts, and other files essential to CNE internal operations are synchronized from the active Bastion Host to the standby Bastion Host periodically. This way, when an active Bastion Host fails, the standby appears as an identical replacement upon switchover.
3.5 Observability
CNE provides services to capture metrics, logs, and traces for both itself and the cloud native applications. You can use the observability data for problem detection, troubleshooting, and preventive maintenance. CNE also includes tools to filter, view, and analyze the observability data.
3.5.1 Metrics
Metrics collection
- The servers (Bare Metal) or VMs (virtualized) that host the Kubernetes cluster. The node-exporter service collects hardware and Operating system (OS) metrics exposed by the Linux OS.
- The kube-state-metrics service collects information about the state of all Kubernetes objects. Since Kubernetes uses objects to store information about all nodes, deployments, and pods in the Kubernetes cluster, kube-state-metrics effectively captures metrics on all aspects of the Kubernetes cluster.
- All of the CNE services generate metrics that Prometheus collects and stores.
Metrics storage
Prometheus stores the application metrics and CNE metrics in an internal time-series database.
Metrics filtering and viewing
Grafana allows you to view and filter metrics from Prometheus. CNE offers some default Grafana dashboards which can be cloned and customized as per your requirement. For more information about these default dashboards, see CNE Grafana Dashboards.
For more information about CNE Metrics, see Metrics.
3.5.2 Alerts
CNE uses AlertManager to raise alerts. These alerts notify the user when any of its common services are in abnormal conditions. CNE delivers alerts and Simple Network Management Protocol (SNMP) traps to an external SNMP trap manager. For more information about the detailed CNE definition and description of each alert, see Alerts section.
Applications deployed on CNE can define their alerts to inform the user of problems specific to each application. For instructions about applications loading the alerting rules, see Maintenance Procedures.
3.5.3 Logs
Logs collection
Fluentd OpenSearch collects logs for applications installed in CNE and CNE services logs.
Logs storage
CNE stores its logs in Oracle OpenSearch. Each log message is written to an OpenSearch index. The source of the log message determines which index the record is written to is as follows:
- If the applications in the CNE generates the logs, these logs are stored in the current day's application index named logstash-YYYY.MM.DD.
- If the CNE services generate the logs, these logs are stored in the current day's CNE index named occne-logstash-YYYY.MM.DD.
CNE creates new log indices each day. Each day's indices are uniquely named by appending the current date to the index name as a suffix. For example, all application logs generated on November 13, 2021, are written to the logstash-2021.11.13 index. By default, indices are retained for seven days.
Only current-day indices are writable and they are configured for efficient storage of new log messages. Current-day indices are called "hot" indices and stored on "hot" data nodes. At the end of each day, new current-day indices are created. At the same time, the previous day's indices are marked as read-only, so no new log messages are written to them, and compressed, to allow for more efficient storage. These previous-day indices are considered "warm" indices, and are stored on "warm" data nodes.
Log filtering and viewing
Oracle OpenSearch Dashboard filters and views logs that are available in OpenSearch.
3.5.4 Traces
Trace collection
Note:
Jaeger captures only a small percentage of all application message flows. The default capture rate is .01% (1 in 10,000 message flows).Traces are not collected from CNE services.
Trace storage
CNE stores its traces in Oracle OpenSearch. The applications deployed in CNE generate traces and stores them in the jaeger-trace-YYYY.MM.DD index. CNE creates a new Jaeger index each day. For example, all traces generated by applications on November 13, 2021, are written to the jaeger-trace-2021.11.13 index.
Trace filtering and viewing
Oracle OpenSearch Dashboard filters and views the traces that are available in Oracle OpenSearch.
3.6 Maintenance Access
To access the Kubernetes cluster for maintenance purposes, CNE provides 2 Bastion Hosts. Each Bastion Host provides command-line access to several troubleshooting and maintenance tools such as kubectl, Helm, and Jenkins.
3.7 Frequently Used Common Services
This section provides some of the frequently used common services and their version supported by CNE 23.4.x. You can find the complete list of third-party services in the dependencies TGZ file provided as part of the software delivery package.
Table 3-1 Frequently Used Common Services
Common Service | Supported Version |
---|---|
AlertManager | 0.25.0 |
Grafana | 9.5.3 |
Prometheus | 2.44.0 |
Calico | 3.25.2 |
cert-manager | 1.12.4 |
Containerd | 1.7.5 |
Fluentd - OpenSearch | 1.16.2 |
HAProxy | 2.6.7 |
Helm | 3.12.3 |
Istio | 1.18.2 |
Jaeger | 1.45.0 |
Kubernetes | 1.27.5 |
Kyverno | 1.9.0 |
MetalLB | 0.13.11 |
Oracle OpenSearch | 2.3.0 |
Oracle OpenSearch Dashboard | 2.3.0 |
Prometheus Operator | 0.65.1 |