4 Resource Utilization

CNE constrains resources such as CPU and RAM to each common service. Resource constraints help to ensure that the services don’t consume excess resources.

During initial CNE deployment, each service is provided an initial CPU and RAM allocation. Each service is allowed to consume each resource (CPU and RAM) to a specified upper limit while it continues to run.

For services where the resource consumption limit remains the same as the initial allocation or in a case where increasing the CPU or RAM limits underneath a running application can cause service disruption, the initial allocation limit and the upper limit are set to the same value. The resource requests and limits are provided in the following table:

Note:

Observability tools such as Prometheus,OpenSearch, Fleuntd, and Jaeger perform resource intensive operations. Even a small change of events can increase the resource consumption exponentially. For example:
  • Changing the log severity level from WARN to INFO increases the CPU and memory usage dramatically.
  • Adding new metric labels or enabling new monitors have big impact on the performance of metric ingestion and can cause resource starvation.
  • Adding more traces to Jaeger spikes the ingestion rate in OpenSearch and causes the data nodes to starve.
Therefore, for observability tools, the limits provided in the following table must not be considered as production ready values as they vary greatly depending on the workload on CNE. It is strongly recommended to consult with the workload teams to get indicators for observability resource allocation.

Table 4-1 CPU and RAM Resource Requests and Limits

Service CPU Initial Request (m) CPU Limit (m) RAM Initial Request (Mi) RAM Limit (Mi) Instances
Prometheus 2000 4000 16384 16384 2
Prometheus Node Exporter 800 800 512 512 1 per node
Prometheus Operator 100 200 100 200 1
Prometheus AlertManager 20 20 64 64 2
Prometheus Kube State Metrics 20 20 32 100 1
Promxy 100 100 512 512 1
OpenSearch Master 1000 1000 100 2048 3
OpenSearch Data 1000 4000 16384 101904 (100Gi) 7
OpenSearch Client 1000 3000 100 32609(32Gi) 3
OpenSearch Dashboard 100 100 512 512 1
occne-metrics-server 100 100 200 200 1
occne-alertmanager-snmp-notifier 100 100 128 128 1
Fluentd OpenSearch 100 500 128 12228(12Gi) 1 per worker node
Jaeger Collector 500 1250 512 1024 1
Jaeger query 256 500 128 512 1
MetalLB Controller

(for CNLB disabled CNE only)

100 100 100 100 1
MetalLB Speaker

(for CNLB disabled CNE only)

100 100 100 100 1 per worker node
LB Controller (vCNE only)

(for CNLB disabled CNE only)

10 500 128 1024 1
Egress Controller

(for CNLB disabled CNE only)

100 1000 200 500 1 per worker node
Bastion Controller 10 200 128 256 1
Kyverno 100 200 256 512 3

CNLB-App

(for CNLB enabled CNE only)

500 4000 1024 1024 4
CNLB-Manager

(for CNLB enabled CNE only)

500 4000 1024 1024 1

The overall common services resource usage varies on each worker node. The common services listed in Table 4-1 are evenly distributed across all worker nodes in the Kubernetes cluster.