4 Resource Utilization
CNE constrains resources such as CPU and RAM to each common service. Resource constraints help to ensure that the services don’t consume excess resources.
During initial CNE deployment, each service is provided an initial CPU and RAM allocation. Each service is allowed to consume each resource (CPU and RAM) to a specified upper limit while it continues to run.
For services where the resource consumption limit remains the same as the
initial allocation or in a case where increasing the CPU or RAM limits underneath a
running application can cause service disruption, the initial allocation limit and the
upper limit are set to the same value. The resource requests and limits are provided in
the following table:
Note:
Observability tools such as Prometheus,OpenSearch, Fleuntd, and Jaeger perform resource intensive operations. Even a small change of events can increase the resource consumption exponentially. For example:- Changing the log severity level from
WARN
toINFO
increases the CPU and memory usage dramatically. - Adding new metric labels or enabling new monitors have big impact on the performance of metric ingestion and can cause resource starvation.
- Adding more traces to Jaeger spikes the ingestion rate in OpenSearch and causes the data nodes to starve.
Table 4-1 CPU and RAM Resource Requests and Limits
Service | CPU Initial Request (m) | CPU Limit (m) | RAM Initial Request (Mi) | RAM Limit (Mi) | Instances |
---|---|---|---|---|---|
Prometheus | 2000 | 4000 | 16384 | 16384 | 2 |
Prometheus Node Exporter | 800 | 800 | 512 | 512 | 1 per node |
Prometheus Operator | 100 | 200 | 100 | 200 | 1 |
Prometheus AlertManager | 20 | 20 | 64 | 64 | 2 |
Prometheus Kube State Metrics | 20 | 20 | 32 | 100 | 1 |
Promxy | 100 | 100 | 512 | 512 | 1 |
OpenSearch Master | 1000 | 1000 | 100 | 2048 | 3 |
OpenSearch Data | 1000 | 4000 | 16384 | 101904 (100Gi) | 7 |
OpenSearch Client | 1000 | 3000 | 100 | 32609(32Gi) | 3 |
OpenSearch Dashboard | 100 | 100 | 512 | 512 | 1 |
occne-metrics-server | 100 | 100 | 200 | 200 | 1 |
occne-alertmanager-snmp-notifier | 100 | 100 | 128 | 128 | 1 |
Fluentd OpenSearch | 100 | 500 | 128 | 12228(12Gi) | 1 per worker node |
Jaeger Collector | 500 | 1250 | 512 | 1024 | 1 |
Jaeger query | 256 | 500 | 128 | 512 | 1 |
MetalLB Controller | 100 | 100 | 100 | 100 | 1 |
MetalLB Speaker | 100 | 100 | 100 | 100 | 1 per worker node |
LB Controller (vCNE only) | 10 | 500 | 128 | 1024 | 1 |
Egress Controller | 100 | 1000 | 200 | 500 | 1 per worker node |
Bastion Controller | 10 | 200 | 128 | 256 | 1 |
Kyverno | 100 | 200 | 256 | 512 | 3 |
The overall common services resource usage varies on each worker node. The common services listed in Table 4-1 are evenly distributed across all worker nodes in the Kubernetes cluster.