Managing ECE Pods

Scaling Kubernetes Pods

Kubernetes pods that are created as part of the deployment can be scaled up or down. By default, three ECE server replicas are created during the installation process.

To scale a Kubernetes pod, run this command:

Note:

Kubernetes pods can be scaled only if the partitions are balanced.

kubectl scale statefulsets componentName --replicas=newReplicaCount

If scaling doesn't occur, check the partitionUnbalanced count under Coherence.service.partitionUnbalanced for all cache services.

Setting up Autoscaling of ECE Pods

You can use the Kubernetes Horizontal Pod Autoscaler to automatically scale up or scale down the number of ECE pod replicas based on a pod's CPU or memory utilization. In BRM cloud native deployments, the Horizontal Pod Autoscaler monitors and scales these ECE pods:

ecs
ecs1
httpgateway

Changing the number of replicas in an ECE autoscalable ReplicaSet results in a re-balancing of the in-memory cache distribution across the replicas. This re-balancing activity consumes incremental CPU and memory resources and can take multiple seconds to complete. Therefore, an ECE autoscaling design should attempt to strike a balance between optimizing infrastructure resource usage and minimizing changes to the number of replicas in a ReplicaSet due to autoscaling.

Note:

Enabling autoscaling of ECE pods in a production environment should be preceded by comprehensive validation of all scenarios expected to trigger autoscaling (scale up and scale down). It is recommended that this validation be performed in a demonstration or test environment using infrastructure equivalent to the target production infrastructure. In addition, monitoring the frequency of autoscaling is recommended to detect flapping conditions so that adjustments can be incorporated to avoid flapping.

To set up and enable autoscaling for ECE pods:

Ensure that your ECE cluster is set up and the system is in the UsageProcessing state.

Note:

Do not enable Horizontal Pod Autoscaler for your ECE cluster until ECE reaches the UsageProcessing state. Enabling it during customer or balance data loading could lead to customer load failure due to re-balancing of the in-memory cache.
Open your override-values.yaml file for oc-cn-ece-helm-chart.
Enable the Horizontal Pod Autoscaler in ECE by setting the charging.hpaEnabled key to true:
```
charging
   hpaEnabled: "true"
```
Specify the memory and CPU usage for each supported ECE pod. To do so, set the required keys under the ecs, ecs1, and httpgatewayn sections:
- maxReplicas: Set this to the maximum number of pod replicas to deploy when scale up is triggered.
  
  If a pod's average utilization goes above averageCpuUtilization or averageMemoryUtilization, the Horizontal Pod Autoscaler increases the number of pod replicas up to this maximum count.
- averageCpuUtilization: Set this as a target or threshold for average CPU usage across all of the pod's replicas with the same entry point. For example, if a cluster has four ecs pod replicas and one ecs1 pod replica, the average will be the sum of CPU usage divided by five. The default is 70% for ecs.
  
  The autoscaler increases or decreases the number of ecs or httpgateway pod replicas to maintain the average CPU utilization you specified across all pods.
  
  Note:
  
  Only the ecs pod and httpgateway pod (with NRF disabled) will be scaled up and down.
- averageMemoryUtilization: Set this as a target or threshold for average resource consumption across all of the pod's replicas, such as 1 Gi. For example, if a cluster has four ecs pod replicas and one ecs1 pod replica, the average will be the sum of memory utilization divided by five.
  
  The autoscaler increases or decreases the number of ecs or httpgateway pod replicas to maintain the average memory utilization you specified across all pods.
  
  Note:
  
  Only the ecs pod and httpgateway pod (with NRF disabled) will be scaled up and down.
- cpuLimit: Set this to the maximum amount of CPU that a pod can utilize.
- cpuRequest: Set this to the minimum CPU amount, in milli-cores, that must be available in a Kubernetes node to deploy a pod. For example, enter 1000m for 1 CPU core.
  
  If the minimum CPU amount is not available, the pod's status is set to Pending.
- memoryLimit: Set this to the maximum amount of memory that a pod can utilize. The default is 3 Gi for the ecs pod.
- memoryRequest: Set this to the minimum amount of memory required for a Kubernetes node to deploy a pod. The default is 2 Gi for the ecs pod.
  
  If the minimum amount is not available, the pod's status is set to Pending.
- scaleDownStabilizationWindowSeconds: Specifies the duration, in seconds, of the stabilization window when scaling down pods. Oracle recommends using a value of 120 seconds or more.
- disableHpaScaleDown: Set this to true to prevent the Horizontal Pod Autoscaler from scaling down the pod.
This shows sample entries for the httpgateway pod:
```
httpgateway:
   httpgatewayList:
      - coherenceMemberName: "httpgateway1"
        maxreplicas: 3
        averageCpuUtilization: 70
        averageMemoryUtilization: ""
        cpuLimit: 2000m
        cpuRequest: 1000m
        memoryLimit: 3Gi
        memoryRequest: 1Gi
        scaleDownStabilizationWindowSeconds: 120
        disableHpaScaleDown: "false"

      - coherenceMemberName: "httpgateway2"
        maxreplicas: 3
        averageCpuUtilization: 70
        averageMemoryUtilization: ""
        cpuLimit: 2000m
        cpuRequest: 1000m
        memoryLimit: 3Gi
        memoryRequest: 1Gi
        scaleDownStabilizationWindowSeconds: 120
        disableHpaScaleDown: "false"
```
To lower the heap memory used by the ECE pods, set the appropriate JVM garbage collection (GC) parameters in the jvmGCOpts key.

Memory-based scale down occurs only if the amount of pod memory decreases. You can decrease pod memory by using JVM garbage collection (GC). For more information about JVM GC, see the "Java Garbage Collection Basics" tutorial.
Under the ecs, ecs1, and httpgatewayn sections, set the replicas key based on your configured Horizontal Pod Autoscaler values. For example, the number of replicas should meet the average resource consumption requirements you set in averageMemoryUtilization.

This prevents the autoscaler from scaling down the ECE pods during the Helm upgrade, which could result in cache data loss.
Save and close your override-values.yaml file.
Enable Horizontal Pod Autoscaler in ECE by running the helm upgrade command for oc-cn-ece-helm-chart:
```
helm upgrade EceReleaseName oc-cn-ece-helm-chart --namespace BrmNameSpace --values OverrideValuesFile
```
where:
- EceReleaseName is the release name for oc-cn-ece-helm-chart and is used to track this installation instance.
- BrmNameSpace is the name space in which the BRM Kubernetes objects reside.
- OverrideValuesFile is the path to the YAML file that overrides the default configurations in the values.yaml file.

Rolling Restart of ECE Pods

You can force a rolling restart of any ECE pod. If you restart a pod with multiple replicas, the pod replicas are restarted in reverse order. For example, if the ecs pod contains three replicas, the replicas are restarted in this order: 3, 2, 1.

To force a rolling restart of one of more ECE pods:

In your override-values.yaml file for oc-cn-ece-helm-chart, increment the appropriate pod's restartCount key by 1. For example, if the key was set to 3, you would increment it to 4.

Table 24-1 lists the keys to use for restarting each ECE pod.

Table 24-1 Keys for Restarting ECE Pods

ECE Pod	Key
ecs	charging.ecs.restartCount
pricingupdater	charging.pricingupdater.restartCount
customerupdater	customerUpdater.customerUpdaterList.[N].restartCount¹
emgateway	emgateway.emgatewayList.[N].restartCount¹
diametergateway	diametergateway.diametergatewayList.[N].restartCount¹
httpgateway	httpgateway.httpgatewayList.[N].restartCount¹
brmgateway	brmgateway.brmgatewayList.[N].restartCount¹
radiusgateway	radiusgateway.radiusgatewayList.[N].restartCount¹
ratedeventformatter	ratedEventFormatter.ratedEventFormatterList.[N].restartCount¹
monitoringagent	monitoringAgent.monitoringAgentList.[N].restartCount¹

Notes:

(1) N represents the item block list, which is indicated by a dash (–) in the override-values.yaml file.

Perform a helm upgrade to update the Helm release:

helm upgrade EceReleaseName oc-cn-ece-helm-chart --values OverrideValuesFile -n BrmNameSpace