Critical Memory Switchover

You can configure a high availability deployment of the ESBC to switch to the standby when the system detects memory utilization that is persistently high. Over-utilization of memory can trigger a system crash. This function reduces the risk of those crashes.

The ESBC includes an alarm mechanism that raises alarms and sends traps when memory utilization has exceeded your configured critical threshold. This function uses and extends upon that feature to trigger an HA switchover when memory utilization exceeds your configured critical threshold for 2 consecutive alarm iterations. Each utilization alarm consumes 15 seconds, resulting in a switchover when the condition exists for a maximum of 30 seconds.

The ESBC uses the existing health score function to trigger this switchover. When the system experiences two consecutive memory alarm conditions, it subtracts 100 from the health score of the active machine. This value would exceed any HA trigger setting and cause the system to switchover to the standby. The ESBC also uses the existing alarm/trap implemented for other alarm threshold functions.

You set the critical-memory-abort option in the system-config to switchover to configure this switchover function.

ORACLE(system-config)# options +critical-memory-abort=switchover

Valid values for this option include enabled, disabled, and switchover.

If you type options and then the option value without the plus sign, you overwrite any previously configured options. To add a new option to an options list, prepend the new option with a plus sign as shown above.

This function requires that you configure the critical memory alarm-threshold so that the system has a value to use. The example below sets the critical memory alarm threshold to 80 percent.

ORACLE(system-config)# alarm-threshold
alarm-threshold
 type        memory
 volume
 severity    critical
 value       80

When the ESBC first reaches this critical value, the system:

  • Raises an ACME Alarm
  • Issues an SNMP trap

The alarm and trap remain valid as long as the system remains in a critical state. If the ESBC reaches this critical value twice within 30 seconds, the system decreases its health score by 100, triggering a switchover to the new Active,

At this point, the original Active is not available for any ensuing switchover. But after this switchover, you can perform a reboot to the ESBC on the former active device. This reboot clears memory and resets the health score, making the new standby available for any ensuing switchover.

The system does not automatically clear the alarm or issue a clear trap for this feature. You use the clear-alarm command manually or reboot the system to clear this alarm.

Note:

If you configure this option on a system that is not operating in an HA deployment, the ESBC stops processing calls after it reaches the critical memory threshold.

Switchover Behavior During Traffic Spikes

The ESBC includes an additional check function to ensure this switchover behavior does not take place if excessive memory utilization is triggered by temporary traffic spikes. To accomplish this, the system stores the exact memory usage (excluding the free list) whenever it hits the critical memory threshold the first time. The system then compares this memory in subsequent iterations to determine whether the system is remaining over the critical memory threshold. The system triggers the switchover in the following cases to ensure a temporary traffic spike did not cause that first detection:

  1. The stored memory value in 1st iteration is less than the subsequent iteration value.
  2. The subsequent iteration value is greater than the critical threshold configured value.

Related Configuration

This feature interacts with the memory-utilization-threshold and heap-threshold memory options in the system-config.

  • The system allows you to configure the values of the alarm-threshold of severity critical in conjunction with the heap-threshold option. This feature, however always refers to the critical alarm-threshold to trigger its functionality. The system uses heap-threshold memory options in the system-config to determine whether it can accept new SIP requests.
  • On the SLB, the system considers the minimum heap-threshold and memory-utilization-threshold values to determine when to stop throttling traffic. The lower of these values becomes the value for both.