Software Worker Threads Watchdog Timer and Health Check Trap

The Oracle Communications Session Border Controller monitors specific software threads for faults and provides the user with configurable actions to take in case of thread failure. The system registers applicable threads to this watchdog and assumes a thread has failed when it does not respond. By default, the Oracle Communications Session Border Controller generates information about the event and reboot history. For HA configurations, the system synchronizes this watchdog configuration and simultaneously operates on both the active and standby Oracle Communications Session Border Controllers.

You can query the system to show the actual threads being monitored with the show platform health-check command. The output include these columns:

  • Name: name of the thread that registered with HealthCheck
  • Count: Health Count of the thread
  • State: State of thread as either: STOPPED, RUNNING, EXCLUDE
  • Duration: Stop Expire time in seconds. Shows 0 for RUNNING and EXCLUDE states.
ORACLE# show platform health-check
------------------------------------------------
Name Count STATE DURATION
------------------------------------------------
tLrtd 3 RUNNING 0
lrtdWorkerThrea 3 RUNNING 0
dnsWorker01 3 RUNNING 0
loseld 3 RUNNING 0
npsoft 3 RUNNING 0
tFlowGdTmr 3 RUNNING 0
tLemd 3 RUNNING 0
tServiceHealth 3 RUNNING 0
tAtcpd 3 RUNNING 0
atcpd02 0 EXCLUDE 0
atcpd01 0 EXCLUDE 0
[...]
------------------------------------------------
Total Displayed: 39
-----------------------------------------------

When an applicable thread is not responding, the Oracle Communications Session Border Controller's default behavior includes:

  • Generate a log message
  • Issue an alarm
  • Issue a SNMP trap
  • Generate a core dump
  • Reboot

The user configures the Software Worker Threads Watchdog action by configuring the sw-health-check-action option in the system-config with one of the following values:

  • logonly — Generate log message only
  • logandreboot — Generate log message and reboot
  • logcoreandreboot — Generate log message, generate a core dump and reboot [default]

By default, the system checks thread status every 16 seconds. The user can change this interval with the task-health-check-time option configured in the system-config.

When the system identifies an unresponsive thread, it sends out the following trap: apUsbcSysThreadNotRespondingTrap. This trap is defined within the apUsbc MIB. The system sends it once by default; this value can be overridden by the trap configuration. This function does not include a clear trap.

Be aware that the tHealthCheckd task monitors only the application tasks that are registered with it. It does not monitor any platform tasks.

None of the configuration options are real-time configurable; the user must reboot after changing the option.

Software Worker Thread Health Check Interval Configuration

Use this procedure to set the timing and action for the Software Worker Thread Health Check and Watchdog Timer.

  1. Access the system-config configuration element.
    ORACLE# configure terminal
    ORACLE(configure)# system
    ORACLE(system)# system-config
    ORACLE(system-config)# 
  2. Type select to begin editing the system-config object.
    ORACLE(system-config)# select
    ORACLE(system-config)#
  3. Set the task-health-check-time option to the preferred interval (in seconds)
    ORACLE(system-config)# option +task-health-check-time=10
  4. Set the watchdog timer action option that indicates the action on thread failure.
    ORACLE(system-config)# option +sw-health-check-action=logonly
  5. Type done to save your configuration.