Software Watchdog Timing

The software watchdog (tHealthCheckd) checks each registerd thread for its health status on an interval triggered by the software watchdog timer. The software watchdog timer triggers the software watchdog after each thread runs its own health check (task_health_check_time) on the interval you set. You can set the health check interval from the default of 5 seconds up to 120 seconds with the task_sw_health_check_time option in system-config. The setting applies to all threads on the threadHealthCheckList. You cannot set an interval per thread.

The software watchdog timer uses the following algorithm to determine how often it alerts the software watchdog to check the threads:

3 x task_health_check_time + 1 second.

For example, suppose you set the health check interval to the default value of 5 seconds. The software watchdog timer multiplies 5 seconds x 3 to allow the threads to make up to 3 attempts to get a reference count in 5 second intervals for a total of 15 seconds. The algorithm adds 1 second to the configured interval and directs the software watchdog to check the threads every 16 seconds. Because some threads might respond on the first attempt and others might not respond until the second or third attempt, the system allows the full interval to pass before triggering the software watchdog. The extra second ensures that the each thread can use the full interval to report its health before the software watchdog begins.

Note:

The software watchdog monitors threads on both the Active and the Standby in an HA pair.