12.8 Slow Performance, Skipped Checks, and Timeouts

Follow these procedures to fix slow performance and other issues.

When Oracle ORAchk and Oracle EXAchk run commands, a child process is spawned to run the command and a watchdog daemon monitors the child process. If the child process is slow or hung, then the watchdog kills the child process and the check is registered as skipped:

The watchdog.log file also contains entries similar to killing stuck command.

Depending on the cause of the problem, you may not see skipped checks.

  1. Determine if there is a pattern to what is causing the problem.
    • EBS checks, for example, depend on the amount of data present and may take longer than the default timeout.

    • If there are prompts in the remote profile, then remote checks timeout and be killed and skipped. Oracle ORAchk and Oracle EXAchk search for prompts or traps in the remote user profiles. If you have prompts in remote profiles, then comment them out at least temporarily, and test run again.

  2. Increase the default timeout.
    • You override the default timeouts by setting the environment variables.

      Table 12-1 Timeout Controlling

      Timeout Controlling Default Value (seconds) Environment Variable

      Collection of all checks not run by root (most).

      Specify the timeout value for individual checks.

      Varies per check.

      RAT_{CHECK-ID}_TIMEOUT

      General timeout for all checks

      90

      RAT_TIMEOUT

      SSH login DNS handshake.

      Specify the time in seconds for checking passwords on the remote nodes.

      1

      RAT_PASSWORDCHECK_TIMEOUT

    • The default timeouts are lengthy enough for most cases. If it is not long enough, then it is possible you are experiencing a system performance problem that should be corrected. Many timeouts can be indicative of a non-Oracle ORAchk and Oracle EXAchk problem in the environment.

  3. If you can not increase the timeout, then try excluding problematic checks running separately with a large enough timeout and then merging the reports back together.
  4. If the problem does not appear to be down to slow or skipped checks but you have a large cluster, then try increasing the number of slave processes users for parallel database run.
    • Database collections are run in parallel. The default number of slave processes used for parallel database run is calculated automatically. You can change the default number using the options:-dbparallel slave processes, or –dbparallelmax

      The higher the parallelism the more resources are consumed. However, the elapsed time is reduced. You can raise or lower the number of parallel slaves beyond the default value. After the entire system is brought up after maintenance, but before the users are permitted on the system, use a higher number of parallel slaves to finish a run as quickly as possible.

      On a busy production system, use a number less than the default value yet more than running in serial mode to get a run more quickly with less impact on the running system.

      Turn off the parallel database run using the -dbserial option.