2.18 Troubleshooting Compliance Framework (Oracle ORAchk and Oracle EXAchk)

Follow the steps explained in this section to troubleshoot and fix Compliance Framework (Oracle ORAchk / Oracle EXAchk) related issues.

2.18.1 How to Troubleshoot Oracle ORAchk and Oracle EXAchk Issues

Follow these steps to fix the Oracle ORAchk and Oracle EXAchk related issues.

  1. Ensure that you are using the correct tool.

    If you have an Oracle Engineered System other than Oracle Database Appliance, then use Oracle EXAchk. For all other systems, use Oracle ORAchk.

  2. Ensure that you are using the latest versions of Oracle ORAchk and Oracle EXAchk.

    New versions are released every three months.

    1. Check the version using the –v option.
      $ orachk –v
      $ exachk –v
    2. Compare your version with the latest version available here:
      1. For Oracle ORAchk, refer to My Oracle Support note 2550798.1.

      2. For Oracle EXAchk, refer to My Oracle Support note 1070954.1.

  3. Check the FAQ  for similar problems in My Oracle Support note 1070954.1.
  4. Review files within the log directory.
    1. Check applicable error.log files for relevant errors.

      This file contains stderr output captured during the run, not everything you see in here will mean you have a problem, but if you have a problem this may give more information.

      • output_dir/log/orachk _error.log

      • output_dir/log/exachk _error.log

    2. Check applicable log for other relevant information.
      • output_dir/log/orachk.log

      • output_dir/log/exachk.log

  5. Review My Oracle Support notes for similar problems.
  6. For Oracle ORAchk issues, check My Oracle Support Community (MOSC).
  7. If necessary capture debug output, log a new SR and attach the resulting zip file.

2.18.2 How to Capture Debug Output

Follow these procedures to capture debug information.

  1. Before enabling debug, reproduce the problem with the least run necessary.
    • Debug captures a lot, the resulting zip file can be large so try to narrow down the amount of run necessary to reproduce the problem.

      Use relevant command line options to limit the scope of checks.

  2. Enable debug.

    If you are running the tool in on-demand mode, then use –debug argument.

    If the problem area is known, then debug can be constrained to a particular module by including the –module argument too.

    $ orachk -debug [-module [ setup | discovery | execution | output ] ]
    $ exachk -debug [-module [ setup | discovery | execution | output ] ]

    When debug is enabled, Oracle ORAchk and Oracle EXAchk create a new debug log file in:

    • output_dir/log/orachk _debug_date_stamp_time_stamp.log

    • output_dir/log/exachk _debug_date_stamp_time_stamp.log

    The output_dir directory retains a number of other temporary files used during health checks.

    If you run health checks using the daemon, then restart the daemon with the –d start –debug option.

    Running this command generates both debug for daemon and include debug in all client runs:
    $ orachk –d start –debug
    $ exachk –d start –debug
    When debug is run with the daemon, Oracle ORAchk and Oracle EXAchk create a daemon debug log file in the directory the daemon was started:
    orachk_daemon_debug.log
    exachk_daemon_debug.log
  3. Collect the resulting output zip file, and the daemon debug log file if applicable.

2.18.3 Remote Login Problems

If Oracle ORAchk and Oracle EXAchk have problem locating and running SSH or SCP, then the tools cannot run any remote checks.

Also, the root privileged commands do not work if:

  • Passwordless remote root login is not permitted over SSH

  • Expect utility is not able to pass the root password

  1. Verify that the SSH and SCP commands can be found.
    • The SSH commands return the error, No such file or directory, if SSH is not located where expected.

      Set the RAT_SSHELL environment variable pointing to the location of SSH:

      $ export RAT_SSHELL=path to ssh
    • The SCP commands return the error, /usr/bin/scp -q: No such file or directory, if SCP is not located where expected.

      Set the RAT_SCOPY environment variable pointing to the location of SCP:
      $ export RAT_SCOPY=path to scp
  2. Verify that the user you are running as, can run the following command manually from where you are running Oracle ORAchk and Oracle EXAchk to whichever remote node is failing.
    $ ssh root@remotehostname "id"
    root@remotehostname's password:
    uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel)
    • If you face any problems running the command, then contact the systems administrators to correct temporarily for running the tool.

    • Oracle ORAchk and Oracle EXAchk search for the prompts or traps in remote user profiles. If you have prompts in remote profiles, then comment them out at least temporarily and test run again.

    • If you can configure passwordless remote root login, then edit the /etc/ssh/sshd_config file as follows:
      n to yes
      Now, run the following command as root on all nodes of the cluster:
      hd restart
  3. Enable Expect debugging.
    • Oracle ORAchk uses the Expect utility when available to answer password prompts to connect to remote nodes for password validation. Also, to run root collections without logging the actual connection process by default.

    • Set environment variables to help debug remote target connection issues.

      • RAT_EXPECT_DEBUG: If this variable is set to -d , then the Expect command tracing is activated. The trace information is written to the standard output.

        For example:
        export RAT_EXPECT_DEBUG=-d
      • RAT_EXPECT_STRACE_DEBUG: If this variable is set to strace, strace calls the Expect command. The trace information is written to the standard output.

        For example:
        export RAT_EXPECT_STRACE_DEBUG=strace
    • By varying the combinations of these two variables, you can get three levels of Expect connection trace information.

Note:

Set the RAT_EXPECT_DEBUG and RAT_EXPECT_STRACE_DEBUG variables only at the direction of Oracle support or development. The RAT_EXPECT_DEBUG and RAT_EXPECT_STRACE_DEBUG variables are used with other variables and user interface options to restrict the amount of data collected during the tracing. The script command is used to capture standard output.

As a temporary workaround while you resolve remote problems, run reports local on each node then merge them together later.

On each node, run:
orachk -local
exachk -local
Then merge the collections to obtain a single report:
orachk –merge zipfile 1  zip file 2 > zip file 3 > zip file ...
exachk –merge zipfile 1  zip file 2 > zip file 3 > zip file ...

2.18.4 Permission Problems

You must have sufficient directory permissions to run Oracle ORAchk and Oracle EXAchk.

  1. Verify that the permissions on the tools scripts orachk and exachk  are set to 755 (-rwxr-xr-x).
    If the permissions are not set, then set the permissions as follows:
    $ chmod 755 orachk
    $ chmod 755 exachk
  2. If you install Oracle ORAchk and Oracle EXAchk as root  and run the tools as a different user, then you may not have the necessary directory permissions.
    [root@randomdb01 exachk]# ls -la
    total 14072
    drwxr-xr-x  3 root root    4096 Jun  7 08:25 .
    drwxrwxrwt 12 root root    4096 Jun  7 09:27 ..
    drwxrwxr-x  2 root root    4096 May 24 16:50 .cgrep
    -rw-rw-r--  1 root root 9099005 May 24 16:50 collections.dat
    -rwxr-xr-x  1 root root  807865 May 24 16:50 exachk
    -rw-r--r--  1 root root 1646483 Jun  7 08:24 exachk.zip
    -rw-r--r--  1 root root    2591 May 24 16:50 readme.txt
    -rw-rw-r--  1 root root 2799973 May 24 16:50 rules.dat
    -rw-r--r--  1 root root     297 May 24 16:50 UserGuide.txt
  • If Oracle Clusterware is installed, then:
    • Install Oracle EXAchk in /opt/oracle.SupportTools/exachk as the Oracle Grid Infrastructure home owner

    • Install Oracle ORAchk in CRS_HOME/suptools/orachk as the Oracle Grid Infrastructure home owner

  • If Oracle Clusterware is not installed, then:
    • Install Oracle EXAchk in /opt/oracle.SupportTools/exachk as root

    • Install Oracle ORAchk (in a convenient location) as root (if possible)

      or

      Install Oracle ORAchk (in a convenient location) as Oracle software install user or Oracle Database home owner

2.18.5 Slow Performance, Skipped Checks, and Timeouts

Follow these procedures to fix slow performance and other issues.

When Oracle ORAchk and Oracle EXAchk run commands, a child process is spawned to run the command and a watchdog daemon monitors the child process. If the child process is slow or hung, then the watchdog kills the child process and the check is registered as skipped:

The watchdog.log file also contains entries similar to killing stuck command.

Depending on the cause of the problem, you may not see skipped checks.

  1. Determine if there is a pattern to what is causing the problem.
    • EBS checks, for example, depend on the amount of data present and may take longer than the default timeout.

    • If there are prompts in the remote profile, then remote checks timeout and be killed and skipped. Oracle ORAchk and Oracle EXAchk search for prompts or traps in the remote user profiles. If you have prompts in remote profiles, then comment them out at least temporarily, and test run again.

  2. Increase the default timeout.
    • You override the default timeouts by setting the environment variables.

      Table 2-9 Timeout Controlling

      Timeout Controlling Default Value (seconds) Environment Variable

      Collection of all checks not run by root (most).

      Specify the timeout value for individual checks.

      Varies per check.

      RAT_{CHECK-ID}_TIMEOUT

      General timeout for all checks

      90

      RAT_TIMEOUT

      SSH login DNS handshake.

      Specify the time in seconds for checking passwords on the remote nodes.

      1

      RAT_PASSWORDCHECK_TIMEOUT

    • The default timeouts are lengthy enough for most cases. If it is not long enough, then it is possible you are experiencing a system performance problem that should be corrected. Many timeouts can be indicative of a non-Oracle ORAchk and Oracle EXAchk problem in the environment.

  3. If you can not increase the timeout, then try excluding problematic checks running separately with a large enough timeout and then merging the reports back together.
  4. If the problem does not appear to be down to slow or skipped checks but you have a large cluster, then try increasing the number of slave processes users for parallel database run.
    • Database collections are run in parallel. The default number of slave processes used for parallel database run is calculated automatically. You can change the default number using the options:-dbparallel slave processes, or –dbparallelmax

      The higher the parallelism the more resources are consumed. However, the elapsed time is reduced. You can raise or lower the number of parallel slaves beyond the default value. After the entire system is brought up after maintenance, but before the users are permitted on the system, use a higher number of parallel slaves to finish a run as quickly as possible.

      On a busy production system, use a number less than the default value yet more than running in serial mode to get a run more quickly with less impact on the running system.

      Turn off the parallel database run using the -dbserial option.