1.12 Troubleshooting Oracle ORAchk and Oracle EXAchk

Follow the steps explained in this section to troubleshoot and fix Oracle ORAchk and Oracle EXAchk related issues.

1.12.1 How to Troubleshoot Oracle ORAchk and Oracle EXAchk Issues

Follow these steps to fix Oracle ORAchk and Oracle EXAchk related issues.

  1. Ensure that you are using the correct tool.

    If you have an Oracle Engineered System other than Oracle Database Appliance, then use Oracle EXAchk. For all other systems, use Oracle ORAchk.

  2. Ensure that you are using the latest versions of Oracle ORAchk and Oracle EXAchk.

    New versions are released every three months.

    1. Check the version using the –v  option:
      $ ./orachk –v
      $ ./exachk –v
    2. Compare your version with the latest version available here:

      1. For Oracle ORAchk, refer to My Oracle Support Note 1268927.2.

      2. For Oracle EXAchk, refer to My Oracle Support Note 1070954.1.

  3. Check the FAQ  for similar problems in My Oracle Support Note 1070954.1.

  4. Review files within the log directory.

    • Check applicable error.log files for relevant errors.

      This file contains stderr output captured during the run, not everything you see in here will mean you have a problem, but if you have a problem this may give more information.

      • output_dir/log/orachk _error.log

      • output_dir/log/exachk _error.log

    • Check applicable log for other relevant information.

      • output_dir/log/orachk.log

      • output_dir/log/exachk.log

  5. Review My Oracle Support Notes for similar problems.

  6. For Oracle ORAchk issues, check My Oracle Support Community (MOSC).

  7. If necessary capture debug output, log a new SR and attach the resulting zip file.

1.12.2 How to Capture Debug Output

Follow these procedures to capture debug information.

To capture debug output, use the following process:

  1. Before enabling debug, reproduce the problem with the least run necessary.

    • Debug captures a lot, the resulting zip file can be large so try to narrow down the amount of run necessary to reproduce the problem.

      Use relevant command line options to limit the scope of checks.

  2. Enable debug.

    If you are running the tool in on-demand mode, then use –debug argument.

    If the problem area is known, then debug can be constrained to a particular module by including the –module argument too.

    $ ./orachk -debug [-module [ setup | discovery | execution | output ] ]
    $ ./exachk -debug [-module [ setup | discovery | execution | output ] ]

    When debug is enabled, Oracle ORAchk and Oracle EXAchk create a new debug log file in:

    • output_dir/log/orachk _debug_date_stamp_time_stamp.log

    • output_dir/log/exachk _debug_date_stamp_time_stamp.log

    The output_dir directory retains a number of other temporary files used during health checks.

    If you run health checks using the daemon, then restart the daemon with the –d start –debug option.

    Running this command generates both debug for daemon and include debug in all client runs:
    $ ./orachk –d start –debug
    $ ./exachk –d start –debug
    When debug is run with the daemon, Oracle ORAchk and Oracle EXAchk create a daemon debug log file in the directory the daemon was started:
    orachk_daemon_debug.log
    exachk_daemon_debug.log
  3. Collect the resulting output zip file, and the daemon debug log file if applicable.

1.12.3 Error Messages or Unexpected Output

Follow these steps to troubleshoot and fix error messages and unexpected output.

1.12.3.1 Data Entry Terminal Considerations

Use any supported UNIX and Linux terminal type (character mode terminal, ILOM, VNC server) to run Oracle ORAchk and Oracle EXAchk.

Respond to the prompts during interactive runs, or while configuring the daemon.

Each terminal type has advantages and disadvantages. The effect of a dropped network connection varies based on the terminal type used.

For example, in an interactive run using a character mode terminal, if all the prompts are answered before the network drop, then the running process completes successfully even if the network connection drops. If the network connection drops before all the input prompts are answered, then all the running processes hang. Clean up the hung processes manually when the network connection is restored.

Using a remote connection to a VNC server running on the database where Oracle ORAchk and Oracle EXAchk are running minimizes the network drop interruptions.

If you use accessibility software or devices that prevent the use of a VNC server, and experience network drops, then contact your system administrator to determine the root cause and adjust the environment as necessary.

For example, if an accessibility aid inserts suspensions and restarts the interactive process running Oracle ORAchk and Oracle EXAchk leads to an operating system timeout due to terminal inactivity. Lengthen the inactivity timeouts of the environment before running the commands.

The timeout caused by an assistive tool at the operating system level due to terminal inactivity is not specific to Oracle ORAchk and Oracle EXAchk. The timeout could happen to any process managed by the assistive technology.

1.12.3.2 Tool Runs without Producing Files

Oracle ORAchk and Oracle EXAchk create temporary files and directories at runtime, as well as output files for data collection.

If you cancel Oracle ORAchk using Ctrl+C or if Oracle ORAchk fails due to an error, then Oracle ORAchk cleans up the files that Oracle ORAchk created while running.

If Oracle ORAchk or Oracle EXAchk complete health check runs, but did not generate output files, then there is an error probably near the end of the run that caused an ungraceful exit. If the problem persists, then run the tool again in debug mode and examine the output. If necessary, contact Oracle Support for assistance.

1.12.3.3 Messages similar to “line ****: **** Killed $perl_cmd 2>> $ERRFIL?”

Oracle ORAchk and Oracle EXAchk have a built-in watchdog process that monitors and kills the commands that exceed default timeouts to prevent hangs.

Killing a command results in “line ****: **** Killed $perl_cmd 2>> $ERRFIL?” error.

1.12.3.4 Messages similar to “RC-001- Unable to read driver files”

There are a number of possible causes related to not having a supported platform or not being able to read or write into temporary, working or installation directories.

Oracle ORAchk and Oracle EXAchk display the same error message also as, RC-002- Unable to read driver files

Troubleshooting Process

  1. Verify that you are running on a supported platform.

  2. Verify that there is sufficient diskspace available in the temporary or output directory. If necessary increase disk space or direct temporary and output files elsewhere.

  3. Verify the hidden subdirectory .cgrep exists within the install location. This directory contains various support files some of which are platform-specific.

  4. Verify that you are able to write into and read out of the temporary and working directory location.

1.12.3.5 Messages similar to “There are prompts in user profile on [hostname] which will cause issues in [tool] successful execution”

Oracle ORAchk and Oracle EXAchk tools exit if the tools detect prompts in the user profile.

Oracle ORAchk and Oracle EXAchk fetch the user environment files on all nodes. If the user environment files contain prompts, for example, read -p, or other commands that pause the running commands, then the commands timeout. The commands timeout because there is no way to respond to the messages when it is being called.

All such commands cannot be detected in the environment. However, the commands that can be detected lead to this message.

Troubleshooting Process

Comment all such prompts from the user profile file (at least temporarily) and test run again.

1.12.3.6 Problems Related to Remote Login

If you see messages similar to No such file or directory or /usr/bin/scp -q: No such file or directory, then refer to "Remote Login Problems" to fix the issues.

Related Topics

1.12.3.7 Other Error Messages in orachk_error.log or exachk_error.log

When examining the orachk_error.log , some messages are expected and they are not indicative of problems.

These errors are redirected and absorbed into the error.log to keep them from being reported on the screen. You do not need to report these types of errors to Oracle Support.

For example, an error similar to the following is reported numerous times, once for each Oracle software home for each node:

/bin/sh: /u01/app/11.2.0/grid/OPatch/opatch: Permission denied
chmod: changing permissions of `/u01/app/oracle_ebs/product/11.2.0.2/VIS_RAC/.patch_storage': Operation not permitted
OPatch could not open log file, logging will not be possible
Inventory load failed... OPatch cannot load inventory for the given Oracle Home.

These types of errors occur in role-separated environments when the tool runs as the Oracle Database software owner uses Opatch to list the patch inventories of homes that are owned by Oracle Grid Infrastructure or other Oracle Database home owners. When you run Opatch to list the patch inventories for other users, Opatch fails because the current user does not have permissions on the other homes. In these cases, the Opatch error is ignored and the patch inventories for those homes are gathered by other means. To avoid such errors, Oracle recommends that you run Oracle ORAchk and Oracle EXAchk as root in role-separated environments.

Also, ignore the errors similar to the following:

./orachk: line [N]: [: : integer expression expected

The line number changes over time. However, the error indicates that the tool was expecting an integer return value and no value was found. The value was null that the shell was not able to compare the return values. The error is repeated many times for the same command, once for each node.

1.12.3.8 Space available on {node_name} at {path} is {x} MB and required space is 500 MB

Oracle ORAchk displays an error message when there is no enough space in the location for temporary files and directories.

Space available on at /users/oracle is 441 MB and required space is 500 MB
Please make at least mentioned space available at above location and retry to continue.[y/n][y]?

Oracle ORAchk creates temporary files and directories during execution. The default location for temporary files and directories is the $HOME directory of the user who runs the tool.

To change the location of Oracle ORAchk temporary files set the RAT_TMPDIR environment variable to the new location before running Oracle ORAchk.

1.12.4 Operating System Is Not Discovered Correctly

Oracle ORAchk and Oracle EXAchk display this message if the tools are not able to detect the operating system.

If Oracle ORAchk and Oracle EXAchk are not able to detect the operating system, then the tools prompt:

  • Data needed for the derived platform could not be found

  • Improperly detecting an unsupported platform

Set the RAT_OS environment variable to the correct operating system:
$ export RAT_OS=platform

1.12.5 Oracle Clusterware or Oracle Database is not Detected or Connected Issues

Follow the procedures in this section to troubleshoot and fix Oracle Clusterware or Oracle Database issues.

1.12.5.1 Oracle Clusterware Software is Installed, but Cannot be Found

Oracle ORAchk discovers the location of the Oracle Clusterware home from the oraInst.loc and oraInventory files.

Oracle Clusterware discovery fails due to:

  • Problems discovering the oraInst.loc and oraInventory  files

  • Problems with the oraInst.loc and oraInventory  files

  • One or more paths in the oraInst.loc and oraInventory  files are incorrect

Troubleshooting Process

  1. Ensure that the oraInst.loc file is located correctly and is properly formed.

    If the file is not in the default location, then set the RAT_INV_LOC environment variable to point to the oraInventory directory:
    $ export RAT_INV_LOC=oraInventory directory
  2. If necessary, set the RAT_CRS_HOME environment variable to point to the location of the Oracle Clusterware home:
    $ export RAT_CRS_HOME=CRS_HOME

1.12.5.2 Oracle Database Software Is Installed, but Cannot Be Found

Oracle ORAchk and Oracle EXAchk tools display this message if the tools cannot find the Oracle Database software installed.

If the Oracle Database software is installed, but Oracle ORAchk and Oracle EXAchk cannot find, then set the RAT_ORACLE_HOME environment variable to the applicable ORACLE_HOME directory.

For example, enter the following command, where your-oracle-home is the path to the Oracle home on your system:

$ export RAT_ORACLE_HOME=your-oracle-home

Oracle ORAchk and Oracle EXAchk perform best practice and recommended patch checks for all the databases running from the home specified in the RAT_ORACLE_HOME environment variable.

1.12.5.3 Oracle Database Software Is Installed, but Version cannot Be Found

Oracle ORAchk and Oracle EXAchk tools display this message if the tools cannot find the version of the Oracle Database software installed.

If Oracle ORAchk and Oracle EXAchk cannot find the correct version, then set the RAT_DB environment variable to the applicable version.

For example:
$ export RAT_DB=11.2.0.3.0

1.12.5.4 Oracle ASM Software is Installed, but Cannot be Found

Oracle ORAchk and Oracle EXAchk tools display this message if the tools cannot find the Oracle ASM software installed.

If Oracle ORAchk and Oracle EXAchk cannot find Oracle ASM, then set the RAT_ASM_HOME environment variable to the applicable home directory.
$ export RAT_ASM_HOME=ASM_HOME

1.12.5.5 Oracle Database Discovery Issues on Oracle Real Application Clusters (Oracle RAC) Systems

On Oracle Real Application Clusters (Oracle RAC) systems, Oracle ORAchk discovers the database resources registered in the Oracle Cluster Registry.

The ORACLE_HOME for the database resources is derived from the profile of the database resources.

If there is a problem with any of the profiles of the database resources, then Oracle ORAchk cannot recognize or connect to one or more databases. Use the -dbnames option temporarily to fix the problem.

Specify the names of the database in a comma-delimited list as follows:
$ ./orachk -dbnames ORCL,ORADB
Alternatively, use the space-delimited environment variable RAT_DBNAMES:
$ export RAT_DBNAMES="ORCL ORADB"

Use double quotes to specify more than one database.

Note:

Configure the RAT_DBHOMES environment variable if you,

  • Configure RAT_DBNAMES as a subset of databases registered in the Oracle Clusterware

  • Want to check the patch inventories of ALL databases found registered in the Oracle Clusterware for recommended patches

By default, the recommended patch analysis is limited to the homes for the list of databases specified in the RAT_DBNAMES environment variable.

To perform the recommended patch analysis for additional database homes, specify space-delimited list of all database names in the RAT_DBHOMES environment variable.

For example:
export RAT_DBNAMES="ORCL ORADB"
export RAT_DBHOMES="ORCL ORADB PROD"

Best practice checks are applied to ORACL and ORADB.

Recommended patch checks are applied to ORACL, ORADB, and PROD.

1.12.5.6 Oracle Database Login Problems

Oracle Database login problems arise if you run Oracle ORAchk and Oracle EXAchk without sufficient privileges.

If you run Oracle ORAchk and Oracle EXAchk as a user other than the database software installation owner, root or grid, and if you experience problems connecting to the database, then perform the following steps:
  1. Log in to the system as grid (operating system).
  2. Run export ORACLE_HOME=path of Oracle database home
  3. Run export ORACLE_SID=database SID
  4. Run export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/lib:$PATH
  5. Add alias in the $ORACLE_HOME/network/admin/tnsnames.ora file fordatabase SID.
  6. Connect to the database using $ORACLE_HOME/bin/sqlplus "sys@SID as sysdba", and enter the password.
  7. Ensure that you have a successful connection.

If this method of connecting to the database does not work, then Oracle ORAchk and Oracle EXAchk do not connect either.

  • If you have multiple homes owned by different users and you are not able to login to the target database as the user running Oracle ORAchk independently in SQL*Plus, then Oracle ORAchk does not login either.

  • If the operating system authentication is not set up, then it should still prompt you for user name and password.

1.12.6 Remote Login Problems

If Oracle ORAChk and Oracle EXAchk tools have problem locating and running SSH or SCP, then the tools cannot run any remote checks.

Also, the root privileged commands do not work if:

  • Passwordless remote root login is not permitted over SSH

  • Expect utility is not able to pass the root password

  1. Verify that the SSH and SCP commands can be found.
    • The SSH commands return the error, No such file or directory, if SSH is not located where expected.

      Set the RAT_SSHELL environment variable pointing to the location of SSH:

      $ export RAT_SSHELL=path to ssh
    • The SCP commands return the error, /usr/bin/scp -q: No such file or directory, if SCP is not located where expected.

      Set the RAT_SCOPY environment variable pointing to the location of SCP:
      $ export RAT_SCOPY=path to scp
  2. Verify that the user you are running as, can run the following command manually from where you are running Oracle ORAchk and Oracle EXAchk to whichever remote node is failing.
    $ ssh root@remotehostname "id"
    root@remotehostname's password:
    uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel)
    • If you face any problems running the command, then contact the systems administrators to correct temporarily for running the tool.

    • Oracle ORAchk and Oracle EXAchk search for the prompts or traps in remote user profiles. If you have prompts in remote profiles, then comment them out at least temporarily and test run again.

    • If you can configure passwordless remote root login, then edit the /etc/ssh/sshd_config file as follows:
      n to yes
      Now, run the following command as root on all nodes of the cluster:
      hd restart
  3. Enable Expect debugging.
    • Oracle ORAchk uses the Expect utility when available to answer password prompts to connect to remote nodes for password validation. Also, to run root collections without logging the actual connection process by default.

    • Set environment variables to help debug remote target connection issues.

      • RAT_EXPECT_DEBUG: If this variable is set to -d , then the Expect command tracing is activated. The trace information is written to the standard output.

        For example:
        export RAT_EXPECT_DEBUG=-d
      • RAT_EXPECT_STRACE_DEBUG: If this variable is set to strace, strace calls the Expect command. The trace information is written to the standard output.

        For example:
        export RAT_EXPECT_STRACE_DEBUG=strace
    • By varying the combinations of these two variables, you can get three levels of Expect connection trace information.

Note:

Set the RAT_EXPECT_DEBUG and RAT_EXPECT_STRACE_DEBUG variables only at the direction of Oracle support or development. The RAT_EXPECT_DEBUG and RAT_EXPECT_STRACE_DEBUG variables are used with other variables and user interface options to restrict the amount of data collected during the tracing. The script command is used to capture standard output.

As a temporary workaround while you resolve remote problems, run reports local on each node then merge them together later.

On each node, run:
./orachk -local
./exachk -local
Then merge the collections to obtain a single report:
./orachk –merge zipfile 1  zip file 2 > zip file 3 > zip file ...
./exachk –merge zipfile 1  zip file 2 > zip file 3 > zip file ...

1.12.7 Permission Problems

You must have sufficient directory permissions to run Oracle ORAchk and Oracle EXAchk.

  1. Verify that the permissions on the tools scripts orachk and exachk  are set to 755 (-rwxr-xr-x).
    If the permissions are not set, then set the permissions as follows:
    $ chmod 755 orachk
    $ chmod 755 exachk
  2. If you install Oracle ORAchk and Oracle EXAchk as root  and run the tools as a different user, then you may not have the necessary directory permissions.
    [root@randomdb01 exachk]# ls -la
    total 14072
    drwxr-xr-x  3 root root    4096 Jun  7 08:25 .
    drwxrwxrwt 12 root root    4096 Jun  7 09:27 ..
    drwxrwxr-x  2 root root    4096 May 24 16:50 .cgrep
    -rw-rw-r--  1 root root 9099005 May 24 16:50 collections.dat
    -rwxr-xr-x  1 root root  807865 May 24 16:50 exachk
    -rw-r--r--  1 root root 1646483 Jun  7 08:24 exachk.zip
    -rw-r--r--  1 root root    2591 May 24 16:50 readme.txt
    -rw-rw-r--  1 root root 2799973 May 24 16:50 rules.dat
    -rw-r--r--  1 root root     297 May 24 16:50 UserGuide.txt
  • If Oracle Clusterware is installed, then:
    • Install Oracle EXAchk in /opt/oracle.SupportTools/exachk as the Oracle Grid Infrastructure home owner

    • Install Oracle ORAchk in CRS_HOME/suptools/orachk as the Oracle Grid Infrastructure home owner

  • If Oracle Clusterware is not installed, then:
    • Install Oracle EXAchk in /opt/oracle.SupportTools/exachk as root

    • Install Oracle ORAchk (in a convenient location) as root (if possible)

      or

      Install Oracle ORAchk (in a convenient location) as Oracle software install user or Oracle Database home owner

1.12.8 Slow Performance, Skipped Checks, and Timeouts

Follow these procedures to address slow performance and other issues.

When Oracle ORAchk and Oracle EXAchk run commands, a child process is spawned to run the command and a watchdog daemon monitors the child process. If the child process is slow or hung, then the watchdog kills the child process and the check is registered as skipped:

The watchdog.log file also contains entries similar to killing stuck command.

Depending on the cause of the problem, you may not see skipped checks.

  1. Determine if there is a pattern to what is causing the problem.
    • EBS checks, for example, depend on the amount of data present and may take longer than the default timeout.

    • If there are prompts in the remote profile, then remote checks timeout and be killed and skipped. Oracle ORAchk and Oracle EXAchk search for prompts or traps in the remote user profiles. If you have prompts in remote profiles, then comment them out at least temporarily, and test run again.

  2. Increase the default timeout.
    • You override the default timeouts by setting the environment variables.

      Table 1-18 Timeout Controlling

      Timeout Controlling Default Value (seconds) Environment Variable

      Collection of all checks not run by root (most).

      Specify the time-out in seconds for the checks not run by root.

      90

      RAT_TIMEOUT

      Collection of all root checks.

      Specify the time-out in seconds for the root checks.

      300

      RAT_ROOT_TIMEOUT

      SSH login DNS handshake.

      Specify the time in seconds for checking passwords on the remote nodes.

      1

      RAT_PASSWORDCHECK_TIMEOUT

      Specify how long to leave cells unlocked without automatically locking them again after usage of exachk -unlockcells.

      50m (minutes)

      RAT_CELLUNLOCK_TIMEOUT

      Specify how long to wait for a remote prompt on the remote machine to appear

      10

      RAT_PROMPT_TIMEOUT

      Specify how long does the execution has to wait for the prompt before it time outs.

      15

      RAT_PROMPT_WAIT_TIMEOUT

      Specify how long to wait for the complete remote execution

      7200

      RAT_REMOTE_RUN_TIMEOUT

      Specify the timeout duration for root checks that are executed on the Oracle ZFS storage cluster.

      500

      RAT_ZFS_ROOT_TIMEOUT

    • The default timeouts are lengthy enough for most cases. If it is not long enough, then it is possible you are experiencing a system performance problem that should be corrected. Many timeouts can be indicative of a non-Oracle ORAchk and Oracle EXAchk problem in the environment.

  3. If you can not increase the timeout, then try excluding problematic checks running separately with a large enough timeout and then merging the reports back together.
  4. If the problem does not appear to be down to slow or skipped checks but you have a large cluster, then try increasing the number of slave processes users for parallel database run.
    • Database collections are run in parallel. The default number of slave processes used for parallel database run is calculated automatically. This default number can be changed using the options:-dbparallel slave processes, or –dbparallelmax

    Note:

    The higher the parallelism the more resources are consumed. However, the elapsed time is reduced.

    You can raise or lower the number of parallel slaves beyond the default value.

    After the entire system is brought up after maintenance, but before the users are permitted on the system, use a higher number of parallel slaves to finish a run as quickly as possible.

    On a busy production system, use a number less than the default value yet more than running in serial mode to get a run more quickly with less impact on the running system.

    Turn off the parallel database run using the -dbserial option.