1.11 Troubleshooting Oracle ORAchk and Oracle EXAchk

Follow the steps explained in this section to troubleshoot and fix Oracle ORAchk and Oracle EXAchk related issues.

1.11.1 How to Troubleshoot Oracle ORAchk and Oracle EXAchk Issues

Follow these steps to fix Oracle ORAchk and Oracle EXAchk related issues.

  1. Ensure that you are using the correct tool.

    If you have an Oracle Engineered System other than ODA, then use Oracle EXAchk. For all other systems, use Oracle ORAchk.

  2. Ensure that you are using the latest versions of Oracle ORAchk and Oracle EXAchk.

    New versions are released every three months.

    1. Check the version using the –v  option:
      $ ./orachk –v
      
      $ ./exachk –v
      
    2. Compare your version with the latest version available here:

      1. For Oracle ORAchk, refer to My Oracle Support Note 1268927.2, which is available at the following URL:

        ORAchk Health Checks For The Oracle Stack

      2. For Oracle EXAchk, refer to My Oracle Support Note 1070954.1, which is available at the following URL:

        Oracle Exadata Database Machine exachk or HealthCheck

  3. Check the FAQ  for similar problems in My Oracle Support Note 1070954.1.

  4. Review files within the log directory.

    • Check applicable error.log files for relevant errors.

      This file contains stderr output captured during the run, not everything you see in here will mean you have a problem, but if you have a problem this may give more information.

      • output_dir/log/orachk _error.log

      • output_dir/log/exachk _error.log

    • Check applicable log for other relevant information.

      • output_dir/log/orachk.log

      • output_dir/log/exachk.log

  5. Review My Oracle Support Notes for similar problems.

  6. For Oracle ORAchk issues, check My Oracle Support Community (MOSC), which is available at the following URL:

    ORAchk (MOSC)

  7. If necessary capture debug output, log a new SR and attach the resulting zip file.

1.11.2 How to Capture Debug Output

Follow these procedures to capture debug information.

To capture debug output, use the following process:

  1. Before enabling debug, reproduce the problem with the least run necessary.

    • Debug captures a lot, the resulting zip file can be large so try to narrow down the amount of run necessary to reproduce the problem.

      Use relevant command line options to limit the scope of checks.

  2. Enable debug.

    If you are running the tool in on-demand mode, use the –debug option:
    $ ./orachk –debug
    
    $ ./exachk –debug
    
    $ ./orachk -debug
    + PS4='$(date "+ $LINENO: + ")'
     36276: + [[ -z 1 ]]
      36302: + sed 's/[\.\/]//g'
       36302: + basename /global/u01/app/oracle/arch03/ORACLE_CHECK/ORACLE_SR/orachk
      36302: + echo orachk
     36302: + program_name=orachk
      36303: + which bash
      36303: + echo 0
     36303: + bash_found=0
     36304: + SSH_PASS_STATUS=0
     36307: + set +u
     36309: + '[' 0 -ne 0 ']'
     36315: + raccheck_deprecate_msg='RACcheck has been deprecated.  ORAchk provides the same functionality.  Please switch to using ORAchk from same directory.\n\nRACcheck will not be available after this (12.1.0.2.3) release.\n\nSee MOS Note "RACcheck Configuration Audit Tool Statement of Direction - name change to ORAchk (Doc ID 1591208.1)".\n'
     36316: + '[' orachk = raccheck ']'
     36325: + export LC_ALL=C
     36325: + LC_ALL=C
     36326: + NO_WRITE_PASS=0
     36327: + ECHO=:
     36328: + DEBUG=:
     36329: + AUDITTAB=db_audit
     36379: + supported_modules='PREUPGR              
     . . . . . . 
     . . . . . . 
    
    

    When debug is enabled, Oracle ORAchk and Oracle EXAchk create a new debug log file in:

    • output_dir/log/orachk _debug_date_stamp_time_stamp.log

    • output_dir/log/exachk _debug_date_stamp_time_stamp.log

    This will contain:

    • bash -x of program on local node

    • bash -x of program on all remote nodes

    • bash -x of all dynamically generated and called scripts

      • The output_dir directory retains a number of other temporary files used during health checks.

      • If you run health checks using the daemon, then restart the daemon with the –d start_debug option.

        Running this command generates both debug for daemon and include debug in all client runs:
        $ ./orachk –d start_debug
        
        $ ./exachk –d start_debug
        
    When debug is run with the daemon, Oracle ORAchk and Oracle EXAchk create a daemon debug log file in the directory the daemon was started:
    orachk_daemon_debug.log
    
    exachk_daemon_debug.log
    
  3. Collect the resulting output zip file, and the daemon debug log file if applicable.

1.11.3 Error Messages or Unexpected Output

Troubleshoot and fix error messages and unexpected output.

1.11.3.1 Data Entry Terminal Considerations

Use any supported UNIX and Linux terminal type (character mode terminal, ILOM, VNC server) to run Oracle ORAchk and Oracle EXAchk. Respond to the prompts during interactive run, or while configuring the daemon.

Each terminal type has advantages and disadvantages. The impact of a dropped network connection varies based on the terminal type used.

For example, in an interactive run using a character mode terminal, if all the prompts are answered prior to the network drop and the update messages are scrolling by, then the running process completes successfully even if the network connection drops. If the network connection drops before all of the input prompts are answered, then all of the running processes hang. Clean up the hung processes manually when the network connection is restored.

Using a remote connection to a VNC server running on the database where Oracle ORAchk and Oracle EXAchk are running minimizes the network drop interruptions.

If you use accessibility software or devices that prevents the use of a VNC server, and experience network drops, then you must work with your network team and system administrator to determine the root cause and adjust the environment as required.

For example, if an accessibility aid inserts suspensions and restarts the interactive process running Oracle ORAchk and Oracle EXAchk leads to an operating system timeout due to terminal inactivity. Lengthen the inactivity timeouts of the environment before running the commands.

The timeout caused by an assistive tool at the operating system level due to terminal inactivity is not specific to Oracle ORAchk and Oracle EXAchk. The timeout could happen to any process managed by the assistive technology.

1.11.3.2 Tool Runs without Producing Files

Oracle ORAchk and Oracle EXAchk create temporary files and directories at runtime, as well as output files for data collection.

If you cancel Oracle ORAchk using control-C or if Oracle ORAchk fails due to an error, then it cleans up files that it created while running.

If Oracle ORAchk or Oracle EXAchk complete health check runs but did not generate output files, then there is an error probably near the end of the run that caused an ungraceful exit. If the problem persists, then run the tool again in debug mode and examine the output. If necessary contact Oracle Support for assistance.

1.11.3.3 Messages similar to “line ****: **** Killed $perl_cmd 2>> $ERRFIL?”

Oracle ORAchk and Oracle EXAchk has a built-in watchdog process that monitors and kills commands that exceed default timeouts to prevent hangs.

The error message is a result of a killed command.

1.11.3.4 Messages similar to “RC-001- Unable to read driver files”

There are a number of possible causes related to not having a supported platform or not being able to read or write into temporary, working or installation directories.

Oracle ORAchk and Oracle EXAchk display the same error message also as, RC-002- Unable to read driver files

Troubleshooting Process

  1. Verify that you are running on a supported platform, see:

  2. Verify that there is sufficient diskspace available in the temporary or output directory. If necessary increase disk space or direct temporary and output files elsewhere.

  3. Verify the hidden subdirectory .cgrep exits within the install location. This directory contains various support files some of which are platform-specific.

  4. Verify that you are able to write into and read out of the temporary and working directory location.

1.11.3.5 Messages similar to “There are prompts in user profile on [hostname] which will cause issues in [tool] successful execution”

Oracle ORAchk and Oracle EXAchk sources the user environment file on all nodes and if those contain prompts, for example, read -p, or other commands that pause the running commands, then the commands timeout because there is no way to respond to the messages when its being called.

If Oracle ORAchk or Oracle EXAchk detects prompts in the user profile, then it displays the referenced message and exits.

All such commands may not be detected in the environment, but those that are might lead to this message.

Troubleshooting Process

Comment all such prompts from the user profile file (at least temporarily) and test run again.

1.11.3.6 Messages similar to “Syntax error near unexpected token $tag”

This error is caused if you have not installed the correct version of Bash.

When you run Oracle ORAchk and Oracle EXAchk, you may get an error similar to this:

./orachk: line 21817: syntax error near unexpected token `"$tag"'   ./orachk: line 21817: `     ?*) path+=("$tag") ;;'

Troubleshooting Process

Install Bash 3.2 or later.

1.11.3.7 Problems Related to Remote Login

Troubleshoot and fix issues related to remote logins.

Messages similar to "-bash: /usr/bin/ssh -q: No such file or directory”

See Remote Login Problems  for more details.

Messages similar to ”/usr/bin/scp -q: No such file or directory “

See Remote Login Problems  for more details.

1.11.3.8 Messages similar to “Another instance of orachk/exachk is running”

This error occurs if the previous session was abruptly terminated. Abruptly ending a session leaves the process ID lock file in the temporary folder.

The following text appears when you attempt to run Oracle ORAchk and Oracle EXAchk:

Another instance of orachk is running on myhost. Please allow it to finish on myhost before you run it on another node.

Troubleshooting Process

  1. Verify that the previous process is terminated, using the command as follows::
    $ ps –ef | grep orachk
    
    $ ps –ef | grep exachk
    
  2. Terminate the process if it is still running, using the command as follows:
    $ kill pid
    
  3. Verify if the temporary directory generated by Oracle ORAchk during the previous run is deleted. If the directory still exists, delete it.

    • By default the temporary directory is, $HOME/.orachk or $HOME/.exachk.

    • You can override the default temporary directory using the environment variable RAT_TMPDIR.

1.11.3.9 Other Error Messages in orachk_error.log or exachk_error.log

When examining the orachk_error.log , some errors should appear. Some errors are expected errors and are not indicative of problems. These errors are redirected and absorbed into the error.log to keep them from being reported on the screen. You do not need to report these types of errors to Oracle Support.

For example, an error similar to the following may be reported numerous times, once for each Oracle software home for each node:

/bin/sh: /u01/app/11.2.0/grid/OPatch/opatch: Permission denied
chmod: changing permissions of `/u01/app/oracle_ebs/product/11.2.0.2/VIS_RAC/.patch_storage': Operation not permitted
OPatch could not open log file, logging will not be possible
Inventory load failed... OPatch cannot load inventory for the given Oracle Home.

These types of errors occur in role-separated environments when the tool is run as the Oracle Database software owner attempts to list the patch inventories of homes that are owned by other users (GRID or other database home owners) using Opatch. When you run Opatch to list the patch inventories for those other users, it fails because the current user does not have permissions on the other homes. In these cases, the Opatch error is ignored and the patch inventories for those homes are gathered by other means. This is an example of why it is recommended to run as root in role-separated environments.

Additionally, ignore the errors similar to the following:

./orachk: line [N]: [: : integer expression expected

The line number may change over time but this error just means that the tool was expecting an integer return value and no value was found, That is, the value was null so the shell returns this error when attempting to make the comparison. This error might be repeated many times for the same command, once for each node.

1.11.4 Operating System Is Not Discovered Correctly

If Oracle ORAchk and Oracle EXAchk is not able to detect the platform, then they will prompt stating that the data needed for the derived platform could not be found to improperly detecting an unsupported platform.

Set RAT_OS to the correct operating system:
$ export RAT_OS=platform

1.11.5 Clusterware or Database is not Detected or Connected Issues

Troubleshoot and fix Clusterware or database related issues.

1.11.5.1 Clusterware Software is Installed, but Cannot be Found

Oracle ORAchk discovers the location of the Clusterware home from the oraInst.loc and oraInventory files.

Clusterware discovery can fail due to:

  • Problems discovering those files.

  • Problems with the files themselves.

  • One or more paths in those files are incorrect.

Troubleshooting Process

  1. Ensure that the oraInst.loc file is located correctly and is properly formed.

    If it is not in the default location, then set the RAT_INV_LOC environment variable to point to the oraInventory directory:
    $ export RAT_INV_LOC=oraInventory directory
    
  2. If necessary set the RAT_CRS_HOME environment variable to point to the location of the Clusterware home:
    $ export RAT_CRS_HOME=CRS_HOME
    

1.11.5.2 Database Software Is Installed, but Cannot Be Found

If the database software is installed, but Oracle ORAchk and Oracle EXAchk cannot find it, then set the RAT_ORACLE_HOME environment variable to the applicable ORACLE_HOME directory.

$ export RAT_ORACLE_HOME=ORACLE_HOME

Oracle ORAchk and Oracle EXAchk perform best practice and recommended patch checks for all the databases running from the home specified in the RAT_ORACLE_HOME  environment variable.

1.11.5.3 Database Software Is Installed, but Version cannot Be Found

If the Database software is installed, but Oracle ORAchk and Oracle EXAchk cannot find the correct version, then set the RAT_DB environment variable to the applicable version.

$ export RAT_DB=11.2.0.3.0
.

1.11.5.4 ASM Software is Installed, but Cannot be Found

If the ASM software is installed, but Oracle ORAchk and Oracle EXAchk cannot find it, then set the RAT_ASM_HOME environment variable to the applicable home directory.

$ export RAT_ASM_HOME=ASM_HOME

1.11.5.5 Database Discovery Issues on RAC Systems

On RAC systems, Oracle ORAchk discovers the database resources registered in the Oracle Cluster Registry.  The ORACLE_HOME for the database resources is derived from the profile of the database resources.

If there is a problem with any of that, then Oracle ORAchk may not be able to recognize or connect to one or more databases.  If this occurs the problems should be found and addressed. However, use the -dbnames option temporarily to workaround this problem. Specify the names of the database in a comma-delimited list as follows:
$ ./orachk -dbnames ORCL,ORADB
Alternatively, you can use the space-delimited environment variable RAT_DBNAMES:
$ export RAT_DBNAMES="ORCL ORADB"

Use double quotes if you are specifying more than one database.

Note:

If you configure RAT_DBNAMES as a subset of databases registered in the Clusterware, and you want the patch inventories of ALL databases found registered in the Clusterware to have their patch inventories checked for recommended patches then you must also configure RAT_DBHOMES.

By default, the recommended patch analysis is limited to the homes for the list of databases specified in the RAT_DBNAMES  environment variable.

To perform the recommended patch analysis for additional database homes than just those specified in the RAT_DBNAMES environment variable, set space-delimited list of all database names in the RAT_DBHOMES environment variable.

For example:
export RAT_DBNAMES="ORCL ORADB"
export RAT_DBHOMES="ORCL ORADB PROD"

Best practice checks are applied to ORACL and ORADB.

Recommended patch checks are applied to ORACL, ORADB and PROD.

1.11.5.6 Database Login Problems

If you run Oracle ORAchk and Oracle EXAchk as a user other than the database software installation owner, root or grid, and if you experience problems connecting to the database, then perform the following steps:

  1. Login as grid (operating system) user on the system.
  2. Run export ORACLE_HOME=path of Oracle database home
  3. Run export ORACLE_SID=database SID
  4. Run export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/lib:$PATH
  5. Add alias in the $ORACLE_HOME/network/admin/tnsnames.ora file fordatabase SID.
  6. Connect to the database using $ORACLE_HOME/bin/sqlplus "sys@SID as sysdba", and enter the password.
  7. Ensure that you have a successful connection.

If this method of connecting to the database does not work, then Oracle ORAchk and Oracle EXAchk do not connect either.

  • If you have multiple homes owned by different users and you are not able to login to the target database as the user running Oracle ORAchk independently in SQL*Plus, then Oracle ORAchk does not login either.

  • If the operating system authentication is not set up, then it should still prompt you for user name and password.

1.11.6 Remote Connections

Troubleshoot and fix remote connections issues.

1.11.6.1 Remote Login Problems

If Oracle ORAChk and Oracle EXAchk have problem locating and running SSH or SCP, then the tools cannot run any remote checks.

Additionally, if passwordless remote root login is not permitted over SSH or Expect is not able to pass the root password, then the root privileged commands do not work

  1. Verify that the SSH and SCP commands can be found.
    • If SSH commands return the error, -bash: /usr/bin/ssh -q: No such file or directory, then it may be because SSH is not located where expected.

      Set the RAT_SSHELL environment variable pointing to the location of SSH:

      $ export RAT_SSHELL=path to ssh
      
    • If SCP commands return the error, /usr/bin/scp -q: No such file or directory, then it may be because SCP is not located where expected.

      Set the RAT_SCOPY environment variable pointing to the location of SCP:
      $ export RAT_SCOPY=path to scp
      
  2. Verify that the user you are running as can run the following command manually from where you are running Oracle ORAchk and Oracle EXAchk to whichever remote node is failing.
    $ ssh root@remotehostname "id"
    root@remotehostname's password:
    uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel)
    
    • If this fails, then engage the systems administrators to correct this if only temporarily for running the tool.

    • Oracle ORAchk and Oracle EXAchk search for the prompts or traps in remote user profiles. However, the tools may miss some. If you have prompts in remote profiles comment them out at least temporarily and test run again.

    • If the passwordless remote root login can be configured, edit the /etc/ssh/sshd_config file as follows:
      n to yes
      
      Now, run the following command as root on all nodes of the cluster:
      hd restart
      
  3. Enable Expect debugging.
    • Oracle ORAchk uses the Expect utility when available to answer password prompts to connect to remote nodes for password validation as well as running root collections, without logging the actual connection process by default.

    • Set environment variables to help debug remote target connection issues.

      • RAT_EXPECT_DEBUG: If this variable is set to -d , then the Expect command tracing is activated. The trace information is written to the standard output.

        For example:
        export RAT_EXPECT_DEBUG=-d
        
      • RAT_EXPECT_STRACE_DEBUG: If this variable is set to strace, strace calls the Expect command. The trace information is written to the standard output.

        For example:
        export RAT_EXPECT_STRACE_DEBUG=strace
        
    • By varying the combinations of these two variables, you can get three levels of Expect connection trace information.

Note:

These two variables should only be set at the direction of Oracle support or development. They are typically used in combination with other variables and user interface options to restrict the amount of data collected during the tracing, and the “script” command to capture standard output. They should not be set for a full Oracle ORAchk run as that will generate a large amount of data, and if the “script” command is not used, the trace data will simply scroll by on the screen and be lost!

As a temporary workaround while you resolve remote problems you can run reports local on each node then merge them together later.

On each node, run:
./orachk -local
./exachk -local
Then merge the collections to obtain a single report.
./orachk –merge zipfile 1  zip file 2 > zip file 3 > zip file ...
./exachk –merge zipfile 1  zip file 2 > zip file 3 > zip file ...

1.11.7 Permission Problems

You need to have necessary directory permissions to run Oracle ORAchk and Oracle EXAchk.

  1. Verify that the permissions on the tools scripts orachk and exachk  are set to 755 (-rwxr-xr-x).
    If the permissions are not currently set to 755, then set the permissions as follows:
    $ chmod 755 orachk
    
    $ chmod 755 exachk
    
  2. If Oracle ORAchk and Oracle EXAchk were installed by root and you are running as a different user, then you may not have the necessary directory permissions.
    [root@randomdb01 exachk]# ls -la
    total 14072
    drwxr-xr-x  3 root root    4096 Jun  7 08:25 .
    drwxrwxrwt 12 root root    4096 Jun  7 09:27 ..
    drwxrwxr-x  2 root root    4096 May 24 16:50 .cgrep
    -rw-rw-r--  1 root root 9099005 May 24 16:50 collections.dat
    -rwxr-xr-x  1 root root  807865 May 24 16:50 exachk
    -rw-r--r--  1 root root 1646483 Jun  7 08:24 exachk.zip
    -rw-r--r--  1 root root    2591 May 24 16:50 readme.txt
    -rw-rw-r--  1 root root 2799973 May 24 16:50 rules.dat
    -rw-r--r--  1 root root     297 May 24 16:50 UserGuide.txt
    

In which case, you need to run as root or unzip again as the Oracle software install user.

1.11.8 Slow Performance, Skipped Checks and Timeouts

Follow these procedures to address slow performance and other issues.

When Oracle ORAchk and Oracle EXAchk run commands, a child process is spawned to run the command and a watchdog daemon monitors the child process. If the child process is slow or hung, then the watchdog kills the child process and the check is registered as skipped:

The watchdog.log file also contains entries similar to killing stuck command.

Depending on the cause of the problem you may not see skipped checks.

  1. Determine if there is a pattern to what is causing the problem.
    • EBS checks, for example, depend on the amount of data present and may take longer than the default timeout.

    • Remote checks may timeout and be killed and skipped, if there are prompts in the remote profile. Oracle ORAchk and Oracle EXAchk search for prompts or traps in the remote user profiles. If you have prompts in remote profiles, then comment them out at least temporarily and test run again.

  2. Increase the default timeout.
    • You override the default timeoute by setting the environment variables.


      Table 1-16 Timeout Controlling

      Timeout Controlling Default Value (seconds) Environment Variable

      Checks not run by root (most).

      90

      RAT_TIMEOUT

      Collection of all root checks.

      300

      RAT_ROOT_TIMEOUT

      SSH login DNS handshake.

      1

      RAT_PASSWORDCHECK_TIMEOUT


    • The default timeouts are designed to be lengthy enough for the vast majority of cases. If it is not long enough, then it is possible you are experiencing a system performance problem that should be corrected. Many timeouts can be indicative of a non Oracle ORAchk and Oracle EXAchk problem in the environment.

  3. If it is not acceptable to increase the timeout to the point where nothing fails, then try excluding problematic checks running separately with a large enough timeout and then merging the reports back together.
    • See "Using Profiles with Oracle ORAchk and Oracle EXAchk"  for more details about excluding all checks in a profile and only run checks in a specific profile.

      For example: -excludeprofile ebs and –profile ebs.

    • See "Excluding Individual Checks"   if this is just a few checks.

    • See "Merging Reports"  for more details.

  4. If the problem does not appear to be down to slow or skipped checks but you have a large cluster, then try increasing the number of slave processes user for parallel database run.
    • Database collections are run in parallel. The default number of slave processes used for parallel database run is calculated automatically. This default number can be changed using the options:-dbparallel slave processes, or –dbparallelmax

    Note:

    The higher the parallelism the more resources are consumed. However, the elapsed time is reduced.

    You can raise or lower the number of parallel slaves beyond the default value.

    After the entire system is brought up after maintenance, but before the users are permitted on the system, use a higher number of parallel slaves to finish a run as quickly as possible.

    On a busy production system, use a number less than the default value yet more than running in serial mode to get a run more quickly with less impact on the running system.

    Turn off the parallel database run using the -dbserial option.