To troubleshoot and fix Oracle ORAchk and Oracle EXAchk issues, follow the steps explained in this section.
2.11.1 How to Troubleshoot Oracle ORAchk and Oracle EXAchk Issues
To troubleshoot Oracle ORAchk and Oracle EXAchk issues, follow the steps explained in this section.
To troubleshoot Oracle ORAchk and Oracle EXAchk:
- Ensure that you are using the correct tool.
Use Oracle EXAchk for Oracle Engineered Systems except for Oracle Database Appliance. For all other systems, use Oracle ORAchk.
- Ensure that you are using the latest versions of Oracle ORAchk and Oracle EXAchk.
- Check the version using the
$ ./orachk –v
$ ./exachk –v
- Compare your version with the latest version available here:
For Oracle ORAchk, refer to My Oracle Support Note 1268927.2.
For Oracle EXAchk, refer to My Oracle Support Note 1070954.1.
- Check the version using the
- Check the FAQ for similar problems in My Oracle Support Note 1070954.1.
- Review the files within the
Check the applicable
error.logfiles for relevant errors.
stderroutput captured during the run.
Check the applicable log for other relevant information.
- Review My Oracle Support Notes for similar problems.
- For Oracle ORAchk issues, check ORAchk (MOSC) in My Oracle Support Community (MOSC).
- If necessary, capture the debug output, and then log an SR and attach the resulting
2.11.2 How to Capture Debug Output
Follow these steps to capture debug information.
To capture debug output:
- Reproduce the problem with fewest runs before enabling debug.
Debug captures a lot and the resulting
zipfile can be large so try to narrow down the amount of run necessary to reproduce the problem.
Use command-line options to limit the scope of checks.
- Enable debug.If you are running the tool in on-demand mode, then use the
$ ./orachk –debug
$ ./exachk –debug
When you enable debug, Oracle ORAchk and Oracle EXAchk create a new debug log file in:
output_dirdirectory retains various other temporary files used during health checks.
If you run health checks using the daemon, then restart the daemon with the
–d start –debugoption.Running this command generates both debug for daemon and include debug in all client runs:
$ ./orachk –d start –debug
$ ./exachk –d start –debugWhen debug is run with the daemon, Oracle ORAchk and Oracle EXAchk create a daemon debug log file in the directory in which the daemon was started:
- Collect the resulting output
zipfile and the daemon debug log file, if applicable.
2.11.3 Remote Login Problems
If Oracle ORAChk and Oracle EXAchk tools have problem locating and running SSH or SCP, then the tools cannot run any remote checks.
root privileged commands do not work if:
rootlogin is not permitted over SSH
Expect utility is not able to pass the
- Verify that the SSH and SCP commands can be found.
The SSH commands return the error, -bash: /usr/bin/ssh -q: No such file or directory, if SSH is not located where expected.
RAT_SSHELLenvironment variable pointing to the location of SSH:
$ export RAT_SSHELL=path to ssh
The SCP commands return the error, /usr/bin/scp -q: No such file or directory, if SCP is not located where expected.Set the
RAT_SCOPYenvironment variable pointing to the location of SCP:
$ export RAT_SCOPY=path to scp
- Verify that the user you are running as, can run the following command manually from where you are running Oracle ORAchk and Oracle EXAchk to whichever remote node is failing.
$ ssh root@remotehostname "id" root@remotehostname's password: uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel)
If you face any problems running the command, then contact the systems administrators to correct temporarily for running the tool.
Oracle ORAchk and Oracle EXAchk search for the prompts or traps in remote user profiles. If you have prompts in remote profiles, then comment them out at least temporarily and test run again.
If you can configure passwordless remote
rootlogin, then edit the
/etc/ssh/sshd_configfile as follows:
n to yesNow, run the following command as
rooton all nodes of the cluster:
- Enable Expect debugging.
Oracle ORAchk uses the Expect utility when available to answer password prompts to connect to remote nodes for password validation. Also, to run
rootcollections without logging the actual connection process by default.
Set environment variables to help debug remote target connection issues.
RAT_EXPECT_DEBUG: If this variable is set to
-d, then the Expect command tracing is activated. The trace information is written to the standard output.For example:
RAT_EXPECT_STRACE_DEBUG: If this variable is set to
stracecalls the Expect command. The trace information is written to the standard output.For example:
By varying the combinations of these two variables, you can get three levels of Expect connection trace information.
RAT_EXPECT_STRACE_DEBUG variables only at the direction of Oracle support or development. The
RAT_EXPECT_STRACE_DEBUGvariables are used with other variables and user interface options to restrict the amount of data collected during the tracing. The
script command is used to capture standard output.
As a temporary workaround while you resolve remote problems, run reports local on each node then merge them together later.
./orachk –merge zipfile 1 zip file 2 > zip file 3 > zip file ...
./exachk –merge zipfile 1 zip file 2 > zip file 3 > zip file ...
2.11.4 Permission Problems
You must have sufficient directory permissions to run Oracle ORAchk and Oracle EXAchk.
- Verify that the permissions on the tools scripts
exachkare set to
755 (-rwxr-xr-x).If the permissions are not set, then set the permissions as follows:
$ chmod 755 orachk
$ chmod 755 exachk
- If you install Oracle ORAchk and Oracle EXAchk as
rootand run the tools as a different user, then you may not have the necessary directory permissions.
[root@randomdb01 exachk]# ls -la total 14072 drwxr-xr-x 3 root root 4096 Jun 7 08:25 . drwxrwxrwt 12 root root 4096 Jun 7 09:27 .. drwxrwxr-x 2 root root 4096 May 24 16:50 .cgrep -rw-rw-r-- 1 root root 9099005 May 24 16:50 collections.dat -rwxr-xr-x 1 root root 807865 May 24 16:50 exachk -rw-r--r-- 1 root root 1646483 Jun 7 08:24 exachk.zip -rw-r--r-- 1 root root 2591 May 24 16:50 readme.txt -rw-rw-r-- 1 root root 2799973 May 24 16:50 rules.dat -rw-r--r-- 1 root root 297 May 24 16:50 UserGuide.txt
In which case, you must run as
root or unzip again as the Oracle software install user.
2.11.5 Slow Performance, Skipped Checks and Timeouts
Follow these steps to fix slow performance and other issues.
Figure 2-27 Skipped Checks
Description of "Figure 2-27 Skipped Checks"
watchdog.log file also contains entries similar to killing stuck command.
Depending on the cause of the problem, you may not see skipped checks.
- Determine if there is a pattern to what is causing the problem.
EBS checks, for example, depend on the amount of data present and may take longer than the default timeout.
Remote checks may timeout and be killed and skipped, if there are prompts in the remote profile. Oracle ORAchk and Oracle EXAchk search for prompts or traps in the remote user profiles. If you have prompts in remote profiles, then comment them out at least temporarily and test run again.
- Increase the default timeout.
Override the default timeout by setting the environment variables.
Table 2-4 Timeout Controlling
Timeout Controlling Default Value (seconds) Environment Variable
Checks not run by root (most).
Collection of all root checks.
SSH login DNS handshake.
The default timeouts are designed to be lengthy enough for most cases. If the timeout is not long enough, then it is possible you are experiencing a system performance problem. Many timeouts can be indicative of a non-Oracle ORAchk and Oracle EXAchk problem in the environment.
- If it is not acceptable to increase the timeout to the point where nothing fails, then try excluding problematic checks running separately with a large enough timeout and then merging the reports back together.
- If the problem does not appear to be down to slow or skipped checks but you have a large cluster, then try increasing the number of slave processes user for parallel database run.
Database collections are run in parallel. The default number of slave processes used for parallel database run is calculated automatically. Change the default number using the options:
-dbparallel slave processes, or
The higher the parallelism the more resources are consumed. However, the elapsed time is reduced.
Raise or lower the number of parallel slaves beyond the default value.
After the entire system is brought up after maintenance, but before the users are permitted on the system, use a higher number of parallel slaves to finish a run as quickly as possible.
On a busy production system, use a number less than the default value yet more than running in serial mode to get a run more quickly with less impact on the running system.
Turn off the parallel database run using the