Skip Headers
Oracle® Database Appliance Getting Started Guide
Release 2.10 for Linux x86-64

E22692-36
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

7 Validating and Troubleshooting Oracle Database Appliance

This chapter contains information about how to validate changes and troubleshoot Oracle Database Appliance problems. Various tools that perform one or both of these tasks are described in the following sections:

Oracle Database Appliance Diagnostics and Validation Tool

The Oracle Appliance Manager diagnostics and validation tool is managed with Oracle Appliance Manager oakcli validate commands. The tool provides diagnostic and validation functions to resolve support issues. If you experience problems with Oracle Database Appliance, then use the oakcli validate command to verify that your environment is properly configured and that best practices are in effect. When placing a service request, also use Oracle Appliance Manager as described in this chapter to prepare the log files to send to Oracle Support Services.

Note:

The Oracle Appliance Manager diagnostics and validation tool is not available on hardware prior to Oracle Database Appliance X3-2.

Oracle Database Appliance Validation Tool Overview

Use the command oakcli validate to validate the status of Oracle Database Appliance. You must run the oakcli validate command as the root user.

The command uses the following syntax, where checklist is a single check or a comma-delimited list of checks, and output_file_name is the name that you designate for a validation output file:

oakcli validate -h 
oakcli validate [-V | -l | -h]
oakcli validate [-v] [-f output_file_name] [-a | -d | -c checklist]

See the following two tables for a summary of the validation tool options and system checks.

Table 7-1 Oracle Database Appliance Validation Tool Options

Option Purpose

-a

Run all system checks, including DiskCalibration. Oracle recommends that you use this command to validate system readiness before deployment. Do not run oakcli validate with this option on a running system, because the DiskCalibration system check can cause performance issues. See Table 7-2 for details about each check.

-c

Run a comma-delimited list of checks. Use this command to run a specific check or list of checks.

-d

Run the default system checks. The default system checks are NetworkComponents, OSDiskStorage, SharedStorage, DiskCalibration, and SystemComponents. See Table 7-2 for details about each check.

-f

Create an output file. Provide the name of the file after the -f flag. The output for the validation commands is written to this file. If you do not specify that output is sent to a file, then output is sent to the screen (stdout).

-h

Print help information.

-l

Lists all system check options, and describes the options.

-v

Provide verbose output.

-V

Print the validation tool version.


Table 7-2 Oracle Database Appliance System Checks

Check Purpose

asr

Validate Oracle Auto Service Request (Oracle ASR) components based on Oracle ASR configuration file and Oracle Integrated Lights Out Manager (Oracle ILOM) sensor data.

DiskCalibration

Preinstallation check for the storage disk performance using /opt/oracle/oak/orion.

Do not run this check after you have deployed Oracle software on Oracle Database Appliance, because running the DiskCalibration command on a deployed system creates performance issues.

NetworkComponents

Validate public and private network hardware connections. Note: This option is not valid on hardware prior to Oracle Database Appliance X3-2.

OSDiskStorage

Validate the operating system disks, and file system information.

SharedStorage

Validate shared storage and multipathing information

StorageTopology

Validate external JBOD (storage shelf) connectivity

SystemComponents

Validate system components, based on Oracle ILOM sensor data readings.


Examples of Oracle Database Appliance Validation Tool Commands

The following command lists and describes all validation command options:

$ ./oakcli validate -l

The following command runs all system checks:

$ ./oakcli validate -a

The following command performs a system check for disk calibration:

$ ./oakcli validate -c DiskCalibration

The following command runs system checks to validate hardware system components and Oracle Database Appliance network components:

$ ./oakcli validate -c SystemComponents,NetworkComponents

Note:

The NetworkComponents option is not available on hardware prior to Oracle Database Appliance X3-2.

Oracle Database Appliance System Troubleshooting

If you encounter errors while configuring Oracle Database Appliance, then review the following messages and actions:

Error Encountered in Step 11 Validation VIP appears to be up on the network
Cause: This message is most likely to occur when you attempt to redeploy the End-User Bundle without cleaning up a previous deployment. This error occurs because an existing VIP is configured for the addresses assigned to Oracle Database Appliance.
Action: Run cleanupDeploy.pl on Node 0, and then restart Oracle Appliance Manager.
Error "CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node oda2-1, number 1, and is terminating"
Cause: This error occurs when the Oracle Grid Infrastructure CSS daemon attempts to start the node as a standalone cluster node, but during startup discovers that the other cluster node is running, and changes to cluster mode to join the cluster.
Action: Ignore this error
Installation requires partitioning of your hard drive
Cause: This message occurs on a node if one of the two operating system disks is not installed, but you are attempting to reimage the operating system.
Action: Ensure that both operating system disks are installed and are available.
Machine Check Exception ...This is not a software problem
Cause: There is a hardware system error.
Action: Log in to the Oracle ILOM Remote Console to determine the specific hardware error.
No volume control GStreamer plugins and/or devices found
Cause: Operating system plug-ins required for sound cards for the Oracle ILOM remote redirection console are not installed.
Action: Ignore this message. You do not require volume control for the console.
Reboot and Select proper Boot device Or Insert Boot Media in selected Boot device and press a key
Cause: One or both operating system disks are not available. This message occurs if you select "Default hard disk" during reimaging the system, but that disk is not available.
Action: Ensure that both operating system disks are installed and are available.
The AoDB Linux installation tree in that directory does not seem to match your boot media
Cause: This message occurs on a node if both operating disks are installed, and you choose to reimage the operating system disks. If you select "Default (use BIOS settings)" as your imaging option, but one or both of the disks is not available.
Action: Ensure that both operating system disks are available for use.
ERROR: Gateway IP is not pingable
Cause: On Windows platforms, the Oracle Appliance Manager configurator uses the echo service on port 7 to contact the gateway. If the echo service is disabled, possibly for security reasons, the ping fails.
Action: Run the native platform ping command. If the ping is successful, then the configurator validation output can be ignored.
ACFS Resources Failed to Start After Applying 2.2 Infra Patch
Cause: Oracle Database Appliance operating system upgrade includes upgrade of Oracle Enterprise Linux to Oracle Unbreakable Enterprise Kernel (Oracle UEK). Since Oracle Automatic Storage Management Cluster File System (ACFS) is not supported on all versions of Oracle Linux, a successful upgrade of the operating system may effectively disable Oracle ACFS.

Upgrade to Oracle Database Appliance 2.2 has three options: —infra, —gi, and —database. The —infra option includes upgrade from Oracle Enterprise Linux to Oracle UEK. Before the —infra upgrade to 2.2, the operating system is Oracle Enterprise Linux with 11.2.0.2.x Grid Infrastructure. After the —infra upgrade, the operating system is Oracle UEK and 11.2.0.2.x ACFS, which is not compatible with Oracle UEK.

For example, upgrade to Oracle Linux 2.6.32-300.11.1.el5uek causes reco.acfsvol.acfs and ora.registry.acfs to temporarily go to an OFFLINE state, because 2.6.32-300.11.1.el5uek does not support Oracle 11.2.0.2.x ACFS. However, when Oracle Grid Infrastructure is upgraded to 11.2.0.3.2, these components are online again.

Action: Upgrade to Oracle Database Appliance 2.2 with the —gi option. This version of the software includes Oracle Grid Infrastructure 11.2.0.3.2, which includes Oracle ACFS modules that works with Oracle UEK.

For more information, see My Oracle Support Note 1369107.1:

https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=1369107.1

Troubleshooting Oracle ILOM Configuration

This section describes Oracle Integrated Lights Out Manager (Oracle ILOM) troubleshooting issues that can occur during installation.

If you encounter errors with configuring Oracle ILOM, then first complete the following tasks:

  1. Check to ensure that you have the required Java Development Kit (JDK) version installed on the server. A 32-bit JDK, such as jdk-6u24-linux-i586.rpm, is required. To download the JDK, go to the following URL: http://www.oracle.com/technetwork/java/javase/downloads/index.html.

  2. Check your environment to ensure that you have the required Java applications installed (for example, Java Web Start, javaws) so that the Remote Console can start.

Oracle ILOM Error Messages and Actions

The following are messages that can occur when using Oracle Integrated Lights Out Manager Remote Console:

Cannot redirect CD-ROM image
Cause: When you attempt to log into the Remote Console, a message window titled "Cannot redirect CD-ROM image" appears with the error message "CD-ROM image redirection is not supported on your client OS." This error occurs when the required Java Development Kit (JDK) for the Remote Console is not installed. A 32-bit JDK is required for cd-rom redirection. The remote console works with both 32-bit and 64-bit JDK.
Action: Install the JDK kit required for the Remote Console version you are using. To download the JDK, go to the following URL:

http://www.oracle.com/technetwork/java/javase/downloads/index.html.

Servicing Memory Modules (DIMMs) (CRU)" in Chapter 4 of Oracle Database Appliance Service Manual
Class fault.memory.intel.dimm_ue Error
Cause: You open the Oracle ILOM Remote Console because a node is offline, and receive a class fault.memory.intel.dimm error. This error indicates a dual in-line memory module (DIMM) has failed.
Action: Refer to Section "4.3 Servicing Memory Modules (DIMMs)" in Oracle Database Appliance Service Manual, Release 2.1, for instructions about how to replace the failed DIMM.
System Status Faulted in System Overview
Cause: A system error has occurred.
Action: Click View, and review the details of the system error.
Reboot and Select proper Boot Device
Cause: The system needs to be reset to read the system startup files.
Action: Enter the following IPMI tool commands to reset the system, where servername is the name of the system:
ipmitool -U root -P changeme -H servername chassis power off
ipmitool -U root -P changeme -H servername chassis power on
You have chosen to open jnipgenerator-16 which is a: JNLP file
Cause: Applications that are required for the Remote Console to open are not installed on the system. When you start the Remote Console and see this message, and click OK to open with the default browser, the Java console source code is displayed.
Action: Install the missing Java applications (for example, javaws) so that the Remote Console interface can start.

Preparing Log Files for Oracle Support Services

If you have a system fault that requires help from Oracle Support Services, you might need to provide log records. Collect log file information by running the oakcli manage diagcollect command. This command consolidates information from log files stored on Oracle Database Appliance into a single log file for use by Oracle Support Services. The location of the file is specified in the command output.

Additional Troubleshooting Tools and Commands

This section describes additional tools and commands to diagnose and troubleshoot problems with Oracle Database Appliance, some of which are specific to Oracle Database Appliance while others are tools for all clustered systems. The section provides information about the following resources:

Oracle Appliance Manager Tools for Configuration Auditing, Disk Diagnosis, and Cabling Validation

Oracle Appliance Manager provides access to a number of sophisticated monitoring and reporting tools, some of them derived from standalone tools that require their own syntax and command sets. This section briefly describes the commands in the following list, which are covered in more detail in Appendix D, "Oracle Appliance Manager Command-Line Interface":

  • ORAchk

    The ORAchk Configuration Audit Tool audits important configuration settings for Oracle RAC two node deployments in categories such as:

    • Operating system kernel parameters, packages, and so on

    • RDBMDS

    • Database parameters and other database configuration settings

    • CRS/Grid infrastructure

    • ASM

  • Disk Diagnostic Tool

    Use the Disk Diagnostic Tool to help identify the cause of disk problems. The tool produces a list of fourteen disk checks for each node. To run the tool, enter the following command:

    # /opt/oracle/oak/bin/oakcli stordiag e_shelf_pd_unit
    
  • Cabling Validation Tool

    After you cable or re-cable Oracle Database Appliance storage, check that your cables are properly configured using Oracle Appliance Manager. (Note: this is not applicable to hardware prior to Oracle Database Appliance X3-2.) After you startup the newly-cabled Oracle Database Appliance, run the oakcli validate -c storagetopology command as the root user. If you have deployed the Virtualized Platform, then run the command from ODA_BASE. Output from the command looks similar to example shown here:

    # /opt/oracle/oak/bin/oakcli validate -c storagetopology
    INFO    : ODA Topology Verification Utility v0.1
    INFO    : Check hardware type              
    SUCCESS : Type of hardware found : V2
    INFO    : Check for Environment(Bare Metal or Virtual Machine)
    SUCCESS : Type of environment found : Bare Metal
    INFO    : Check number of Controllers      
    SUCCESS : Number of Internal LSI SAS controller found : 1
    SUCCESS : Number of External LSI SAS controller found : 2
    INFO    : Check for Controllers correct PCIe slot address
    SUCCESS : Internal LSI SAS controller   : 50:00.0
    SUCCESS : External LSI SAS controller 0 : 30:00.0
    SUCCESS : External LSI SAS controller 1 : 40:00.0
    INFO    : Check if JBOD powered on         
    SUCCESS : 1JBOD : Powered-on
    INFO    : Check for correct number of EBODS(2 or 4)
    SUCCESS : EBOD found : 2
    INFO    : Check for External Controller 0  
    SUCCESS : Controller connected to correct ebod number
    SUCCESS : Controller port connected to correct ebod port
    SUCCESS : Overall Cable check for controller 0
    INFO    : Check for External Controller 1  
    SUCCESS : Controller connected to correct ebod number
    SUCCESS : Controller port connected to correct ebod port
    SUCCESS : Overall Cable check for controller 1
    INFO    : Check for overall status of cable validation on Node
    SUCCESS : Overall Cable Validation on Node
    INFO    : Check Node Identification status 
    SUCCESS : Node Identification
    SUCCESS : Node name based on cable configuration found : NODE0
    

    Check the output for any reported errors and, if any are shown, shutdown the system and re-cable as directed. After re-installing the cables, restart Oracle Database Appliance and rerun the cable validation test.

Trace File Analyzer Collector

Trace File Analyzer (TFA) Collector simplifies diagnostic data collection on Oracle Clusterware/Grid Infrastructure and RAC systems. TFA behaves in a similar manner to the diagcollection utility packaged with Oracle Clusterware. Both tools collect and package diagnostic data. However, TFA is much more powerful than diagcollection because TFA centralizes and automates the collection of diagnostic information.

TFA provides the following key benefits and options:

  • Encapsulation of diagnostic data collection for all CRS/GI and RAC components on all cluster nodes into a single command executed from a single node

  • Option to "trim" diagnostic files during data collection to reduce data upload size

  • Options to isolate diagnostic data collection to a given time period and to a particular product component, such as ASM, RDBMS, or Clusterware

  • Centralization of collected diagnostic output to a single node in Oracle Database Appliance, if desired

  • On-Demand Scans of all log and trace files for conditions indicating a problem

  • Real-Time Scan Alert Logs for conditions indicating a problem (DB Alert Logs, ASM Alert Logs, Clusterware Alert Logs, etc.)

See Also:

TFA Collector- The Preferred Tool for Automatic or Ad Hoc Diagnostic Gathering Across All Cluster Nodes, Document ID 1513912.1, on My Oracle Support for further information.

Ongoing Monitoring with Oracle Cluster Health Monitor and Oracle OSWatcher

You might avoid problems that cause shutdowns or other performance problems by regular monitoring. The Cluster Health Monitor (CHM) and (OSWatcher) are two particularly useful tools available on Oracle Database Appliance for this purpose.

The CHM stores real-time metrics about clusterware performance in a repository. Find detailed information about the CHM in Appendix H, Troubleshooting Oracle Clusterware, in the Oracle Clusterware Administration and Deployment Guide.

Occasionally, the Cluster Health Monitor (CHM) database grows too large, and you will see errors such as the following, which appears when you run the command

CRS-9011-Error manage: Failed to initialize connection
to the Cluster Logger Service

To solve this problem, you should perform the following steps:

  1. Run the command df -h /u01 as the grid user on both nodes as shown in the following example for one node:

    # df -h /u01
    Filesystem Size Used Avail Use% Mounted on
    /dev/mapper/VolGroupSys-LogVolU01
      97G 61G 32G 67% /u01
    
  2. If the result shows that the /u01 partition is over 60% full, as in the preceding example, then check the size of the GRID_HOME with the ls -lrth command on both nodes and look in the output for the row with the crfclust.bdb entry, as shown in this example for one node:

    # cd /u01/app/11.2.0.3/grid/crf/db/oda1
    # ls -lrth grep crfclust.bdb
        -rw-r----- 1 root root 31G Jan 13 08:28 crfclust.bdb
    
  3. If the size of crfclust.bdb (shown in Step 2) is larger than the size shown in the output from the df -h command (shown in Step 1) on both nodes, then complete these two steps:

    1. While logged into either node as the grid user, resize the CHM repository database as shown in the following example:

      # oclumon manage -repos resize 259200
      oda1 --> retention check successful
      oda2 --> retention check successful
      New retention is 259200 and will use 4516300800 bytes of disk space
       
      CRS-9115-Cluster Health Monitor repository size change completed on all nodes.
      
    2. As the grid user, log into each node in turn to restart the ora.crf resource as shown in the following example for one node only:

      # crsctl stop res ora.crf -init
              CRS-2673: Attempting to stop 'ora.crf' 
              on 'oda1'
       
              CRS-2677: Stop of 'ora.crf' on 'oda1' succeeded
      
      # crsctl start res ora.crf
      
  4. Re-run the command df -h /u01 as the grid user on both nodes (as done originally in Step 1) to ensure that the problem has been resolved. If the problem still exists, then see My Oracle Support Notes 1574492.1 and 1343105.1 for additional help.

OSWatcher captures performance metrics from the operating system. You might be asked by Oracle Support to provide OSWatcher reports to help them diagnose certain types of problem. For more information about OSWatcher, see Oracle Support Note 301137.1 at https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=301137.1

Oracle Database Appliance Hardware Monitoring Tool

The Oracle Database Appliance Hardware Monitoring Tool monitors the health of the hardware components in Oracle Database Appliance server nodes as needed. The hardware monitor queries the overall health of all hardware components in the server node. Use the tool on bare metal and on virtualized systems.

See the list of monitored components, which are shown in the following list for convenience, in the output of the oakcli show -h command:

  • Server

  • Processor

  • Memory

  • Power

  • Cooling

  • Network

  • Storage Enclosure

See Also:

Appendix D for detailed information about all Oracle Appliance Manager commands including oakcli show.

The information reported by the Oracle Database Appliance Hardware Monitoring Tool is only for the node on which you run the command. Details in the output depend on the component you select to review. The following example shows the output for the power subsystem on the current node:

oakcli show power

NAME            HEALTH HEALTH DETAILS PART_NO. SERIAL_NO.          LOCATION 
INPUT POWER OUTPUT POWER INLET TEMP      EXHAUST TEMP

Power Supply_0  OK     -              7047410   476856F+1242CE0020 PS0      Present     88 watts     31.250 degree C 34.188 degree C
Power Supply_1  OK     -              7047410   476856F+1242CE004J PS1      Present     66 watts     31.250 degree C 34.188 degree C

Note:

Upon initial startup of ODA_BASE on Oracle Database Appliance Virtualized Platform, the Oracle Database Appliance Server Hardware Monitoring Tool is enabled and collects base statistics for about 5 minutes. During this time, the tool displays a "Gathering Statistics…" message.

Configuring Serial Console to Debug Startup Problems

Complete the instructions in this section only if you have been advised by Oracle Support to configure a serial console to resolve startup problems. Use the procedure in this section based on your system configuration, either bare metal or Oracle Database Appliance Virtualized Platform.

Note:

The content in this section is for Oracle Database Appliance X3-2 and Oracle Database Appliance X4-2.

Configuring Serial Console on Bare Metal Systems

Complete the following actions to set up the required serial console on a bare metal system:

  1. Change the baud rate in the Basic Input/Output Section (BIOS) by following these steps:

    1. Login into ILOM

    2. Click Host Management

    3. Click Host Control

    4. Select BIOS for the "Next Boot Device:" option

    5. Click Save

    6. Click Power Control under "Host Management"

    7. Restart the machine

    8. Launch the remote console

    9. After the machine restarts, select the "Advance" tab

    10. Click Enter on "Serial Port Console Redirection"

    11. Change the "bits per second" value to 115200

    12. Exit and save your changes, then wait for the automatic restart to complete

  2. Change the ILOM settings by following these steps:

    1. Login to ILOM

    2. Click ILOM Administration

    3. Click Connectivity

    4. Click the Serial port tab

    5. Change the Host Serial Port "Baud Rate" value to 115200

    6. Change the External Serial Port "Baud Rate" value to 115200

    7. Click "Save"

  3. Update the grub.conf file by following these steps:

    1. Open the /boot/grub/grub.conf file for editing

    2. Add the following text to the end of the row that starts with the word kernel:

      console=ttyS0,115200n8 console=tty1
      

      Use the following as an example of what the edited row should contain:

      kernel /vmlinuz-2.6.39-400.126.1.el5uek ro root=/dev/VolGroupSys/LogVolRoot rhgb pci=noaer crashkernel=256M@64M loglevel=7 panic=60 ipv6.disable=1 debug audit=1 intel_idle.max_cstate=1 nomce numa=off rhgb console=ttyS0,115200n8 console=tty1
      
    3. Save the edited file and restart the system to instantiate the change.

  4. This step is optional. If you want to see the console from ILOM, then complete the following steps:

    1. ssh to your ILOM IP address

    2. Login as the root user

    3. Run the command start /SP/console

    4. Enter y in response to the Are you sure you want to start /SP/console (y/n)? prompt

    5. When you are ready to exit the console, click Esc and then click Shift and the 9-key at the same time.

Configuring Serial Console on Virtualized Systems

Complete the following actions to set up the required serial console on a virtualized system:

  1. Change the baud rate in the Basic Input/Output Section (BIOS) by following these steps:

    1. Login into ILOM

    2. Click Host Management

    3. Click Host Control

    4. Select BIOS for the "Next Boot Device:" option

    5. Click Save

    6. Click Power Control under "Host Management"

    7. Restart the machine

    8. Launch the remote console

    9. After the machine restarts, select the "Advance" tab

    10. Click Enter on "Serial Port Console Redirection"

    11. Change the "bits per second" value to 115200

    12. Exit and save your changes, then wait for the automatic restart to complete

  2. Change the ILOM settings by following these steps:

    1. Login to ILOM

    2. Click ILOM Administration

    3. Click Connectivity

    4. Click the Serial port tab

    5. Change the Host Serial Port "Baud Rate" value to 115200

    6. Change the External Serial Port "Baud Rate" value to 115200

    7. Click Save

  3. Update the grub.conf file by following these steps:

    1. Open the /boot/grub/grub.conf file for editing

    2. Add the following text to the end of the row that starts with the word kernel:

      console=com1,vga com1=9600,8n1
      

      Use the following as an example of what the edited row should contain:

      kernel /xen.gz dom0_mem=4096M crashkernel=256M@64M console=com1,vga com1=9600,8n1
      
    3. Add the following text to the end of the row that starts with the word module:

      console=hvc0
      

      Use the following as an example of what the edited row should contain:

      module /vmlinuz-2.6.39-400.126.1.el5uek ro root=UUID=6a9a3989-0aab-47b8-822e-0bdb43aba334 nohz=off intel_idle.max_cstate=1 pci=noaer loglevel=7 nomce panic=60 numa=off  console=hvc0
      
    4. Save the edited file and restart the system to instantiate the change.

  4. This step is optional. If you want to see the console from ILOM, then complete the following steps:

    1. ssh to your ILOM IP address

    2. Login as the root user

    3. Run the command start /SP/console

    4. Enter y in response to the Are you sure you want to start /SP/console (y/n)? prompt

    5. When you are ready to exit the console, click Esc and then click Shift and the 9-key at the same time.