4 Validating and Troubleshooting Oracle Database Appliance

This chapter contains information about how to validate changes and troubleshoot Oracle Database Appliance problems. Various tools that perform one or both of these tasks are described in the following sections:

Oracle Database Appliance Diagnostics and Validation Tool

The Oracle Appliance Manager diagnostics and validation tool is managed with Oracle Appliance Manager oakcli validate commands. The tool provides diagnostic and validation functions to resolve support issues. If you experience problems with Oracle Database Appliance, then use the oakcli validate command to verify that your environment is properly configured and that best practices are in effect. When placing a service request, also use Oracle Appliance Manager as described in this chapter to prepare the log files to send to Oracle Support Services.

Note:

The Oracle Appliance Manager diagnostics and validation tool is not available on hardware prior to Oracle Database Appliance X3-2.

Oracle Database Appliance Validation Tool Overview

Use the command oakcli validate to validate the status of Oracle Database Appliance. You must run the oakcli validate command as the root user.

The command uses the following syntax, where checklist is a single check or a comma-delimited list of checks, and output_file_name is the name that you designate for a validation output file:

oakcli validate -h 
oakcli validate [-V | -l | -h]
oakcli validate [-v] [-f output_file] [-a | -d | -c checklist] [-v patch_version]

See the following two tables for a summary of the validation tool options and system checks.

Table 4-1 Oracle Database Appliance Validation Tool Options

Option Purpose

-a

Run all system checks, including DiskCalibration. Oracle recommends that you use this command to validate system readiness before deployment. Do not run oakcli validate with this option on a busy production system, because the DiskCalibration system check can cause performance degradation. See Table 4-2 for details about each check.

-c checklist

Run the validation checks for the items identified in checklist, a comma-delimited list. Use this parameter to check either a single item or subset of items.

-d

Run only the default checks. The default checks are NetworkComponents, OSDiskStorage, SharedStorage, DiskCalibration, and SystemComponents. See Table 4-2 for details about each check.

-f output_file

Send output to a file with a fully-qualified file name, output_file, instead of to the screen (stdout).

-h

Display the online help.

-l

List the items that can be checked along with their descriptions.

-v

Show verbose output (must be used with a parameter that generates a validation report).

-V

Display the version of oakValidation.

-ver patch_version

Report any reasons for not being able to patch Oracle Database Appliance with the patch named in patch_version.


Table 4-2 Oracle Database Appliance Validation Checks

Check Purpose

asr

Validate Oracle Auto Service Request (Oracle ASR) components based on Oracle ASR configuration file and Oracle Integrated Lights Out Manager (Oracle ILOM) sensor data.

DiskCalibration

Preinstallation check for the storage disk performance using /opt/oracle/oak/orion.

Do not run this check after you have deployed Oracle software on Oracle Database Appliance, because running the DiskCalibration command on a deployed system creates performance issues.

NetworkComponents

Validate public and private network hardware connections. Note: This option is not valid on hardware prior to Oracle Database Appliance X3-2.

OSDiskStorage

Validate the operating system disks, and file system information.

ospatch

Validates that the system will be able to complete an upgrade successfully using the named patch

SharedStorage

Validate shared storage and multipathing information

StorageTopology

Validate the storage shelf connectivity

SystemComponents

Validate system components, based on Oracle ILOM sensor data readings.


Examples of Oracle Database Appliance Validation Tool Commands

The following command lists and describes all validation command options:

# oakcli validate -l

The following command runs all system checks:

# oakcli validate -a

The following command performs a system check for disk calibration:

# oakcli validate -c DiskCalibration

The following command runs system checks to validate hardware system components and Oracle Database Appliance network components:

# oakcli validate -c SystemComponents,NetworkComponents

Note:

The NetworkComponents option is not available on hardware prior to Oracle Database Appliance X3-2.

The oakcli validate -c StorageTopology command performs a check of the cable configuration between the system controllers and the storage shelf, as well as the storage expansion shelf if one is installed. Oracle recommends that you run this command immediately after deploying the system or after adding an expansion storage shelf. The output shown in the following example reports a successful configuration. If the cabling is not correct, you would see errors in your output.

# oakcli validate -c storagetopology
 It may take a while. Please wait...
 INFO : ODA Topology Verification
 INFO : Running on Node0
 INFO : Check hardware type
 SUCCESS : Type of hardware found : X4-2
 INFO : Check for Environment(Bare Metal or Virtual Machine)
 SUCCESS : Type of environment found : Virtual Machine(ODA BASE)
 SUCCESS : Number of External LSI SAS controller found : 2
 INFO : Check for Controllers correct PCIe slot address
 SUCCESS : External LSI SAS controller 0 : 00:15.0
 SUCCESS : External LSI SAS controller 1 : 00:16.0
 INFO : Check if JBOD powered on
 SUCCESS : 1JBOD : Powered-on
 INFO : Check for correct number of EBODS(2 or 4)
 SUCCESS : EBOD found : 2
 INFO : Check for External Controller 0
 SUCCESS : Controller connected to correct ebod number
 SUCCESS : Controller port connected to correct ebod port
 SUCCESS : Overall Cable check for controller 0
 INFO : Check for External Controller 1
 SUCCESS : Controller connected to correct ebod number
 SUCCESS : Controller port connected to correct ebod port
 SUCCESS : Overall Cable check for controller 1
 INFO : Check for overall status of cable validation on Node0
 SUCCESS : Overall Cable Validation on Node0
 INFO : Check Node Identification status
 SUCCESS : Node Identification
 SUCCESS : Node name based on cable configuration found : NODE0
 INFO : Check JBOD Nickname
 SUCCESS : JBOD Nickname set correctly : Oracle Database Appliance - E0
 INFO : The details for Storage Topology Validation can also be found in log file=/opt/oracle/oak/log/<hostname>/storagetopology/StorageTopology-2014-07-03-08:57:31_7661_15914.log

Oracle Database Appliance Configuration Error Messages

If you encounter errors while configuring Oracle Database Appliance, then review the following messages and actions:

Error Encountered in Step 11 Validation VIP appears to be up on the network
Cause: This message is most likely to occur when you attempt to redeploy the End-User Bundle without cleaning up a previous deployment. This error occurs because an existing VIP is configured for the addresses assigned to Oracle Database Appliance.
Action: Run cleanupDeploy.pl on Node 0, and then restart Oracle Appliance Manager.
Error "CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node oda2-1, number 1, and is terminating"
Cause: This error occurs when the Oracle Grid Infrastructure CSS daemon attempts to start the node as a standalone cluster node, but during startup discovers that the other cluster node is running, and changes to cluster mode to join the cluster.
Action: Ignore this error
Installation requires partitioning of your hard drive
Cause: This message occurs on a node if one of the two operating system disks is not installed, but you are attempting to reimage the operating system.
Action: Ensure that both operating system disks are installed and are available.
Machine Check Exception ...This is not a software problem
Cause: There is a hardware system error.
Action: Log in to the Oracle ILOM Remote Console to determine the specific hardware error.
No volume control GStreamer plugins and/or devices found
Cause: Operating system plug-ins required for sound cards for the Oracle ILOM remote redirection console are not installed.
Action: Ignore this message. You do not require volume control for the console.
Reboot and Select proper Boot device Or Insert Boot Media in selected Boot device and press a key
Cause: One or both operating system disks are not available. This message occurs if you select "Default hard disk" during reimaging the system, but that disk is not available.
Action: Ensure that both operating system disks are installed and are available.
The AoDB Linux installation tree in that directory does not seem to match your boot media
Cause: This message occurs on a node if both operating disks are installed, and you choose to reimage the operating system disks. If you select "Default (use BIOS settings)" as your imaging option, but one or both of the disks is not available.
Action: Ensure that both operating system disks are available for use.
ERROR: Gateway IP is not pingable
Cause: On Windows platforms, the Oracle Appliance Manager configurator uses the echo service on port 7 to contact the gateway. If the echo service is disabled, possibly for security reasons, the ping fails.
Action: Run the native platform ping command. If the ping is successful, then the configurator validation output can be ignored.
ACFS Resources Failed to Start After Applying 2.2 Infra Patch
Cause: Oracle Database Appliance operating system upgrade includes upgrade of Oracle Enterprise Linux to Oracle Unbreakable Enterprise Kernel (Oracle UEK). Since Oracle Automatic Storage Management Cluster File System (ACFS) is not supported on all versions of Oracle Linux, a successful upgrade of the operating system may effectively disable Oracle ACFS.

Upgrade to Oracle Database Appliance 2.2 has three options: —infra, —gi, and —database. The —infra option includes upgrade from Oracle Enterprise Linux to Oracle UEK. Before the —infra upgrade to 2.2, the operating system is Oracle Enterprise Linux with 11.2.0.2.x Grid Infrastructure. After the —infra upgrade, the operating system is Oracle UEK and 11.2.0.2.x ACFS, which is not compatible with Oracle UEK.

For example, upgrade to Oracle Linux 2.6.32-300.11.1.el5uek causes reco.acfsvol.acfs and ora.registry.acfs to temporarily go to an OFFLINE state, because 2.6.32-300.11.1.el5uek does not support Oracle 11.2.0.2.x ACFS. However, when Oracle Grid Infrastructure is upgraded to 11.2.0.3.2, these components are online again.

Action: Upgrade to Oracle Database Appliance 2.2 with the —gi option. This version of the software includes Oracle Grid Infrastructure 11.2.0.3.2, which includes Oracle ACFS modules that works with Oracle UEK.

For more information, see My Oracle Support note 1369107.1:

https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=1369107.1

Preparing Log Files for Oracle Support Services

If you have a system fault that requires help from Oracle Support Services, you might need to provide log records. Collect log file information by running the oakcli manage command. This command consolidates information from log files stored on Oracle Database Appliance into a single log file for use by Oracle Support Services. The location of the file is specified in the command output.

Additional Troubleshooting Tools and Commands

This section describes additional tools and commands to diagnose and troubleshoot problems with Oracle Database Appliance, some of which are specific to Oracle Database Appliance while others are tools for all clustered systems. The section provides information about the following resources:

Oracle Appliance Manager Tools for Configuration Auditing and Disk Diagnosis

Oracle Appliance Manager provides access to a number of sophisticated monitoring and reporting tools, some of them derived from standalone tools that require their own syntax and command sets. The following list briefly describes the ORAchk command and the disk diagnostic tool:

  • ORAchk

    The ORAchk Configuration Audit Tool audits important configuration settings for Oracle RAC two node deployments in categories such as:

    • Operating system kernel parameters, packages, and so on

    • RDBMDS

    • Database parameters and other database configuration settings

    • CRS/Grid infrastructure

    • ASM

    ORAchk is system-aware and checks for best practices, for example, that are specific to Oracle Database Appliance when run by Oracle Appliance Manager. To explore ORAchk on Oracle Database Appliance use the oakcli orachk -h command. Find more details about ORAchk at https://support.oracle.com/epmos/faces/DocContentDisplay?id=1268927.2.

  • Disk Diagnostic Tool

    Use the Disk Diagnostic Tool to help identify the cause of disk problems. The tool produces a list of fourteen disk checks for each node. To run the tool, enter the following command:

    # oakcli stordiag eshelf_pd_unit
    

Trace File Analyzer Collector

Trace File Analyzer (TFA) Collector simplifies diagnostic data collection on Oracle Clusterware/Grid Infrastructure and Oracle RAC systems. TFA behaves in a similar manner to the ion utility packaged with Oracle Clusterware. Both tools collect and package diagnostic data. However, TFA is much more powerful than ion because TFA centralizes and automates the collection of diagnostic information.

TFA provides the following key benefits and options:

  • Encapsulation of diagnostic data collection for all CRS/GI and Oracle RAC components on all cluster nodes into a single command executed from a single node

  • Option to "trim" diagnostic files during data collection to reduce data upload size

  • Options to isolate diagnostic data collection to a given time period and to a particular product component, such as ASM, RDBMS, or Clusterware

  • Centralization of collected diagnostic output to a single node in Oracle Database Appliance, if desired

  • On-Demand Scans of all log and trace files for conditions indicating a problem

  • Real-Time Scan Alert Logs for conditions indicating a problem (DB Alert Logs, ASM Alert Logs, Clusterware Alert Logs, etc.)

See Also:

My Oracle Support note "TFA Collector- Tool for Enhanced Diagnostic Gathering" at https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=1513912.1

Oracle Database Appliance Hardware Monitoring Tool

The Oracle Database Appliance Hardware Monitoring Tool, implemented with the Oracle Appliance Manager show command, displays the status of different hardware components in Oracle Database Appliance server nodes. Use the tool on bare metal and on virtualized systems.

See the list of monitored components in the output of the oakcli show -h command.

See Also:

Chapter 5 for detailed information about all Oracle Appliance Manager commands including oakcli show
oakcli show power

NAME            HEALTH HEALTH DETAILS PART_NO. SERIAL_NO.          LOCATION 
INPUT POWER OUTPUT POWER INLET TEMP      EXHAUST TEMP

Power Supply_0  OK     -              7047410   476856F+1242CE0020 PS0    
Present     88 watts     31.250 degree C 34.188 degree C
Power Supply_1  OK     -              7047410   476856F+1242CE004J PS1     
Present     66 watts     31.250 degree C 34.188 degree C

Note:

Upon initial startup of ODA_BASE on Oracle Database Appliance Virtualized Platform, the Oracle Database Appliance Server Hardware Monitoring Tool is enabled and collects base statistics for about 5 minutes. During this time, the tool displays a "Gathering Statistics…" message.

The information reported by the Oracle Database Appliance Hardware Monitoring Tool is only for the node on which you run the command. Details in the output depend on the component you select to review. The following example shows the output for the power subsystem on the current node: