10 Validating and Troubleshooting Oracle Database Appliance

This chapter contains information about how to validate changes and troubleshoot Oracle Database Appliance problems.

Topics:

Oracle Database Appliance Diagnostics and Validation

Use oakcli validate to check your Oracle Database Appliance configuration, and if necessary, to provide information to Oracle Support Services.

The oakcli validate command is the Oracle Appliance Manager diagnostic and validation utility to identify and resolve support issues. If you experience problems with Oracle Database Appliance, then use the oakcli validate command options to verify that your environment is properly configured, and that best practices are in effect. When placing a service request, also use Oracle Appliance Manager as described in this chapter to prepare the log files to send to Oracle Support Services.

Topics:

Oracle Database Appliance Validation Command Overview

Use the oakcli validate command and options to validate the status of Oracle Database Appliance.

You must run the oakcli validate command as the root user.

Syntax

The command oakcli validate uses the following syntax, where checklist is a single check or a comma-delimited list of checks, and output_file is the name that you designate for a validation output file:

oakcli validate -h 
oakcli validate [-V | -l | -h]
oakcli validate [-v] [-f output_file] [-a | -d | -c checklist] [-v patch_version]

Parameters

Option Purpose

-a

Run all system checks, including DiskCalibration. Oracle recommends that you use this command to validate system readiness before deployment. Do not run oakcli validate with this option on a busy production system, because the DiskCalibration system check can cause performance degradation.

-c checklist

Run the validation checks for the items identified in checklist, a comma-delimited list. Use this parameter to check either a single item or subset of items.

-d

Run only the default checks. The default checks are NetworkComponents, OSDiskStorage, SharedStorage, and SystemComponents.

-f output_file

Send output to a file with a fully qualified file name, output_file, instead of to the screen (stdout).

-h

Display the online help.

-l

List the items that can be checked (and their descriptions).

-v

Show verbose output (must be used with a parameter that generates a validation report).

-V

Display the version of oakValidation.

-ver patch_version

Report any reasons for not being able to patch Oracle Database Appliance with the patch named in patch_version.

VALIDATE Options

Command Purpose

-c asr

Validate Oracle Auto Service Request (Oracle ASR) components based on the Oracle ASR configuration file and Oracle Integrated Lights Out Manager (Oracle ILOM) sensor data.

-c DiskCalibration

Preinstallation check for the storage disk performance using /opt/oracle/oak/bin/orion

Do not run this check after you have deployed Oracle software on Oracle Database Appliance, because running the DiskCalibration command on a deployed system creates performance issues.

Use the default check option (oakcli validate -d) if you do not want to perform a system check for disk calibration.

-c NetworkComponents

Validate public and private network hardware connections.

-c OSDiskStorage

Validate the operating system disks, and file system information.

-c ospatch

Validate that the system can complete an upgrade successfully using the named patch.

-c SharedStorage

Validate shared storage and multipathing information.

-c StorageTopology

Validate the storage shelf connectivity.

-c SystemComponents

Validate system components, based on Oracle ILOM sensor data readings.

Examples of OAKCLI Validate Command Checks

Review these examples to see you can perform validation checks using the oakcli validate command and options.

Listing All Checks and Their Descriptions

oakcli validate -l

         Checkname -- Description
         =========    ===========
         *SystemComponents -- Validate system components based on ilom sensor data
         readings
         *OSDiskStorage -- Validate OS disks and filesystem information
         *SharedStorage -- Validate Shared storage and multipathing information
         DiskCalibration -- Check disk performance with orion
         *NetworkComponents -- Validate public and private network components
         *StorageTopology -- Validate external JBOD connectivity
         asr -- Validate asr components based on asr config file and ilom sensor
         data readings

* -- These checks are also performed as part of default checks

Note:

The NetworkComponents validation check is not available on hardware prior to Oracle Database Appliance X3-2.

Running All Checks

Enter the following command to run all checks:

oakcli validate -a

Validating Storage Cable Connections

Check the cable connections between the system controllers and the storage shelf, as well as the cable connection to the storage expansion shelf (if one is installed):

oakcli validate -c storagetopology

Oracle recommends that you run the oakcli validate -c StorageTopology command before deploying the system. This will avoid and prevent problems during deployment due to wrong or missing cable connections. The output shown in the following example reports a successful configuration. If the cabling is not correct, you will see errors in your output.

# oakcli validate -c storagetopology
 It may take a while. Please wait...
 INFO : ODA Topology Verification
 INFO : Running on Node0
 INFO : Check hardware type
 SUCCESS : Type of hardware found : X4-2
 INFO : Check for Environment(Bare Metal or Virtual Machine)
 SUCCESS : Type of environment found : Virtual Machine(ODA BASE)
 SUCCESS : Number of External LSI SAS controller found : 2
 INFO : Check for Controllers correct PCIe slot address
 SUCCESS : External LSI SAS controller 0 : 00:15.0
 SUCCESS : External LSI SAS controller 1 : 00:16.0
 INFO : Check if  powered on
 SUCCESS : 1 : Powered-on
 INFO : Check for correct number of EBODS(2 or 4)
 SUCCESS : EBOD found : 2
 INFO : Check for External Controller 0
 SUCCESS : Controller connected to correct ebod number
 SUCCESS : Controller port connected to correct ebod port
 SUCCESS : Overall Cable check for controller 0
 INFO : Check for External Controller 1
 SUCCESS : Controller connected to correct ebod number
 SUCCESS : Controller port connected to correct ebod port
 SUCCESS : Overall Cable check for controller 1
 INFO : Check for overall status of cable validation on Node0
 SUCCESS : Overall Cable Validation on Node0
 INFO : Check Node Identification status
 SUCCESS : Node Identification
 SUCCESS : Node name based on cable configuration found : NODE0
 INFO : Check  Nickname
 SUCCESS :  Nickname set correctly : Oracle Database Appliance - E0
 INFO : The details for Storage Topology Validation can also be found in 
log file=/opt/oracle/oak/log/<hostname>/storagetopology/StorageTopology-2014-07-03-08:57:31_7661_15914.log

Validating Oracle ASR

Enter the following syntax to validate your Oracle ASR configuration:

# oakcli validate -c asr
INFO: oak Asr information and Validations
RESULT: /opt/oracle/oak/conf/asr.conf exist
RESULT: ASR Manager ip:10.139.154.17
RESULT: ASR Manager port:1162
SUCCESS: ASR configuration file validation successfully completed
RESULT: /etc/hosts has entry 141.146.156.46 transport.oracle.com
RESULT: ilom alertmgmt level is set to minor
RESULT: ilom alertmgmt type is set to snmptrap
RESULT: alertmgmt snmp_version is set to 2c
RESULT: alertmgmt community_or_username is set to public
RESULT: alertmgmt destination is set to 10.139.154.17
RESULT: alertmgmt destination_port is set to 1162
SUCCESS: Ilom snmp confguration for asr set correctly
RESULT: notification trap configured to ip:10.139.154.17
RESULT: notification trap configured to port:1162
SUCCESS: Asr notification trap set correctly
INFO: IP_ADDRESS HOST_NAME SERIAL_NUMBER ASR PROTOCOL SOURCE PRODUCT_NAME
INFO: --------- ---------- ------------- --- -------- ------ ------------
10.170.79.98 oda-02-c 1130FMW00D Enabled SNMP ILOM SUN FIRE X4370 M2 SERVER
10.170.79.97 oda-01-c 1130FMW00D Enabled SNMP ILOM SUN FIRE X4370 M2 SERVER
INFO: Please use My Oracle Support 'http://support.oracle.com' to view the activation status.
SUCCESS: asr log level is already set to Fine.
RESULT: Registered with ASR backend.
RESULT: test connection successfully completed.
RESULT: submitted test event for asset:10.139.154.17
RESULT: bundle com.sun.svc.asr.sw is in active state
RESULT: bundle com.sun.svc.asr.sw-frag is in resolved state
RESULT: bundle com.sun.svc.asr.sw-rulesdefinitions is in resolved state
RESULT: bundle com.sun.svc.ServiceActivation is in active state
SUCCESS: ASR diag successfully completed

Checking the Viability of a Patch

Use the oakcli validate ospatch -ver command to report any reasons for not being able to patch Oracle Database Appliance with the patch named in patch_version. Run this command before you attempt to patch Oracle Database Appliance to determine if it succeeds or if you must make changes before applying the patch.

# oakcli validate -c ospatch -ver 12.1.2.5.0
INFO: Validating the OS patch for the version 12.1.2.5.0
WARNING: 2015-10-10 06:30:32: Patching sub directory /opt/oracle/oak/pkgrepos/orapkgs/OEL/5.10/Patches/5.10.1 is not existing
INFO: 2015-10-10 06:30:32: May need to unpack the Infra patch bundle for the version: 12.1.2.5.0
ERROR: 2015-10-10 06:30:32: No OS patch directory found in the repository

Validating Hardware System and Network Components

The following command runs system checks to validate hardware system components and Oracle Database Appliance network components:

# oakcli validate -c SystemComponents,NetworkComponents

Oracle Database Appliance Configuration Error Messages

If you encounter errors while configuring Oracle Database Appliance, then review the following messages and actions:

Error Encountered in Step 11 Validation VIP appears to be up on the network

Cause: This message is most likely to occur when you attempt to redeploy the End-User Bundle without cleaning up a previous deployment. This error occurs because an existing VIP is configured for the addresses assigned to Oracle Database Appliance.

Action: Run cleanupDeploy.pl on Node 0, and then restart Oracle Appliance Manager.

Error "CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node oda2-1, number 1, and is terminating"

Cause: This error occurs when the Oracle Grid Infrastructure CSS daemon attempts to start the node as a standalone cluster node, but during startup discovers that the other cluster node is running, and changes to cluster mode to join the cluster.

Action: Ignore this error.

Installation requires partitioning of your hard drive

Cause: This message occurs on a node if one of the two operating system disks is not installed, but you are attempting to reimage the operating system.

Action: Ensure that both operating system disks are installed and are available.

Machine Check Exception ...This is not a software problem

Cause: There is a hardware system error.

Action: Log in to the Oracle ILOM Remote Console to determine the specific hardware error.

No volume control GStreamer plug-ins and/or devices found

Cause: Operating system plug-ins required for sound cards for the Oracle ILOM remote redirection console are not installed.

Action: Ignore this message. You do not require volume control for the console.

Reboot and select proper boot device or insert boot media in selected boot device and press a key

Cause: One or both operating system disks are not available. This message occurs if you select "Default hard disk" during reimaging the system, but that disk is not available.

Action: Ensure that both operating system disks are installed and are available.

The AoDB Linux installation tree in that directory does not seem to match your boot media

Cause: If you select "Default (use BIOS settings)" as your imaging option, but one or both of the disks is not available, this message occurs on a node if both operating disks are installed, and you choose to reimage the operating system disks.

Action: Ensure that both operating system disks are available for use.

ERROR: Gateway IP is not pingable

Cause: On Windows platforms, the Oracle Appliance Manager configurator uses the echo service on port 7 to contact the gateway. If the echo service is disabled, possibly for security reasons, the ping fails.

Action: Run the native platform ping command. If the ping is successful, then the configurator validation output can be ignored.

ACFS resources failed to start after applying 2.2 INFRA patch

Cause: Oracle Database Appliance operating system upgrade includes upgrade of Oracle Linux to Unbreakable Enterprise Kernel (UEK). Because Oracle Automatic Storage Management Cluster File System (Oracle ACFS) is not supported on all versions of Oracle Linux, a successful upgrade of the operating system may effectively disable Oracle ACFS.

Upgrade to Oracle Database Appliance 2.2 has three options: —infra, —gi, and —database. The —infra option includes upgrade from Oracle Linux to UEK. Before the —infra upgrade to 2.2, the operating system is Oracle Linux with 11.2.0.2.x Grid Infrastructure. After the —infra upgrade, the operating system is UEK and 11.2.0.2.x Oracle ACFS, which is not compatible with UEK.

For example, upgrade to Oracle Linux 2.6.32-300.11.1.el5uek causes reco.acfsvol.acfs and ora.registry.acfs to temporarily go to an OFFLINE state, because 2.6.32-300.11.1.el5uek does not support Oracle 11.2.0.2.x ACFS. However, when Oracle Grid Infrastructure is upgraded to 11.2.0.3.2, these components are online again.

Action: Upgrade to Oracle Database Appliance 2.2 with the —gi option. This version of the software includes Oracle Grid Infrastructure 11.2.0.3.2, which includes Oracle ACFS modules that work with UEK.

For more information, see My Oracle Support note 1369107.1:

https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=1369107.1

Preparing Log Files for Oracle Support Services

If necessary, use the command oakcli manage diagcollect to collect diagnostic files to send to Oracle Support Services.

If you have a system fault that requires help from Oracle Support Services, then you may need to provide log records to help Oracle support diagnose your issue.

Collect log file information by running the commandoakcli manage diagcollect. This command consolidates information from log files stored on Oracle Database Appliance into a single log file for use by Oracle Support Services. The location of the file is specified in the command output.

Additional Troubleshooting Tools and Commands

This section describes additional tools and commands for diagnosing and troubleshooting problems with Oracle Database Appliance.

Although some of these tools are specific to Oracle Database Appliance, others are tools for all clustered systems.

Topics:

Oracle Appliance Manager Tools for Configuration Auditing and Disk Diagnosis

Oracle Appliance Manager provides access to a number of sophisticated monitoring and reporting tools, some of them derived from standalone tools that require their own syntax and command sets.

The following list briefly describes the ORAchk command, and the disk diagnostic tool:

  • ORAchk

    The ORAchk Configuration Audit Tool audits important configuration settings for Oracle RAC two-node deployments in the following categories:

    • Operating system kernel parameters and packages

    • RDBMS

    • Database parameters, and other database configuration settings

    • Oracle Grid Infrastructure, which includes Oracle Clusterware and Oracle Automatic Storage Management

    ORAchk is aware of the entire system. It checks the configuration to indicate if best practices are being followed. For example, ORAchk reviews the system and identifies best practice issues that are specific to Oracle Database Appliance when ORAchk is run by Oracle Appliance Manager. To explore ORAchk on Oracle Database Appliance, use the following command:

    oakcli orachk -h

    Also review My Oracle Support note 1268927.2, which is available from My Oracle Support.

  • Disk Diagnostic Tool

    Use the Disk Diagnostic Tool to help identify the cause of disk problems. The tool produces a list of 14 disk checks for each node. To run the tool, enter the following command:

    # oakcli stordiag resource_type
    

Trace File Analyzer Collector

Trace File Analyzer (TFA) Collector simplifies diagnostic data collection on Oracle Grid Infrastructure and Oracle Real Application Clusters systems.

TFA behaves in a similar manner to the ion utility packaged with Oracle Clusterware. Both tools collect and package diagnostic data. However, TFA is much more powerful than ion, because TFA centralizes and automates the collection of diagnostic information.

TFA provides the following key benefits and options:

  • Encapsulation of diagnostic data collection for all Oracle Grid Infrastructure and Oracle RAC components on all cluster nodes into a single command, which you run from a single node

  • Option to "trim" diagnostic files during data collection to reduce data upload size

  • Options to isolate diagnostic data collection to a given time period, and to a particular product component, such as Oracle ASM, RDBMS, or Oracle Clusterware

  • Centralization of collected diagnostic output to a single node in Oracle Database Appliance, if desired

  • On-Demand Scans of all log and trace files for conditions indicating a problem

  • Real-Time Scan Alert Logs for conditions indicating a problem (for example, Database Alert Logs, Oracle ASM Alert Logs, and Oracle Clusterware Alert Logs)

Refer to My Oracle Support note 1513912.1 "TFA Collector - Tool for Enhanced Diagnostic Gathering" for more information. https://support.oracle.com/CSP/main/article?cmd=show&amp;amp;type=NOT&amp;amp;id=1513912.1

Oracle Database Appliance Hardware Monitoring Tool

The Oracle Database Appliance Hardware Monitoring Tool displays the status of different hardware components in Oracle Database Appliance server nodes.

The tool is implemented with the Trace File Analyzer collector. Use the tool both on bare-metal and on virtualized systems.

You can see the list of monitored components by running the command oakcli show -h

To see information about specific components, use the command syntax oakcli show component, where component is the hardware component that you want to query. For example, the command oakcli show power shows information specifically about the Oracle Database Appliance power supply:

oakcli show power

NAME            HEALTH HEALTH DETAILS PART_NO. SERIAL_NO.          LOCATION 
INPUT POWER OUTPUT POWER INLET TEMP      EXHAUST TEMP

Power Supply_0  OK     -              7047410   476856F+1242CE0020 PS0    
Present     88 watts     31.250 degree C 34.188 degree C
Power Supply_1  OK     -              7047410   476856F+1242CE004J PS1     
Present     66 watts     31.250 degree C 34.188 degree C

Note:

Oracle Database Appliance Server Hardware Monitoring Tool is enabled during initial startup of ODA_BASE on Oracle Database Appliance Virtualized Platform. When it starts, the tool collects base statistics for about 5 minutes. During this time, the tool displays the message "Gathering Statistics…" message.

The Oracle Database Appliance Hardware Monitoring Tool reports information only for the node on which you run the command. The information it displays in the output depend on the component that you select to review.