9 Validating and Troubleshooting Oracle Database Appliance

This chapter contains information about how to validate changes and troubleshoot Oracle Database Appliance problems.

Topics:

Validate the Host Name

The host name at the log in prompt should be oak1 for Node 0 and oak2 for Node 1.

When a default manufacturing host name, such as mtnk4t1-d05-01-host, appears in the log in prompt instead of oak1 oroak2, there is an issue that can cause problems when you configure Oracle Database Appliance.

The most likely cause of unexpected host names is that the storage cabling is incorrect or cables are not properly seated in the ports.

Verify the Storage Cabling for Oracle Database Appliance

Review the cabling instructions for your Oracle Database Appliance model and verify that the color-coded cables are in the correct ports. Also, verify that connections are secure. The SAS cables must be locked in place. Gently pull on each cable to confirm that it is properly seated. If you can pull the cable out, push it into the port until it clicks into place.

Verify the Storage Cabling for Models Earlier than X5-2

How you connect the storage tray to the appliance determines which Server is Node 0 and which is Node 1. This is important as all installation and configuring of software going forward is done from Node 0. In most cases the Server on the bottom is Node 0.

Ensure that the Node Files Exist and are Accurate

If the cabling appears to be correct and you still have unexpected host names, then confirm that the /opt/oracle/oak/conf/node_num.conf file exists for each node. Ensure that the NODENUM parameter is properly defined for each node. Set the parameter to NODENUM=0 for Node 0 and NODENUM=1 for Node 1. Create or edit the files, as needed.

Oracle Database Appliance Configuration Error Messages

If you encounter errors while configuring Oracle Database Appliance, then review the following messages and actions:

Error Encountered in Step 11 Validation VIP appears to be up on the network

Cause: This message is most likely to occur when you attempt to redeploy the End-User Bundle without cleaning up a previous deployment. This error occurs because an existing VIP is configured for the addresses assigned to Oracle Database Appliance.

Action: Run cleanupDeploy.pl on Node 0, and then restart Oracle Appliance Manager.

Error "CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node oda2-1, number 1, and is terminating"

Cause: This error occurs when the Oracle Grid Infrastructure CSS daemon attempts to start the node as a standalone cluster node, but during startup discovers that the other cluster node is running, and changes to cluster mode to join the cluster.

Action: Ignore this error.

Installation requires partitioning of your hard drive

Cause: This message occurs on a node if one of the two operating system disks is not installed, but you are attempting to reimage the operating system.

Action: Ensure that both operating system disks are installed and are available.

Machine Check Exception ...This is not a software problem

Cause: There is a hardware system error.

Action: Log in to the Oracle ILOM Remote Console to determine the specific hardware error.

No volume control GStreamer plug-ins and/or devices found

Cause: Operating system plug-ins required for sound cards for the Oracle ILOM remote redirection console are not installed.

Action: Ignore this message. You do not require volume control for the console.

Reboot and select proper boot device or insert boot media in selected boot device and press a key

Cause: One or both operating system disks are not available. This message occurs if you select "Default hard disk" during reimaging the system, but that disk is not available.

Action: Ensure that both operating system disks are installed and are available.

The AoDB Linux installation tree in that directory does not seem to match your boot media

Cause: If you select "Default (use BIOS settings)" as your imaging option, but one or both of the disks is not available, this message occurs on a node if both operating disks are installed, and you choose to reimage the operating system disks.

Action: Ensure that both operating system disks are available for use.

ERROR: Gateway IP is not pingable

Cause: On Windows platforms, the Oracle Appliance Manager configurator uses the echo service on port 7 to contact the gateway. If the echo service is disabled, possibly for security reasons, the ping fails.

Action: Run the native platform ping command. If the ping is successful, then the configurator validation output can be ignored.

ACFS resources failed to start after applying 2.2 INFRA patch

Cause: Oracle Database Appliance operating system upgrade includes upgrade of Oracle Linux to Unbreakable Enterprise Kernel (UEK). Because Oracle Automatic Storage Management Cluster File System (Oracle ACFS) is not supported on all versions of Oracle Linux, a successful upgrade of the operating system may effectively disable Oracle ACFS.

Upgrade to Oracle Database Appliance 2.2 has three options: —infra, —gi, and —database. The —infra option includes upgrade from Oracle Linux to UEK. Before the —infra upgrade to 2.2, the operating system is Oracle Linux with 11.2.0.2.x Grid Infrastructure. After the —infra upgrade, the operating system is UEK and 11.2.0.2.x Oracle ACFS, which is not compatible with UEK.

For example, upgrade to Oracle Linux 2.6.32-300.11.1.el5uek causes reco.acfsvol.acfs and ora.registry.acfs to temporarily go to an OFFLINE state, because 2.6.32-300.11.1.el5uek does not support Oracle 11.2.0.2.x ACFS. However, when Oracle Grid Infrastructure is upgraded to 11.2.0.3.2, these components are online again.

Action: Upgrade to Oracle Database Appliance 2.2 with the —gi option. This version of the software includes Oracle Grid Infrastructure 11.2.0.3.2, which includes Oracle ACFS modules that work with UEK.

For more information, see My Oracle Support note 1369107.1:

https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=1369107.1

Preparing Log Files for Oracle Support Services

If necessary, use the command odaadmcli manage diagcollect to collect diagnostic files to send to Oracle Support Services.

If you have a system fault that requires help from Oracle Support Services, then you may need to provide log records to help Oracle support diagnose your issue.

Collect log file information by running the commandodaadmcli manage diagcollect. This command consolidates information from log files stored on Oracle Database Appliance into a single log file for use by Oracle Support Services. The location of the file is specified in the command output.

Additional Troubleshooting Tools and Commands

This section describes additional tools and commands for diagnosing and troubleshooting problems with Oracle Database Appliance.

Although some of these tools are specific to Oracle Database Appliance, others are tools for all clustered systems.

Topics:

ORAchk Health Check Tool

Use the ORAchk Health Check Tool to audit configuration settings and check system health.

The ORAchk utility performs proactive heath checks for the Oracle software stack and scans for known problems.

The ORAchk Configuration Audit Tool audits important configuration settings for Oracle RAC two-node deployments in the following categories:

  • Operating system kernel parameters and packages

  • RDBMS

  • Database parameters, and other database configuration settings

  • Oracle Grid Infrastructure, which includes Oracle Clusterware and Oracle Automatic Storage Management

ORAchk is aware of the entire system. It checks the configuration to indicate if best practices are being followed.

See Also:

For more information about ORAchk, see My Oracle Support note 1268927.2, "ORAchk Health Checks for the Oracle Stack" at https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=1268927.2
  1. Open the command-line interface as root.
  2. Navigate to the ORAchk tool in the /suptools directory.
    /u01/app/12.2.0.1/grid/suptools/orachk
    
  3. Run the utility.
    ./orachk
    
    When all checks are finished, a detailed report is available. The output displays the location of the report in an HTML format and the location of a zip file if you want to upload the report.
  4. Review the Oracle Database Appliance Assessment Report and system health and troubleshoot any issues that are identified.
    The report includes a summary and filters that enable you to focus on specific areas. For example, you can choose the filter to show failed checks only, show checks with a Fail, Warning, Info, or Pass status, or any combination.

Trace File Analyzer Collector

Trace File Analyzer (TFA) Collector simplifies diagnostic data collection on Oracle Grid Infrastructure and Oracle Real Application Clusters systems.

TFA behaves in a similar manner to the ion utility packaged with Oracle Clusterware. Both tools collect and package diagnostic data. However, TFA is much more powerful than ion, because TFA centralizes and automates the collection of diagnostic information.

TFA provides the following key benefits and options:

  • Encapsulation of diagnostic data collection for all Oracle Grid Infrastructure and Oracle RAC components on all cluster nodes into a single command, which you run from a single node

  • Option to "trim" diagnostic files during data collection to reduce data upload size

  • Options to isolate diagnostic data collection to a given time period, and to a particular product component, such as Oracle ASM, RDBMS, or Oracle Clusterware

  • Centralization of collected diagnostic output to a single node in Oracle Database Appliance, if desired

  • On-Demand Scans of all log and trace files for conditions indicating a problem

  • Real-Time Scan Alert Logs for conditions indicating a problem (for example, Database Alert Logs, Oracle ASM Alert Logs, and Oracle Clusterware Alert Logs)

See Also:

Refer to My Oracle Support note 1513912.1 "TFA Collector - Tool for Enhanced Diagnostic Gathering" for more information. https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=1513912.1

Oracle Database Appliance Hardware Monitoring Tool

The Oracle Database Appliance Hardware Monitoring Tool displays the status of different hardware components in Oracle Database Appliance server nodes.

The tool is implemented with the Trace File Analyzer collector. Use the tool both on bare-metal and on virtualized systems.

You can see the list of monitored components by running the command odaadmcli show -h

To see information about specific components, use the command syntax odaadmcli show component, where component is the hardware component that you want to query. For example, the command odaadmcli show power shows information specifically about the Oracle Database Appliance power supply:

# odaadmcli show power
NAME            HEALTH  HEALTH_DETAILS   PART_NO.      SERIAL_NO.
Power_Supply_0  OK            -          7079395     476856Z+1514CE056G
(Continued)
LOCATION    INPUT_POWER   OUTPUT_POWER   INLET_TEMP         EXHAUST_TEMP
PS0         Present       112 watts      28.000 degree C    34.938 degree C

Note:

Oracle Database Appliance Server Hardware Monitoring Tool is enabled during initial startup of ODA_BASE on Oracle Database Appliance Virtualized Platform. When it starts, the tool collects base statistics for about 5 minutes. During this time, the tool displays the message "Gathering Statistics…" message.

The Oracle Database Appliance Hardware Monitoring Tool reports information only for the node on which you run the command. The information it displays in the output depend on the component that you select to review.