This chapter contains information about how to validate changes and troubleshoot Oracle Database Appliance problems.
oakcli validate to check your Oracle Database Appliance configuration, and if necessary, to provide information to Oracle Support Services.
oakcli validate command is the Oracle Appliance Manager diagnostic and validation utility to identify and resolve support issues. If you experience problems with Oracle Database Appliance, then use the
oakcli validate command options to verify that your environment is properly configured, and that best practices are in effect. When placing a service request, also use Oracle Appliance Manager as described in this chapter to prepare the log files to send to Oracle Support Services.
oakcli validate command and options to validate the status of Oracle Database Appliance.
You must run the
oakcli validate command as the
oakcli validate uses the following syntax, where checklist is a single check or a comma-delimited list of checks, and
output_file is the name that you designate for a validation output file:
oakcli validate -h oakcli validate [-V | -l | -h] oakcli validate [-v] [-f output_file] [-a | -d | -c checklist] [-v patch_version]
Run all system checks, including
Run the validation checks for the items identified in
Run only the default checks. The default checks are
Send output to a file with a fully qualified file name,
Display the online help.
List the items that can be checked (and their descriptions).
Show verbose output (must be used with a parameter that generates a validation report).
Display the version of oakValidation.
Report any reasons for not being able to patch Oracle Database Appliance with the patch named in
Validate Oracle Auto Service Request (Oracle ASR) components based on the Oracle ASR configuration file and Oracle Integrated Lights Out Manager (Oracle ILOM) sensor data.
Preinstallation check for the storage disk performance using
Do not run this check after you have deployed Oracle software on Oracle Database Appliance, because running the
Use the default check option (
Validate public and private network hardware connections.
Validate the operating system disks, and file system information.
Validate that the system can complete an upgrade successfully using the named patch.
Validate shared storage and multipathing information.
Validate the storage shelf connectivity.
Validate system components, based on Oracle ILOM sensor data readings.
Review these examples to see you can perform validation checks using the
oakcli validate command and options.
Listing All Checks and Their Descriptions
oakcli validate -l Checkname -- Description ========= =========== *SystemComponents -- Validate system components based on ilom sensor data readings *OSDiskStorage -- Validate OS disks and filesystem information *SharedStorage -- Validate Shared storage and multipathing information DiskCalibration -- Check disk performance with orion *NetworkComponents -- Validate public and private network components *StorageTopology -- Validate external JBOD connectivity asr -- Validate asr components based on asr config file and ilom sensor data readings * -- These checks are also performed as part of default checks
NetworkComponents validation check is not available on hardware prior to Oracle Database Appliance X3-2.
Running All Checks
Enter the following command to run all checks:
oakcli validate -a
Validating Storage Cable Connections
Check the cable connections between the system controllers and the storage shelf, as well as the cable connection to the storage expansion shelf (if one is installed):
oakcli validate -c storagetopology
Oracle recommends that you run the
oakcli validate -c StorageTopology command before deploying the system. This will avoid and prevent problems during deployment due to wrong or missing cable connections. The output shown in the following example reports a successful configuration. If the cabling is not correct, you will see errors in your output.
# oakcli validate -c storagetopology It may take a while. Please wait... INFO : ODA Topology Verification INFO : Running on Node0 INFO : Check hardware type SUCCESS : Type of hardware found : X4-2 INFO : Check for Environment(Bare Metal or Virtual Machine) SUCCESS : Type of environment found : Virtual Machine(ODA BASE) SUCCESS : Number of External LSI SAS controller found : 2 INFO : Check for Controllers correct PCIe slot address SUCCESS : External LSI SAS controller 0 : 00:15.0 SUCCESS : External LSI SAS controller 1 : 00:16.0 INFO : Check if powered on SUCCESS : 1 : Powered-on INFO : Check for correct number of EBODS(2 or 4) SUCCESS : EBOD found : 2 INFO : Check for External Controller 0 SUCCESS : Controller connected to correct ebod number SUCCESS : Controller port connected to correct ebod port SUCCESS : Overall Cable check for controller 0 INFO : Check for External Controller 1 SUCCESS : Controller connected to correct ebod number SUCCESS : Controller port connected to correct ebod port SUCCESS : Overall Cable check for controller 1 INFO : Check for overall status of cable validation on Node0 SUCCESS : Overall Cable Validation on Node0 INFO : Check Node Identification status SUCCESS : Node Identification SUCCESS : Node name based on cable configuration found : NODE0 INFO : Check Nickname SUCCESS : Nickname set correctly : Oracle Database Appliance - E0 INFO : The details for Storage Topology Validation can also be found in log file=/opt/oracle/oak/log/<hostname>/storagetopology/StorageTopology-2014-07-03-08:57:31_7661_15914.log
Validating Oracle ASR
Enter the following syntax to validate your Oracle ASR configuration:
# oakcli validate -c asr INFO: oak Asr information and Validations RESULT: /opt/oracle/oak/conf/asr.conf exist RESULT: ASR Manager ip:10.139.154.17 RESULT: ASR Manager port:1162 SUCCESS: ASR configuration file validation successfully completed RESULT: /etc/hosts has entry 22.214.171.124 transport.oracle.com RESULT: ilom alertmgmt level is set to minor RESULT: ilom alertmgmt type is set to snmptrap RESULT: alertmgmt snmp_version is set to 2c RESULT: alertmgmt community_or_username is set to public RESULT: alertmgmt destination is set to 10.139.154.17 RESULT: alertmgmt destination_port is set to 1162 SUCCESS: Ilom snmp confguration for asr set correctly RESULT: notification trap configured to ip:10.139.154.17 RESULT: notification trap configured to port:1162 SUCCESS: Asr notification trap set correctly INFO: IP_ADDRESS HOST_NAME SERIAL_NUMBER ASR PROTOCOL SOURCE PRODUCT_NAME INFO: --------- ---------- ------------- --- -------- ------ ------------ 10.170.79.98 oda-02-c 1130FMW00D Enabled SNMP ILOM SUN FIRE X4370 M2 SERVER 10.170.79.97 oda-01-c 1130FMW00D Enabled SNMP ILOM SUN FIRE X4370 M2 SERVER INFO: Please use My Oracle Support 'http://support.oracle.com' to view the activation status. SUCCESS: asr log level is already set to Fine. RESULT: Registered with ASR backend. RESULT: test connection successfully completed. RESULT: submitted test event for asset:10.139.154.17 RESULT: bundle com.sun.svc.asr.sw is in active state RESULT: bundle com.sun.svc.asr.sw-frag is in resolved state RESULT: bundle com.sun.svc.asr.sw-rulesdefinitions is in resolved state RESULT: bundle com.sun.svc.ServiceActivation is in active state SUCCESS: ASR diag successfully completed
Checking the Viability of a Patch
oakcli validate ospatch -ver command to report any reasons for not being able to patch Oracle Database Appliance with the patch named in
patch_version. Run this command before you attempt to patch Oracle Database Appliance to determine if it succeeds or if you must make changes before applying the patch.
# oakcli validate -c ospatch -ver 126.96.36.199.0 INFO: Validating the OS patch for the version 188.8.131.52.0 WARNING: 2015-10-10 06:30:32: Patching sub directory /opt/oracle/oak/pkgrepos/orapkgs/OEL/5.10/Patches/5.10.1 is not existing INFO: 2015-10-10 06:30:32: May need to unpack the Infra patch bundle for the version: 184.108.40.206.0 ERROR: 2015-10-10 06:30:32: No OS patch directory found in the repository
Validating Hardware System and Network Components
The following command runs system checks to validate hardware system components and Oracle Database Appliance network components:
# oakcli validate -c SystemComponents,NetworkComponents
The host name at the log in prompt should be
oak1 for Node 0 and
oak2 for Node 1.
When a default manufacturing host name, such as
mtnk4t1-d05-01-host, appears in the log in prompt instead of
oak2, there is an issue that can cause problems when you configure Oracle Database Appliance.
The most likely cause of unexpected host names is that the storage cabling is incorrect or cables are not properly seated in the ports.
Verify the Storage Cabling for Oracle Database Appliance
Review the cabling instructions for your Oracle Database Appliance model and verify that the color-coded cables are in the correct ports. Also, verify that connections are secure. The SAS cables must be locked in place. Gently pull on each cable to confirm that it is properly seated. If you can pull the cable out, push it into the port until it clicks into place.
Verify the Storage Cabling for Models Earlier than X5-2
How you connect the storage tray to the appliance determines which Server is Node 0 and which is Node 1. This is important as all installation and configuring of software going forward is done from Node 0. In most cases the Server on the bottom is Node 0.
Ensure that the Node Files Exist and are Accurate
If the cabling appears to be correct and you still have unexpected host names, then confirm that the
/opt/oracle/oak/conf/node_num.conf file exists for each node. Ensure that the
NODENUM parameter is properly defined for each node. Set the parameter to
NODENUM=0 for Node 0 and
NODENUM=1 for Node 1. Create or edit the files, as needed.
If you encounter errors while configuring Oracle Database Appliance, then review the following messages and actions:
Cause: This message is most likely to occur when you attempt to redeploy the End-User Bundle without cleaning up a previous deployment. This error occurs because an existing VIP is configured for the addresses assigned to Oracle Database Appliance.
Action: Run cleanupDeploy.pl on Node 0, and then restart Oracle Appliance Manager.
Cause: This error occurs when the Oracle Grid Infrastructure CSS daemon attempts to start the node as a standalone cluster node, but during startup discovers that the other cluster node is running, and changes to cluster mode to join the cluster.
Action: Ignore this error.
Cause: This message occurs on a node if one of the two operating system disks is not installed, but you are attempting to reimage the operating system.
Action: Ensure that both operating system disks are installed and are available.
Cause: There is a hardware system error.
Action: Log in to the Oracle ILOM Remote Console to determine the specific hardware error.
Cause: Operating system plug-ins required for sound cards for the Oracle ILOM remote redirection console are not installed.
Action: Ignore this message. You do not require volume control for the console.
Cause: One or both operating system disks are not available. This message occurs if you select "Default hard disk" during reimaging the system, but that disk is not available.
Action: Ensure that both operating system disks are installed and are available.
Cause: If you select "Default (use BIOS settings)" as your imaging option, but one or both of the disks is not available, this message occurs on a node if both operating disks are installed, and you choose to reimage the operating system disks.
Action: Ensure that both operating system disks are available for use.
Cause: On Windows platforms, the Oracle Appliance Manager configurator uses the echo service on port 7 to contact the gateway. If the echo service is disabled, possibly for security reasons, the ping fails.
Action: Run the native platform ping command. If the ping is successful, then the configurator validation output can be ignored.
Cause: Oracle Database Appliance operating system upgrade includes upgrade of Oracle Linux to Unbreakable Enterprise Kernel (UEK). Because Oracle Automatic Storage Management Cluster File System (Oracle ACFS) is not supported on all versions of Oracle Linux, a successful upgrade of the operating system may effectively disable Oracle ACFS.
Upgrade to Oracle Database Appliance 2.2 has three options:
—infra option includes upgrade from Oracle Linux to UEK. Before the
—infra upgrade to 2.2, the operating system is Oracle Linux with 220.127.116.11.x Grid Infrastructure. After the
—infra upgrade, the operating system is UEK and 18.104.22.168.x Oracle ACFS, which is not compatible with UEK.
For example, upgrade to Oracle Linux 2.6.32-300.11.1.el5uek causes
ora.registry.acfs to temporarily go to an OFFLINE state, because 2.6.32-300.11.1.el5uek does not support Oracle 22.214.171.124.x ACFS. However, when Oracle Grid Infrastructure is upgraded to 126.96.36.199.2, these components are online again.
Action: Upgrade to Oracle Database Appliance 2.2 with the
—gi option. This version of the software includes Oracle Grid Infrastructure 188.8.131.52.2, which includes Oracle ACFS modules that work with UEK.
For more information, see My Oracle Support note 1369107.1:
If necessary, use the command
oakcli manage diagcollect to collect diagnostic files to send to Oracle Support Services.
If you have a system fault that requires help from Oracle Support Services, then you may need to provide log records to help Oracle support diagnose your issue.
Collect log file information by running the command
oakcli manage diagcollect. This command consolidates information from log files stored on Oracle Database Appliance into a single log file for use by Oracle Support Services. The location of the file is specified in the command output.
This section describes additional tools and commands for diagnosing and troubleshooting problems with Oracle Database Appliance.
Although some of these tools are specific to Oracle Database Appliance, others are tools for all clustered systems.
Oracle Appliance Manager provides access to a number of sophisticated monitoring and reporting tools, some of them derived from standalone tools that require their own syntax and command sets.
The following list briefly describes the ORAchk command, and the disk diagnostic tool:
The ORAchk Configuration Audit Tool audits important configuration settings for Oracle RAC two-node deployments in the following categories:
Operating system kernel parameters and packages
Database parameters, and other database configuration settings
Oracle Grid Infrastructure, which includes Oracle Clusterware and Oracle Automatic Storage Management
ORAchk is aware of the entire system. It checks the configuration to indicate if best practices are being followed. For example, ORAchk reviews the system and identifies best practice issues that are specific to Oracle Database Appliance when ORAchk is run by Oracle Appliance Manager. To explore ORAchk on Oracle Database Appliance, use the following command:
oakcli orachk -h
Also review My Oracle Support note 1268927.2, which is available from My Oracle Support.
Disk Diagnostic Tool
Use the Disk Diagnostic Tool to help identify the cause of disk problems. The tool produces a list of 14 disk checks for each node. To run the tool, enter the following command:
# oakcli stordiag resource_type
Trace File Analyzer (TFA) Collector simplifies diagnostic data collection on Oracle Grid Infrastructure and Oracle Real Application Clusters systems.
TFA behaves in a similar manner to the ion utility packaged with Oracle Clusterware. Both tools collect and package diagnostic data. However, TFA is much more powerful than ion, because TFA centralizes and automates the collection of diagnostic information.
TFA provides the following key benefits and options:
Encapsulation of diagnostic data collection for all Oracle Grid Infrastructure and Oracle RAC components on all cluster nodes into a single command, which you run from a single node
Option to "trim" diagnostic files during data collection to reduce data upload size
Options to isolate diagnostic data collection to a given time period, and to a particular product component, such as Oracle ASM, RDBMS, or Oracle Clusterware
Centralization of collected diagnostic output to a single node in Oracle Database Appliance, if desired
On-Demand Scans of all log and trace files for conditions indicating a problem
Real-Time Scan Alert Logs for conditions indicating a problem (for example, Database Alert Logs, Oracle ASM Alert Logs, and Oracle Clusterware Alert Logs)
Refer to My Oracle Support note 1513912.1 "TFA Collector - Tool for Enhanced Diagnostic Gathering" for more information. https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=1513912.1
The Oracle Database Appliance Hardware Monitoring Tool displays the status of different hardware components in Oracle Database Appliance server nodes.
The tool is implemented with the Trace File Analyzer collector. Use the tool both on bare-metal and on virtualized systems.
You can see the list of monitored components by running the command
oakcli show -h
To see information about specific components, use the command syntax
oakcli show component, where
component is the hardware component that you want to query. For example, the command
oakcli show power shows information specifically about the Oracle Database Appliance power supply:
oakcli show power NAME HEALTH HEALTH DETAILS PART_NO. SERIAL_NO. LOCATION INPUT POWER OUTPUT POWER INLET TEMP EXHAUST TEMP Power Supply_0 OK - 7047410 476856F+1242CE0020 PS0 Present 88 watts 31.250 degree C 34.188 degree C Power Supply_1 OK - 7047410 476856F+1242CE004J PS1 Present 66 watts 31.250 degree C 34.188 degree C
Oracle Database Appliance Server Hardware Monitoring Tool is enabled during initial startup of ODA_BASE on Oracle Database Appliance Virtualized Platform. When it starts, the tool collects base statistics for about 5 minutes. During this time, the tool displays the message "Gathering Statistics…" message.
The Oracle Database Appliance Hardware Monitoring Tool reports information only for the node on which you run the command. The information it displays in the output depend on the component that you select to review.