5 Maintaining Oracle Exadata System Software
This section explains how to maintain Oracle Exadata System Software.
Caution:
All operations in this chapter must be performed with extreme caution and only after you have ensured you have complete backups of the data. If not, then you may experience irrecoverable data loss.- Recommendations for Changing the Exadata Storage Server Network Address
Review the following advice before changing the fundamental configuration of a storage server, such as changing the IP address, host name, or RDMA Network Fabric address. - Using the ipconf Utility
Theipconf
utility is used to set and change the following parameters on Oracle Exadata servers. - Oracle Exadata System Software Validation Tests and Utilities
You can use a variety of commands and utilities to validate the Oracle Exadata System Software and hardware configurations. - Locating Serial Numbers for System Components
You may need to provide the serial numbers for the system components when contacting Oracle Support Services. - Diagnostic and Repair Utilities
Oracle Exadata System Software includes utilities for diagnostics and repair of Oracle Exadata Storage Server. - System Diagnostics Data Gathering with sosreports and Oracle ExaWatcher
You can use the sosreport utility and Oracle ExaWatcher to diagnose problems with your system. - Host Console Support
The storage servers and database servers of Oracle Exadata are configured to provide host console access. - Oracle Linux Kernel Crash Core Files
The storage servers and database servers of Oracle Exadata are configured to generate Oracle Linux kernel crash core files in the/var/crash
directory, when the Oracle Linux operating system malfunctions or crashes. - Monitoring syslog Messages Remotely
By default, storage server syslog messages are written to local log files.
5.1 Recommendations for Changing the Exadata Storage Server Network Address
Review the following advice before changing the fundamental configuration of a storage server, such as changing the IP address, host name, or RDMA Network Fabric address.
-
Before changing the storage server configuration, ensure that all Oracle Automatic Storage Management (Oracle ASM), Oracle Real Application Clusters (Oracle RAC) and database instances that use the storage servers do not access the storage server while you are changing the IP address.
-
After changing the storage server configuration, ensure that consumers of storage server services are correctly reconfigured to use the new connect information of the storage server. If Oracle Auto Service Request (ASR) is being used, then deactivate the asset from Oracle ASR Manager, and activate the asset with the new IP address.
-
When changing a storage server configuration, change only one storage server at a time to ensure that Oracle ASM and Oracle RAC work properly during the changes.
Related Topics
Parent topic: Maintaining Oracle Exadata System Software
5.2 Using the ipconf Utility
The ipconf
utility is used to set and change the following parameters on Oracle Exadata servers.
During initial configuration of Oracle Exadata Database Machine, the utility also configures the database servers.
- IP address
- Host name
- NTP server
- Time zone
- DNS name servers
- RDMA Network Fabric addresses
The ipconf
utility makes a back up copy of the files it modifies. When the utility is rerun, it overwrites the existing backup file. The log file maintains the complete history of every ipconf
operation performed.
Table 5-1 ipconf Options
Option | Description |
---|---|
no option |
Utility starts in main editing mode. |
|
Determine if pkey is configured on current host by looking into |
|
Starts utility in main editing mode when there is a mismatch between the stored cell configuration and the running configuration. |
|
Prints basic ILOM settings. |
|
Sets basic ILOM settings. |
-pkey-add $PKEY.CONF -pkey-apply [-force] |
Apply the pkey configuration to an unconfigured system or add a new pkey configuration to the current pkey configured host (without changing the current pkey settings). If the interface specified in the pkey file is already, the command exits with an error. If you include the |
|
To delete some pkeys on the current pkey configured host, where |
|
Get the InfiniBand partitioning pkey configuration for the current system. |
|
Get the run-time pkey configuration, and match it with the file specified with the |
|
Verifies the |
|
Checks only the DNS and NTP configuration. |
|
Checks for access to at least one NTP and one DNS server. |
|
Update DNS and NTP servers for Linux and ILOM. For the DNS and NTP servers, specify a list of IP addresses separated by commas. A maximum of three DNS servers are allowed with If the timestamp obtained from the new NTP server differs from the current time known to the system by more than 1 second (time step), then the command errors out and does not update the NTP settings. The The |
|
The |
|
Verifies the consistency between the stored cell configuration and the running configuration. Success returns zero errors. You can use the following options with |
The following example shows the display for the ipconf
utility when setting the Sun ILOM interface.
Example 5-1 Using the ipconf Utility to Set the Sun ILOM Interface
# ipconf
Logging started to /var/log/cellos/ipconf.log
Interface ib0 is Linked. hca: mxx4_0
Interface ib1 is Linked. mxx4_0
Interface eth0 is Linked. driver/mac: igb/00:00:00:01:cd:01
Interface eth1 is ... Unlinked. driver/mac: igb/00:00:00:01:cd:02
Interface eth2 is ... Unlinked. driver/mac: igb/00:00:00:01:cd:03
Interface eth3 is ... Unlinked. driver/mac: igb/00:00:00:01:cd:04
Network interfaces
Name State IP address Netmask Gateway Hostname
ib0 Linked
ib1 Linked
eth0 Linked
eth1 Unlinked
eth2 Unlinked
eth3 Unlinked
Warning. Some network interface(s) are disconnected. Check cables and switches
and retry
Do you want to retry (y/n) [y]: n
The current nameserver(s): 192.0.2.10 192.0.2.12 192.0.2.13
Do you want to change it (y/n) [n]:
The current timezone: America/Los_Angeles
Do you want to change it (y/n) [n]:
The current NTP server(s): 192.0.2.06 192.0.2.12 1192.0.2.13
Do you want to change it (y/n) [n]:
Network interfaces
Name State IP address Netmask Gateway Hostname
eth0 Linked 192.0.2.151 255.255.252.0 192.0.2.15 Managment myg.example.com
eth1 Unlinked
eth2 Unlinked
eth3 Unlinked
bond0 ib0,ib1 192.168.13.101 255.255.252.0 Private myg.example.com
Select interface name to configure or press Enter to continue:
Select canonical hostname from the list below
1: myg.example.com
2: myg-private.example.com
Canonical fully qualified domain name [1]:
Select default gateway interface from the list below
1: eth01
Default gateway interface [1]:
Canonical hostname: myg.example.com
Nameservers: 192.0.2.10 192.0.2.12 192.0.2.13
Timezone: America/Los_Angeles
NTP servers: 192.0.2.06 192.0.2.12 192.0.2.13
Network interfaces
Name State IP address Netmask Gateway Hostname
eth0 Linked 192.0.2.151 255.255.252.0 192.0.2.15 myg.example.com
eth1 Unlinked
eth2 Unlinked
eth3 Unlinked
bond0 ib0,ib1 192.168.13.101 255.255.252.0 Private myg-priv.example.com
Is this correct (y/n) [y]:
Do you want to configure basic ILOM settings (y/n) [y]: y
Loading configuration settings from ILOM ...
ILOM Fully qualified hostname [myg_ilom.example.com]:
ILOM IP discovery (static/dhcp) [static]:
ILOM IP address [192.0.2.201]:
ILOM Netmask [255.255.252.0]:
ILOM Gateway or none [192.0.2.15]:
ILOM Nameserver or none: [192.0.2.10]:
ILOM Use NTP Servers (enabled/disabled) [enabled]:
ILOM First NTP server. Fully qualified hostname or ip address or none [192.0.2.06]:
ILOM Second NTP server. Fully qualified hostname or ip address or none [none]:
Basic ILOM configuration settings:
Hostname : myg.example.com
IP Discovery : static
IP Address : 192.0.2.10
Netmask : 255.255.252.0
Gateway : 192.0.2.15
DNS servers : 192.0.2.10
Use NTP servers : enabled
First NTP server : 192.0.2.06
Second NTP server : none
Timezone (read-only) : America/Los Angeles
Is this correct (y/n) [y]:
5.3 Oracle Exadata System Software Validation Tests and Utilities
You can use a variety of commands and utilities to validate the Oracle Exadata System Software and hardware configurations.
- Summary of Software and Firmware Components on Oracle Exadata Storage Servers
Theimageinfo
command located in the/usr/local/bin/
directory provides a summary of release and status of the software and firmware components on Oracle Exadata Storage Servers. - Oracle Exadata Storage Server Image History
Theimagehistory
command lists the version history for Oracle Exadata Storage Server. - Validation of the State and Health of the System
The validation framework runs different tests under certain conditions, such as on first boot after recovery of an Oracle Exadata Storage Server using the rescue and recovery functionality of the CELLBOOT USB flash drive, or when patching an Oracle Exadata Storage Server.
Parent topic: Maintaining Oracle Exadata System Software
5.3.1 Summary of Software and Firmware Components on Oracle Exadata Storage Servers
The imageinfo
command located in the /usr/local/bin/
directory provides a summary of release and status of the software and firmware components on Oracle Exadata Storage Servers.
The software and firmware components comprise the storage server image. The release and status information is required when working with My Oracle Support.
The following table lists the output fields from the imageinfo
command.
Table 5-2 Description of imageinfo
Command Output
Field | Description |
---|---|
Active image activated |
Date stamp in UTC format when the image on the cell was considered completed, either successfully or unsuccessfully. A cell patch updates the time stamp to indicate the time the cell was patched. |
Active image status |
Status of the cell image based on the success or failure of a set of self-tests and configuration actions, collectively known as validations. When this status is undefined, empty or failure, then examine the different validation logs in the |
Active image version |
Main release version of the overall cell image indicating a specific combination of releases of operating system, core Oracle Exadata System Software (the cell rpm), and the firmware levels for most key components of the cell. A cell patch usually updates this information. The first five separated fields of the version match the standard way Oracle product releases are identified. The last field is the exact build number of the release. It corresponds to |
Active system partition on device |
Cell operating system root ( |
Boot area has rollback archive for version |
For a patched cell using non in-partition cell patch, this indicates whether there is a suitable back up archive that can be used to roll the cell back to the inactive image version. Existence of this archive is necessary but not sufficient for rolling back to inactive version of the cell image. |
Cell boot usb partition |
Oracle Exadata Storage Server boot and rescue USB partition. |
Cell boot usb version |
Version of the software on the boot USB. On a healthy cell this release must be identical to the value of the Active image version line. |
Cell rpm version |
Cell software version or cell rpm version as reported by the CellCLI utility. |
Cell version |
Release version as reported by the CellCLI utility. |
In partition rollback |
Some cell patches do not switch the partitions. These are in-partition patches. This field indicates whether there is enough information to roll back such patch. |
Inactive image activated |
Time stamp for activation of the inactive image. This field is similar to active image activated field. |
Inactive image status |
Status of the inactive image. This field is similar to the status of the active image. |
Inactive image version |
Version of the cell before the most-recent patch was applied. |
Inactive software partition on device |
Oracle Exadata System Software file system partition, |
Inactive system partition on device |
The root ( |
Kernel version |
Operating system kernel version of the cell. |
Rollback to inactive partition |
Summary indicator for a non-in-partition patched cell indicating whether rollback can be run on the cell to take it back to inactive version of the cell image. On a new cell, this field is empty or has the value |
The following is an example of the output from the imageinfo
command:
Kernel version: 2.6.18-194.3.1.0.3.el5 #1 SMP Tue Aug 31 22:41:13 EDT 2010 x86_64
Cell version: OSS_MAIN_LINUX.X64_101105
Cell rpm version: cell-11.2.2.1.1_LINUX.X64_101105-1
Active image version: 11.2.2.1.1.101105
Active image activated: 2010-11-06 21:52:08 -0700
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7
In partition rollback: Impossible
Cell boot usb partition: /dev/sdm1
Cell boot usb version: 11.2.2.1.1.101105
Inactive image version: 11.2.1.3.1
Inactive image activated: 2010-08-28 20:01:30 -0700
Inactive image status: success
Inactive system partition on device: /dev/md6
Inactive software partition on device: /dev/md8
Boot area has rollback archive for the version: 11.2.1.3.1
Rollback to the inactive partitions: Possible
Related Topics
5.3.2 Oracle Exadata Storage Server Image History
The imagehistory
command lists the version history for Oracle Exadata Storage Server.
For example, if a storage server was updated from release 11.2.1.2.6 to release 11.2.1.3.1, and then updated to release 11.2.1.2.3, the imagehistory
command displays this history. The following is an example of the output:
# imagehistory
Version : 11.2.1.2.3
Image activation date : 2012-12-03 06:06:46 -0700
Imaging mode : fresh
Imaging status : success
Version : 11.2.3.2.0.120713
Image activation date : 2012-12-12 17:56:31 -0700
Imaging mode : out of partition upgrade
Imaging status : success
5.3.3 Validation of the State and Health of the System
The validation framework runs different tests under certain conditions, such as on first boot after recovery of an Oracle Exadata Storage Server using the rescue and recovery functionality of the CELLBOOT USB flash drive, or when patching an Oracle Exadata Storage Server.
Validation framework is a set of validation tests that run at boot time at the rc.local
level. The logs for the tests are available in the /var/log/cellos/validations
directory.
Health check validations are a set of quick health checks on the system on each boot, such as basic health of the disks, and report the status. If a validation fails, then you should examine the log file for the cause as it may indicate potential problem requiring attention.
Automatic patch rollback occurs if one or more validation checks fail after patch application. Refer to the documentation for the specific patch.
Check for any failures reported in the /var/log/cellos/vldrun.first_boot.log
file after the first boot configuration. For all subsequent boots, the /var/log/cellos/validations.log
file contains information about failed validations. For each failed validation, perform the following procedure:
-
Look for
/var/log/cellos/validations/
failed_validation_name
.SuggestedRemedy
file. The file exists only if the validation process has identified some corrective action. Follow the suggestions in the file to correct the cause of the failure. -
If the
SuggestedRemedy
file does not exist, then examine the log file for the failed validation in/var/log/cellos/validations
to track down the cause, and correct it as needed.
5.4 Locating Serial Numbers for System Components
You may need to provide the serial numbers for the system components when contacting Oracle Support Services.
Serial numbers for system components can be determined by using the following procedure:
Each time the system is booted, the serial numbers are written to the /var/log/cellos/validations/SerialNumbers
file. This file can be used as a historic record of the serial numbers. The file also contains configuration information for some components.
Parent topic: Maintaining Oracle Exadata System Software
5.5 Diagnostic and Repair Utilities
Oracle Exadata System Software includes utilities for diagnostics and repair of Oracle Exadata Storage Server.
The utilities help diagnose and repair problems that may occur during the normal life cycle of Oracle Exadata Storage Servers. The utilities are in the
directory.
/opt/oracle.SupportTools
Note:
All utilities must be run as the root
user from the
directory.
/opt/oracle.SupportTools
- The CheckHWnFWProfile Utility
The CheckHWnFWProfile utility checks that the system meets the required hardware and firmware specifications, and reports any mismatches. - The Diagnostic ISO File
Use the diagnostic ISO file (diagnostics.iso
) to diagnose and recover from serious problems. - The ibdiagtools Utilities
The most useful utilities of the ibdiagtools utilities are theverify-topology
,checkbadlinks.pl
, andinfinicheck
utilities. - The make_cellboot_usb Utility
Themake_cellboot_usb
utility allows you to rebuild a damaged CELLBOOT USB flash drive.
Parent topic: Maintaining Oracle Exadata System Software
5.5.1 The CheckHWnFWProfile Utility
The CheckHWnFWProfile utility checks that the system meets the required hardware and firmware specifications, and reports any mismatches.
Table 5-3 CheckHWnFWProfile Utility Commands
Command | Description |
---|---|
|
When run without options, the utility checks the existing hardware and firmware components against the expected values. |
|
View the existing hardware and firmware versions on the system. |
|
Disable the |
|
Enable the |
|
Check specified components against the expected values. |
|
View the hardware and firmware versions of specified components on the system. |
|
List serial numbers. The list includes the following serial numbers:
Depending on the system, serial numbers for all the memory (RAM) modules may be included. |
|
View the expected hardware and firmware. |
|
View help and utility usage. |
Parent topic: Diagnostic and Repair Utilities
5.5.2 The Diagnostic ISO File
Use the diagnostic ISO file (diagnostics.iso
) to diagnose
and recover from serious problems.
You may need to boot a server using the diagnostic ISO file to diagnose and recover from serious problems. For example, when the system is inaccessible due to system damage or damage to the CELLBOOT USB flash drive.
Use this facility only as directed to perform specific documented maintenance tasks, or under the guidance of Oracle Support Services.
- Booting a Server using the Diagnostic ISO File
Use this procedure to boot an Oracle Exadata Storage Server or Oracle Exadata Database Server using the diagnostic ISO file (diagnostics.iso
).
Parent topic: Diagnostic and Repair Utilities
5.5.2.1 Booting a Server using the Diagnostic ISO File
Use this procedure to boot an Oracle Exadata Storage Server or Oracle Exadata Database Server using the
diagnostic ISO file (diagnostics.iso
).
Note:
For information on booting an Oracle Linux KVM guest using the diagnostic ISO file, see Starting a Guest using the Diagnostic ISO File.
5.5.3 The ibdiagtools Utilities
The most useful utilities of the ibdiagtools utilities are the verify-topology
, checkbadlinks.pl
, and infinicheck
utilities.
The verify-topology
utility checks the correctness and health of
InfiniBand Network Fabric connections. For example, it
can determine if both cables from the server go to the same switch in the Oracle Exadata Rack. When both cables go to the same switch,
the system loses the ability to fail over to another switch if the first switch
fails.
The checkbadlinks.pl
utility reports the links that are operating at 5 Gbps. This is usually an indication that the cables are loose, and need to be reseated.
The infinicheck
utility reports the base RDMA Network Fabric performance between servers in Oracle Exadata, such as expected minimum throughput between
the database server and storage server, between storage servers, and between a database
server and another database server. This utility can help to identify issues in the RDMA Network Fabric. Because the utility runs stress tests
on the RDMA Network Fabric, Oracle recommends using the
utility when the system is idle and with all cell services shut down. This utility works
on both the InfiniBand Network Fabric and RoCE Network Fabric.
See Also:
- For detailed information about the ibdiagtools utilities, refer to the
README.txt
file in the/opt/oracle.SupportTools/ibdiagtools/
directory. - Sample outputs from each utility are included in the
/opt/oracle.SupportTools/ibdiagtools/SampleOutputs.txt
file.
Parent topic: Diagnostic and Repair Utilities
5.5.4 The make_cellboot_usb Utility
The make_cellboot_usb
utility allows you to rebuild a damaged CELLBOOT USB flash drive.
Do not connect more than one USB flash drive to the system when running this utility. The utility builds on the first discovered USB flash drive on the system.
Note:
This utility can only be used on Oracle Exadata Storage Server.-
To see what is done before rebuilding the USB flash drive:
cd /opt/oracle.SupportTools ./make_cellboot_usb -verbose
-
To rebuild the USB flash drive, run the command with one of the following options:
-execute
,-force
, or-rebuild
../make_cellboot_usb -execute
Or:
./make_cellboot_usb -force
Or:
./make_cellboot_usb -rebuild
Parent topic: Diagnostic and Repair Utilities
5.6 System Diagnostics Data Gathering with sosreports and Oracle ExaWatcher
You can use the sosreport utility and Oracle ExaWatcher to diagnose problems with your system.
Every time a server is started, system-wide configuration information is collected by the sosreport utility, and stored in the /var/log/cellos/sosreports
directory. You can generate a new sosreport by running the following command as the root
user. The script starts collecting the information 30 minutes after entering the command.
/opt/oracle.cellos/vldrun -script sosreport
In addition, the /opt/oracle.ExaWatcher
directory contains the Oracle ExaWatcher system data gathering and reporting utilities. Gathered data is stored in archive subdirectories. The following table describes the data gathered at different intervals by the utility:
Table 5-4 Oracle ExaWatcher Collector Names and Descriptions
Collector Name | Description |
---|---|
CellSrvStat |
Cell server status. |
Diskinfo |
I/O statistics of the disk, such as successfully completed reads, merged reads, time spent reading, and so on. |
FlashSpace |
RAW value of the flash card space. Minimum interval limit is 300 seconds. |
IBCardInfo (Currently not available for X8M systems) |
RDMA Network Fabric card information, and status of InfiniBand Network Fabric ports. Minimum interval is 300 seconds. |
IBprocs |
Commands that check the RDMA Network Fabric card status. Minimum interval is 600 seconds. |
Iostat |
CPU statistics, and I/O statistics for devices and partitions. |
Lsof |
Files opened by current processes. Minimum interval limit is 120 seconds. |
MegaRaidFW |
MegaRaid firmware information, such as battery information. Minimum interval is 86400 seconds. |
Meminfo |
Memory management by the kernel. |
Mpstat |
Microprocessor statistics. |
Netstat |
Current network connection statistics. |
Ps |
Active processes statistics. |
RDSinfo |
Availability of cell servers. Interval limit is 30 seconds. |
Slabinfo |
Caches for frequently-used objects in the kernel. |
Top |
Dynamic, real-time view of the system. |
Vmstat |
Virtual memory status. |
To use Oracle ExaWatcher, do the following:
-
As the
root
user, start the Oracle ExaWatcher processes and service.# systemctl start ExaWatcher
-
Run the Oracle ExaWatcher utility at the
root
user./opt/oracle.ExaWatcher/ExaWatcher.sh [options]
The following options are available for use with the Oracle ExaWatcher utility:
Option | Description |
---|---|
No options specified |
The utility runs using the default options. |
|
To change the core command to be run on the current group. Only the following core commands can be changed:
Example: |
|
The utility parses all command line inputs, validates them, and creates a configuration file. If the file path and name is not specified, then the utility overwrites the default configuration file. |
|
The name of the collector to be disabled on the utility. Example: |
|
The ending tine for the current group. The default value is 10 years from current time. Example: |
|
The configuration file to use with the Oracle ExaWatcher utility. The default configuration files are as follows:
|
|
Starts a new group for gathering data. Other options can be specified with the |
|
Displays help information. |
|
The sampling interval for the current group, in seconds. The default value is 5 seconds. Certain collection modules cannot be run every second because the modules consume resources. Example: |
|
Sets the limit for the amount of storage space used by the utility. The limit is specified in MB. On database servers, the default is 6 GB. On storage servers, the default is 600 MB. Example: |
|
The most-recent configuration file used with the utility. Data is not collected when using this option. |
|
The information about the command inputs. The following are the options:
|
|
The type of collection modules to run for the current group. The following are the options:
The default value is Example: |
|
The archive count of the current group. The default value is 720. Example: |
|
The directory path to store the results of the data collection. Example: |
|
To stop the utility and all its processes, and then to zip the data files. |
|
The starting time for the current group. The default is 20 seconds from the current time. Example: |
|
To include a custom collection module in the current group. Example: |
|
The compression program to use on the collected data. The default program is bzip2. Example: |
Parent topic: Maintaining Oracle Exadata System Software
5.7 Host Console Support
The storage servers and database servers of Oracle Exadata are configured to provide host console access.
The host console is useful when collecting Oracle Linux kernel traces or creating crash dump files to help diagnose severe malfunctions.
To access the host console, perform the following procedure:
-
Connect to the Integrated Lights Out Manager (ILOM) using SSH and log in as an ILOM administrator.
-
Run the
start /SP/console
commandTo stop using the console, use the
stop /SP/console
command.
Parent topic: Maintaining Oracle Exadata System Software
5.8 Oracle Linux Kernel Crash Core Files
The storage servers and database servers of Oracle Exadata are configured to generate Oracle Linux kernel crash core files in the /var/crash
directory, when the Oracle Linux operating system malfunctions or crashes.
The crash
utility can be used to analyze the crash files. The crash files are automatically removed by the ExaWatcher utility so that the files do not occupy more than 10 percent of the free disk space on the file system. Older crash files are removed first.
Parent topic: Maintaining Oracle Exadata System Software
5.9 Monitoring syslog Messages Remotely
By default, storage server syslog messages are written to local log files.
A separate management server, known as a loghost server, can receive syslog messages from Oracle Exadata database servers and storage servers.
-
To monitor the syslog messages remotely, configure the syslog service on the loghost server to listen for incoming syslog messages by setting
SYSLOGD_OPTIONS -r
in the loghost server/etc/sysconfig/syslog
file. -
Use the
ALTER CELL
orALTER DBSERVER
command configure each server to forward specified syslog messages to the loghost server by setting thesyslogconf
attribute.The server configuration is maintained across restarts and updates.
-
Use the
ALTER CELL VALIDATE SYSLOGCONF
orALTER DBSERVER VALIDATE SYSLOGCONF
command to test message transmission from the Exadata servers to the loghost server.
Starting with Oracle Exadata System Software release 21.2.0, you can also configure each server to
forward syslog messages from the Integrated Lights Out Manager (ILOM) service processor (SP). To configure this facility, use the
ALTER CELL
or ALTER DBSERVER
command to set
the ilomSyslogClients
attribute.
Related Topics
Parent topic: Maintaining Oracle Exadata System Software