5 Maintaining Oracle Exadata Storage Server Software

This chapter explains how to maintain Oracle Exadata Storage Server Software. When changing the fundamental configuration of a cell, such as changing the IP address, host name, and InfiniBand address, note the following:

  • Before changing the cell configuration, ensure that all Oracle Automatic Storage Management (Oracle ASM), Oracle Real Application Clusters (Oracle RAC) and database instances that use the cell do not access the cell while you are changing the IP address.

  • After changing the cell configuration, ensure that consumers of cell services are correctly reconfigured to use the new connect information of the cell. If Auto Service Request is being used, then deactivate the asset from ASR Manager, and activate the asset with the new IP address.

  • When changing a cell configuration, change only one cell at a time to ensure that Oracle ASM and Oracle RAC work properly during the changes.

This chapter contains the following topics:

Caution:

All operations in this chapter must be performed with extreme caution and only after you have ensured you have complete backups of the data. If not, then you may experience irrecoverable data loss.

5.1 Using the ipconf Utility

The ipconf utility is used to set and change the following parameters on Oracle Exadata Storage Servers. During initial configuration of Oracle Exadata Database Machine, the utility also configures the database servers.

  • IP address

  • Host name

  • NTP server

  • Time zone

  • DNS name servers

  • InfiniBand addresses

The ipconf utility makes a back up copy of the files it modifies. When the utility is rerun, it overwrites the existing backup file. The log file maintains the complete history of every ipconf operation performed.

Table 5-1 lists the ipconf utility options.

Table 5-1 ipconf Options

Option Description

no option

Utility starts in main editing mode.

-ignoremismatch

Starts utility in main editing mode when there is a mismatch between the stored cell configuration and the running configuration.

-ilom print

Prints basic ILOM settings.

-ilom set

Sets basic ILOM settings.

-verify [-verbose]

Verifies the consistency between the stored cell configuration and the running configuration. The -verbose option shows all details. If -verbose is not used, then only errors are displayed. Success returns zero errors.

-verify -semantic [-verbose]

Verifies consistency and checks for access to all DNS and NTP servers. The -verbose option shows all details. If -verbose is not used, then only errors are displayed. Success returns zero errors.

-verify -semantic-min [-verbose]

Verifies the consistency and checks for access to at least one NTP and one DNS server. The -verbose option shows all details. If -verbose is not used, then only errors are displayed. Success returns zero errors.

Example 5-1 shows the display for the ipconf utility when setting the Sun ILOM interface.

Example 5-1 Using the ipconf Utility to Set the Sun ILOM Interface

# ipconf
Logging started to /var/log/cellos/ipconf.log
Interface ib0 is Linked. hca: mxx4_0
Interface ib1 is Linked. mxx4_0
Interface eth0 is Linked. driver/mac: igb/00:00:00:01:cd:01
Interface eth1 is ... Unlinked. driver/mac: igb/00:00:00:01:cd:02
Interface eth2 is ... Unlinked. driver/mac: igb/00:00:00:01:cd:03
Interface eth3 is ... Unlinked. driver/mac: igb/00:00:00:01:cd:04
 
Network interfaces
Name  State      IP address      Netmask         Gateway         Hostname       
ib0   Linked                                                                    
ib1   Linked                                                                    
eth0  Linked                                                                    
eth1  Unlinked                                                                  
eth2  Unlinked                                                                  
eth3  Unlinked                                                                  
Warning. Some network interface(s) are disconnected. Check cables and switches
         and retry
Do you want to retry (y/n) [y]: n
 
The current nameserver(s): 192.0.2.10 192.0.2.12 192.0.2.13
Do you want to change it (y/n) [n]: 
The current timezone: America/Los_Angeles
Do you want to change it (y/n) [n]: 
The current NTP server(s): 192.0.2.06 192.0.2.12 1192.0.2.13
Do you want to change it (y/n) [n]: 
 
Network interfaces
Name  State   IP address      Netmask       Gateway        Hostname       
eth0  Linked  192.0.2.151  255.255.252.0 192.0.2.15  Managment myg.example.com
eth1  Unlinked
eth2  Unlinked
eth3  Unlinked
bond0 ib0,ib1    192.168.13.101  255.255.252.0        Private myg.example.com 

Select interface name to configure or press Enter to continue: 
 
Select canonical hostname from the list below
1: myg.example.com
2: myg-private.example.com
Canonical fully qualified domain name [1]: 
 
Select default gateway interface from the list below
1: eth01
Default gateway interface [1]:

Canonical hostname: myg.example.com
Nameservers: 192.0.2.10 192.0.2.12 192.0.2.13
Timezone: America/Los_Angeles
NTP servers: 192.0.2.06 192.0.2.12 192.0.2.13
Network interfaces
Name  State  IP address      Netmask        Gateway        Hostname       
eth0  Linked 192.0.2.151 255.255.252.0  192.0.2.15  myg.example.com
eth1  Unlinked
eth2  Unlinked
eth3  Unlinked
bond0 ib0,ib1  192.168.13.101 255.255.252.0 Private  myg-priv.example.com
Is this correct (y/n) [y]: 
 
Do you want to configure basic ILOM settings (y/n) [y]: y
Loading configuration settings from ILOM ...
ILOM Fully qualified hostname [myg_ilom.example.com]:
ILOM IP discovery (static/dhcp) [static]:
ILOM IP address [192.0.2.201]: 
ILOM Netmask [255.255.252.0]: 
ILOM Gateway or none [192.0.2.15]: 
ILOM Nameserver or none: [192.0.2.10]:
ILOM Use NTP Servers (enabled/disabled) [enabled]: 
ILOM First NTP server. Fully qualified hostname or ip address or none [192.0.2.06]:
ILOM Second NTP server. Fully qualified hostname or ip address or none [none]:

Basic ILOM configuration settings:
Hostname             : myg.example.com
IP Discovery         : static
IP Address           : 192.0.2.10
Netmask              : 255.255.252.0
Gateway              : 192.0.2.15
DNS servers          : 192.0.2.10
Use NTP servers      : enabled
First NTP server     : 192.0.2.06
Second NTP server    : none
Timezone (read-only) : America/Los Angeles
 
Is this correct (y/n) [y]:

5.2 Oracle Exadata Storage Server Software Validation Tests and Utilities

Oracle Exadata Storage Server Software includes the following validation tests that run at boot time:

5.2.1 Summary of Software and Firmware Components on Oracle Exadata Storage Servers

The imageinfo command located in the /usr/local/bin/ directory provides a summary of release and status of the software and firmware components on Oracle Exadata Storage Servers. The software and firmware components make the cell image. The release and status information is required when working with Oracle Support Services.

Table 5-2 lists the output fields from the imageinfo command.

Table 5-2 Description of imageinfo Command Output

Field Description

Active image activated

Date stamp in UTC format when the image on the cell was considered completed, either successfully or unsuccessfully. A cell patch updates the time stamp to indicate the time the cell was patched.

Active image status

Status of the cell image based on the success or failure of a set of self-tests and configuration actions, collectively known as validations. Validations are explained in "Validation of the State and Health of the System". When this status is undefined, empty or failure, then examine the different validation logs in the /var/log/cellos directory to determine the cause for the status.

Active image version

Main release version of the overall cell image indicating a specific combination of releases of operating system, core Oracle Exadata Storage Server Software (the cell rpm), and the firmware levels for most key components of the cell. A cell patch usually updates this information. The first five separated fields of the version match the standard way Oracle product releases are identified. The last field is the exact build number of the release. It corresponds to YYMMDD format of the build date.

Active system partition on device

Cell operating system root (/) partition device. A typical successful cell patch switches the cell from its active partitions to inactive partitions. Each successful cell patch keeps the cell switching between the active and inactive partitions. There are few occasions when the cell patch does not switch partitions. These are rare, and are known as in-partition patches.

Boot area has rollback archive for version

For a patched cell using non in-partition cell patch, this indicates whether there is a suitable back up archive that can be used to roll the cell back to the inactive image version. Existence of this archive is necessary but not sufficient for rolling back to inactive version of the cell image.

Cell boot usb partition

Oracle Exadata Storage Server boot and rescue USB partition.

Cell boot usb version

Version of the software on the boot USB. On a healthy cell this release must be identical to the value of the Active image version line.

Cell rpm version

Cell software version or cell rpm version as reported by the CellCLI utility.

Cell version

Release version as reported by the CellCLI utility.

In partition rollback

Some cell patches do not switch the partitions. These are in-partition patches. This field indicates whether there is enough information to roll back such patch.

Inactive image activated

Time stamp for activation of the inactive image. This field is similar to active image activated field.

Inactive image status

Status of the inactive image. This field is similar to the status of the active image.

Inactive image version

Version of the cell before the most-recent patch was applied.

Inactive software partition on device

Oracle Exadata Storage Server Software file system partition, /opt/oracle, for the inactive image.

Inactive system partition on device

The root (/) file system partition for the inactive image.

Kernel version

Operating system kernel version of the cell.

Rollback to inactive partition

Summary indicator for a non-in-partition patched cell indicating whether rollback can be run on the cell to take it back to inactive version of the cell image. On a new cell, this field is empty or has the value undefined.

The following is an example of the output from the imageinfo command:

Kernel version: 2.6.18-194.3.1.0.3.el5 #1 SMP Tue Aug 31 22:41:13 EDT 2010 x86_64
Cell version: OSS_MAIN_LINUX.X64_101105
Cell rpm version: cell-11.2.2.1.1_LINUX.X64_101105-1
 
Active image version: 11.2.2.1.1.101105
Active image activated: 2010-11-06 21:52:08 -0700
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7
 
In partition rollback: Impossible
 
Cell boot usb partition: /dev/sdm1
Cell boot usb version: 11.2.2.1.1.101105
 
Inactive image version: 11.2.1.3.1
Inactive image activated: 2010-08-28 20:01:30 -0700
Inactive image status: success
Inactive system partition on device: /dev/md6
Inactive software partition on device: /dev/md8
 
Boot area has rollback archive for the version: 11.2.1.3.1
Rollback to the inactive partitions: Possible

5.2.2 Oracle Exadata Storage Server Image History

The imagehistory command lists the version history for Oracle Exadata Storage Server. For example, if a cell was updated from release 11.2.1.2.6 to release 11.2.1.3.1, and then updated to release 11.2.1.2.3, the imagehistory command displays this history. The following is an example of the output:

# imagehistory
Version                  : 11.2.1.2.3
Image activation date    : 2012-12-03 06:06:46 -0700
Imaging mode             : fresh
Imaging status           : success
 
Version                  : 11.2.3.2.0.120713
Image activation date    : 2012-12-12 17:56:31 -0700
Imaging mode             : out of partition upgrade
Imaging status           : success 

5.2.3 Validation of the State and Health of the System

Validation framework is a set of validation tests that run at boot time at the rc.local level. The logs for the tests are available in the /var/log/cellos/validations directory. Validation framework also runs different tests under certain conditions, such as on first boot after recovery of an Oracle Exadata Storage Server using the rescue and recovery functionality of the CELLBOOT USB flash drive, or when patching an Oracle Exadata Storage Server.

In addition, health check validations are a set of quick health checks on the system on each boot, such as basic health of the disks, and report the status. If a validation fails, then you should examine the log file for the cause as it may indicate potential problem requiring attention.

Automatic patch rollback occurs if one or more validation checks fail after patch application. Refer to the documentation for the specific patch.

Check for any failures reported in the /var/log/cellos/vldrun.first_boot.log file after the first boot configuration. For all subsequent boots, the /var/log/cellos/validations.log file contains information about failed validations. For each failed validation, perform the following procedure:

  1. Look for /var/log/cellos/validations/failed_validation_name.SuggestedRemedy file. The file exists only if the validation process has identified some corrective action. Follow the suggestions in the file to correct the cause of the failure.

  2. If the SuggestedRemedy file does not exist, then examine the log file for the failed validation in /var/log/cellos/validations to track down the cause, and correct it as needed.

5.2.4 Serial Numbers for System Components

You may need to provide the serial numbers for the system components when contacting Oracle Support Services. Serial numbers for system components can be determined by using the following procedure:

  1. Log in as the root user.
  2. Enter the following command:
    /opt/oracle.SupportTools/CheckHWnFWProfile -action list -mode serial_numbers
    

Each time the system is booted, the serial numbers are written to the /var/log/cellos/validations/SerialNumbers file. This file can be used as a historic record of the serial numbers. The file also contains configuration information for some components.

5.2.5 Diagnostic and Repair Utilities

Oracle Exadata Storage Server Software includes utilities for diagnostics and repair of Oracle Exadata Storage Server. The utilities help diagnose and repair problems that may occur during the normal life cycle of Oracle Exadata Storage Servers. The utilities are in the /opt/oracle.SupportTools directory.

Note:

All utilities must be run as the root user from the /opt/oracle.SupportTools directory.

5.2.5.1 The CheckHWnFWProfile Utility

The CheckHWnFWProfile utility checks that the system meets the required hardware and firmware specifications, and reports any mismatches.

Table 5-3 CheckHWnFWProfile Utility Commands

Command Description

./CheckHWnFWProfile

When run without options, the utility checks the existing hardware and firmware components against the expected values.

./CheckHWnFWProfile -action list

View the existing hardware and firmware versions on the system.

./CheckHWnFWProfile -action alter_config -property HWFW_Checker_Updater_Status -value Disabled

Disable the CheckHWnFWProfile utility.

./CheckHWnFWProfile -action alter_config -property HWFW_Checker_Updater_Status -value Enabled

Enable the CheckHWnFWProfile utility.

./CheckHWnFWProfile -action check -component list_of_components

Check specified components against the expected values.

./CheckHWnFWProfile -action list -component list_of_components

View the hardware and firmware versions of specified components on the system.

./CheckHWnFWProfile -action list -mode serial_numbers

List serial numbers. The list includes the following serial numbers:

  • System

  • Disk controller

  • Each disk

  • InfiniBand HCA

Depending on the system, serial numbers for all the memory (RAM) modules may be included.

./CheckHWnFWProfile -action list -mode supported_info

View the expected hardware and firmware.

./CheckHWnFWProfile -h

./CheckHWnFWProfile --help

View help and utility usage.

5.2.5.2 The diagnostics.iso Utility

The diagnostics.iso utility may be used to boot the server to diagnose serious problems when no other way exists to analyze the system due to damage to the system, and its CELLBOOT USB flash drive. Use this utility only with Oracle Support Services guidance. The root password should be available to Oracle Support Services, as needed.

5.2.5.3 The ibdiagtools Utilities

The most useful utilities of the ibdiagtools utilities are the verify-topology, checkbadlinks.pl, and infinicheck utilities. The verify-topology utility checks the correctness and health of InfiniBand connections. For example, it can determine if both cables from the server go to the same switch in the Oracle Exadata Database Machine. When both cables go to the same switch, the server loses the ability to fail over to another switch if the first InfiniBand switch fails.

The checkbadlinks.pl utility reports the links that are operating at 5 Gbps. This is usually an indication that the cables are loose, and need to be reseated.

The infinicheck utility reports the base InfiniBand performance between servers in Oracle Exadata Database Machine, such as expected minimum throughput between the database server and cell, cell and cell, and database server and another database server. This utility can help identify potential issues in the InfiniBand fabric.

See Also:

  • For detailed information about the ibdiagtools utilities, refer to the README.txt file in the /opt/oracle.SupportTools/ibdiagtools/ directory.

  • Sample outputs from each utility are included in the /opt/oracle.SupportTools/ibdiagtools/SampleOutputs.txt file.

5.2.5.4 The make_cellboot_usb Utility

The make_cellboot_usb utility allows you to rebuild a damaged CELLBOOT USB flash drive. Do not have more than one USB flash drive connected to the system when running this utility. It builds on the first discovered USB flash drive on the system.

Note:

This utility can only be used on Oracle Exadata Storage Server.

  • To see what is done before rebuilding the USB flash drive:

    cd /opt/oracle.SupportTools
    ./make_cellboot_usb -verbose
    
  • To rebuild the USB flash drive, run the command with one of the following options: -execute, -force, or -rebuild.

    ./make_cellboot_usb -execute
    

    Or:

    ./make_cellboot_usb -force
    

    Or:

    ./make_cellboot_usb -rebuild
    

5.2.6 System Diagnostics Data Gathering with sosreports and Oracle ExaWatcher

On every start of a server, systemwide configuration information is collected by the sosreport utility, and stored in the /var/log/cellos/sosreports directory. The information can be used to help diagnose problems. You can generate a new sosreport by running the following command as the root user. The script starts collecting the information 30 minutes after entering the command.

/opt/oracle.cellos/vldrun -script sosreport

In addition, the /opt/oracle.ExWatcher directory contains the Oracle ExaWatcher system data gathering and reporting utilities. Gathered data is stored in archive subdirectories. The following data is gathered at different intervals by the utility:

Table 5-4 Oracle ExaWatcher Collector Names and Descriptions

Collector Name Description

CellSrvStat

Cell server status.

Diskinfo

I/O statistics of the disk, such as successfully completed reads, merged reads, time spent reading, and so on.

FlashSpace

RAW value of the flash card space.

Minimum interval limit is 300 seconds.

IBCardInfo

InfiniBand card information, and status of InfiniBand ports.

Minimum interval is 300 seconds.

IBprocs

Commands that check the InfiniBand status.

Minimum interval is 600 seconds.

Iostat

CPU statistics, and I/O statistics for devices and partitions.

Lsof

Files opened by current processes.

Minimum interval limit is 120 seconds.

MegaRaidFW

MegaRaid firmware information, such as battery information.

Minimum interval is 86400 seconds.

Meminfo

Memory management by the kernel.

Mpstat

Microprocessor statistics.

Netstat

Current network connection statistics.

Ps

Active processes statistics.

RDSinfo

Availability of cell servers.

Interval limit is 30 seconds.

Slabinfo

Caches for frequently-used objects in the kernel.

Top

Dynamic, real-time view of the system.

Vmstat

Virtual memory status.

Use the following command to run Oracle ExaWatcher. Oracle recommends running the command as the root user.

/opt/oracle.ExaWatcher/ExaWatcher.sh [options]

The following options are available for use with the Oracle ExaWatcher utility:

Option Description

No options specified

The utility runs using the default options.

-c | --command 'collector_name ;; "default_command; ... " '

To change the core command to be run on the current group. Only the following core commands can be changed:

CellSrvStat

Iostat

Mpstat

Netstat

Ps

Top

Vmstat

Example: --command 'Vmstat;; "vmstat -a"'

--createconf "config_file_to_create" | null

The utility parses all command line inputs, validates them, and creates a configuration file. If the file path and name is not specified, then the utility overwrites the default configuration file.

-d | --disable "collector_name"

The name of the collector to be disabled on the utility.

Example: --disable "Vmstat"

-e | --end "end_time"

The ending tine for the current group. The default value is 10 years from current time.

Example: --end "11/06/2013 12:01:00"

--fromconf "configuration_file" | null

The configuration file to use with the Oracle ExaWatcher utility. The default configuration files are as follows:

/opt/oracle.ExaWatcher/ExaWatcher.conf for Oracle Linux

/opt/oracle.ExaWatcher/ExaWatcher_SunOS.conf for Oracle Solaris

-g | --group

Starts a new group for gathering data. Other options can be specified with the group option.

-h | --help

Displays help information.

-i | --interval "interval_length"

The sampling interval for the current group, in seconds. The default value is 5 seconds.

Certain collection modules cannot be run every second because the modules consume resources.

Example: --interval 10

-l | --spacelimit

Sets a limit for the amount of space used by the utility. The limit is specified in MB. The default value is 300 GB.

Example: --spacelimit 600

--lastconf

The most-recent configuration file used with the utility.

Data is not collected when using this option.

--listcmd "Full"|"Nameonly"|"Core"|"CMD"|"Enabled"|null

The information about the command inputs. The following are the options:

Full displays all the information about the commands and samplers.

Nameonly displays all names and if it is enabled.

Core displays only the core sampler information.

CMD displays the name, if it is enabled, and the default commands.

-m | --commandmode {"ALL" | "CORE" | "SELECTED"}

The type of collection modules to run for the current group. The following are the options:

ALL runs all collection modules.

CORE runs only the core collection modules.

SELECTED runs only the specified collection modules.

The default value is ALL.

Example: --commandmode "CORE"

-o | --count "archiving_count"

The archive count of the current group. The default value is 720.

Example: --count 500

-r | --resultdir "result_directory "

The directory path to store the results of the data collection.

Example: --r "/opt/oracle.ExaWatcher/archive"

--stop

To stop the utility and all its processes, and then to zip the data files.

-t | --start "start_time"

The starting time for the current group. The default is 20 seconds from the current time.

Example: --start "11/05/2013 12:00:00"

-u | --customcmd 'sample_name ;; "custom_command;... " '

To include a custom collection module in the current group.

Example: --customcmd 'Lsl; "/bin/ls -l"'

-z | --zip "bzip2" "gzip"

The compression program to use on the collected data. The default program is bzip2.

Example: --zip "gzip"

5.2.7 Serial Console Support

The cells and database server of Oracle Exadata Database Machine are configured to provide serial console access. The serial console is useful when taking Linux kernel traces or creating crash dump files to help diagnose severe malfunctions. To access the serial console, perform the following procedure:

  • Connect to the ILOM using SSH and log in as an ILOM administrator. Then run the "start /SP/console" command. To stop using the console, use the "stop /SP/console" command.

5.2.8 Linux Kernel Crash Core Files

The cells and database servers of Oracle Exadata Database Machine are configured to generate Linux kernel crash core files in the /var/crash directory, when there is a Linux crash. The crash utility can be used to analyze the crash files. The crash files are automatically removed by the Exawatcher utility so that the files do not occupy more than 10 percent of the free disk space on the file system. Older crash files are removed first.