4 Maintaining Oracle Exadata System Software

This section explains how to maintain Oracle Exadata System Software.

Caution:

All operations in this chapter must be performed with extreme caution and only after you have ensured you have complete backups of the data. If not, then you may experience irrecoverable data loss.

4.1 Understanding Oracle Exadata System Software Release Numbering

The Oracle Exadata System Software release number is related to the Oracle Database release number.

The Oracle Exadata System Software release number matches the highest Oracle Grid Infrastructure and Oracle Database version it supports. For example, the highest version Oracle Exadata System Software release 18 supports is Oracle Grid Infrastructure and Oracle Database release 18. The highest version Oracle Exadata System Software release 12.2 supports is Oracle Grid Infrastructure and Oracle Database release 12.2.0.1.

Release 18c and Later Numbering

The Oracle Exadata System Software release that followed release 12.2.1.1.8 was renamed to 18.1.0 and a new numbering scheme for the Oracle Exadata System Software was implemented. Instead of a legacy nomenclature such as 12.2.1.1.5, a three field format consisting of: Year.Update.Revision is used, for example 18.1.0. This new numbering scheme allows you to clearly determine:

  • The annual release designation of the software
  • The latest software update, which can contain new features
  • The latest software revision, which includes security and software fixes

If there are new features or new hardware supported, a new software update will be release during the year, for example, 19.2. To allow you to keep current on just security-related and other software fixes after your feature environment becomes stable, software revisions are made available approximately once a month, for example 19.1.3.

Numbering for Releases Prior to 18c

  • The first two digits of the Oracle Exadata System Software release number represent the major Oracle Database release number, such as Oracle Database 12c Release 1 (12.1). Oracle Exadata System Software release 12.1 is compatible with all Oracle Database 12c Release 1 (12.1) releases.
  • The third digit usually represents the component-specific Oracle Database release number. This digit usually matches the fourth digit of the complete release number, such as 12.1.0.1.0 for the current release of Oracle Database.
  • The last two digits represent the Oracle Exadata System Software release.

4.2 Understanding Automated Cell Maintenance

The Management Server (MS) includes a file deletion policy based on the date.

When there is a shortage of space in the Automatic Diagnostic Repository (ADR) directory, then MS deletes the following files:

  • All files in the ADR base directory older than 7 days.
  • All files in the LOG_HOME directory older than 7 days.
  • All metric history files older than 7 days.

The retention period of seven days is the default. The retention period can be modified using the metricHistoryDays and diagHistoryDays attributes with the ALTER CELL command. The diagHistoryDays attribute controls the ADR files, and the metricHistoryDays attribute controls the other files.

If there is sufficient disk space, then trace files are not purged. This can result in files persisting in the ADR base directory past the time limit specified by diagHistoryDays.

In addition, the alert.log file is renamed if it is larger than 10 MB, and versions of the file that are older than 7 days are deleted if their total size is greater than 50 MB.

MS includes a file deletion policy that is triggered when file system utilization is high. Deletion of files in the / (root) directory and the /var/log/oracle directory is triggered when file utilization is 80 percent. Deletion of files in the /opt/oracle file system is triggered when file utilization reaches 90 percent, and the alert is cleared when utilization is below 85 percent. An alert is sent before the deletion begins. The alert includes the name of the directory, and space usage for the subdirectories. In particular, the deletion policy is as follows:

  • The /var/log/oracle file systems, files in the ADR base directory, metric history directory, and LOG_HOME directory are deleted using a policy based on the file modification time stamp.

    • Files older than the number of days set by the metricHistoryDays attribute value are deleted first
    • Successive deletions occur for earlier files, down to files with modification time stamps older than or equal to 10 minutes, or until file system utilization is less than 75 percent.
    • The renamed alert.log files and ms-odl generation files that are over 5 MB, and older than the successively-shorter age intervals are also deleted.
    • Crash files in the /var/log/oracle/crashfiles directory that are more than one day old can be deleted. If the space pressure is not heavy, then the retention time for crash files is the same as for other files. If there are empty directories under /var/log/oracle/crashfiles, these directories are also deleted.
  • For the /opt/oracle file system, the deletion policy is similar to the preceding settings. However, the file threshold is 90 percent, and files are deleted until the file system utilization is less than 85 percent.

  • When file system utilization is full, the files controlled by the diagHistoryDays and metricHistoryDays attributes are purged using the same purging policy.

  • For the / file system, files in the home directories (cellmonitor and celladmin), /tmp, /var/crash, and /var/spool directories that are over 5 MB and older than one day are deleted.

Every hour, MS deletes eligible alerts from the alert history using the following criteria. Alerts are considered eligible if they are stateless or they are stateful alerts which have been resolved.

  • If there are less than 500 alerts, then alerts older than 100 days are deleted.

  • If there are between 500 and 999 alerts, then the alerts older than 7 days are deleted.

  • If there are 1,000 or more alerts, then all eligible alerts are deleted every minute.

Note:

Any directories or files with SAVE in the name are not deleted.

Related Topics

4.3 Recommendations for Changing the Exadata Storage Server Network Address

Review the following advice before changing the fundamental configuration of a storage server, such as changing the IP address, host name, or RDMA Network Fabric address.

  • Before changing the storage server configuration, ensure that all Oracle Automatic Storage Management (Oracle ASM), Oracle Real Application Clusters (Oracle RAC) and database instances that use the storage servers do not access the storage server while you are changing the IP address.

  • After changing the storage server configuration, ensure that consumers of storage server services are correctly reconfigured to use the new connect information of the storage server. If Oracle Auto Service Request (ASR) is being used, then deactivate the asset from Oracle ASR Manager, and activate the asset with the new IP address.

  • When changing a storage server configuration, change only one storage server at a time to ensure that Oracle ASM and Oracle RAC work properly during the changes.

4.4 Using the ipconf Utility

The ipconf utility is used to set and change the following parameters on Oracle Exadata servers.

During initial configuration of Oracle Exadata Database Machine, the utility also configures the database servers.

  • IP address
  • Host name
  • NTP server
  • Time zone
  • DNS name servers
  • RDMA Network Fabric addresses

The ipconf utility makes a back up copy of the files it modifies. When the utility is rerun, it overwrites the existing backup file. The log file maintains the complete history of every ipconf operation performed.

Table 4-1 ipconf Options

Option Description

no option

Utility starts in main editing mode.

-check-consistency [-pkey-file pkey.conf]

Determine if pkey is configured on current host by looking into cell.conf. If pkey is configured, this command reversely checks the sub interfaces and the ifcfg config files, compares them with the pkey.conf, and reports any inconsistencies it finds.

-ignoremismatch

Starts utility in main editing mode when there is a mismatch between the stored cell configuration and the running configuration.

-ilom print

Prints basic ILOM settings.

-ilom set

Sets basic ILOM settings.

-pkey-add $PKEY.CONF -pkey-apply [-force]

Apply the pkey configuration to an unconfigured system or add a new pkey configuration to the current pkey configured host (without changing the current pkey settings). If the interface specified in the pkey file is already, the command exits with an error.

If you include the -force option, then the current pkey configuration for the already configured interface is deleted, and the configuration based on the new pkey.conf file is applied.

-pkey-delete $PKEY.CONF -pkey-apply

To delete some pkeys on the current pkey configured host, where $PKEY.CONF is a simple pkey file that includes only the pkey and physical devices in the entries. If the system is not already configured with pkeys, the command exits with an error.

-pkey-getruntime

Get the InfiniBand partitioning pkey configuration for the current system.

-pkey-matchruntime [-pkey-file pkey.conf]

Get the run-time pkey configuration, and match it with the file specified with the -pkey-file option. If a file is not specified, then /opt/oracle.cellos/pkey.conf is used as the default.

-preconf preconf.scv [-pkey-file pkey.conf] {-generate | -generateall | -verify}

Verifies the pkey.conf file and preconf.scv, then generates cell.conf, and finally cross-validates pkey.conf and cell.conf.

-semantic

Checks only the DNS and NTP configuration.

-semantic-min

Checks for access to at least one NTP and one DNS server.

-update [-dns dns_ip_list] [-ntp ntp_ip_list] [-ilom-dns ilom_dns_list] [-ilom-ntp ilom_ntp_list] [-force | -dry]

Update DNS and NTP servers for Linux and ILOM. For the DNS and NTP servers, specify a list of IP addresses separated by commas.

A maximum of three DNS servers are allowed with -ilom-dns. A maximum of two NTP servers are allowed with -ilom-ntp.

If the timestamp obtained from the new NTP server differs from the current time known to the system by more than 1 second (time step), then the command errors out and does not update the NTP settings. The -force flag on command line overrides this check.

The -dry option indicates the command should check all settings, but not apply them.

-verbose

The -verbose option shows all details. If -verbose is not used, then only errors are displayed.

-verify

Verifies the consistency between the stored cell configuration and the running configuration. Success returns zero errors.

You can use the following options with -verify: -pkey-file, -semantic, -semantic-min

The following example shows the display for the ipconf utility when setting the Sun ILOM interface.

Example 4-1 Using the ipconf Utility to Set the Sun ILOM Interface

# ipconf
Logging started to /var/log/cellos/ipconf.log
Interface ib0 is Linked. hca: mxx4_0
Interface ib1 is Linked. mxx4_0
Interface eth0 is Linked. driver/mac: igb/00:00:00:01:cd:01
Interface eth1 is ... Unlinked. driver/mac: igb/00:00:00:01:cd:02
Interface eth2 is ... Unlinked. driver/mac: igb/00:00:00:01:cd:03
Interface eth3 is ... Unlinked. driver/mac: igb/00:00:00:01:cd:04
 
Network interfaces
Name  State      IP address      Netmask         Gateway         Hostname       
ib0   Linked                                                                    
ib1   Linked                                                                    
eth0  Linked                                                                    
eth1  Unlinked                                                                  
eth2  Unlinked                                                                  
eth3  Unlinked                                                                  
Warning. Some network interface(s) are disconnected. Check cables and switches
         and retry
Do you want to retry (y/n) [y]: n
 
The current nameserver(s): 192.0.2.10 192.0.2.12 192.0.2.13
Do you want to change it (y/n) [n]: 
The current timezone: America/Los_Angeles
Do you want to change it (y/n) [n]: 
The current NTP server(s): 192.0.2.06 192.0.2.12 1192.0.2.13
Do you want to change it (y/n) [n]: 
 
Network interfaces
Name  State   IP address      Netmask       Gateway        Hostname       
eth0  Linked  192.0.2.151  255.255.252.0 192.0.2.15  Managment myg.example.com
eth1  Unlinked
eth2  Unlinked
eth3  Unlinked
bond0 ib0,ib1    192.168.13.101  255.255.252.0        Private myg.example.com 

Select interface name to configure or press Enter to continue: 
 
Select canonical hostname from the list below
1: myg.example.com
2: myg-private.example.com
Canonical fully qualified domain name [1]: 
 
Select default gateway interface from the list below
1: eth01
Default gateway interface [1]:

Canonical hostname: myg.example.com
Nameservers: 192.0.2.10 192.0.2.12 192.0.2.13
Timezone: America/Los_Angeles
NTP servers: 192.0.2.06 192.0.2.12 192.0.2.13
Network interfaces
Name  State  IP address      Netmask        Gateway        Hostname       
eth0  Linked 192.0.2.151 255.255.252.0  192.0.2.15  myg.example.com
eth1  Unlinked
eth2  Unlinked
eth3  Unlinked
bond0 ib0,ib1  192.168.13.101 255.255.252.0 Private  myg-priv.example.com
Is this correct (y/n) [y]: 
 
Do you want to configure basic ILOM settings (y/n) [y]: y
Loading configuration settings from ILOM ...
ILOM Fully qualified hostname [myg_ilom.example.com]:
ILOM IP discovery (static/dhcp) [static]:
ILOM IP address [192.0.2.201]: 
ILOM Netmask [255.255.252.0]: 
ILOM Gateway or none [192.0.2.15]: 
ILOM Nameserver or none: [192.0.2.10]:
ILOM Use NTP Servers (enabled/disabled) [enabled]: 
ILOM First NTP server. Fully qualified hostname or ip address or none [192.0.2.06]:
ILOM Second NTP server. Fully qualified hostname or ip address or none [none]:

Basic ILOM configuration settings:
Hostname             : myg.example.com
IP Discovery         : static
IP Address           : 192.0.2.10
Netmask              : 255.255.252.0
Gateway              : 192.0.2.15
DNS servers          : 192.0.2.10
Use NTP servers      : enabled
First NTP server     : 192.0.2.06
Second NTP server    : none
Timezone (read-only) : America/Los Angeles
 
Is this correct (y/n) [y]:

4.5 Oracle Exadata System Software Validation Tests and Utilities

You can use a variety of commands and utilities to validate the Oracle Exadata System Software and hardware configurations.

4.5.1 Summary of Software and Firmware Components on Oracle Exadata Storage Servers

The imageinfo command located in the /usr/local/bin/ directory provides a summary of release and status of the software and firmware components on Oracle Exadata Storage Servers.

The software and firmware components comprise the storage server image. The release and status information is required when working with My Oracle Support.

The following table lists the output fields from the imageinfo command.

Table 4-2 Description of imageinfo Command Output

Field Description

Active image activated

Date stamp in UTC format when the image on the cell was considered completed, either successfully or unsuccessfully. A cell patch updates the time stamp to indicate the time the cell was patched.

Active image status

Status of the cell image based on the success or failure of a set of self-tests and configuration actions, collectively known as validations. When this status is undefined, empty or failure, then examine the different validation logs in the /var/log/cellos directory to determine the cause for the status.

Active image version

Main release version of the overall cell image indicating a specific combination of releases of operating system, core Oracle Exadata System Software (the cell rpm), and the firmware levels for most key components of the cell. A cell patch usually updates this information. The first five separated fields of the version match the standard way Oracle product releases are identified. The last field is the exact build number of the release. It corresponds to YYMMDD format of the build date.

Active system partition on device

Cell operating system root (/) partition device. A typical successful cell patch switches the cell from its active partitions to inactive partitions. Each successful cell patch keeps the cell switching between the active and inactive partitions. There are few occasions when the cell patch does not switch partitions. These are rare, and are known as in-partition patches.

Boot area has rollback archive for version

For a patched cell using non in-partition cell patch, this indicates whether there is a suitable back up archive that can be used to roll the cell back to the inactive image version. Existence of this archive is necessary but not sufficient for rolling back to inactive version of the cell image.

Cell boot usb partition

Oracle Exadata Storage Server boot and rescue USB partition.

Cell boot usb version

Version of the software on the boot USB. On a healthy cell this release must be identical to the value of the Active image version line.

Cell rpm version

Cell software version or cell rpm version as reported by the CellCLI utility.

Cell version

Release version as reported by the CellCLI utility.

In partition rollback

Some cell patches do not switch the partitions. These are in-partition patches. This field indicates whether there is enough information to roll back such patch.

Inactive image activated

Time stamp for activation of the inactive image. This field is similar to active image activated field.

Inactive image status

Status of the inactive image. This field is similar to the status of the active image.

Inactive image version

Version of the cell before the most-recent patch was applied.

Inactive software partition on device

Oracle Exadata System Software file system partition, /opt/oracle, for the inactive image.

Inactive system partition on device

The root (/) file system partition for the inactive image.

Kernel version

Operating system kernel version of the cell.

Rollback to inactive partition

Summary indicator for a non-in-partition patched cell indicating whether rollback can be run on the cell to take it back to inactive version of the cell image. On a new cell, this field is empty or has the value undefined.

The following is an example of the output from the imageinfo command:

Kernel version: 2.6.18-194.3.1.0.3.el5 #1 SMP Tue Aug 31 22:41:13 EDT 2010 x86_64
Cell version: OSS_MAIN_LINUX.X64_101105
Cell rpm version: cell-11.2.2.1.1_LINUX.X64_101105-1
 
Active image version: 11.2.2.1.1.101105
Active image activated: 2010-11-06 21:52:08 -0700
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7
 
In partition rollback: Impossible
 
Cell boot usb partition: /dev/sdm1
Cell boot usb version: 11.2.2.1.1.101105
 
Inactive image version: 11.2.1.3.1
Inactive image activated: 2010-08-28 20:01:30 -0700
Inactive image status: success
Inactive system partition on device: /dev/md6
Inactive software partition on device: /dev/md8
 
Boot area has rollback archive for the version: 11.2.1.3.1
Rollback to the inactive partitions: Possible

4.5.2 Oracle Exadata Storage Server Image History

The imagehistory command lists the version history for Oracle Exadata Storage Server.

For example, if a storage server was updated from release 11.2.1.2.6 to release 11.2.1.3.1, and then updated to release 11.2.1.2.3, the imagehistory command displays this history. The following is an example of the output:

# imagehistory
Version                  : 11.2.1.2.3
Image activation date    : 2012-12-03 06:06:46 -0700
Imaging mode             : fresh
Imaging status           : success
 
Version                  : 11.2.3.2.0.120713
Image activation date    : 2012-12-12 17:56:31 -0700
Imaging mode             : out of partition upgrade
Imaging status           : success 

4.5.3 Validation of the State and Health of the System

The validation framework runs different tests under certain conditions, such as on first boot after recovery of an Oracle Exadata Storage Server using the rescue and recovery functionality of the CELLBOOT USB flash drive, or when patching an Oracle Exadata Storage Server.

Validation framework is a set of validation tests that run at boot time at the rc.local level. The logs for the tests are available in the /var/log/cellos/validations directory.

Health check validations are a set of quick health checks on the system on each boot, such as basic health of the disks, and report the status. If a validation fails, then you should examine the log file for the cause as it may indicate potential problem requiring attention.

Automatic patch rollback occurs if one or more validation checks fail after patch application. Refer to the documentation for the specific patch.

Check for any failures reported in the /var/log/cellos/vldrun.first_boot.log file after the first boot configuration. For all subsequent boots, the /var/log/cellos/validations.log file contains information about failed validations. For each failed validation, perform the following procedure:

  1. Look for /var/log/cellos/validations/failed_validation_name.SuggestedRemedy file. The file exists only if the validation process has identified some corrective action. Follow the suggestions in the file to correct the cause of the failure.

  2. If the SuggestedRemedy file does not exist, then examine the log file for the failed validation in /var/log/cellos/validations to track down the cause, and correct it as needed.

4.6 Locating Serial Numbers for System Components

You may need to provide the serial numbers for the system components when contacting Oracle Support Services.

Serial numbers for system components can be determined by using the following procedure:

  1. Log in as the root user.
  2. Use the CheckHWnFWProfile command to view the serial numbers.
    # /opt/oracle.SupportTools/CheckHWnFWProfile -action list -mode serial_numbers
    

Each time the system is booted, the serial numbers are written to the /var/log/cellos/validations/SerialNumbers file. This file can be used as a historic record of the serial numbers. The file also contains configuration information for some components.

4.7 Diagnostic and Repair Utilities

Oracle Exadata System Software includes utilities for diagnostics and repair of Oracle Exadata Storage Server.

The utilities help diagnose and repair problems that may occur during the normal life cycle of Oracle Exadata Storage Servers. The utilities are in the /opt/oracle.SupportTools directory.

Note:

All utilities must be run as the root user from the /opt/oracle.SupportTools directory.

4.7.1 The CheckHWnFWProfile Utility

The CheckHWnFWProfile utility checks that the system meets the required hardware and firmware specifications, and reports any mismatches.

Table 4-3 CheckHWnFWProfile Utility Commands

Command Description

./CheckHWnFWProfile

When run without options, the utility checks the existing hardware and firmware components against the expected values.

./CheckHWnFWProfile -action list

View the existing hardware and firmware versions on the system.

./CheckHWnFWProfile -action alter_config -property HWFW_Checker_Updater_Status -value Disabled

Disable the CheckHWnFWProfile utility.

./CheckHWnFWProfile -action alter_config -property HWFW_Checker_Updater_Status -value Enabled

Enable the CheckHWnFWProfile utility.

./CheckHWnFWProfile -action check -component list_of_components

Check specified components against the expected values.

./CheckHWnFWProfile -action list -component list_of_components

View the hardware and firmware versions of specified components on the system.

./CheckHWnFWProfile -action list -mode serial_numbers

List serial numbers. The list includes the following serial numbers:

  • System
  • Disk controller
  • Each disk
  • RDMA Network Fabric host channel adapter (HCA)

Depending on the system, serial numbers for all the memory (RAM) modules may be included.

./CheckHWnFWProfile -action list -mode supported_info

View the expected hardware and firmware.

./CheckHWnFWProfile -h

./CheckHWnFWProfile --help

View help and utility usage.

4.7.2 The Diagnostic ISO File

Use the diagnostic ISO file (diagnostics.iso) to diagnose and recover from serious problems.

You may need to boot a server using the diagnostic ISO file to diagnose and recover from serious problems. For example, when the system is inaccessible due to system damage or damage to the CELLBOOT USB flash drive.

Use this facility only as directed to perform specific documented maintenance tasks, or under the guidance of Oracle Support Services.

4.7.2.1 Booting a Server using the Diagnostic ISO File

Use this procedure to boot an Oracle Exadata Storage Server or Oracle Exadata Database Server using the diagnostic ISO file (diagnostics.iso).

Note:

For information on booting an Oracle Linux KVM guest using the diagnostic ISO file, see Starting a Guest using the Diagnostic ISO File.

  1. Download the diagnostic ISO file (diagnostics.iso) corresponding to your current Oracle Exadata System Software release.

    If required, use the imageinfo command on a running server to determine your current Oracle Exadata System Software release.

    Download the diagnostic ISO file to a client workstation or server that you will use to access the ILOM interface on the server that you want to boot in diagnostics mode. The client machine must have a Web browser and network access to the ILOM interface on the target server.

    If you are using Oracle Exadata System Software Release 18.x, or earlier, you can find the diagnostic ISO file at /opt/oracle.SupportTools/diagnostics.iso on your Oracle Exadata Storage Servers or Oracle Exadata Database Servers. Otherwise, search the My Oracle Support (MOS) patch repository using "exadata diagnostic iso" as the search term.

    You can also locate the diagnostic ISO file in the Supplemental README that is associated with your Oracle Exadata System Software release. The Supplemental README for each Oracle Exadata System Software release is documented in My Oracle Support document 888828.1.

  2. Start the ILOM Web client.

    Start a Web browser on your client machine and navigate to http://ILOM_SP_IPaddress. In the URL, ILOM_SP_IPaddress is the IP address of the ILOM service processor (SP) on the target server.

    If you do not know the IP address of the ILOM SP, you can make a serial connection to the ILOM SP and list the network properties by using the show /network and show /networkipv6 commands. For more details, see Connect to Oracle ILOM.

    The Oracle ILOM Web client login dialog appears.
  3. Log in to the Oracle ILOM Web client using the root account and the password.
  4. Use the ILOM Web interface to attach the diagnostics.iso file as a virtual CDROM device.
    1. In the ILOM Web interface navigation hierarchy, click Remote Control, and then click Redirection
      Depending on the version of the ILOM Web interface that you are using, Redirection may appear in the left hand navigation pane or in a strip of options running along the top of the page.
    2. In the Redirection pane, click Use Video Redirection, then Launch Remote Console.
      In some versions of the ILOM Web interface, the Use Video Redirection button does not exist. In that case, go directly to Launch Remote Console.
    3. In the Remote Console window, choose the Storage menu option in the KVMS menu.
    4. In the Storage Devices dialog, click Add. And in the resulting dialog, select the diagnostics.iso file.

      The following screen image shows an example of what you may see.


      Description of ilom-web-interface-add-storage-device-dialog.jpg follows
      Description of the illustration ilom-web-interface-add-storage-device-dialog.jpg
    5. Back in the Storage Devices dialog, select the entry associated with the diagnostics.iso file and click Connect.

      After the connection is established, the label on the Connect button in the Storage Devices dialog changes to Disconnect.

      Keep in mind the navigation path to the Storage Devices dialog so that you can easily disconnect the diagnostics.iso file after you are done with it.

  5. Configure the server to boot from the diagnostics.iso file by using the newly configured virtual CDROM device.
    1. In the ILOM Web interface navigation hierarchy, click Host Management, and then click Host Control
      Depending on the version of the ILOM Web interface that you are using, Host Control may appear in the left hand navigation pane or in a strip of options running along the top of the page.
    2. From the resulting list of values, select CDROM.
    3. Click Save.
    Now, when the system is booted, the diagnostics.iso file is used as the default boot image.
  6. Back in the Remote Console window, restart the system by using the following command:
    ->  Start /SP/console -script
    The system now reboots using the diagnostics.iso file as the default boot image.

4.7.3 The ibdiagtools Utilities

The most useful utilities of the ibdiagtools utilities are the verify-topology, checkbadlinks.pl, and infinicheck utilities.

The verify-topology utility checks the correctness and health of InfiniBand Network Fabric connections. For example, it can determine if both cables from the server go to the same switch in the Oracle Exadata Rack. When both cables go to the same switch, the system loses the ability to fail over to another switch if the first switch fails.

The checkbadlinks.pl utility reports the links that are operating at 5 Gbps. This is usually an indication that the cables are loose, and need to be reseated.

The infinicheck utility reports the base RDMA Network Fabric performance between servers in Oracle Exadata, such as expected minimum throughput between the database server and storage server, between storage servers, and between a database server and another database server. This utility can help to identify issues in the RDMA Network Fabric. Because the utility runs stress tests on the RDMA Network Fabric, Oracle recommends using the utility when the system is idle and with all cell services shut down. This utility works on both the InfiniBand Network Fabric and RoCE Network Fabric.

See Also:

  • For detailed information about the ibdiagtools utilities, refer to the README.txt file in the /opt/oracle.SupportTools/ibdiagtools/ directory.
  • Sample outputs from each utility are included in the /opt/oracle.SupportTools/ibdiagtools/SampleOutputs.txt file.

4.7.4 The make_cellboot_usb Utility

The make_cellboot_usb utility allows you to rebuild a damaged CELLBOOT USB flash drive.

Do not connect more than one USB flash drive to the system when running this utility. The utility builds on the first discovered USB flash drive on the system.

Note:

This utility can only be used on Oracle Exadata Storage Server.
  • To see what is done before rebuilding the USB flash drive:

    cd /opt/oracle.SupportTools
    ./make_cellboot_usb -verbose
    
  • To rebuild the USB flash drive, run the command with one of the following options: -execute, -force, or -rebuild.

    ./make_cellboot_usb -execute
    

    Or:

    ./make_cellboot_usb -force
    

    Or:

    ./make_cellboot_usb -rebuild
    

4.8 System Diagnostics Data Gathering with sosreports and Oracle ExaWatcher

You can use the sosreport utility and Oracle ExaWatcher to diagnose problems with your system.

Every time a server is started, system-wide configuration information is collected by the sosreport utility, and stored in the /var/log/cellos/sosreports directory. You can generate a new sosreport by running the following command as the root user. The script starts collecting the information 30 minutes after entering the command.

/opt/oracle.cellos/vldrun -script sosreport

In addition, the /opt/oracle.ExaWatcher directory contains the Oracle ExaWatcher system data gathering and reporting utilities. Gathered data is stored in archive subdirectories. The following table describes the data gathered at different intervals by the utility:

Table 4-4 Oracle ExaWatcher Collector Names and Descriptions

Collector Name Description

CellSrvStat

Cell server status.

Diskinfo

I/O statistics of the disk, such as successfully completed reads, merged reads, time spent reading, and so on.

FlashSpace

RAW value of the flash card space.

Minimum interval limit is 300 seconds.

IBCardInfo

(Currently not available for X8M systems)

RDMA Network Fabric card information, and status of InfiniBand Network Fabric ports.

Minimum interval is 300 seconds.

IBprocs

Commands that check the RDMA Network Fabric card status.

Minimum interval is 600 seconds.

Iostat

CPU statistics, and I/O statistics for devices and partitions.

Lsof

Files opened by current processes.

Minimum interval limit is 120 seconds.

MegaRaidFW

MegaRaid firmware information, such as battery information.

Minimum interval is 86400 seconds.

Meminfo

Memory management by the kernel.

Mpstat

Microprocessor statistics.

Netstat

Current network connection statistics.

Ps

Active processes statistics.

RDSinfo

Availability of cell servers.

Interval limit is 30 seconds.

Slabinfo

Caches for frequently-used objects in the kernel.

Top

Dynamic, real-time view of the system.

Vmstat

Virtual memory status.

To use Oracle ExaWatcher, do the following:

  1. As the root user, start the Oracle ExaWatcher processes and service.

    # systemctl start ExaWatcher
  2. Run the Oracle ExaWatcher utility at the root user.

    /opt/oracle.ExaWatcher/ExaWatcher.sh [options]
    

The following options are available for use with the Oracle ExaWatcher utility:

Option Description

No options specified

The utility runs using the default options.

-c | --command 'collector_name ;; "default_command; ... " '

To change the core command to be run on the current group. Only the following core commands can be changed:

CellSrvStat

Iostat

Mpstat

Netstat

Ps

Top

Vmstat

Example: --command 'Vmstat;; "vmstat -a"'

--createconf "config_file_to_create" | null

The utility parses all command line inputs, validates them, and creates a configuration file. If the file path and name is not specified, then the utility overwrites the default configuration file.

-d | --disable "collector_name"

The name of the collector to be disabled on the utility.

Example: --disable "Vmstat"

-e | --end "end_time"

The ending tine for the current group. The default value is 10 years from current time.

Example: --end "11/06/2013 12:01:00"

--fromconf "configuration_file" | null

The configuration file to use with the Oracle ExaWatcher utility. The default configuration files are as follows:

/opt/oracle.ExaWatcher/ExaWatcher.conf for Oracle Linux

-g | --group

Starts a new group for gathering data. Other options can be specified with the group option.

-h | --help

Displays help information.

-i | --interval "interval_length"

The sampling interval for the current group, in seconds. The default value is 5 seconds.

Certain collection modules cannot be run every second because the modules consume resources.

Example: --interval 10

-l | --spacelimit

Sets the limit for the amount of storage space used by the utility. The limit is specified in MB. On database servers, the default is 6 GB. On storage servers, the default is 600 MB.

Example: --spacelimit 900

--lastconf

The most-recent configuration file used with the utility.

Data is not collected when using this option.

--listcmd "Full"|"Nameonly"|"Core"|"CMD"|"Enabled"|null

The information about the command inputs. The following are the options:

Full displays all the information about the commands and samplers.

Nameonly displays all names and if it is enabled.

Core displays only the core sampler information.

CMD displays the name, if it is enabled, and the default commands.

-m | --commandmode {"ALL" | "CORE" | "SELECTED"}

The type of collection modules to run for the current group. The following are the options:

ALL runs all collection modules.

CORE runs only the core collection modules.

SELECTED runs only the specified collection modules.

The default value is ALL.

Example: --commandmode "CORE"

-o | --count "archiving_count"

The archive count of the current group. The default value is 720.

Example: --count 500

-r | --resultdir "result_directory"

The directory path to store the results of the data collection.

Example: --r "/opt/oracle.ExaWatcher/archive"

--stop

To stop the utility and all its processes, and then to zip the data files.

-t | --start "start_time"

The starting time for the current group. The default is 20 seconds from the current time.

Example: --start "11/05/2013 12:00:00"

-u | --customcmd 'sample_name ;; "custom_command;... " '

To include a custom collection module in the current group.

Example: --customcmd 'Lsl; "/bin/ls -l"'

-z | --zip "bzip2" "gzip"

The compression program to use on the collected data. The default program is bzip2.

Example: --zip "gzip"

4.9 Host Console Support

The storage servers and database servers of Oracle Exadata are configured to provide host console access.

The host console is useful when collecting Oracle Linux kernel traces or creating crash dump files to help diagnose severe malfunctions.

To access the host console, perform the following procedure:

  1. Connect to the Integrated Lights Out Manager (ILOM) using SSH and log in as an ILOM administrator.

  2. Run the start /SP/console command

    To stop using the console, use the stop /SP/console command.

4.10 Oracle Linux Kernel Crash Core Files

The storage servers and database servers of Oracle Exadata are configured to generate Oracle Linux kernel crash core files in the /var/crash directory, when the Oracle Linux operating system malfunctions or crashes.

The crash utility can be used to analyze the crash files. The crash files are automatically removed by the ExaWatcher utility so that the files do not occupy more than 10 percent of the free disk space on the file system. Older crash files are removed first.

4.11 Monitoring syslog Messages Remotely

By default, storage server syslog messages are written to local log files.

A separate management server, known as a loghost server, can receive syslog messages from Oracle Exadata database servers and storage servers.

  • To monitor the syslog messages remotely, configure the syslog service on the loghost server to listen for incoming syslog messages by setting SYSLOGD_OPTIONS -r in the loghost server /etc/sysconfig/syslog file.

  • Use the ALTER CELL or ALTER DBSERVER command configure each server to forward specified syslog messages to the loghost server by setting the syslogconf attribute.

    The server configuration is maintained across restarts and updates.

  • Use the ALTER CELL VALIDATE SYSLOGCONF or ALTER DBSERVER VALIDATE SYSLOGCONF command to test message transmission from the Exadata servers to the loghost server.

Starting with Oracle Exadata System Software release 21.2.0, you can also configure each server to forward syslog messages from the Integrated Lights Out Manager (ILOM) service processor (SP). To configure this facility, use the ALTER CELL or ALTER DBSERVER command to set the ilomSyslogClients attribute.

Related Topics