2.7 Understanding Oracle EXAchk specifics for Oracle Exadata and Zero Data Loss Recovery Appliance

Understand the features and learn to perform tasks specific to Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance.

2.7.1 Installation Requirements for Running Oracle EXAhk on Oracle Exadata and Zero Data Loss Recovery Appliance

Understand the requirements for installing Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance, either on your local database or on a remote device that is connected to a database.

Note:

For more information about installing and upgrading Oracle Autonomous Health Framework, see Installing and Upgrading Oracle Autonomous Health Framework.

2.7.2 Using Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance

Usage of Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance depends on other considerations such as virtualization, parallel run, and so on.

2.7.2.1 Database Default Access on the Client Interface

If you use the client interface as the default access for your database, then use the -clusternodes command-line option to instruct Oracle EXAchk to communicate over the management interface.

For example, if a cluster is configured as follows, then the command must include:
-clusternodes dbadm01,dbadm02,dbadm03,dbadm04

Note:

When using the -clusternodes option, start Oracle EXAchk on the first database in the list.

Table 2-6 Example Cluster Configuration

Interface Database Host names

Management

dbadm01, dbadm02, dbadm03, dbadm04

Client

dbclnt01, dbclnt02, dbclnt03, dbclnt04

2.7.2.2 Virtualization Considerations

Oracle EXAchk supports virtualization on Oracle Exadata and Zero Data Loss Recovery Appliance.

To run hardware and operating system level checks for database, storage servers, InfiniBand fabric, and InfiniBand switches:

  • Install Oracle EXAchk into the management domain also referred to as DOM0

  • Run Oracle EXAchk as root

When you run Oracle EXAchk from DOM0, Oracle EXAchk:

  • Discovers all compute nodes, storage servers, and InfiniBand switches in the entire InfiniBand fabric

  • Runs on all those components

To run Oracle EXAchk on a subset of nodes when Oracle EXAchk is run in the management domain, use the command-line options:

  • -clusternodes to designate databases

  • -cells to designate storage servers

  • -ibswitches to designate InfiniBand switches

For example, for a full rack where only the first quarter rack is configured for virtualization, but all components are on the same InfiniBand fabric, run the following command as root on the dom0 database node, randomadm01:
exachk -clusternodes randomadm01,randomadm02 \
       -cells randomceladm01,randomceladm02,randomceladm03 \
       -ibswitches randomsw-ibs0,randomsw-iba0,randomsw-ibb0

Run Oracle EXAchk separately for each cluster in a user domain also referred to as DOMUs in addition to running it in the management domain dom0. Within the DOMU, there is no need to use the above parameters because Oracle EXAchk will automatically discover the nodes in the cluster.

For example, consider 2 clusters and 4 user domains in each cluster. Although there are a total of 8 user domains, Oracle EXAchk runs only twice. Once on the first node of the first cluster running in the first user domain and once on the first node of second cluster running in the second user domain. The user domain runs do not include hardware or operating system level checks on the database, storage servers, or InfiniBand switches.

Note:

Run Oracle EXAchk as root in the management domain and the user domains.

2.7.2.3 Running Serial Data Collection

By default, Oracle EXAchk runs parallel data collection for the storage servers, InfiniBand switches, and Oracle Databases.

You can also configure Oracle EXAchk to run serial data collection.

To run serial data collection for the storage server, database, and InfiniBand switches, set the following environment variables:

  • RAT_COMPUTE_RUNMODE

  • RAT_CELL_RUNMODE

  • RAT_IBSWITCH_RUNMODE

  1. To collect database server data in serial:
    export RAT_COMPUTE_RUNMODE=serial
  2. To collect storage server data in serial:
    export RAT_CELL_RUNMODE=serial
  3. To collect InfiniBand switch data in serial:
    export RAT_IBSWITCH_RUNMODE=serial

2.7.2.4 Using the root User ID in Asymmetric and Role Separated Environments

Run Oracle EXAchk as root to simplify the work required in asymmetric or role separated environments.

If database homes are not symmetric, then install Oracle EXAchk on multiple databases in the cluster, such that there is one installation for each Oracle Database home located on a subset of databases.

For this example, assume the following configuration in the same cluster:

Table 2-7 Using root User ID in Asymmetric and Role Separated Environments

Owner User ID Oracle Database Home Installed on Databases

user1

/path1/dbhome_1

db01, db02, db03, db04

dbm-a

user2

/path2/dbhome_2

db05, db06, db07, db08

dbm-b,dbm-c

grid

/path3/grid

db01, db02, db03, db04, db05, db06, db07, db08

+ASM

Further, there is role separation between user1 and user2 and Grid, such that none can access the database structure of the others. You can also enforce company policy to isolate the system administrators from the database administrators.

Do the following:

  1. As root, install Oracle EXAchk in the /tmp/exachk/121026 directory on db01 .

  2. As root, install Oracle EXAchk the /tmp/exachk/121026 directory on db05.

  3. As root , on db01:
    cd /tmp/exachk/121026
    exachk -clusternodes db01,db02,db03,db04

    Choose dbm-a from the database selection list to collect the database checks for dbm-a.

  4. As root on db05:
    cd /tmp/exachk/121026
    exachk -excludeprofiles storage,switch -clusternodes db05,db06,db07,db08
    

    Choose dbm-b and dbm-c from the Oracle Database selection list to collect the database checks for dbm-b and dbm-c.

  5. If desired, use the -merge command-line option to merge the reports.

2.7.2.5 Environment Variables for Specifying a Different User Than root

Review the list of environment variables for specifying a different user than root.

  • RAT_CELL_SSH_USER

    By default, Oracle EXAchk runs as root  to run checks on an Oracle Exadata Storage Server.

    If security policies do not permit connection to a storage server as root over SSH, then you can specify a different user by setting this environment variable:
    export RAT_CELL_SSH_USER=celladmin

    Note:

    If you specify RAT_CELL_SSH_USER, then a subset of checks is run, based upon the privileges of the alternate user you specify.

  • RAT_IBSWITCH_USER

    By default, Oracle EXAchk runs as root to run checks on the InfiniBand switches, when you run Oracle EXAchk on an Oracle Database as root. By default, when Oracle EXAchk is run as a user other than root on a database, the nm2user is used to run checks on the InfiniBand switches.

    If security policies do not permit connection to an InfiniBand switch as either the root or nm2user user over SSH, then specify a different user by setting this environment variable:
    export RAT_IBSWITCH_USER=ilom-admin

    Note:

    If you specify RAT_IBSWITCH_USER , then a subset of checks is run, based upon the privileges of the alternate user you specify.

2.7.2.6 Oracle EXAchk InfiniBand Switch Processing

This topic explains how Oracle EXAchk InfiniBand switch processing is done when Oracle Exalogic and Oracle Exadata engineered systems reside on the same InfiniBand fabric.

When an Oracle Exalogic and Oracle Exadata engineered system reside on the same InfiniBand fabric:
  1. Running Oracle EXAchk on an Oracle Exadata database server excludes the Exalogic gateway switches.

  2. Running Oracle EXAchk on an Oracle Exalogic compute node excludes the Exadata switches.

2.7.3 Troubleshooting Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance

Follow these steps to troubleshoot and fix Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance issues.

Error RC-003 - No Audit Checks Were Found

Description: While identifying the environment characteristics, Oracle EXAchk

  • Constructs environment variables

  • Compares with the Oracle EXAchk rules database to determine what checks to run

If one of the environment variables does not match a known profile in the rules database, then Oracle EXAchk displays an error error RC-003 - no audit checks were found… and exits.

Cause: The most common case occurs when an older version of Oracle EXAchk is used in an Oracle Exadata Database machine environment with recently released components. This may occur because of a delay between the release of a new component or version and when Oracle EXAchk incorporates support for it.

For example, when Oracle EXAchk earlier than 2.1.3_20111212 were run on an Oracle Exadata Database machine where Oracle Database release 11.2.0.3.0 was deployed, Oracle EXAchk exited with the following message:
Error RC-003 - No audit checks were found for LINUXX8664OELRHEL5_112030-. 
Please refer to the section for this error code in 
"Appendix A - Troubleshooting Scenarios" of the "Exachk User Guide".

In this example, _112030 indicates that Oracle Database release 11.2.0.3.0 was installed on the system. Since the version of Oracle EXAchk used did not support 11.2.0.3.0, Oracle EXAchk could not find a known match in the Oracle EXAchk rules database.

How Long Should It Take to Run Oracle EXAchk?

The time it takes to run the tool varies based on the number of nodes in a cluster, CPU load, network latency, and so on. Normally the entire process takes only a few minutes per node, that is, less than 5 minutes per node. If it takes substantially more time than 5 minutes, then investigate the problem.

With the introduction of parallelized database collection in 2.2.5, the elapsed time for systems with many databases is reduced. Experience in the field is that, it normally takes about 10 minutes for a quarter rack X2-2 system with one database. On an internal X3-2 half rack with 20 storage servers, 9 InfiniBand switches, and 44 databases, the elapsed time was 44 minutes.