3.2 Oracle Exadata and Zero Data Loss Recovery Appliance

Understand the features and tasks specific to Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance.

3.2.1 Prerequisites for Running Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance

Review the list of additional prerequisites for running Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance.

3.2.1.1 InfiniBand Switches

On the database, if you configure passwordless SSH equivalency for the user that launched Oracle EXAchk to the nm2user user on each InfiniBand switch, then Oracle EXAchk uses SSH equivalency credentials to complete the InfiniBand switch checks.

If you have not configured passwordless SSH equivalency, then Oracle EXAchk prompts you for the nm2user user password on each of the InfiniBand switches.

3.2.2 Installation Requirements for Running Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance

Understand the requirements for installing Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance, either on your local database or on a remote device that is connected to a database.

3.2.2.1 Shared Remote Versus Local Installation

If the environment contains only one Oracle Exadata Database machine or one Oracle Real Application Clusters (Oracle RAC) database, then the entire Oracle EXAchk installation must be local to one of the databases. Do not install Oracle EXAchk on every database.

When an environment consists of more than one Oracle Exadata Database machine or Oracle RAC database, consider installing Oracle EXAchk on a remote device that is connected to a database on each Oracle Exadata Database machine or Oracle RAC cluster.

The advantage is that you can install Oracle EXAchk in one location, validate it, and then run it where required within your environment. This saves time and errors. Because Oracle EXAchk is frequently updated, Oracle recommends that you always use the latest version available.

Use the remote location for running Oracle EXAchk only. All working directories and output files are written to the local databases using the RAT_OUTPUT environment variable. The location you choose for RAT_OUTPUT must have read, write, and delete privileges for the user running Oracle EXAchk. Typically, RAT_OUTPUT is set to the local /opt/oracle.SupportTools/exachk directory.

For example, to install Oracle EXAchk in the /remotely_mounted_dev/exachk/12.1.0.2.6 directory, and then run Oracle EXAchk on the local node as the Oracle Database home owner oracle, use the command:
oracle $ export RAT_OUTPUT=/opt/oracle.SupportTools/exachk
oracle $ /remotely_mounted_dev/exachk/12.1.0.2.6/exachk

Note:

To use the remote device for Oracle EXAchk output, consider the following:

  1. Ensure that the remote device can handle the I/O load.

    The performance of Oracle EXAchk is adversely affected when the remote device cannot manage the I/O load. The effect varies from excessively long run times to unpredictable check timeouts leading to hard-to-diagnose skipped checks.

  2. Do not write I/O from multiple Oracle Exadata Database machines or Oracle RAC clusters into the same output directory.

    Using the same output directory for multiple devices can cause remote locking or access issues on the remote device.

    At a minimum, store the output for each unique Oracle Exadata Database Machine or Oracle RAC cluster to its own directory structure using the RAT_OUTPUT environment variable.

3.2.2.2 Recommended User and Local Installation Directory

If the installation is local, then install Oracle EXAchk in /opt/oracle.SupportTools/exachk owned by the Oracle Grid Infrastructure home owner for the relevant cluster. The permissions on the directory must be 775.

For example, in a role-separated environment if the Oracle Grid Infrastructure home is owned by user1 belonging to the install1 group, then the installation directory is as follows:
# ls -lt /opt/oracle.SupportTools | grep exachk 
drwxrwxr-x 2 user1 install1 4096 Jan 23 08:31 exachk
As user1, copy and unzip the exachk.zip file as follows:
# ls -la 
total 55912 
drwxrwxr-x 5 user1 install1 4096 Jan 23 10:27 . 
drwxr-xr-x 8 root root 4096 Jan 23 08:31 .. 
drwxrwxr-x 3 user1 install1 4096 Jan 22 16:00 .cgrep 
-rw-r--r-- 1 user1 install1 8041431 Jan 22 16:34 exachk.zip 
-rw-r--r-- 1 user1 install1 4580698 Jan 22 16:00 rules.dat 
-rw-r--r-- 1 user1 install1 36866945 Jan 22 16:00 collections.dat 
-rw-r--r-- 1 user1 install1 291 Jan 22 15:59 UserGuide.txt 
-rw-r--r-- 1 user1 install1 2533 Jan 22 15:58 readme.txt 
-rw-r--r-- 1 user1 install1 4114714 Jan 22 15:55 CollectionManager_App.sql 
-rwxr-xr-x 1 user1 install1 1973350 Jan 22 15:55 exachk

This configuration permits the root user and the users in the install1 group to run Oracle EXAchk from the installation directory.

3.2.2.3 Recommended Oracle EXAchk Run Location

By default, Oracle EXAchk stores the output in the directory from where you run it. Oracle recommends any user that runs Oracle EXAchk must first change the working directory to the Oracle EXAchk installation directory.

For example:
[user1]$ cd /opt/oracle.SupportTools/exachk
[user1]$ ./exachk -nodaemon -profile clusterware

This method maintains the output files in one location, even though the file owner users are different.

For example:
[user1]$ ls -lt | grep exachk_
-rw-r--r-- 1 user2 install1  1462155 Jan 23 12:25 exachk_randomdb04_V1201_012315_121443.zip
drwxr-xr-x 8 user2 install1    61440 Jan 23 12:25 exachk_randomdb04_V1201_012315_121443
-rw-r--r-- 1 user1 install1   295994 Jan 23 12:12 exachk_randomdb04_V1201_012315_120457.zip
drwxr-xr-x 8 user1 install1    28672 Jan 23 12:12 exachk_randomdb04_V1201_012315_120457
drwxr-xr-x 8 root  root        69632 Jan 23 10:27 exachk_randomdb04_012315_101719
-rw-r--r-- 1 root  root      1405449 Jan 23 10:27 exachk_randomdb04_012315_101719.zip

If you do not want the output files in this location, then use either the RAT_OUTPUT environment variable or the -output command line option to direct the output to another location. By default, Oracle EXAchk maintains temporary working files in the home directory of the user that runs Oracle EXAchk, and deletes the files at the end of the run.

3.2.3 Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance Usage

Usage of Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance depends on other considerations such as virtualization, parallel run, and so on.

3.2.3.1 Database Default Access on the Client Interface

If you use the client interface as the default access for your database, then use the -clusternodes command-line option to instruct Oracle EXAchk to communicate over the management interface.

For example, if a cluster is configured as follows, then the command must include:
-clusternodes dbadm01,dbadm02,dbadm03,dbadm04

Note:

When using the -clusternodes option, start Oracle EXAchk on the first database in the list.

Table 3-1 Example Cluster Configuration

Interface Database Host names

Management

dbadm01, dbadm02, dbadm03, dbadm04

Client

dbclnt01, dbclnt02, dbclnt03, dbclnt04

3.2.3.2 Virtualization Considerations

Oracle EXAchk supports virtualization on Oracle Exadata and Zero Data Loss Recovery Appliance.

To run hardware and operating system level checks for database, storage servers, InfiniBand fabric, and InfiniBand switches:

  • Install Oracle EXAchk into the management domain also referred to as DOM0

  • Run Oracle EXAchk as root

When you run Oracle EXAchk from DOM0, Oracle EXAchk:

  • Discovers all compute nodes, storage servers, and InfiniBand switches in the entire InfiniBand fabric

  • Runs on all those components

To run Oracle EXAchk on a subset of nodes when Oracle EXAchk is run in the management domain, use the command-line options:

  • -clusternodes to designate databases

  • -cells to designate storage servers

  • -ibswitches to designate InfiniBand switches

For example, for a full rack where only the first quarter rack is configured for virtualization, but all components are on the same InfiniBand fabric, run the following command as root on the database randomadm01:
./exachk -clusternodes randomadm01,randomadm02 \
                  -cells randomceladm01,randomceladm02,randomceladm03 \
                  -ibswitches randomsw-ibs0,randomsw-iba0,randomsw-ibb0

Run Oracle EXAchk separately for each cluster in a user domain also referred to as DOMUs.

For example, consider 2 clusters and 4 user domains in each cluster. Although there are a total of 8 user domains, Oracle EXAchk runs only twice. Once on the first node of the first cluster running in the first user domain and once on the first node of second cluster running in the second user domain. The user domain runs do not include hardware or operating system level checks on the database, storage servers, or InfiniBand switches.

Note:

Run Oracle EXAchk as root in the management domain and the user domains.

3.2.3.3 Running Serial Data Collection

By default, Oracle EXAchk runs parallel data collection for the storage servers, InfiniBand switches, and databases.

You can also configure Oracle EXAchk to run serial data collection.

To run serial data collection for the storage server, database, and InfiniBand switches, set the following environment variables:

  • RAT_COMPUTE_RUNMODE

  • RAT_CELL_RUNMODE

  • RAT_IBSWITCH_RUNMODE

  1. To collect database server data in serial:
    export RAT_COMPUTE_RUNMODE=serial
    
  2. To collect storage server data in serial:
    export RAT_CELL_RUNMODE=serial
    
  3. To collect InfiniBand switch data in serial:
    export RAT_IBSWITCH_RUNMODE=serial
    

3.2.3.4 Multiple Asymmetric Database Home Examples

If the Oracle Database homes are not symmetric, then install Oracle EXAchk onto multiple databases in the cluster so that there is one installation for each Oracle Database home on a subset of databases.

Multiple Asymmetric Database Homes Owned by the Same or Different Users

The following table is an example of a distribution in the same cluster, with role separation between user1 and user2 such that neither can access the other's database home or database:

Table 3-2 Multiple Asymmetric Database Homes Owned by the Same or Different Users

Owner User Database Home Installed on Databases

user1

/path1/dbhome_1

db01, db02, db03, db04

dbm-a

user2

/path2/dbhome_2

db05, db06, db07, db08

dbm-b, dbm-c

Do the following:

  1. As user1, install Oracle EXAchk in /home/exachk/user1 on db01.

  2. As user2, install Oracle EXAchk in /home/exachk/user2 on db05.

  3. As user1, on db01, run the following command to collect the storage server, root level database checks, and InfiniBand switch checks:
    cd /home/exachk/user1 
    ./exachk -profile sysadmin
    
  4. As user1, on db01 , collect the database checks for dbm-a:
    cd  /home/exachk/user1
    ./exachk -profile dba -clusternodes db01,db02,db03,db04
    
  5. As user2, on db05:
    cd /home/exachk/user2 
    ./exachk -profile dba -clusternodes db05,db06,db07,db08.
    

    Choose dbm-b and dbm-c from the database selection list to collect the database checks for dbm-b and dbm-c.

  6. Optionally, use the -merge  option to merge the reports.

Multiple Asymmetric Database Homes Owned by the Same or Different Users, Grid User, and SYSADMIN/DBA Role Isolation

For this example, assume the following configuration in the same cluster:

Table 3-3 Multiple Asymmetric Database Homes Owned by the Same or Different Users, Grid User, and SYSADMIN/DBA Role Isolation

Owner User ID Database Home Installed on Database(s)

user1

/path1/dbhome_1

db01, db02, db03, db04

dbm-a

user2

/path2/dbhome_2

db05, db06, db07, db08

dbm-b, dbm-c

grid

/path3/grid

db01, db02, db03, db04, db05, db06, db07, db08

+ASM

Further, there is role separation between user1 and user2 and grid such that none can access the database structure of others and a company policy to isolate the system administrators from the database administrators.

Do the following:

  1. As user1, install Oracle EXAchk in /home/exachk/user1 on db01.

  2. As user2, install Oracle EXAchk in /home/exachk/user2 on db05.

  3. As the grid user, run Oracle Clusterware checks:
    mkdir /home/grid/exachk_reports
    cd /home/grid/exachk_reports
    /home/exachk/userid1/exachk -profile clusterware 
    

    The working directory and zip file are stored in the /home/grid/exachk_reports directory.

  4. As root, run the sysadmin checks to collect from the storage server, root level database, and InfiniBand switch checks:
    mkdir /root/exachk_reports
    cd /root/exachk_reports
    /home/exachk/userid1/exachk -profile sysadmin
    

    The working directory and zip file are stored in the /root/exachk_reports directory.

  5. As user1 on db01, run the command:
    cd /home/exachk/user1
    ./exachk -profile dba -clusternodes db01,db02,db03,db04
    

    Choose dbm-a from the Oracle database selection list to collect the database checks for dbm-a.

  6. As user2 on db05, run the command:
    cd /home/exachk/user2
    ./exachk -profile dba -clusternodes db05,db06,db07,db08
    

    Choose dbm-b and dbm-c from the database selection list to collect the database checks for dbm-b and dbm-c.

  7. Optionally, use the -merge command-line option to merge the reports.

3.2.3.5 Using the root User ID in Asymmetric and Role Separated Environments

Run Oracle EXAchk as root to simplify the work required in asymmetric or role separated environments.

If database homes are not symmetric, then install Oracle EXAchk on multiple databases in the cluster, such that there is one installation for each Oracle Database home located on a subset of databases.

For this example, assume the following configuration in the same cluster:

Table 3-4 Using root User ID in Asymmetric and Role Separated Environments

Owner User ID Database Home Installed on Database(s)

user1

/path1/dbhome_1

db01, db02, db03, db04

dbm-a

user2

/path2/dbhome_2

db05, db06, db07, db08

dbm-b,dbm-c

grid

/path3/grid

db01, db02, db03, db04, db05, db06, db07, db08

+ASM

Further, there is role separation between user1 and user2 and GRID, such that none can access the database structure of the others. You can also enforce company policy to isolate the system administrators from the database administrators.

Do the following:

  1. As root, install Oracle EXAchk in the /tmp/exachk/121026 directory on db01 .

  2. As root, install Oracle EXAchk the /tmp/exachk/121026 directory on db05.

  3. As root , on db01:
    cd /tmp/exachk/121026
    ./exachk -clusternodes db01,db02,db03,db04
    

    Choose dbm-a from the database selection list to collect the database checks for dbm-a.

  4. As root on db05:
    cd /tmp/exachk/121026
    ./exachk -excludeprofiles storage,switch -clusternodes db05,db06,db07,db08
    

    Choose dbm-b and dbm-c from the database selection list to collect the database checks for dbm-b and dbm-c.

  5. If desired, use the -merge command-line option to merge the reports.

3.2.3.6 Environment Variables for Specifying a Different User Than root

Review the list of environment variables for specifying a different user than root.

  • RAT_CELL_SSH_USER

    By default, Oracle EXAchk runs as root  to run checks on an Oracle Exadata Storage Server.

    If security policies do not permit connection to a storage server as root over SSH, then you can specify a different user by setting this environment variable:
    export RAT_CELL_SSH_USER=celladmin
    

    Note:

    If you specify RAT_CELL_SSH_USER, then a subset of checks is run, based upon the privileges of the alternate user you specify.

  • RAT_IBSWITCH_USER

    By default, Oracle EXAchk runs as root to run checks on the InfiniBand switches, when you run Oracle EXAchk on a database as root. By default, when Oracle EXAchk is run as a user other than root on a database, the nm2user is used to run checks on the InfiniBand switches.

    If security policies do not permit connection to an InfiniBand switch as either the root or nm2user user over SSH, then specify a different user by setting this environment variable:
    export RAT_IBSWITCH_USER=ilom-admin
    

    Note:

    If you specify RAT_IBSWITCH_USER , then a subset of checks is run, based upon the privileges of the alternate user you specify.

3.2.3.7 Oracle EXAchk InfiniBand Switch Processing

This topic explains how Oracle EXAchk InfiniBand switch processing is done when Oracle Exalogic and Oracle Exadata engineered systems reside on the same InfiniBand fabric.

When an Exalogic and Exadata engineered system reside on the same InfiniBand fabric:
  1. Running Oracle EXAchk on an Exadata database server excludes the Exalogic gateway switches.

  2. Running Oracle EXAchk on an Exalogic compute node excludes the Exadata switches.

3.2.4 Troubleshooting Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance

Follow these steps to troubleshoot and fix Oracle EXAchk on Oracle Exadata and Zero Data Loss Recovery Appliance issues.

Error RC-003 - No Audit Checks Were Found

Description: While identifying the environment characteristics, Oracle EXAchk

  • Constructs environment variables

  • Compares with the Oracle EXAchk rules database to determine what checks to run

If one of the environment variables does not match a known profile in the rules database, then Oracle EXAchk displays an error error RC-003 - no audit checks were found… and exits.

Cause: The most common case occurs when an older version of Oracle EXAchk is used in an Oracle Exadata Database machine environment with recently released components. This may occur because of a delay between the release of a new component or version and when Oracle EXAchk incorporates support for it.

For example, when Oracle EXAchk earlier than 2.1.3_20111212 were run on an Oracle Exadata Database machine where Oracle Database release 11.2.0.3.0 was deployed, EXAchk exited with the following message:
Error RC-003 - No audit checks were found for LINUXX8664OELRHEL5_112030-. 
Please refer to the section for this error code in 
"Appendix A - Troubleshooting Scenarios" of the "Exachk User Guide".

In this example, _112030 indicates that Oracle Database release 11.2.0.3.0 was installed on the system. Since the version of Oracle EXAchk used did not support 11.2.0.3.0, Oracle EXAchk could not find a known match in the Oracle EXAchk rules database.

How Long Should It Take to Run Oracle EXAchk?

The time it takes to run the tool varies based on the number of nodes in a cluster, CPU load, network latency, and so on. Normally the entire process takes only a few minutes per node, that is, less than 5 minutes per node. If it takes substantially more time than 5 minutes, then investigate the problem.

With the introduction of parallelized database collection in 2.2.5, the elapsed time for systems with many databases is reduced. Experience in the field is that, it normally takes about 10 minutes for a quarter rack X2-2 system with one database. On an internal X3-2 half rack with 20 storage servers, 9 InfiniBand switches, and 44 databases, the elapsed time was 44 minutes.