4 Running Exachk for Exalogic: Advanced Usage

This chapter contains the following sections:

4.1 Setting Environment Variables

This section describes the environment variables that you can set up for Exachk. This section contains following topics:

Note:

To see the Exachk-related environment variables that are already configured on the system, run the following command:
export | grep RAT

4.1.1 Setting Environment Variables for Runtime Command Timeouts

To prevent the program from freezing, Exachk automatically terminates commands that exceed default timeouts. On a busy system, Exachk kills commands when the target of the check does not respond within the default timeout. Use the following environment variables to extend the default timeouts:

Note:

If a command times out, analyze the cause of the timeout, and correct the parameters in the environment variables. Timeouts result in missing data, which limits the value of the tool. To avoid frequent timeouts, run the tool during times of least load on the system.

Table 4-1 Setting Environment Variables for Runtime Command Timeouts

Environment Variables Default Timeout Description Example

RAT_TIMEOUT

90 seconds

If the default timeout for any non-root privileged individual command is not long enough, Exachk kills the other commands, which results in missing data.

Set the parameter by setting the environment variable in the script execution environment, as follows:

export RAT_TIMEOUT=120

RAT_ROOT_TIMEOUT

300 seconds

The tool executes a set of root privileged data collections once for each node, including storage nodes and InfiniBand switches. If the default timeout for the set of root- privileged data collections is not long enough, Exachk kills the command running in the environment, which results in missing data for that node.

Set the parameter by setting this environment variable in the script execution environment, as follows:

export RAT_ROOT_TIMEOUT=600

RAT_PASSWORDCHECK_TIMEOUT

10 seconds

During SSH login, if there is a delay in communication between the remote target and the DNS server, the login or password validation operation might timeout. This might result in failure of password validation, or other timeouts in the log.

Set the parameter by setting the environment variable, as follows:

export RAT_PASSWORDCHECK_TIMEOUT=10

4.1.2 Setting Environment Variables for Local Issues

Exachk attempts to derive all the data it needs, from the environment in which it is executed. However, at times, the tool does not work as expected due to local system variations. In such cases, you can use local environment variables to override the default behavior of Exachk.

Table 4-2 lists the environment variables for local issues, that you can set to address local issues:

Note:

In a virtual configuration, when running Exachk from the vServer that hosts the Enterprise Controller component of the Exalogic Control stack, do not use the RAT_CELLS, RAT_SWITCHES, and RAT_CLUSTERNODES variables (as described in Table 4-2) to override the storage node, switches, and compute nodes for which Exachk should perform health checks. Instead, use the exachk_exalogic.conf file as described in Section 4.4.

Table 4-2 Setting Environment Variables for Local Issues

Environment Variables Description Example

RAT_OS

Enables the utility to verify the platform information.

For a 64 bit Oracle Enterprise Linux 5 machine, with x86 architecture, use the following command to set the RAT_OS variable:

export RAT_OS=LINUXX8664OELRHEL5

For a 64 bit Oracle Solaris 11 machine, with x86 architecture, use the following command to set the RAT_OS variable:

export RAT_OS=SOLARISX866411

RAT_SSHELL

Redirects Exachk to the default secure shell location.

export RAT_SSHELL="/usr/bin/ssh -q"

RAT_SCOPY

Redirects Exachk to the default secure copy (scp) location.

export RAT_SCOPY="/usr/bin/scp -q"

RAT_LOCALONLY

If set to 1, directs Exachk to perform health checks on only the compute node from which Exachk is run; that is, Exachk skips the checks for the storage nodes, the switches, and all the compute nodes other than one from which it is run.

To direct Exachk to perform health checks on only the compute node from which Exachk is run, use the following command:

export RAT_LOCALONLY=1

RAT_CELLS

Directs Exachk to run checks on one of the two storage nodes.

If the names of the storage nodes are non-standard, edit the o_storage.out file that is located in the same directory where Exachk is installed, and specify the name of the storage node.

To direct Exachk to run checks on the second storage node, use the following command:

export RAT_CELLS="el01sn02"

RAT_SWITCHES

Directs Exachk to run checks on sub-sets of the InfiniBand switches, in addition to the default checks on the InfiniBand switches.

If the names of the switches are non-standard, edit the o_ibswitches.out file that is located in the same directory where Exachk is installed, and specify the names of the switches.

To direct Exact to run on the InfiniBand switch el01sw-ib02 and its subsets, use the following command:

export RAT_IBSWITCHES="el01sw-ib02"

RAT_CLUSTERNODES

Directs Exachk to run checks on specific nodes.

On a quarter rack, which has eight compute nodes, use the following command to list the compute nodes on which the health check needs to be performed:

export RAT_CLUSTERNODES="el01cn01 el01cn02 el01cn03 el01cn04 el01cn05 el01cn06 el01cn07 el01cn08"

RAT_ELRACKTYPE

Indicates whether the machine is an eighth rack (0), quarter rack (1), half rack (2), or full rack (3).

To specify that the system is a full rack, use the following command:

export RAT_ELRACKTYPE="3"

4.2 Exachk Command Options

You can run Exachk with the following command-line options:

Note:

Only the options documented in the following table are applicable to Exalogic.
Option Purpose and Syntax
-clusternodes Perform checks on only the specified compute nodes and all the other components, and exclude the unspecified compute nodes.

Syntax:

./exachk -clusternodes cn_1[,cn_2,...]
-diff Compare two Exachk HTML reports and generate an HTML report showing the changes in the health of the Exalogic rack between Exachk runs.

Syntax:

# ./exachk -diff report1 report2 [-outfile compared_report.html]

For more information, see Section 5.3.

-exadiff Compare two Exachk zip collections and generate an HTML report showing the differences in the versions of the infrastructure components (hardware, firmware, and software) between the two reports. The two Exachk reports can be for different Exalogic racks or at different points in time for the same rack, such as before and after upgrading the rack.

Syntax:

./exachk -exadiff exachk_collection_zip_1 exachk_collection_zip_2

For more information, see Section 5.5.

-f Perform checks on already collected data.

Syntax:

./exachk -f report_name
-vmguest Perform checks for guest vServers as well.

Syntax:

./exachk -vmguest conf_file_1[,conf_file_2,...]

For more information, see Section 3.3.

-hybrid Perform checks on physical nodes (as well) in a hybrid rack

Syntax:

./exachk -hybrid

For more information, see Section 3.2.4.

-localonly Perform checks for only the host on which Exachk is running.

Syntax:

./exachk -localonly
-nopass Exclude passed checks from the HTML report.

Syntax:

./exachk -nopass
-o v Display results for all checks, including those that passed.

Syntax:

./exachk -o v

-phy Use this option along with -hybrid, to specify the physical nodes in a hybrid rack

Syntax:

./exachk -hybrid -phy node_1[,node_2,...]

For more information, see Section 3.2.4.

-profile Perform specific checks or checks for specific components.

Syntax:

./exachk -profile profile_name

For information about the profiles that you can specify, see "Supported Profiles for the -profile Option" after this table.

-s or -S Run Exachk in silent mode.

Syntax:

./exachk -s

For more information, see Section 4.3.

-v Display the version of the tool.

Syntax:

./exachk -v


Supported Profiles for the -profile Option

Profile Description
control_VM Run health checks for only the Exalogic Control components.
el_extensive In addition to the standard set of checks, run the following checks, which are useful for a freshly installed or upgraded machine:
  • Verify whether the BIOS on the compute nodes is configured correctly.

  • Verify whether PCI 64-bit resource allocation setting on the compute nodes is disabled.

  • In Oracle VM Manager, for each server pool name, verify whether VM Start Policy is set to Start on current server.

Note: Before running Exachk with the el_extensive profile, verify whether passwordless SSH has been enabled for the CLI shell of Oracle VM Manager. For more information, see "Verifying and Enabling Passwordless SSH to the Oracle VM Manager CLI" after this table.

switch Run checks for the switches.
virtual_infra Run checks for the Exalogic virtual infrastructure. This check is applicable to only Exalogic machines in a virtual configuration.
zfs Run checks for the storage appliance.

Verifying and Enabling Passwordless SSH to the Oracle VM Manager CLI

Before running Exachk with the el_extensive profile, you must verify whether passwordless SSH has been enabled for the CLI shell of Oracle VM Manager. To do this, try logging in via SSH to the Oracle VM Manager CLI shell, by running the following command on the host running the Oracle VM Manager vServer:

# ssh -l admin host_name_of_localhost -p 10000

host_name_of_localhost is the host name of the localhost.

If you can log in without having to enter a password (that is, the OVM> prompt is displayed), then passwordless SSH has been enabled.

If a password prompt is displayed, do the following:

  1. Enter the password for the admin user (default: welcome1).

  2. Log out from the OVM> shell, and try logging in again via SSH.

    If the password prompt continues to be displayed, then passwordless SSH is not enabled. To enable passwordless SSH to the Oracle VM Manager CLI, complete the following steps:

    1. SSH, as root, to the vServer that hosts the Oracle VM Manager.

    2. Ensure that ssh-agent is running:

      # eval `ssh-agent`
      

      The output would be as shown in the following example:

      Agent pid 18529
      
    3. Generate a public/private key pair:

      # ssh-keygen -t rsa -f ~/.ssh/admin
      

      If ssh-agent is not running, the following error message will be displayed:

      Could not open a connection to your authentication agent.
      

      When prompted for a pass phrase, press Enter.

      The keys are generated and stored in the ~/.ssh/ directory: the admin file contains the private key and the admin.pub file contains the public key.

    4. Add the private key to the authentication agent:

      # ssh-add ~/.ssh/admin
      Identity added: /home/user/.ssh/admin (/home/user/.ssh/admin)
      
    5. Copy the public key to the .ssh directory in the oracle user's home directory:

      # cp ~/.ssh/admin.pub /home/oracle/.ssh/
      
    6. Append the file containing the public key (that is, admin.pub) to the ovmcli_authorized_keys file:

      # cd /home/oracle/.ssh/# cat admin.pub >> ovmcli_authorized_keys
      
    7. SSH, as the admin user, to the Oracle VM Manager CLI:

      # ssh -l admin localhost -p 10000
      

      At the prompt to continue connecting, enter yes.

      At the prompt for the password, enter the admin user's password.

      The following shell is displayed:

      OVM>
      

    For subsequent logins, the newly established passwordless SSH channel is used.

4.3 Running Exachk in Silent Mode

You can run Exachk run in silent mode by using the -s or -S command-line options:

./exachk -s

./exachk -S

Note:

When you run Exachk in silent mode, it does not perform health checks for storage nodes and InfiniBand switches.

Prerequisites

Ensure that the following prerequisites are met before running Exachk in silent mode:

  1. Configure SSH user equivalence for the root user, from the compute node on which Exachk will be staged, to all the other compute nodes on which you would run the health-check tool.

    To verify SSH user equivalence, log in by using the oracle software owner credentials, and run the SSH command, as shown in the following example:

            $ ssh -o NumberOfPasswordPrompts=0 -o StrictHostKeyChecking=no -l oracle el01cn01 "echo \"oracle user equivalence is setup correctly\""
    

    In this example, oracle is the oracle software owner, and el01cn01 is the compute node hostname.

    If the SSH user is not properly configured on the compute nodes, the following message is displayed:

    Permission denied (publickey,gssapi-with-mic,password)
    

    For more information about configuring passwordless login, see the "Upgrading Multiple Nodes Simultaneously" section in My Oracle Support document 1446396.1.

  2. (required only for the -s option) Add the following line to the sudoers file on each compute node by using the visudo command:

            oracle ALL=(root) NOPASSWD:/tmp/root_exachk.sh 
    

4.4 Overriding Discovered Component Addresses

In a physical environment, the component IP addresses or host names are determined in the first run based on user input. In a virtual environment, Exachk has an in-built mechanism to automatically discover the IP addresses or host names of all the components. These features are designed to minimize the need for end-user input.

However, if the components were entered incorrectly during the first run or the autodiscovery mechanism fails to identify the components correctly, you can do the following to override the values:

  • If you are running Exachk from a compute node, do the following:

    • To override the names of the I switches, edit (or create) the file o_ibswitches.out in the directory that contains the exachk binary. The file should contain a list of host names of the NM2-GW switches, each on a separate line.

    • To override the names of the storage components, edit (or create) the file o_storage.out in the directory that contains the exachk binary. The file should contain a list of host names of the storage heads, each on a separate line.

    • To override the names of the compute nodes, add the environment variable named RAT_CLUSTERNODES, and specify a list of the host names separated by a space, as the value of the variable.

      Example: export RAT_CLUSTERNODES="el01cn01 el01cn02 el01cn03 el01cn04"
      
  • If you are running Exachk from the vServer that hosts the Enterprise Controller component of the Exalogic Control stack, you must use a file named exachk_exalogic.conf to define the names of the components.

    The Exachk bundle contains the following templates for exachk_exalogic.conf in the templates subdirectory:

    • exachk_exalogic.conf.tmpl_full

    • exachk_exalogic.conf.tmpl_half

    • exachk_exalogic.conf.tmpl_quarter

    • exachk_exalogic.conf.tmpl_eight

    Copy the template that corresponds to the size of your Exalogic machine to the directory that contains the exachk binary, and rename the template file to exachk_exalogic.conf.

    Modify exachk_exalogic.conf to match your IP address schema.

    Note:

    Oracle recommends that you create a copy of the exachk_exalogic.conf file that Exachk generates the first time when the system is fully populated and functional, so that you can use the file later.