This chapter contains the following sections:
This section describes the environment variables that you can set up for Exachk. This section contains following topics:
Note:
To see the Exachk-related environment variables that are already configured on the system, run the following command:export | grep RAT
To prevent the program from freezing, Exachk automatically terminates commands that exceed default timeouts. On a busy system, Exachk kills commands when the target of the check does not respond within the default timeout. Use the following environment variables to extend the default timeouts:
Note:
If a command times out, analyze the cause of the timeout, and correct the parameters in the environment variables. Timeouts result in missing data, which limits the value of the tool. To avoid frequent timeouts, run the tool during times of least load on the system.Table 4-1 Setting Environment Variables for Runtime Command Timeouts
Environment Variables | Default Timeout | Description | Example |
---|---|---|---|
|
90 seconds |
If the default timeout for any non- |
Set the parameter by setting the environment variable in the script execution environment, as follows: export RAT_TIMEOUT=120 |
|
300 seconds |
The tool executes a set of |
Set the parameter by setting this environment variable in the script execution environment, as follows: export RAT_ROOT_TIMEOUT=600 |
|
10 seconds |
During SSH login, if there is a delay in communication between the remote target and the DNS server, the login or password validation operation might timeout. This might result in failure of password validation, or other timeouts in the log. |
Set the parameter by setting the environment variable, as follows: export RAT_PASSWORDCHECK_TIMEOUT=10 |
Exachk attempts to derive all the data it needs, from the environment in which it is executed. However, at times, the tool does not work as expected due to local system variations. In such cases, you can use local environment variables to override the default behavior of Exachk.
Table 4-2 lists the environment variables for local issues, that you can set to address local issues:
Note:
In a virtual configuration, when running Exachk from the vServer that hosts the Enterprise Controller component of the Exalogic Control stack, do not use theRAT_CELLS
, RAT_SWITCHES
, and RAT_CLUSTERNODES
variables (as described in Table 4-2) to override the storage node, switches, and compute nodes for which Exachk should perform health checks. Instead, use the exachk_exalogic.conf
file as described in Section 4.4.Table 4-2 Setting Environment Variables for Local Issues
Environment Variables | Description | Example |
---|---|---|
|
Enables the utility to verify the platform information. |
For a 64 bit Oracle Enterprise Linux 5 machine, with x86 architecture, use the following command to set the
For a 64 bit Oracle Solaris 11 machine, with x86 architecture, use the following command to set the RAT_OS variable:
|
|
Redirects Exachk to the default secure shell location. |
|
|
Redirects Exachk to the default secure copy ( |
|
|
If set to 1, directs Exachk to perform health checks on only the compute node from which Exachk is run; that is, Exachk skips the checks for the storage nodes, the switches, and all the compute nodes other than one from which it is run. |
To direct Exachk to perform health checks on only the compute node from which Exachk is run, use the following command:
|
|
Directs Exachk to run checks on one of the two storage nodes. If the names of the storage nodes are non-standard, edit the |
To direct Exachk to run checks on the second storage node, use the following command:
|
|
Directs Exachk to run checks on sub-sets of the InfiniBand switches, in addition to the default checks on the InfiniBand switches. If the names of the switches are non-standard, edit the |
To direct Exact to run on the InfiniBand switch
|
|
Directs Exachk to run checks on specific nodes. |
On a quarter rack, which has eight compute nodes, use the following command to list the compute nodes on which the health check needs to be performed: export RAT_CLUSTERNODES="el01cn01 el01cn02 el01cn03 el01cn04 el01cn05 el01cn06 el01cn07 el01cn08" |
|
Indicates whether the machine is an eighth rack ( |
To specify that the system is a full rack, use the following command: export RAT_ELRACKTYPE="3" |
You can run Exachk with the following command-line options:
Note:
Only the options documented in the following table are applicable to Exalogic.Option | Purpose and Syntax |
---|---|
-clusternodes |
Perform checks on only the specified compute nodes and all the other components, and exclude the unspecified compute nodes.
Syntax: ./exachk -clusternodes cn_1[,cn_2,...] |
-diff |
Compare two Exachk HTML reports and generate an HTML report showing the changes in the health of the Exalogic rack between Exachk runs.
Syntax: # ./exachk -diff report1 report2 [-outfile compared_report.html] For more information, see Section 5.3. |
-exadiff |
Compare two Exachk zip collections and generate an HTML report showing the differences in the versions of the infrastructure components (hardware, firmware, and software) between the two reports. The two Exachk reports can be for different Exalogic racks or at different points in time for the same rack, such as before and after upgrading the rack.
Syntax: ./exachk -exadiff exachk_collection_zip_1 exachk_collection_zip_2 For more information, see Section 5.5. |
-f |
Perform checks on already collected data.
Syntax:
./exachk -f report_name
|
-vmguest |
Perform checks for guest vServers as well.
Syntax: ./exachk -vmguest conf_file_1[,conf_file_2,...] For more information, see Section 3.3. |
-hybrid |
Perform checks on physical nodes (as well) in a hybrid rack
Syntax: ./exachk -hybrid For more information, see Section 3.2.4. |
-localonly |
Perform checks for only the host on which Exachk is running.
Syntax: ./exachk -localonly |
-nopass |
Exclude passed checks from the HTML report.
Syntax: ./exachk -nopass |
-o v |
Display results for all checks, including those that passed.
Syntax:
|
-phy |
Use this option along with -hybrid , to specify the physical nodes in a hybrid rack
Syntax: ./exachk -hybrid -phy node_1[,node_2,...] For more information, see Section 3.2.4. |
-profile |
Perform specific checks or checks for specific components.
Syntax:
./exachk -profile profile_name
For information about the profiles that you can specify, see "Supported Profiles for the -profile Option" after this table. |
-s or -S |
Run Exachk in silent mode.
Syntax: ./exachk -s For more information, see Section 4.3. |
-v |
Display the version of the tool.
Syntax:
|
Supported Profiles for the -profile Option
Profile | Description |
---|---|
control_VM |
Run health checks for only the Exalogic Control components. |
el_extensive |
In addition to the standard set of checks, run the following checks, which are useful for a freshly installed or upgraded machine:
Note: Before running Exachk with the |
switch |
Run checks for the switches. |
virtual_infra |
Run checks for the Exalogic virtual infrastructure. This check is applicable to only Exalogic machines in a virtual configuration. |
zfs |
Run checks for the storage appliance. |
Verifying and Enabling Passwordless SSH to the Oracle VM Manager CLI
Before running Exachk with the el_extensive
profile, you must verify whether passwordless SSH has been enabled for the CLI shell of Oracle VM Manager. To do this, try logging in via SSH to the Oracle VM Manager CLI shell, by running the following command on the host running the Oracle VM Manager vServer:
# ssh -l admin host_name_of_localhost -p 10000
host_name_of_localhost
is the host name of the localhost.
If you can log in without having to enter a password (that is, the OVM>
prompt is displayed), then passwordless SSH has been enabled.
If a password prompt is displayed, do the following:
Enter the password for the admin
user (default: welcome1
).
Log out from the OVM> shell, and try logging in again via SSH.
If the password prompt continues to be displayed, then passwordless SSH is not enabled. To enable passwordless SSH to the Oracle VM Manager CLI, complete the following steps:
SSH, as root
, to the vServer that hosts the Oracle VM Manager.
Ensure that ssh-agent
is running:
# eval `ssh-agent`
The output would be as shown in the following example:
Agent pid 18529
Generate a public/private key pair:
# ssh-keygen -t rsa -f ~/.ssh/admin
If ssh-agent
is not running, the following error message will be displayed:
Could not open a connection to your authentication agent.
When prompted for a pass phrase, press Enter.
The keys are generated and stored in the ~/.ssh/
directory: the admin
file contains the private key and the admin.pub
file contains the public key.
Add the private key to the authentication agent:
# ssh-add ~/.ssh/admin Identity added: /home/user/.ssh/admin (/home/user/.ssh/admin)
Copy the public key to the .ssh
directory in the oracle
user's home directory:
# cp ~/.ssh/admin.pub /home/oracle/.ssh/
Append the file containing the public key (that is, admin.pub
) to the ovmcli_authorized_keys
file:
# cd /home/oracle/.ssh/# cat admin.pub >> ovmcli_authorized_keys
SSH, as the admin
user, to the Oracle VM Manager CLI:
# ssh -l admin localhost -p 10000
At the prompt to continue connecting, enter yes
.
At the prompt for the password, enter the admin
user's password.
The following shell is displayed:
OVM>
For subsequent logins, the newly established passwordless SSH channel is used.
You can run Exachk run in silent mode by using the -s
or -S
command-line options:
./exachk -s
./exachk -S
Note:
When you run Exachk in silent mode, it does not perform health checks for storage nodes and InfiniBand switches.Prerequisites
Ensure that the following prerequisites are met before running Exachk in silent mode:
Configure SSH user equivalence for the root
user, from the compute node on which Exachk will be staged, to all the other compute nodes on which you would run the health-check tool.
To verify SSH user equivalence, log in by using the oracle
software owner credentials, and run the SSH command, as shown in the following example:
$ ssh -o NumberOfPasswordPrompts=0 -o StrictHostKeyChecking=no -l oracle el01cn01 "echo \"oracle user equivalence is setup correctly\""
In this example, oracle
is the oracle
software owner, and el01cn01
is the compute node hostname.
If the SSH user is not properly configured on the compute nodes, the following message is displayed:
Permission denied (publickey,gssapi-with-mic,password)
For more information about configuring passwordless login, see the "Upgrading Multiple Nodes Simultaneously" section in My Oracle Support document 1446396.1.
(required only for the -s
option) Add the following line to the sudoers
file on each compute node by using the visudo
command:
oracle ALL=(root) NOPASSWD:/tmp/root_exachk.sh
In a physical environment, the component IP addresses or host names are determined in the first run based on user input. In a virtual environment, Exachk has an in-built mechanism to automatically discover the IP addresses or host names of all the components. These features are designed to minimize the need for end-user input.
However, if the components were entered incorrectly during the first run or the autodiscovery mechanism fails to identify the components correctly, you can do the following to override the values:
If you are running Exachk from a compute node, do the following:
To override the names of the I switches, edit (or create) the file o_ibswitches.out
in the directory that contains the exachk binary. The file should contain a list of host names of the NM2-GW switches, each on a separate line.
To override the names of the storage components, edit (or create) the file o_storage.out
in the directory that contains the exachk binary. The file should contain a list of host names of the storage heads, each on a separate line.
To override the names of the compute nodes, add the environment variable named RAT_CLUSTERNODES
, and specify a list of the host names separated by a space, as the value of the variable.
Example: export RAT_CLUSTERNODES="el01cn01 el01cn02 el01cn03 el01cn04"
If you are running Exachk from the vServer that hosts the Enterprise Controller component of the Exalogic Control stack, you must use a file named exachk_exalogic.conf
to define the names of the components.
The Exachk bundle contains the following templates for exachk_exalogic.conf
in the templates
subdirectory:
exachk_exalogic.conf.tmpl_full
exachk_exalogic.conf.tmpl_half
exachk_exalogic.conf.tmpl_quarter
exachk_exalogic.conf.tmpl_eight
Copy the template that corresponds to the size of your Exalogic machine to the directory that contains the exachk binary, and rename the template file to exachk_exalogic.conf
.
Modify exachk_exalogic.conf
to match your IP address schema.
Note:
Oracle recommends that you create a copy of theexachk_exalogic.conf
file that Exachk generates the first time when the system is fully populated and functional, so that you can use the file later.