2.11 Using Oracle EXAchk on Oracle Big Data Appliance

Understand the features and learn to perform tasks specific to Oracle EXAchk on Oracle Big Data Appliance.

2.11.1 Scope and Supported Platforms for Running Oracle EXAchk on Oracle Big Data Appliance

Oracle EXAchk for Oracle Big Data Appliance supports all Oracle Big Data Appliance versions later than 2.0.1.

Oracle EXAchk for Oracle Big Data Appliance audits important configuration settings within an Oracle Big Data Appliance. Oracle EXAchk examines the following components:

  • CPU

  • Hardware, firmware, and BIOS

  • Operating System kernel parameters, system packages

  • Ethernet network, InfiniBand switches

  • RAM, hard disks

  • Software Installed

Goals for Oracle Big Data Appliance Health Checks

  1. Provide a mechanism to check the complete health of an Oracle Big Data Appliance on a proactive and reactive basis.

  2. Provide a “recommendation engine” for best practices and tips to fix Oracle Big Data Appliance known issues.

Recommended Validation Frequency

Oracle recommends validating Oracle Big Data Appliance immediately after initial deployment, before and after any change, and at least once a quarter as part of planned maintenance operations. The runtime duration of Oracle Autonomous Health Framework depends on the number of nodes to check, CPU load, network latency, and so on.

Note:

Plan to run Oracle EXAchk when there is less load on the Oracle Big Data Appliance. This helps you avoid runtime timeouts during health checks.

2.11.2 Running Compliance Checks on Oracle Big Data Using Oracle EXAchk

Run the exachk compliance -h command to view the list of options supported for Oracle Big Data Appliance.

Note:

Run Oracle EXAchk as root from node1 of the Oracle Big Data Appliance cluster.

Most of the data collection options require password for each InfiniBand switch. This is required only if there is no SSH user equivalency from running compute node to switch.

  1. To view the command options, run the following command as root or non-root user:
    exachk -h
    

Note:

If you run any other profiles that are not not supported, then Oracle EXAchk returns an error as follows:

<profile_name> is not supported component. EXAchk will run generic checks for components identified from environment
For example, to perform all checks including the best practice checks and recommendations, run:
# exachk -a

Note:

If you do not specify any options, then Oracle EXAchk runs with the -a by default.

2.11.3 Reviewing Oracle Big Data Compliance Checks Output

Identify the checks that you must act immediately to remediate, or investigate further to assess the checks that can cause performance or stability issues.

The following message statuses are specific to Oracle EXAchk on Oracle Big Data:

Oracle EXAchk on Oracle Big Data Message Definitions

Table 2-11 Oracle EXAchk on Oracle Big Data Message Definitions

Message Status Description or Possible Impact Action to be Taken

FAIL

Shows checks that did not pass due to issues.

Address the issue immediately.

WARNING

Shows checks that can cause performance or stability issues if not addressed.

Investigate the issue further.

INFO

Indicates information about the system.

Read the information displayed in these checks and follow the instructions provided, if any.

2.11.4 Troubleshooting Oracle EXAchk on Oracle BigData Appliance

In addition to the base troubleshooting, the following are also applicable to Oracle EXAchk on Oracle BigData.

If you face any problems running Oracle EXAchk, then create a service request through My Oracle Support.

Refer to My Oracle Support note 1643715.1 for the latest known issues specific to Oracle EXAchk on Oracle BigData Appliance:

2.11.4.1 Timeouts Checking Switches

If there is a slow SSH on a given switch, then Oracle EXAchk throws an error:

Starting to run root privileged commands in background on INFINIBAND SWITCH <cluster>sw-ib1.

Timed out
Unable to create temp directory on <cluster>sw-ib1

Skipping root privileged commands on INFINIBAND SWITCH <cluster> sw-ib1 is 
available but SSH is blocked.

To resolve, increase the SSH timeout using Oracle EXAchk environment variable.

  1. Reset the environment variable RAT_PASSWORDCHECK_TIMEOUT:
    # set RAT_PASSWORDCHECK_TIMEOUT=40
  2. Rerun Oracle EXAchk.
    # exachk -a