4 Cluster

This chapter provides information about the Cluster metrics.

For each metric, it provides the following information:

  • Description

  • Metric table

    The metric table can include some or all of the following: target version, default collection frequency, default warning threshold, default critical threshold, and alert text.

Clusterware

The metrics in this metric category provide an overview of the clusterware status for this cluster, how many nodes in this cluster have problems, and the Cluster Verification (CLUVFY) utility output for all the nodes of this cluster. Generally, the clusterware is up if the clusterware on at least one host is up.

Cluster Verification Output

This metric shows the CLUVFY output of clusterware for all nodes of this cluster.

Data Source

The following command is data source for metric where node1, node2 is the node list for the cluster:

cluvfy comp crs -n node1, node2 ...

User Action

Search for the Cluster Verification (CLUVFY) utility in the Oracle Clusterware Administration and Deployment Guide.

Clusterware Status

This metric shows the overall clusterware status for this cluster. The clusterware is up if the clusterware on at least one host is up.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

10gR2, 11g, 12c

Every 5 Minutes

2

0

Clusterware has problems on the master agent host %CRS_output%

Data Source

The following command is the data source for metric where node1 and node2 is the node list for the cluster:

cluvfy comp crs -n node1, node2 ...

User Action

Search for the Cluster Verification (CLUVFY) utility in the Oracle Clusterware Administration and Deployment Guide.

Alert Log Metrics

The metrics in this metric category provide details about the Cluster Alert Log metrics.

There are two Alert Log Metric groups in this category:
  • Alert Log Error: This metric group de-duplicates the recurring errors over a period of time and raises a single alert for the same underlying issue. It is enabled by default.

  • Clusterware Alert Log: This metric group is an old way of collecting data and raises alerts with every occurrence. It is disabled by default.

Note:

Both Alert Log Error and Clusterware Alert Log metric groups use the same source path of the Alert log destination.
Target Version Alert Log File

10gR2, 11gR1

%OracleHome%/log/%NodeName%/alert%NodeName%.log

11gR2

%OracleHome%/log/%NodeName%/alert%NodeName%.log

12c, 12cR2

%AdrHome%/trace/alert.log

Alert Log Error

The metrics in this metric category provide details about the Alert Log Error metrics.

Clusterware Service Alert Log Error

This metric collects certain error messages in the CRS alert log at the cluster level.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

10gR2, 11gR1

Every 5 Minutes

CRS-1601

Not Defined

%clusterwareErrStack%

See %alertLogName% for details.

11gR2, 12c

Every 5 Minutes

CRS-(8011|8013|8014|8015)

Not Defined

%clusterwareErrStack%

See %alertLogName% for details.

Note:

Do not modify the default warning and critical thresholds for this metric.

Node Configuration Alert Log Error

This column collects CRS-1607, 1802, 1803, 1804, and 1805 messages from the CRS alert log at the cluster level, and issues alerts based on the error code.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

10gR2, 11gR1

Every 5 Minutes

CRS-180(2|3|4|5)

CRS-1607

%nodeErrStack%

See %alertLogName% for details.

11gR2, 12c

Every 5 Minutes

Not Defined

CRS-1607

%nodeErrStack%

See %alertLogName% for details.

Note:

Do not modify the default warning and critical thresholds for this metric.

OCR Alert Log Error

This column collects CRS-1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1010, and 1011 messages from CRS alert log at the cluster level, and issue alerts based on the error code.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

10gR2, 11gR1

Every 5 Minutes

CRS-100(1|2|3|4|5|7)

CRS-(1006|1008|1010|1011)

%ocrErrStack%

See %alertLogName% for details.

Note:

Do not modify the default warning and critical thresholds for this metric.

Voting Disk Alert Log Error

This column collects CRS-1607, 1802, 1803, 1804, and 1805 messages from the CRS alert log at the cluster level, and issues alerts based on the error code.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

10gR2, 11gR1

Every 5 Minutes

Not Defined

CRS-160(4|5|6)

%votingErrStack%

See %alertLogName% for details.

11gR2, 12c

Every 5 Minutes

Not Defined

CRS-160(4|5|6)

%votingErrStack%

See %alertLogName% for details.

Note:

Do not modify the default warning and critical thresholds for this metric.

Clusterware Alert Log Metric

The metrics in this metric category provide details about the Cluster Alert Log metrics.

Clusterware Service Alert Log Error

This metric collects certain error messages in the CRS alert log at the cluster level.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

10gR2, 11gR1

CRS-1601

Not Defined

%clusterwareErrStack%

See %alertLogName% for details.

11gR2, 12c

CRS-(8011|8013|8014|8015)

Not Defined

%clusterwareErrStack%

See %alertLogName% for details.

Note:

Do not modify the default warning and critical thresholds for this metric.

Node Configuration Alert Log Error

This column collects CRS-1607, 1802, 1803, 1804, and 1805 messages from the CRS alert log at the cluster level, and issues alerts based on the error code.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

10gR2, 11gR1

CRS-180(2|3|4|5)

CRS-1607

%nodeErrStack%

See %alertLogName for details.

11gR2, 12c

Not Defined

CRS-1607

%nodeErrStack%

See %alertLogName% for details.

Note:

Do not modify the default warning and critical thresholds for this metric.

OCR Alert Log Error

This column collects CRS-1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1010, and 1011 messages from CRS alert log at the cluster level, and issue alerts based on the error code.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

10gR2, 11gR1

CRS-100(1|2|3|4|5|7)

CRS-(1006|1008|1010|1011)

%ocrErrStack%

See %alertLogName% for details.

Note:

Do not modify the default warning and critical thresholds for this metric.

Voting Disk Alert Log Error

This column collects CRS-1607, 1802, 1803, 1804, and 1805 messages from the CRS alert log at the cluster level, and issues alerts based on the error code.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

10gR2, 11gR1

Not Defined

CRS-160(4|5|6)

%votingErrStack%

See %alertLogName% for details.

11gR2, 12c

Not Defined

CRS-160(4|5|6)

%votingErrStack%

See %alertLogName% for details.

Note:

Do not modify the default warning and critical thresholds for this metric.

QoS Events

The metrics in this metric category provide information about the Quality of Service (QoS) events.

Compliance State

For a database to be managed by Oracle Database QoS Management, the database must be compliant.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

11gR2, 12c

-

Not Defined

NOT_COMPLIANT

Server pool %wlm_entity_name% has a violation. Refer to the Grid Operations Manager log for details.

Memory Pressure Analysis Risk State

Oracle Database QoS Management detects memory pressure on a server in real time and redirects new sessions to other servers to prevent using all available memory on the stressed server.

This metric indicates that the database server is experiencing memory pressure.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

11gR2, 12c

-

RED

Not Defined

Server %wlm_server% is under elevated memory pressure and services on all instances on this server will be stopped.

QoSM State Change

This metric displays the reason for a change in the Oracle Database QoS Management state.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

11gR2, 12c

-

USER_DISABLED

EXCEPTION_DISABLED

QoSM service is disabled due to %wlm_qosm_state%.

Resource State

The metrics in this metric category provide information about the Cluster Resource State (CRS).

State Change

This is the CRS resource status change metric.

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

11gR2, 12c

Every 24 Hours

COMPLETE_INTERMEDIATE|PARTIALLY_UNKNOWN|PARTIALLY_OFFLINE|PARTIALLY_INTERMEDIATE

COMPLETE_UNKNOWN|COMPLETE_OFFLINE|ADD|DOWN

%crs_entity_name% has %resource_status_alert_count% instances in %resource_status_alert_state% State %resource_status_additional_mesg%