Oracle High Availability Service

3 Oracle High Availability Service

This chapter provides information about the Oracle High Availability Service metrics.

For each metric, it provides the following information:

Description
Metric table

The metric table can include some or all of the following: target version, default collection frequency, default warning threshold, default critical threshold, and alert text.

CRS nodeapp Status

The metric in this category monitors the status of the Oracle Cluster Ready Services (CRS) node applications (nodeapps), Virtual Internet Protocol (IP), Global Services Daemon (GSD), and Oracle Notification System (ONS).

nodeapp Status

This metric monitors the status of the nodeapps, IP, GSD, and ONS. A critical alert is raised for the nodeapp if its status is OFFLINE NOT RESTARTING. A warning alert is raised for the nodeapp if its status is either UNKNOWN or OFFLINE.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
10g, 11gR1	Every 5 minutes	UNKNOWN\|OFFLINE	OFFLINE NOT RESTARTING	CRS resource %nodeapps% is %status%

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each nodeapp object.

If warning or critical threshold values are currently set for any nodeapp object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each nodeapp object, use the Edit Thresholds page.

Data Source

Not available.

User Action

Refer to the Real Application Clusters Administration and Deployment Guide for node applications startup and troubleshooting information.

CRS Virtual IP Relocation Status

The metrics in this category provide information about whether there is a Virtual IP relocation taking place. When a Virtual IP is relocated from the host (node) on which it was originally configured, a critical alert is generated.

Virtual IP Relocated

This metric shows whether the Virtual Internet protocol has relocated from the host (node) where it was originally configured. The value is TRUE if relocation occurred. Otherwise it is FALSE. When the value is TRUE, a critical alert is raised.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
10g, 11gR1	Every 5 minutes	Not Defined	TRUE	CRS resource %vip% was relocated to %current_node%

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each Virtual IP Name object.

If warning or critical threshold values are currently set for any Virtual IP Name object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each Virtual IP Name object, use the Edit Thresholds page.

Data Source

Not available.

User Actions

The required actions are specific to your site.

Incident

This metrics category provides information about the Incident target.

Alert Log Error Trace File

The alert log error trace file is the name of an associated server trace file generated when the problem causing this incident occurred. If no additional trace file was generated, this field is blank.

Target Version	Collection Frequency
All Versions	Every 5 Minutes

Data Source

The alert log error trace file name is extracted from the database alert log.

User Action

Examine the alert log error trace file for more information about the problem that occurred.

Alert Log Name

This metric contains the fully specified name of the current XML alert log file (including directory path).

Target Version	Collection Frequency
All Versions	Every 15 Minutes

Data Source

This name is retrieved by searching the OMS ADR_HOME/alert directory for the most recent (current) log file.

User Action

Examine the alert log file for more information about the problem that occurred.

ECID

The Execution Context ID (ECID) tracks requests as they move through the application server. This information is useful for diagnostic purposes because it can be used to correlate related problems encountered by a single user attempting to accomplish a single task.

Target Version	Collection Frequency
All Versions	Every 15 Minutes

Data Source

The ECID is extracted from the database alert log.

User Action

Diagnostic incidents usually indicate software errors and should be reported to Oracle through the Enterprise Manager Support Workbench. When you package problems using Support Workbench, the Support Workbench uses ECID to correlate and include any additional problems in the package.

Impact

This metric provides an optional field that reports the impact of the problem that occurred. It may be empty.

Target Version	Collection Frequency
All Versions	Every 15 Minutes

Data Source

The impact is extracted from the database alert log.

User Action

This field is informational. Diagnostic incidents usually indicate software errors and should be reported to Oracle using the Enterprise Manager Support Workbench.

Incident ID

This metric reports the incident ID, a number that uniquely identifies a diagnostic incident (a single occurrence of a problem).

Target Version	Collection Frequency
All Versions	Every 15 Minutes

Data Source

The incident ID is extracted from the database alert log.

User Action

Diagnostic incidents usually indicate software errors and should be reported to Oracle using the Enterprise Manager Support Workbench. A problem is one or more occurrences of the same incident. If you use Support Workbench, the incident ID can be used to select the correct problem to package and send to Oracle. If you use the command line tool ADRCI, you can use the Show Incident command with the incident ID to retrieve details about the incident.

Generic Incident

This metric reports the number of Generic Incident type incidents observed the last time that Oracle Enterprise Manager scanned the alert log.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
12c	Every 5 minutes	Not Defined	.*	Incident (%adr_problemKey%) detected in %alertLogName% at time/line number: %timeLine%.

Data Source

The source for this metric is the Incident metric.

User Action

Use Support Workbench in Enterprise Manager to examine the details of the incidents.

Generic Internal Error

This metric reflects the number of Generic Internal Error incidents observed the last time Enterprise Manager scanned the alert log.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
12c	Every 5 minutes	Not Defined	.*	Internal error (%adr_problemKey%) detected in %alertLogName% at time/line number: %timeLine%.

Data Source

The source for this metric is the Incident metric.

User Action

Use Support Workbench in Enterprise Manager to examine the details of the incidents.

Operational Error

This metric category contains metrics representing errors that might affect the operation of the database as recorded in the database alert log file. The alert log file has a chronological log of messages and errors.

Generic Operational Error

This metric reports the number of generic operation errors observed the last time Enterprise Manager scanned the alert log file.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
12c	Every 5 minutes	Not Defined	.*	Operational error (%errorCodes%) detected in %alertLogName% at time/line number: %timeLine%.

User-Defined Error

This metric reports the number of user-defined errors observed the last time Enterprise Manager scanned the alert log file.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
12c	Every 5 minutes	Not Defined	Not Defined	Error (%errorCodes%) detected in %alertLogName% at time/line number: %timeLine%.

User-Defined Warning

This metric reflects the number of user-defined warnings witnessed the last time Enterprise Manager scanned the alert log file.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
12c	Every 5 minutes	Not Defined	Not Defined	Warning (%errorCodes%) detected in %alertLogName% at time/line number: %timeLine%.

Oracle High Availability Service Alert Log

The metrics in this category provide information about the Oracle high availability service alert log.

Alert Log Name

This metric reports the name and full path of the CRS alert log.

Target Version	Collection Frequency
All Versions	Every 5 Minutes

Data Source

Not available.

User Action

The required actions are specific to your site.

CRS Resource Alert Log Error

This resource collects CRS-1203, CRS-1205 and CRS-1206 messages in the CRS alert log at the host level and issues CRS Resource Alert Log Error alerts at a critical level.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
10gR2, 11gR1	Every 5 minutes	Not Defined	CRS-120(3\|5\|6)	%resourceErrStack% See %alertLogName% for details.
11gR2, 12c	Every 5 Minutes	CRS-(2765\|2878)	CRS-120(3\|5\|6)\|CRS-(2768\|2769\|2771)	%resourceErrStack% See %alertLogName% for details.

Note:

After an alert is triggered for this metric, it must be manually cleared.

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each Time/Line Number object.

If warning or critical threshold values are currently set for any Time/Line Number object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each Time/Line Number object, use the Edit Thresholds page.

Data Source

Not available.

User Action

The required actions are specific to your site.

OCR Alert Log Error

This metric collects CRS-1009 messages in the CRS alert log at the host level and issues OCR Alert Log Error type alerts. OCR refers to Oracle Cluster Registry.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
10gR2, 11gR1	Every 5 Minutes	CRS-100(1\|2\|3\|4\|5\|7)	CRS-(1006\|1008\|1010\|1011\|1009)	%ocrErrStack% See %alertLogName% for details.
11gR2, 12c	Every 5 Minutes	CRS-(1021\|1022)	CRS-(1006\|1009\|1011\|1013\|1015\|1016\|1017\|1018\|1019\|1021)	%ocrErrStack% See %alertLogName% for details.

Note:

After an alert is triggered for this metric, it must be manually cleared.

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each Time/Line Number object.

If warning or critical threshold values are currently set for any Time/Line Number object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each Time/Line Number object, use the Edit Thresholds page.

Data Source

Not available.

User Action

The required actions are specific to your site.

OLR Alert Log Error

The Oracle Local Registry (OLR) Alert Log Error metric collects certain CRS error messages and issues OLR Alert Log Error type alerts.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
10gR2, 11g, 12c	Every 5 Minutes	CRS-(2106)	-*	%olrErrStack% See %alertLogName% for details.

Oracle High Availability Service Alert Log Error

This metric collects CRS-1012, CRS-1201, CRS-1202 and CRS-1401, CRS-1402, CRS-1602, and CRS-1603 messages in the CRS alert log at the host level.

CRS-1201, CRS-1401, CRS-1012, alert log messages trigger warning alerts.

CRS-1202, CRS-1402, CRS-1602, and CRS-1603 alert log messages trigger critical alerts.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
10gR2, 11gR1	Every 5 Minutes	CRS-(1601\|1201\|1401\|1012)	CRS-(1202\|1402\|1602\|1603\|1604)	%clusterwareErrStack% See %alertLogName% for details.
11gR2, 12c	Every 5 Minutes	CRS-(2412\|8000\|8001\|8002\|8003\|8004\|8005\|8006\|8007\|8009\|80010\|8016\|8018\|8019\|8020\|1601)	CRS-(2402\|2406\|2413\|2414\|1202\|1207\|1208\|1209\|1210\|1212\|1213\|1214\|1215\|1216\|1217\|1218\|1219\|1220\|1221\|1223\|1229\|1231\|1232\|1233\|1234\|1235\|1236\|1237\|1238\|1239\|1305\|1306\|1307\|1308\|1308\|1310\|1339\|1402\|1403\|2301\|2302\|2303\|2304\|2305\|2306\|2307\|2308\|2309\|2310\|2311\|2312\|2313\|2314\|2315\|2316\|2317\|2318\|2319\|2320\|2321\|2322\|2323\|2324\2325\|2326\|2327\|2330\|2331\|2332\|2333\|2334\|2335\|2336\|2337\|2338\|2339\|2340\|2341\|2342\|5601\|\|10100\|10101\|10102\|10103\|1602\|1603\|1604)	%clusterwareErrStack% See %alertLogName% for details.

Note:

After an alert is triggered for this metric, it must be manually cleared.

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each Time/Line Number object.

If warning or critical threshold values are currently set for any Time/Line Number object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each Time/Line Number object, use the Edit Thresholds page.

Data Source

Not available.

User Action

The required actions are specific to your site.

Oracle High Availability Service Alert Log Error

This metric category provides information about node-specific alerts that are obtained by mining the CRS alert file on that node. The mined alerts are for the categories of node-specific Oracle High Availability/Clusterware Stack, CRS Resource, OCR, OLR, Node Configuration.

Alert Log Name

This metric reports the name and full path of the CRS alert log.

Target Version	Collection Frequency
All Versions	Every 5 Minutes

Alert Time

This is the timestamp of the alert in the CRS Alert log file.

Target Version	Collection Frequency
All Versions	Every 5 Minutes

OCR Alert Log Error

This metric collects CRS-1009 messages in the CRS alert log and issues OCR Alert Log Error type alerts. OCR refers to Oracle Cluster Registry.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
10gR2, 11gR1	Every 5 Minutes	CRS-100(1\|2\|3\|4\|5\|7)	CRS-(1006\|1008\|1010\|1011\|1009)	%ocrErrStack% See %alertLogName% for details.
11gR2, 12c	Every 5 Minutes	CRS-(1021\|1022)	CRS-(1006\|1009\|1011\|1013\|1015\|1016\|1017\|1018\|1019\|1021)	%ocrErrStack% See %alertLogName% for details.

OLR Alert Log Error

The Oracle Local Registry (OLR) Alert Log Error metric collects certain CRS error messages and issues OLR Alert Log Error type alerts.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
10gR2, 11g, 12c	Every 5 Minutes	Not Defined	CRS-(2106)	%olrErrStack% See %alertLogName% for details.

CRS Resource Alert Log Error

This resource collects CRS-1203, CRS-1205 and CRS-1206 messages in the CRS alert log and issues CRS Resource Alert Log Error alerts at a critical level.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
10gR2, 11gR1	Every 5 minutes	Not Defined	CRS-120(3\|5\|6)	%resourceErrStack% See %alertLogName% for details.
11gR2, 12c	Every 5 Minutes	CRS-(2765\|2878)	CRS-120(3\|5\|6)\|CRS-(2768\|2769\|2771)	%resourceErrStack% See %alertLogName% for details.

Oracle High Availability Service Alert Log Error

This metris displays the node-specific Oracle High Availability/Clusterware Stack errors from the CRS Alert log file.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
10gR2, 11gR1	Every 5 Minutes	CRS-(1601\|1201\|1401\|1012)	CRS-(1202\|1402\|1602\|1603\|1604)	%clusterwareErrStack% See %alertLogName% for details.
11gR2, 12c	Every 5 Minutes	CRS-(2412\|8000\|8001\|8002\|8003\|8004\|8005\|8006\|8007\|8009\|80010\|8016\|8018\|8019\|8020\|1601)	CRS-(2402\|2406\|2413\|2414\|1202\|1207\|1208\|1209\|1210\|1212\|1213\|1214\|1215\|1216\|1217\|1218\|1219\|1220\|1221\|1223\|1229\|1231\|1232\|1233\|1234\|1235\|1236\|1237\|1238\|1239\|1305\|1306\|1307\|1308\|1308\|1310\|1339\|1402\|1403\|2301\|2302\|2303\|2304\|2305\|2306\|2307\|2308\|2309\|2310\|2311\|2312\|2313\|2314\|2315\|2316\|2317\|2318\|2319\|2320\|2321\|2322\|2323\|2324\2325\|2326\|2327\|2330\|2331\|2332\|2333\|2334\|2335\|2336\|2337\|2338\|2339\|2340\|2341\|2342\|5601\|10100\|10101\|10102\|10103\|1602\|1603\|1604)	%clusterwareErrStack% See %alertLogName% for details.

Witnessed Error Codes

This metric displays the node-specific Oracle High Availability/Clusterware Stack errors from the CRS Alert log file.

Target Version	Collection Frequency
All Versions	Every 5 Minutes

Node Configuration Alert Log Error

This metric displays the node-specific node configuration errors from the CRS Alert log file.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
10gR2, 11gR1	Every 5 Minutes	CRS-180(2\|3\|4\|5)	CRS-1607	%nodeErrStack% See %alertLogName% for details.
11gR2, 12c	Every 5 Minutes	CRS-(1801\|1802\|1803\|1804\|1113\|1121\|1123)	CRS-(1110\|1111\|1112\|1116\|1117\|1118\|1119\|1805\|1806\|1807\|1809)	%nodeErrStack% See %alertLogName% for details.

Time/Line Number

This metric displays the timestamp and the line number of the alert in the CRS Alert log file of that alert.

Target Version	Collection Frequency
All Versions	Every 5 Minutes

Resource State

This metric category provides information about resources changing states.

State Change

This metric tracks and raises an alert when a resource changes to a state defined in the thresholds.

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
11gR2, 12c	Every 30 Minutes	COMPLETE_INTERMEDIATE\|PARTIALLY_UNKNOWN\| PARTIALLY_OFFLINE\| PARTIALLY_INTERMEDIATE	COMPLETE_UNKNOWN\| COMPLETE_OFFLINE\|ADD\|DOWN	%crs_entity_name% has %resource_status_alert_count% instances in %resource_status_alert_state% State %resource_status_additional_mesg%

Response

The metrics in this category report the status of the host (whether it is up or down).

Status

This metric indicates whether or not the host is reachable. A host can be unreachable for various reasons, for example, when the network is down or the Management Agent on the host is down (which can be because the host itself is shut down).

Target Version	Evaluation and Collection Frequency	Default Warning Threshold	Default Critical Threshold	Alert Text
All Versions	Every 30 Minutes	Not Defined	0	Oracle High Availability Service has problems on this host %CRS_output%