3 Oracle High Availability Service
For each metric, it provides the following information:
-
Description
-
Metric table
The metric table can include some or all of the following: target version, default collection frequency, default warning threshold, default critical threshold, and alert text.
CRS nodeapp Status
The metric in this category monitors the status of the Oracle Cluster Ready Services (CRS) node applications (nodeapps), Virtual Internet Protocol (IP), Global Services Daemon (GSD), and Oracle Notification System (ONS).
nodeapp Status
This metric monitors the status of the nodeapps, IP, GSD, and ONS. A critical alert is raised for the nodeapp if its status is OFFLINE NOT RESTARTING. A warning alert is raised for the nodeapp if its status is either UNKNOWN or OFFLINE.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
10g, 11gR1 |
Every 5 minutes |
UNKNOWN|OFFLINE |
OFFLINE NOT RESTARTING |
CRS resource %nodeapps% is %status% |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each nodeapp object.
If warning or critical threshold values are currently set for any nodeapp object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each nodeapp object, use the Edit Thresholds page.
Data Source
Not available.
User Action
Refer to the Real Application Clusters Administration and Deployment Guide for node applications startup and troubleshooting information.
CRS Virtual IP Relocation Status
The metrics in this category provide information about whether there is a Virtual IP relocation taking place. When a Virtual IP is relocated from the host (node) on which it was originally configured, a critical alert is generated.
Virtual IP Relocated
This metric shows whether the Virtual Internet protocol has relocated from the host (node) where it was originally configured. The value is TRUE if relocation occurred. Otherwise it is FALSE. When the value is TRUE, a critical alert is raised.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
10g, 11gR1 |
Every 5 minutes |
Not Defined |
TRUE |
CRS resource %vip% was relocated to %current_node% |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each Virtual IP Name object.
If warning or critical threshold values are currently set for any Virtual IP Name object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each Virtual IP Name object, use the Edit Thresholds page.
Data Source
Not available.
User Actions
The required actions are specific to your site.
Incident
This metrics category provides information about the Incident target.
Alert Log Error Trace File
The alert log error trace file is the name of an associated server trace file generated when the problem causing this incident occurred. If no additional trace file was generated, this field is blank.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The alert log error trace file name is extracted from the database alert log.
User Action
Examine the alert log error trace file for more information about the problem that occurred.
Alert Log Name
This metric contains the fully specified name of the current XML alert log file (including directory path).
Target Version | Collection Frequency |
---|---|
All Versions |
Every 15 Minutes |
Data Source
This name is retrieved by searching the OMS ADR_HOME/alert
directory for the most recent (current) log file.
User Action
Examine the alert log file for more information about the problem that occurred.
ECID
The Execution Context ID (ECID) tracks requests as they move through the application server. This information is useful for diagnostic purposes because it can be used to correlate related problems encountered by a single user attempting to accomplish a single task.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 15 Minutes |
Data Source
The ECID is extracted from the database alert log.
User Action
Diagnostic incidents usually indicate software errors and should be reported to Oracle through the Enterprise Manager Support Workbench. When you package problems using Support Workbench, the Support Workbench uses ECID to correlate and include any additional problems in the package.
Impact
This metric provides an optional field that reports the impact of the problem that occurred. It may be empty.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 15 Minutes |
Data Source
The impact is extracted from the database alert log.
User Action
This field is informational. Diagnostic incidents usually indicate software errors and should be reported to Oracle using the Enterprise Manager Support Workbench.
Incident ID
This metric reports the incident ID, a number that uniquely identifies a diagnostic incident (a single occurrence of a problem).
Target Version | Collection Frequency |
---|---|
All Versions |
Every 15 Minutes |
Data Source
The incident ID is extracted from the database alert log.
User Action
Diagnostic incidents usually indicate software errors and should be reported to Oracle using the Enterprise Manager Support Workbench. A problem is one or more occurrences of the same incident. If you use Support Workbench, the incident ID can be used to select the correct problem to package and send to Oracle. If you use the command line tool ADRCI, you can use the Show Incident command with the incident ID to retrieve details about the incident.
Generic Incident
This metric reports the number of Generic Incident type incidents observed the last time that Oracle Enterprise Manager scanned the alert log.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
12c |
Every 5 minutes |
Not Defined |
.* |
Incident (%adr_problemKey%) detected in %alertLogName% at time/line number: %timeLine%. |
Data Source
The source for this metric is the Incident metric.
User Action
Use Support Workbench in Enterprise Manager to examine the details of the incidents.
Generic Internal Error
This metric reflects the number of Generic Internal Error incidents observed the last time Enterprise Manager scanned the alert log.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
12c |
Every 5 minutes |
Not Defined |
.* |
Internal error (%adr_problemKey%) detected in %alertLogName% at time/line number: %timeLine%. |
Data Source
The source for this metric is the Incident metric.
User Action
Use Support Workbench in Enterprise Manager to examine the details of the incidents.
Operational Error
This metric category contains metrics representing errors that might affect the operation of the database as recorded in the database alert log file. The alert log file has a chronological log of messages and errors.
Generic Operational Error
This metric reports the number of generic operation errors observed the last time Enterprise Manager scanned the alert log file.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
12c |
Every 5 minutes |
Not Defined |
.* |
Operational error (%errorCodes%) detected in %alertLogName% at time/line number: %timeLine%. |
User-Defined Error
This metric reports the number of user-defined errors observed the last time Enterprise Manager scanned the alert log file.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
12c |
Every 5 minutes |
Not Defined |
Not Defined |
Error (%errorCodes%) detected in %alertLogName% at time/line number: %timeLine%. |
User-Defined Warning
This metric reflects the number of user-defined warnings witnessed the last time Enterprise Manager scanned the alert log file.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
12c |
Every 5 minutes |
Not Defined |
Not Defined |
Warning (%errorCodes%) detected in %alertLogName% at time/line number: %timeLine%. |
Oracle High Availability Service Alert Log
The metrics in this category provide information about the Oracle high availability service alert log.
Alert Log Name
This metric reports the name and full path of the CRS alert log.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
Not available.
User Action
The required actions are specific to your site.
CRS Resource Alert Log Error
This resource collects CRS-1203, CRS-1205 and CRS-1206 messages in the CRS alert log at the host level and issues CRS Resource Alert Log Error alerts at a critical level.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
10gR2, 11gR1 |
Every 5 minutes |
Not Defined |
CRS-120(3|5|6) |
%resourceErrStack% See %alertLogName% for details. |
11gR2, 12c |
Every 5 Minutes |
CRS-(2765|2878) |
CRS-120(3|5|6)|CRS-(2768|2769|2771) |
%resourceErrStack% See %alertLogName% for details. |
Note:
After an alert is triggered for this metric, it must be manually cleared.
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each Time/Line Number object.
If warning or critical threshold values are currently set for any Time/Line Number object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each Time/Line Number object, use the Edit Thresholds page.
Data Source
Not available.
User Action
The required actions are specific to your site.
OCR Alert Log Error
This metric collects CRS-1009 messages in the CRS alert log at the host level and issues OCR Alert Log Error type alerts. OCR refers to Oracle Cluster Registry.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
10gR2, 11gR1 |
Every 5 Minutes |
CRS-100(1|2|3|4|5|7) |
CRS-(1006|1008|1010|1011|1009) |
%ocrErrStack% See %alertLogName% for details. |
11gR2, 12c |
Every 5 Minutes |
CRS-(1021|1022) |
CRS-(1006|1009|1011|1013|1015|1016|1017|1018|1019|1021) |
%ocrErrStack% See %alertLogName% for details. |
Note:
After an alert is triggered for this metric, it must be manually cleared.
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each Time/Line Number object.
If warning or critical threshold values are currently set for any Time/Line Number object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each Time/Line Number object, use the Edit Thresholds page.
Data Source
Not available.
User Action
The required actions are specific to your site.
OLR Alert Log Error
The Oracle Local Registry (OLR) Alert Log Error metric collects certain CRS error messages and issues OLR Alert Log Error type alerts.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
10gR2, 11g, 12c |
Every 5 Minutes |
CRS-(2106) |
-* |
%olrErrStack% See %alertLogName% for details. |
Oracle High Availability Service Alert Log Error
This metric collects CRS-1012, CRS-1201, CRS-1202 and CRS-1401, CRS-1402, CRS-1602, and CRS-1603 messages in the CRS alert log at the host level.
CRS-1201, CRS-1401, CRS-1012, alert log messages trigger warning alerts.
CRS-1202, CRS-1402, CRS-1602, and CRS-1603 alert log messages trigger critical alerts.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
10gR2, 11gR1 |
Every 5 Minutes |
CRS-(1601|1201|1401|1012) |
CRS-(1202|1402|1602|1603|1604) |
%clusterwareErrStack% See %alertLogName% for details. |
11gR2, 12c |
Every 5 Minutes |
CRS-(2412|8000|8001|8002|8003|8004|8005|8006|8007|8009|80010|8016|8018|8019|8020|1601) |
CRS-(2402|2406|2413|2414|1202|1207|1208|1209|1210|1212|1213|1214|1215|1216|1217|1218|1219|1220|1221|1223|1229|1231|1232|1233|1234|1235|1236|1237|1238|1239|1305|1306|1307|1308|1308|1310|1339|1402|1403|2301|2302|2303|2304|2305|2306|2307|2308|2309|2310|2311|2312|2313|2314|2315|2316|2317|2318|2319|2320|2321|2322|2323|2324\2325|2326|2327|2330|2331|2332|2333|2334|2335|2336|2337|2338|2339|2340|2341|2342|5601||10100|10101|10102|10103|1602|1603|1604) |
%clusterwareErrStack% See %alertLogName% for details. |
Note:
After an alert is triggered for this metric, it must be manually cleared.
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each Time/Line Number object.
If warning or critical threshold values are currently set for any Time/Line Number object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each Time/Line Number object, use the Edit Thresholds page.
Data Source
Not available.
User Action
The required actions are specific to your site.
Oracle High Availability Service Alert Log Error
This metric category provides information about node-specific alerts that are obtained by mining the CRS alert file on that node. The mined alerts are for the categories of node-specific Oracle High Availability/Clusterware Stack, CRS Resource, OCR, OLR, Node Configuration.
Alert Log Name
This metric reports the name and full path of the CRS alert log.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Alert Time
This is the timestamp of the alert in the CRS Alert log file.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
OCR Alert Log Error
This metric collects CRS-1009 messages in the CRS alert log and issues OCR Alert Log Error type alerts. OCR refers to Oracle Cluster Registry.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
10gR2, 11gR1 |
Every 5 Minutes |
CRS-100(1|2|3|4|5|7) |
CRS-(1006|1008|1010|1011|1009) |
%ocrErrStack% See %alertLogName% for details. |
11gR2, 12c |
Every 5 Minutes |
CRS-(1021|1022) |
CRS-(1006|1009|1011|1013|1015|1016|1017|1018|1019|1021) |
%ocrErrStack% See %alertLogName% for details. |
OLR Alert Log Error
The Oracle Local Registry (OLR) Alert Log Error metric collects certain CRS error messages and issues OLR Alert Log Error type alerts.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
10gR2, 11g, 12c |
Every 5 Minutes |
Not Defined |
CRS-(2106) |
%olrErrStack% See %alertLogName% for details. |
CRS Resource Alert Log Error
This resource collects CRS-1203, CRS-1205 and CRS-1206 messages in the CRS alert log and issues CRS Resource Alert Log Error alerts at a critical level.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
10gR2, 11gR1 |
Every 5 minutes |
Not Defined |
CRS-120(3|5|6) |
%resourceErrStack% See %alertLogName% for details. |
11gR2, 12c |
Every 5 Minutes |
CRS-(2765|2878) |
CRS-120(3|5|6)|CRS-(2768|2769|2771) |
%resourceErrStack% See %alertLogName% for details. |
Oracle High Availability Service Alert Log Error
This metris displays the node-specific Oracle High Availability/Clusterware Stack errors from the CRS Alert log file.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
10gR2, 11gR1 |
Every 5 Minutes |
CRS-(1601|1201|1401|1012) |
CRS-(1202|1402|1602|1603|1604) |
%clusterwareErrStack% See %alertLogName% for details. |
11gR2, 12c |
Every 5 Minutes |
CRS-(2412|8000|8001|8002|8003|8004|8005|8006|8007|8009|80010|8016|8018|8019|8020|1601) |
CRS-(2402|2406|2413|2414|1202|1207|1208|1209|1210|1212|1213|1214|1215|1216|1217|1218|1219|1220|1221|1223|1229|1231|1232|1233|1234|1235|1236|1237|1238|1239|1305|1306|1307|1308|1308|1310|1339|1402|1403|2301|2302|2303|2304|2305|2306|2307|2308|2309|2310|2311|2312|2313|2314|2315|2316|2317|2318|2319|2320|2321|2322|2323|2324\2325|2326|2327|2330|2331|2332|2333|2334|2335|2336|2337|2338|2339|2340|2341|2342|5601|10100|10101|10102|10103|1602|1603|1604) |
%clusterwareErrStack% See %alertLogName% for details. |
Witnessed Error Codes
This metric displays the node-specific Oracle High Availability/Clusterware Stack errors from the CRS Alert log file.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Node Configuration Alert Log Error
This metric displays the node-specific node configuration errors from the CRS Alert log file.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
10gR2, 11gR1 |
Every 5 Minutes |
CRS-180(2|3|4|5) |
CRS-1607 |
%nodeErrStack% See %alertLogName% for details. |
11gR2, 12c |
Every 5 Minutes |
CRS-(1801|1802|1803|1804|1113|1121|1123) |
CRS-(1110|1111|1112|1116|1117|1118|1119|1805|1806|1807|1809) |
%nodeErrStack% See %alertLogName% for details. |
Resource State
This metric category provides information about resources changing states.
State Change
This metric tracks and raises an alert when a resource changes to a state defined in the thresholds.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
11gR2, 12c |
Every 30 Minutes |
COMPLETE_INTERMEDIATE|PARTIALLY_UNKNOWN| PARTIALLY_OFFLINE| PARTIALLY_INTERMEDIATE |
COMPLETE_UNKNOWN| COMPLETE_OFFLINE|ADD|DOWN |
%crs_entity_name% has %resource_status_alert_count% instances in %resource_status_alert_state% State %resource_status_additional_mesg% |
Response
The metrics in this category report the status of the host (whether it is up or down).
Status
This metric indicates whether or not the host is reachable. A host can be unreachable for various reasons, for example, when the network is down or the Management Agent on the host is down (which can be because the host itself is shut down).
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 30 Minutes |
Not Defined |
0 |
Oracle High Availability Service has problems on this host %CRS_output% |