1 Error and Alert Messages

The various software components of Oracle Exadata System Software generate error and alert messages.

Understanding Alert, Incident, and Trace Files

Alert, incident, and trace files provide a diagnostic record of useful information.

Alert files contain information about internal errors and administrative tasks. Incident files contain information about single occurrences. Trace files can contain information about server and background processes.

About Alert Files

An alert file is a log file that records information about internal errors and administrative activities, such as backups.

When an internal error occurs, a message is sent to the terminal screen and is written to the alert file. Additional information about internal errors is also written to the alert file, such as the location and name of any trace files generated because of the error.

Alert files are located in the following directory:

/opt/oracle/cell/log/diag/asm/cell/hostname/trace/alert.log

If your system uses an operator console, then some messages from Oracle may appear on the console. All important messages are written to the alert file and the operator console. Because all messages, not just Oracle messages, appear on this console, the alert file is a better record for tracing all Oracle administrative activity and errors than the console log.

About Trace Files

A trace file is created each time an Oracle instance starts or an unexpected event occurs in a user process or background process.

The file extension or file type is usually .trc. If it is different, then it is noted in your operating system-specific Oracle documentation. The contents of the trace file may include dumps of the system global area, process global area, operating call stack, and registers.

Note:

If you change a traceLevel attribute setting, then you need to restart Management Server (MS) for the change to take effect. Restarting MS does not affect the database or the flow of data.

About Automatic Diagnostic Repository

The Automatic Diagnostic Repository (ADR) is a file-based repository for database diagnostic data such as traces, dumps, the alert log, health monitor reports, and more. It has a unified directory structure across multiple instances and multiple products.

The database, Oracle Automatic Storage Management (Oracle ASM), the listener, Oracle Clusterware, Oracle Exadata Storage Server, and other Oracle products or components store all diagnostic data in the ADR. Each instance of each product stores diagnostic data underneath its own home directory within the ADR. For example, in an Oracle Real Application Clusters (Oracle RAC) environment with shared storage and Oracle ASM, each database instance and each Oracle ASM instance has an ADR home directory. ADR's unified directory structure, consistent diagnostic data formats across products and instances, and a unified set of tools enable customers and Oracle Support Services to correlate and analyze diagnostic data across multiple instances. With Oracle Clusterware, each host node in the cluster has an ADR home directory.

Note:

Because all diagnostic data, including the alert log, are stored in the ADR, the initialization parameters BACKGROUND_DUMP_DEST and USER_DUMP_DEST are deprecated. They are replaced by the initialization parameter DIAGNOSTIC_DEST, which identifies the location of the ADR.
About Incidents and Incident Packages

An incident package is a collection of data about incidents for one or more problems.

An incident is a single occurrence of a problem. When a problem occurs multiple times, an incident is created for each occurrence. Incidents are tracked in the Automatic Diagnostic Repository (ADR). Each incident is identified by a numeric incident identifier, which is unique within ADR. When an incident occurs, the database makes an entry in the alert log, sends an incident alert to Oracle Enterprise Manager, gathers diagnostic data about the incident in the dump files (incident dumps), tags the incident dumps with the incident ID, and stores the incident dumps in the ADR subdirectory created for that incident.

Diagnosis and resolution of a critical error usually starts with an incident alert. You can obtain a list of all incidents in ADR using an ADR Command Interpreter (ADRCI) command.

Each incident is mapped to a single problem only. Incidents are compared so that a single problem does not generate too many incidents and incident dumps.

Before uploading diagnostic data to Oracle Support Services, you first collect the data into an intermediate logical structure called an incident package (package). A package is a collection of metadata that is stored in the ADR and that points to diagnostic data files and other files both in and out of the ADR. When you create a package, you select one or more problems to add to the package. The Support Workbench then automatically adds to the package the problem information, incident information, and diagnostic data (such as trace files and dumps) associated with the selected problems. Because a problem can have many incidents (many occurrences of the same problem), by default only the first three and last three incidents for each problem are added to the package, excluding any incidents that are over 90 days old. You can change these default numbers on the Incident Packaging Configuration page of the Support Workbench.

After the package is created, you can add any type of external file to the package, remove selected files from the package, or edit selected files in the package to remove sensitive data. As you add and remove package contents, only the package metadata is modified. When you are ready to upload the diagnostic data to Oracle Support Services, you first create a zip file that contains all the files referenced by the package metadata.

Diagnostic File Locations

Alert, incident, and trace files are written to the alert, incident, and trace subdirectories in the ADR home directory ($ADR_BASE/diag/asm/cell/cell_name) on the storage server.

The ADR home is located within the ADR base directory ($ADR_BASE). The retention period for ADR files is specified by the diagHistoryDays cell attribute. You can modify this setting with the CellCLI ALTER CELL command.

If you use Secure Shell (SSH) to access the storage server, then you can display the value of $ADR_BASE that was set during installation using the DESCRIBE CELL command.

Managing Diagnostic Files

ADR Command Interpreter (ADRCI) is a command-line tool that you use to manage diagnostic data collected by ADR Command Interpreter (ADRCI).

Using ADRCI, you can perform the following duties:

  • View diagnostic data within ADR

  • Package incident and problem information into a zip file for transmission to Oracle Support Services

In order to use ADRCI with Oracle Exadata System Software, set the ADR base using the following command:

ADRCI> SET BASE /opt/oracle/cell/log

With ADRCI you can view the alert, incident, and trace files for a cell, as shown in the following example.

Example 1-1 Viewing Alert, Incident, and Trace Files

$ ADRCI
ADRCI: Release 11.2.0.1.0 - Production on Wed May 20 02:17:38 2009
Copyright (c) 1982, 2009, Oracle.  All rights reserved.

ADRCI> SET BASE /opt/oracle/cell/log

ADRCI> SHOW HOMES
ADR Homes:
diag/asm/cell/st-cell03-2
...

ADRCI> SET HOMEPATH diag/asm/cell/st-cell03-2

ADRCI> SHOW ALERT
...
ADRCI> SHOW INCIDENT
...
ADRCI> SHOW TRACEFILE
...

Alert Messages

Alert messages are generated by Oracle Exadata System Software.

Format of Alert Messages for E-mail Notification

Alert messages can be sent through email.

The format of an e-mail notification for an alert message is as follows:

Subject: 
cell_name: alert level: { critical | warning | clear } alert

E-mail Content:
Alert Type: { ADR | Hardware | Threshold } Alert alert_name is triggered at
alert_time with message:
alert_message

The suggested action is: 
alert_action

Format of Alert Messages for SNMP Notification

Simple Network Management Protocol (SNMP) alerts sent by Oracle Exadata Storage Servers conform to a Management Information Base (MIB) which is included in each Oracle Exadata System Software installation.

The MIB file on Oracle Exadata Storage Server is available at /opt/oracle/cell/cellsrv/deploy/config/cell_alert.mib. The SNMP alerts and MIB conform to SNMP version 1 (SNMPv1). The alerts contain variables, such as those shown in the following list.

  • oraCellAlertAction: Recommended action to perform for this alert.

  • oraCellAlertBeginTime: Time stamp when an alert changes state.

  • oraCellAlertEndTime: Time stamp for the end of the period when an alert changes state.

  • oraCellAlertExaminedBy: Administrator who reviewed the alert.

  • oraCellAlertMsg: Brief explanation of the alert.

  • oraCellAlertNotif: Number indicating progress in notifying subscribers to alert messages:

    • 0: Never tried
    • 1: Sent successfully
    • 2: Retrying, up to five times
    • 3: Five failed retries
  • oraCellAlertObjectName: Object, such as cell disk or grid disk, for which a metric threshold has caused an alert.

  • oraCellAlertSeqBeginTime: Time stamp when an alert sequence ID is first created.

  • oraCellAlertSeqID: Unique sequence ID for the alert. When an alert changes state, such as from warning to critical, or critical to clear, another occurrence of the alert is created with the same sequence number and a time stamp of the transition.

  • oraCellAlertSeverity: Severity level. Values are clear, info, warning, or critical.

  • oraCellAlertShortName: Abbreviated name for the alert. If the alert is based on a metric, then the short name is the same as the corresponding metric name attribute.

  • oraCellAlertType: Type of the alert. Values are stateful or stateless.

    • Stateful alerts are automatically cleared on transition to normal.
    • Stateless alerts are never cleared.

Threshold Alert Messages

Threshold alerts help you monitor your database. Most alerts notify you when particular metric thresholds are exceeded.

For each alert, you can set critical and warning threshold values. These threshold values are boundary values that when exceeded, indicate that the system is in an undesirable state. For example, when a tablespace becomes 97 percent full, this can be considered undesirable, and Oracle Database generates a critical alert. The following are examples of threshold alerts:

Threshold name triggered alert state severity

The threshold alert was triggered. Examine the metric value that is violating the specified threshold. Correct the problem indicated by threshold name.

The threshold value is no longer violated. No further action is required for threshold name.

The threshold alert was cleared.

ADR Alert Messages

Problems are tracked in the Automatic Diagnostic Repository (ADR). ADR is a file-based repository for storing diagnostic data.

Because this repository is stored outside the database, the diagnostic data is available even when the database is down. Starting with Oracle Database release 11g, the alert log, all trace and dump files, and other diagnostic data are also stored in ADR.

Each problem has a problem key, which is a text string that describes the problem. The problem key includes the error code (such as ORA 00600), and in some cases, one or more error parameter values or other information. The following is an example of an ADR message:

Errors in file /opt/oracle/log/diag/asm/cell/stado54/trace/svtrc_2763_0.trc 
 (incident=1): ORA-00600: internal error code, arguments: [main_5], [3], 
[Invalid IP Param], [], [], [], [], []

The action to be taken for ADR messages is:

Create an incident package for incident <incident number> using ADRCI 
and upload the incident packages to Oracle Support Services.

Contacting Oracle Support Services

Some error messages or alerts recommend contacting Oracle Support Services to report a problem.

You may also want to contact Oracle Support Services when you have a service request submitted by Oracle Auto Service Request (ASR). When you contact Oracle Support Services, have the following information available:

  • The hardware, operating system, and release number of the operating system running Oracle Database.

  • The complete release number of Oracle Database, such as release 11.2.0.1.0.

  • All Oracle programs (with release numbers) in use when the error occurred, such as SQL*Plus release 11.2.0.1.0.

  • If you encountered one or more error codes or messages, then the exact code numbers and message text, in the order in which they appeared.

  • The problem severity, according to the following codes:

    • 1: Program not usable. Critical effect on operations.

    • 2: Program usable. Operations severely restricted.

    • 3: Program usable with limited functions. Not critical to overall operations.

    • 4: Problem circumvented by customer. Minimal effect, if any, on operations.

You will also be expected to provide the following:

  • Your name

  • The name of your organization

  • Your Oracle Support ID number

  • Your telephone number

  • Rack master serial number