Database Service Events

Database Service Events feature implementation enables you to get notified about health issues with your Oracle Databases or other components on the DB system.

It is possible that Oracle Database or Clusterware may not be healthy or various system components may be running out of space on the DB system. Customers are not notified of this situation. Database Service Events feature implementation generates events for Data Plane operations and conditions, as well as Notifications for customers by leveraging the existing OCI Events service and Notification mechanisms in their tenancy. Customers can then create topics and subscribe to these topics through email, functions, or streams.

Note:

Events flow on the DB system depends on Oracle Trace File Analyzer (TFA) and Oracle Database Cloud Service (DBCS) agent. Ensure that these components are up and running.

Receive Notifications about Database Service Events

Subscribe to the Database Service Events and get notified. To receive notifications, subscribe to Database Service Events and get notified using the Oracle Notification service, see Notifications Overview. For more information about Oracle Cloud Infrastructure Events, see Overview of Events.

Events Service - Event Types

  • Database - Critical
  • DB Node - Critical
  • DB Node - Error
  • DB Node - Warning
  • DB Node - Information
  • DB System - Critical

Database Service Event Types

The following table lists the event types that the Database Service emits.

Note:

  • Critical events are triggered due to several types of critical conditions and errors that cause disruption to the database and other critical components. For example, database hang errors, and availability errors for databases, database nodes, and database systems to let you know if a resource becomes unavailable.
  • Information events are triggered when the database and other critical components work as expected. For example, a clean shutdown of CRS, CDB, client, or scan listener, or a startup of these components will create an event with the severity of INFO.
  • Threshold limits reduce the number of notifications customers will receive for similar incident events whilst at the same time ensuring they receive the incident events and are reminded in a timely fashion.

Database Service Events

Table - Database Service Events

Friendly Name Event Name Description Remediation Event Type Threshold
Resource Utilization - Disk Usage HEALTH.DB_GUEST.FILESYSTEM.FREE_SPACE

This event is reported when VM guest file system free space falls below 10% free, as determined by the operating system df(1) command, for the following file systems:

  • /
  • /u01
  • /u02
  • /var (X8M and later only)
  • /tmp (X8M and later only)
HEALTH-DB_GUEST-FILESYSTEM-FREE_SPACE com.oraclecloud.databaseservice.dbnode.critical Critical threshold: 90%
CRS status Up/Down AVAILABILITY.DB_GUEST.CRS_INSTANCE.DOWN. An event of type CRITICAL is created when the Cluster Ready Service (CRS) is detected to be down. AVAILABILITY-DB_GUEST-CRS_INSTANCE.DOWN com.oraclecloud.databaseservice.dbnode.critical (if .DOWN and NOT "user_action") NA
AVAILABILITY.DB_GUEST.CRS_INSTANCE.DOWN_CLEARED An event of type INFORMATION is created once it is determined that the event for CRS down has cleared. NA com.oraclecloud.databaseservice.dbnode.information (if .DOWN_CLEARED) NA
AVAILABILITY.DB_GUEST.CRS_INSTANCE.EVICTION An event of type CRITICAL is created. AVAILABILITY-DB_GUEST-CRS_INSTANCE-EVICTION com.oraclecloud.databaseservice.dbnode.critical NA
SCAN Listener Up/Down AVAILABILITY.DB_CLUSTER.SCAN_LISTENER.DOWN

A DOWN event is created when a SCAN listener goes down. The event is of type INFORMATION when a SCAN listener is shutdown due to user action, such as with the Server Control Utility (srvctl) or Listener Control (lsnrctl) commands, or any Oracle Cloud maintenance action that uses those commands, such as performing a grid infrastructure software update. The event is of type CRITICAL when a SCAN listener goes down unexpectedly. A corresponding DOWN_CLEARED event is created when a SCAN listener is started.

There are three SCAN listeners per cluster called LISTENER_SCAN[1,2,3].

AVAILABILITY-DB_CLUSTER-SCAN_LISTENER-DOWN com.oraclecloud.databaseservice.dbnode.critical (if .DOWN and NOT "user_action") NA
AVAILABILITY.DB_CLUSTER.SCAN_LISTENER.DOWN_CLEARED An event of type INFORMATION is created once it is determined that the event for SCAN Listener down has cleared. NA com.oraclecloud.databaseservice.dbnode.information (if .DOWN_CLEARED) NA
Net Listener Up/Down AVAILABILITY.DB_GUEST.CLIENT_LISTENER.DOWN

A DOWN event is created when a client listener goes down. The event is of type INFORMATION when a client listener is shutdown due to user action, such as with the Server Control Utility (srvctl) or Listener Control (lsnrctl) commands, or any Oracle Cloud maintenance action that uses those commands, such as performing a grid infrastructure software update. The event is of type CRITICAL when a client listener goes down unexpectedly. A corresponding DOWN_CLEARED event is created when a client listener is started.

There is one client listener per node, each called LISTENER.

AVAILABILITY-DB_GUEST-CLIENT_LISTENER.DOWN com.oraclecloud.databaseservice.database.critical (if .DOWN and NOT "user_action") NA
AVAILABILITY.DB_GUEST.CLIENT_LISTENER.DOWN_CLEARED An event of type INFORMATION is created once it is determined that the event for Client Listener down has cleared. NA com.oraclecloud.databaseservice.database.information (if .DOWN_CLEARED) NA
CDB Up/Down AVAILABILITY.DB_GUEST.CDB_INSTANCE.DOWN A DOWN event is created when a database instance goes down. The event is of type INFORMATION when a database instance is shutdown due to user action, such as with the SQL*Plus (sqlplus) or Server Control Utility (srvctl) commands, or any Oracle Cloud maintenance action that uses those commands, such as performing a database home software update. The event is of type CRITICAL when a database instance goes down unexpectedly. A corresponding DOWN_CLEARED event is created when a database instance is started. AVAILABILITY-DB_GUEST-CDB_INSTANCE-DOWN com.oraclecloud.databaseservice.database.critical (if .DOWN and NOT "user_action") NA
AVAILABILITY.DB_GUEST.CDB_INSTANCE.DOWN_CLEARED An event of type INFORMATION is created once it is determined that the event for the CDB down has cleared. NA com.oraclecloud.databaseservice.database.information (if .DOWN_CLEARED) NA
Critical DB Errors HEALTH.DB_CLUSTER.CDB.CORRUPTION Database corruption has been detected on your primary or standby database. The database alert.log is parsed for any specific errors that are indicative of physical block corruptions, logical block corruptions, or logical block corruptions caused by lost writes. HEALTH-DB_CLUSTER-CDB-CORRUPTION com.oraclecloud.databaseservice.database.critical NA
Other DB Errors HEALTH.DB_CLUSTER.CDB.ARCHIVER_HANG An event of type CRITICAL is created if a CDB is either unable to archive the active online redo log or unable to archive the active online redo log fast enough to the log archive destinations. HEALTH-DB_CLUSTER-CDB-ARCHIVER_HANG com.oraclecloud.databaseservice.database.critical NA
HEALTH.DB_CLUSTER.CDB.DATABASE_HANG An event of type CRITICAL is created when a process/session hang is detected in the CDB. HEALTH-DB_CLUSTER-CDB-DATABASE_HANG com.oraclecloud.databaseservice.database.critical NA
Backup Failures HEALTH.DB_CLUSTER.CDB.BACKUP_FAILURE An event of type CRITICAL is created if there is a CDB backup with a FAILED status reported in the v$rman_status view. HEALTH-DB_CLUSTER-CDB-BACKUP_FAILURE com.oraclecloud.databaseservice.database.critical NA
HEALTH.DB_CLUSTER.CDB.BACKUP_FAILURE_CLEARED An event of type INFORMATION is created. NA com.oraclecloud.databaseservice.database.information NA
Disk Group Usage HEALTH.DB_CLUSTER.DISK_GROUP.FREE_SPACE An event of type CRITICAL is created when an ASM disk group reaches space usage of 90% or higher. An event of type INFORMATION is created when the ASM disk group space usage drops below 90%. HEALTH-DB_CLUSTER-DISK_GROUP-FREE_SPACE

com.oraclecloud.databaseservice.dbsystem.critical

com.oraclecloud.databaseservice.dbsystem.information (if < 90%)

Notifications are sent when the usage hits 70%, 80%, 90%, and 100% with a corresponding severity of 4, 3, 2, and 1.

Temporarily Restrict Automatic Diagnostic Collections for Specific Events

Use the tfactl blackout command to temporarily suppress automatic diagnostic collections.

If you set blackout for a target, then Oracle Trace File Analyzer stops automatic diagnostic collections if it finds events in the alert logs for that target while scanning. By default, blackout will be in effect for 24 hours.

You can also restrict automatic diagnostic collection at a granular level, for example, only for ORA-00600 or even only ORA-00600 with specific arguments.

Syntax

tfactl blackout add|remove|print
    -targettype host|crs|asm|asmdg|database|dbbackup|db_dataguard|
        db_tablespace|pdb_tablespace|pdb|listener|service|os
    -target all|name
    [-container name]
    [-pdb pdb_name]
    -event all|"event_str1,event_str2"|availability
    [-timeout nm|nh|nd|none]
    [-c|-local|-nodes "node1,node2"]
    [-reason "reason for blackout"]
    [-docollection]

Parameters

Table - Parameters

Parameter Description
add|remove|print| Adds, removes, or prints blackout conditions.

-targettype type

Target type:host|crs|asm|asmdg|database|dbbackup|db_dataguard|

db_tablespace|pdb_tablespace|pdb|listener|service|os

Limits blackout only to the specified target type.

host: The whole node is under blackout. If there is host blackout, then every blackout element that's shown true in the Telemetry JSON will have the reason for the blackout.

crs: Blackout the availability of the Oracle Clusterware resource or events in the Oracle Clusterware logs.

asm: Blackout the availability of Oracle Automatic Storage Management (Oracle ASM) on this machine or events in the Oracle ASM alert logs.

asmdg: Blackout an Oracle ASM disk group.

database: Blackout the availability of an Oracle Database, Oracle Database backup, tablespace, and so on, or events in the Oracle Database alert logs.

dbbackup: Blackout Oracle Database backup events (such as CDB or archive backups).

db_dataguard: Blackout Oracle Data Guard events.

db_tablespace: Blackout Oracle Database tablespace events (container database).

pdb_tablespace: Blackout Oracle pluggable database tablespace events (pluggable database).

pdb: Blackout Oracle pluggable database events.

listener: Blackout the availability of a listener.

service: Blackout the availability of a service.

os: Blackout one or more operating system records.

-target all|name

Specify the target for blackout. You can specify a comma-delimited list of targets.

By default, the target is set to all.

-container name Specify the database container name (db_unique_name) where the blackout will take effect (for PDB, DB_TABLESPACE, and PDB_TABLESPACE).
-pdb pdb_name Specify the PDB where the blackout will take effect (for PDB_TABLESPACE only).
-events all|"str1,str2"

Limits blackout only to the availability events, or event strings, which should not trigger auto collections, or be marked as blacked out in telemetry JSON.

all: Blackout everything for the target specified.

string: Blackout for incidents where any part of the line contains the strings specified.

Specify a comma-delimited list of strings.

-timeout nh|nd|none Specify the duration for blackout in number of hours or days before timing out. By default, the timeout is set to 24 hours (24h).
-c|-local

Specify if blackout should be set to cluster-wide or local.

By default, blackout is set to local.

-reason comment Specify a descriptive reason for the blackout.
-docollection Use this option to do an automatic diagnostic collection even if a blackout is set for this target.

Examples

The following are the examples to use tfactl blackout command.

To blackout event: ORA-00600 on targettype: database, target: mydb

tfactl blackout add -targettype database -target mydb -event "ORA-00600"

To blackout event: ORA-04031 on targettype: database, target: all

tfactl blackout add -targettype database -target all -event "ORA-04031" -timeout 1h

To blackout db backup events on targettype: dbbackup, target: mydb

tfactl blackout add -targettype dbbackup -target mydb

To blackout db dataguard events on targettype: db_dataguard, target: mydb

tfactl blackout add -targettype db_dataguard -target mydb -timeout 30m

To blackout db tablespace events on targettype: db_tablespace, target: system, container: mydb

tfactl blackout add -targettype db_tablespace -target system -container mydb -timeout 30m

To blackout ALL events on targettype: host, target: all

tfactl blackout add -targettype host -event all -target all -timeout 1h 
    -reason "Disabling all events during patching"

To print blackout details:

tfactl blackout print
.-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------.
|                                                                                myhostname                                                                                     |
+---------------+---------------------+-----------+------------------------------+------------------------------+--------+---------------+--------------------------------------+
| Target Type   | Target              | Events    | Start Time                   | End Time                     | Status | Do Collection | Reason                               |
+---------------+---------------------+-----------+------------------------------+------------------------------+--------+---------------+--------------------------------------+
| HOST          | ALL                 | ALL       | Thu Mar 24 16:48:39 UTC 2022 | Thu Mar 24 17:48:39 UTC 2022 | ACTIVE | false         | Disabling all events during patching |
| DATABASE      | MYDB                | ORA-00600 | Thu Mar 24 16:39:03 UTC 2022 | Fri Mar 25 16:39:03 UTC 2022 | ACTIVE | false         | NA                                   |
| DATABASE      | ALL                 | ORA-04031 | Thu Mar 24 16:39:54 UTC 2022 | Thu Mar 24 17:39:54 UTC 2022 | ACTIVE | false         | NA                                   |
| DB_DATAGUARD  | MYDB                | ALL       | Thu Mar 24 16:41:38 UTC 2022 | Thu Mar 24 17:11:38 UTC 2022 | ACTIVE | false         | NA                                   |
| DBBACKUP      | MYDB                | ALL       | Thu Mar 24 16:40:47 UTC 2022 | Fri Mar 25 16:40:47 UTC 2022 | ACTIVE | false         | NA                                   |
| DB_TABLESPACE | SYSTEM_CDBNAME_MYDB | ALL       | Thu Mar 24 16:45:56 UTC 2022 | Thu Mar 24 17:15:56 UTC 2022 | ACTIVE | false         | NA                                   |
'---------------+---------------------+-----------+------------------------------+------------------------------+--------+---------------+--------------------------------------'

To remove blackout for event: ORA-00600 on targettype: database, target: mydb

tfactl blackout remove -targettype database -event "ORA-00600" -target mydb

To remove blackout for db backup events on targettype: dbbackup, target: mydb

tfactl blackout remove -targettype dbbackup -target mydb

To remove blackout for db tablespace events on targettype: db_tablespace, target: system, container: mydb

tfactl blackout remove -targettype db_tablespace -target system -container mydb

To remove blackout for host events on targettype: all, target: all

tfactl blackout remove -targettype host -event all -target all

Manage Oracle Trace File Analyzer

To check the run status of Oracle Trace File Analyzer, run the tfactl status command as root or a non-root user:

tfactl status
.----------------------------------------------------------------------------------------------.
| Host  | Status of TFA | PID    | Port | Version    | Build ID             | Inventory Status |
+-------+---------------+--------+------+------------+----------------------+------------------+
| node1 | RUNNING       | 41312  | 5000 | 22.1.0.0.0 | 22100020220310214615 | COMPLETE         |
| node2 | RUNNING       | 272300 | 5000 | 22.1.0.0.0 | 22100020220310214615 | COMPLETE         |
'----------------------------------------------------------------------------------------------'

To start the Oracle Trace File Analyzer daemon on the local node, run the tfactl start command as root user:

tfactl start
Starting TFA..
Waiting up to 100 seconds for TFA to be started..
. . . . .
Successfully started TFA Process..
. . . . .
TFA Started and listening for commands

To stop the Oracle Trace File Analyzer daemon on the local node, run the tfactl stop command as root user:

tfactl stop
Stopping TFA from the Command Line
Nothing to do !
Please wait while TFA stops
Please wait while TFA stops
TFA-00002 Oracle Trace File Analyzer (TFA) is not running
TFA Stopped Successfully
Successfully stopped TFA..

Manage Database Service Agent

View the /opt/oracle/dcs/log/dcs-agent.log file to identify issues with the agent.

To check the status of the Database Service Agent, run the systemctl status command:

systemctl status dbcsagent.service
dbcsagent.service
Loaded: loaded (/usr/lib/systemd/system/dbcsagent.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2022-04-01 13:40:19 UTC; 6min ago
Process: 9603 ExecStopPost=/bin/bash -c kill `ps -fu opc |grep "java.*dbcs-agent.*jar" |
    awk '{print $2}' ` (code=exited, status=0/SUCCESS)
Main PID: 10055 (sudo)
CGroup: /system.slice/dbcsagent.service
‣ 10055 sudo -u opc /bin/bash -c umask 077; /bin/java -Doracle.security.jps.config=/opt/oracle/...

To start the agent if it is not running, run the systemctl start command as the root user:

systemctl start dbcsagent.service