Oracle Exadata Database Service on Dedicated Infrastructure Events

Exadata Cloud Infrastructure resources emit events, which are structured messages that indicate changes in resources.

About Event Types on Exadata Cloud Infrastructure

Learn about the event types available for Exadata Cloud Infrastructure resources.

Exadata Cloud Infrastructure resources emit events, which are structured messages that indicate changes in resources. For more information about Oracle Cloud Infrastructure Events, see Overview of Events. You may subscribe to events and be notified when they occur using the Oracle Notification service, see Notifications Overview.

Prerequisites for Event Service

The following prerequisites are required for the Events to flow out of the VM Cluster.

The Event Service requires the following:

  1. Events on the VM Cluster depends on Oracle Trace File Analyzer (TFA) agent. Ensure that these components are up and running. AHF version 22.2.2 or higher is required for capturing events from the VM Cluster. To start, stop, or check the status of TFA, see Incident Logs and Trace Files . To enable AHF Telemetry for the VM Cluster using the dbcli ulitilty, see AHF Telemetry Commands
  2. The following network configurations are required.
    1. Egress rules for outgoing traffic: The default egress rules are sufficient to enable the required network path : For more information, see Default Security List .If you have blocked the outgoing traffic by modifying the default egress rules on your Virtual Cloud Network(VCN), you will need to revert the settings to allow outgoing traffic. The default egress rule allowing outgoing traffic (as shown in Security Rules for the Oracle Exadata Database Service on Dedicated Infrastructure) is as follows:
      • Stateless: No (all rules must be stateful)
      • Destination Type: CIDR
      • Destination CIDR: All <region> Services in Oracle Services Network

      • IP Protocol: TCP

      • Destination Port: 443 (HTTPS)
    2. Public IP or Service Gateway: The database server host must have either a public IP address or a service gateway to be able to send database server host metrics to the Monitoring service.

      If the instance does not have a public IP address, set up a service gateway on the virtual cloud network (VCN). The service gateway lets the instance send database server host metrics to the Monitoring service without the traffic going over the internet. Here are special notes for setting up the service gateway to access the Monitoring service:

      1. When creating the service gateway, enable the service label called All <region> Services in Oracle Services Network. It includes the Monitoring service.
      2. When setting up routing for the subnet that contains the instance, set up a route rule with Target Type set to Service Gateway, and the Destination Service set to All <region> Services in Oracle Services Network.

        For detailed instructions, see Access to Oracle Services: Service Gateway.

Oracle Exadata Database Service on Dedicated Infrastructure Event Types

The events in this section are emitted by the cloud Exadata infrastructure resource

Note

Exadata systems that use the old DB system resource model are deprecated and will be desupported in a future release. The DB system event are not described.
Friendly Name Event Type
Cloud Exadata Infrastructure - Create Begin com.oraclecloud.databaseservice.createcloudexadatainfrastructure.begin
Cloud Exadata Infrastructure - Create End com.oraclecloud.databaseservice.createcloudexadatainfrastructure.end
Cloud Exadata Infrastructure - Change Compartment Begin com.oraclecloud.databaseservice.changecloudexadatainfrastructurecompartment.begin
Cloud Exadata Infrastructure - Change Compartment End com.oraclecloud.databaseservice.changecloudexadatainfrastructurecompartment.end
Cloud Exadata Infrastructure - Critical

See Exadata Cloud Service Infrastructure Critical and Information Event Types for details

com.oraclecloud.databaseservice.cloudexadatainfrastructure.critical
Cloud Exadata Infrastructure - Delete Begin com.oraclecloud.databaseservice.deletecloudexadatainfrastructure.begin
Cloud Exadata Infrastructure - Delete End com.oraclecloud.databaseservice.deletecloudexadatainfrastructure.end
Cloud Exadata Infrastructure - Information

See Exadata Cloud Service Infrastructure Critical and Information Event Types for details

com.oraclecloud.databaseservice.cloudexadatainfrastructure.information
Cloud Exadata Infrastructure - Update Begin com.oraclecloud.databaseservice.updatecloudexadatainfrastructure.begin
Cloud Exadata Infrastructure - Update End com.oraclecloud.databaseservice.updatecloudexadatainfrastructure.end

This is a reference event for a Cloud Exadata Infrastructure resource:

{
  "cloudEventsVersion": "0.1",
  "eventId": "<unique_ID>",
  "eventType": "com.oraclecloud.databaseservice.cloudexadatainfrastructuremaintenance.end",
  "source": "DatabaseService",
  "eventTypeVersion": "1.0",
  "eventTime": "2019-06-27T21:16:04.000Z",
  "contentType": "application/json",
  "extensions": {
    "compartmentId": "ocid1.compartment.oc1.<unique_ID>"
  },
  "data": {
    "compartmentId": "ocid1.compartment.oc1.<unique_ID>",
    "compartmentName": "example_name",
    "resourceName": "my_exadata_infrastructure",
    "resourceId": "ocid1.dbsystem.oc1.eu-frankfurt-1.<unique_ID>", ,
    "availabilityDomain": "tXPJ:EU-FRANKFURT-1-AD-3",
    "freeFormTags": {
      "Department": "Finance"
    },
    "definedTags": {
      "Operations": {
        "CostCenter": "42"
      }
    },
    "additionalDetails" : { 
"subnetId" : "ocid1.subnet.oc1.eu-frankfurt-1.<unique_ID>", 
"lifecycleState" : "MAINTENANCE_IN_PROGRESS", 
"sshPublicKeys" : "...", 
"cpuCoreCount" : 32, 
"version" : "19.2.8.0.0.191119", 
"nsgIds" : "null", 
"backupSubnetId" : "ocid1.subnet.oc1.eu-frankfurt-1.<unique_ID>", 
"licenseType" : "BRING_YOUR_OWN_LICENSE", 
"dataStoragePercentage" : 80, 
"patchHistoryEntries" : "null", 
"lifecycleMessage" : "The underlying infrastructure of this system (cell storage) is being updated and this will not impact database 
                      availability.", 
"exadataIormConfig" : "ExadataIormConfigCache(lifecycleState=DISABLED, lifeCycleDetails=null, objective=Auto, 
                       dbPlans=[DbIormConfigCache(dbName=default, share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database1>, 
                       share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database2>, share=null, flashCacheLimit=null), 
                       DbIormConfigCache(dbName=<my_database3>, share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database4>, 
                       share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database5>, share=null, flashCacheLimit=null), 
                       DbIormConfigCache(dbName=<my_database6>, share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database7>, 
                       share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database8>, share=null, flashCacheLimit=null), 
                       DbIormConfigCache(dbName=<my_database9>, share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database10>, 
                       share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database11>, share=null, flashCacheLimit=null)], 
                       undoData=null)" 
} 
}, 
"eventID" : "<unique_ID>", 
"extensions" : { 
"compartmentId" : "ocid1.compartment.oc1.<unique_ID>" 
} 
}

Oracle Exadata Database Service on Dedicated Infrastructure Maintenance Event Types

The events in this section are emitted by the cloud Exadata infrastructure resource for Maintenance Events

Note

Exadata systems that use the old DB system resource model are deprecated and will be desupported in a future release. The DB system event are not described.
Friendly Name Event Type Event Messages
Cloud Exadata Infrastructure – Maintenance Begin com.oraclecloud.databaseservice.cloudexadatainfrastructuremaintenance.begin
This is an Oracle Cloud Operations notice regarding the quarterly maintenance update installation for your Cloud Exadata 
Infrastructure instance <infra-name>, ocid <infra-ocid>. The update installation for the service started at <time scheduled>. 
A follow-up notice will be sent when the maintenance update operation has completed.
Cloud Exadata Infrastructure – Maintenance End com.oraclecloud.databaseservice.cloudexadatainfrastructuremaintenance.end N/A
CloudExadataInfrastructureMaintenance End Success com.oraclecloud.databaseservice
This is an Oracle Cloud Operations notice that your Cloud Exadata Infrastructure quarterly maintenance update installation for 
service instance <infra-name>, ocid <infra-ocid> which started at <maintenance-start-time> is now successfully complete. 
CloudExadataInfrastructureMaintenance End Failed com.oraclecloud.databaseservice.com.
This is an Oracle Cloud Operations notice that your Cloud Exadata Infrastructure quarterly maintenance update installation for 
service instance <infra-name>, ocid <infra-ocid> which started at <maintenance-start-time> has failed to complete due to technical 
reasons and operations team are currently looking into the issue. 
You will receive regular notifications to track progress of this maintenance.
Cloud Exadata Infrastructure – Maintenance Reminder com.oraclecloud.databaseservice.cloudexadatainfrastructuremaintenancereminder
This is an Oracle Cloud Operations reminder notice. Oracle has scheduled a quarterly maintenance update installation for Cloud 
Exadata Infrastructure instance <infra-name>, ocid <infra-ocid> in approximately <lead-time> weeks on <time-scheduled>.";

UPDATED
Rolling
This is an Oracle Cloud Operations reminder notice. Oracle has scheduled a quarterly maintenance update 
installation for Cloud Exadata Infrastructure <infra-name>, ocid <ocid>. 
in approximately <no-of-days> days on <time-scheduled>. The maintenance method for this maintenance 
is <maintenance-method> as selected per the maintenance preferences.

Non Rolling
This is an Oracle Cloud Operations reminder notice. Oracle has scheduled a quarterly maintenance update 
installation for Cloud Exadata Infrastructure <infra-name>, ocid <ocid>. 
in approximately <no-of-days> days on <time-scheduled>. The maintenance method for this maintenance 
is <maintenance-method> as selected per the maintenance preferences.
Non-rolling maintenance minimizes maintenance time but will result in full system downtime.
Cloud Exadata Infrastructure – Maintenance Scheduled com.oraclecloud.databaseservice.cloudexadatainfrastructuremaintenancescheduled
Oracle Cloud Operations is announcing the availability of a new quarterly maintenance update for Cloud Exadata Infrastructure.
Oracle has scheduled the installation of this new update on your service instance <infra-name>, ocid <infra-ocid> on <time-scheduled>.

UPDATED
Rolling 
Oracle Cloud Operations is announcing the availability of a new quarterly maintenance update for Cloud Exadata Infrastructure. 
Oracle has scheduled the installation of this new update on your service instance <infra-name>, ocid <infra-ocid> on <time-scheduled>. 
The maintenance method for this maintenance is <maintenance-method> as selected per the maintenance preferences.

Non Rolling 
Oracle Cloud Operations is announcing the availability of a new quarterly maintenance update for Cloud Exadata Infrastructure. 
Oracle has scheduled the installation of this new update on your service instance <infra-name>, ocid <infra-ocid> on <time-scheduled>. 
The maintenance method for this maintenance is <maintenance-method> as selected per the maintenance preferences. 
Non-rolling maintenance minimizes maintenance time but will result in full system downtime
CloudExadataInfrastructureMaintenanceVM Begin com.oraclecloud.databaseservice. message may vary

This is a reference event for a Cloud Exadata Infrastructure resource:

{
  "cloudEventsVersion": "0.1",
  "eventId": "<unique_ID>",
  "eventType": "com.oraclecloud.databaseservice.cloudexadatainfrastructuremaintenance.end",
  "source": "DatabaseService",
  "eventTypeVersion": "1.0",
  "eventTime": "2019-06-27T21:16:04.000Z",
  "contentType": "application/json",
  "extensions": {
    "compartmentId": "ocid1.compartment.oc1.<unique_ID>"
  },
  "data": {
    "compartmentId": "ocid1.compartment.oc1.<unique_ID>",
    "compartmentName": "example_name",
    "resourceName": "my_exadata_infrastructure",
    "resourceId": "ocid1.dbsystem.oc1.eu-frankfurt-1.<unique_ID>", ,
    "availabilityDomain": "tXPJ:EU-FRANKFURT-1-AD-3",
    "freeFormTags": {
      "Department": "Finance"
    },
    "definedTags": {
      "Operations": {
        "CostCenter": "42"
      }
    },
    "additionalDetails" : { 
"subnetId" : "ocid1.subnet.oc1.eu-frankfurt-1.<unique_ID>", 
"lifecycleState" : "MAINTENANCE_IN_PROGRESS", 
"sshPublicKeys" : "...", 
"cpuCoreCount" : 32, 
"version" : "19.2.8.0.0.191119", 
"nsgIds" : "null", 
"backupSubnetId" : "ocid1.subnet.oc1.eu-frankfurt-1.<unique_ID>", 
"licenseType" : "BRING_YOUR_OWN_LICENSE", 
"dataStoragePercentage" : 80, 
"patchHistoryEntries" : "null", 
"lifecycleMessage" : "The underlying infrastructure of this system (cell storage) is being updated and this will not impact database 
                      availability.", 
"exadataIormConfig" : "ExadataIormConfigCache(lifecycleState=DISABLED, lifeCycleDetails=null, objective=Auto, 
                       dbPlans=[DbIormConfigCache(dbName=default, share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database1>, 
                       share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database2>, share=null, flashCacheLimit=null), 
                       DbIormConfigCache(dbName=<my_database3>, share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database4>, 
                       share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database5>, share=null, flashCacheLimit=null), 
                       DbIormConfigCache(dbName=<my_database6>, share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database7>, 
                       share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database8>, share=null, flashCacheLimit=null), 
                       DbIormConfigCache(dbName=<my_database9>, share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database10>, 
                       share=null, flashCacheLimit=null), DbIormConfigCache(dbName=<my_database11>, share=null, flashCacheLimit=null)], 
                       undoData=null)" 
} 
}, 
"eventID" : "<unique_ID>", 
"extensions" : { 
"compartmentId" : "ocid1.compartment.oc1.<unique_ID>" 
} 
}

Exadata Cloud Infrastructure Critical and Information Event Types

Exadata Cloud Infrastructure infrastructure resources emit "critical" and "information" data plane events that allow you to receive notifications when your infrastructure resource needs attention.

Exadata Cloud Service infrastructure resources emit "critical" and "information" data plane events that allow you to receive notifications when your infrastructure resource needs urgent attention ("critical" events), or notifications for events that are not critical, but which you may want to monitor ("information" events). The eventType values for these events are the following:

  • com.oraclecloud.databaseservice.exadatainfrastructure.critical
  • com.oraclecloud.databaseservice.exadatainfrastructure.information

These events use the additionalDetails section of the event message to provide specific details about what is happening within the infrastructure resource emitting the event. In the additionalDetails section, the eventName field provides the name of the critical or information event. (Note that some fields in the example that follows have been omitted for brevity.)


 {
  "eventType" : "com.oraclecloud.databaseservice.exadatainfrastructure.critical",
  ....
  "data" : {
   ....
     "additionalDetails" : {
      ....
      "description" : "SQL statement terminated by Oracle Database Resource Manager due to excessive consumption of CPU and/or I/O.
                     The execution plan associated with the terminated SQL stmt is quarantined. Please find the sql identifier in 
                     sqlId field of this JSON payload. This feature protects an Oracle database from performance degradation. 
                     Please review the SQL statement. You can see the statement using the following commands: \"set serveroutput off\",
                     \"select sql_id, sql_text from v$sqltext where sql_id =<sqlId>\", \"set serveroutput on\"",
      "component" : "storage",
      "infrastructureType" : "exadata",
      "eventName" : "HEALTH.INFRASTRUCTURE.CELL.SQL_QUARANTINE",
      "quarantineMode" : "\"FULL Quarantine\""
       ....
    }
  },
  "eventID" : "<unique_ID>",
  ....
  }
}

In the tables below, you can read about the conditions and operations that trigger "critical" and "information" events. Each condition or operation is identified by a unique eventName value.

Critical events for Exadata Cloud Service infrastructure:

Critical Event - EventName Description
HEALTH.INFRASTRUCTURE.CELL.SQL_QUARANTINE

SQL statement terminated by Oracle Database Resource Manager due to excessive consumption of CPU and/or I/O. The execution plan associated with the terminated SQL stmt is quarantined. Please find the sql identifier in sqlId field of this JSON payload. This feature protects an Oracle database from performance degradation. Please review the SQL statement. You can see the statement using the following commands:

  • \"set serveroutput off\"
  • \"select sql_id, sql_text from v$sqltext where sql_id =<sqlId>\"
  • \"set serveroutput on\"

Informational events for Exadata Cloud Service infrastructure:

Information Event - EventName Description
HEALTH.INFRASTRUCTURE.CELL.FLASH_DISK_FAILURE Flash Disk Failure has been detected. This is being investigated by Oracle Exadata team and the disk will be replaced if needed. No action needed from the customer.

In the following example of a "critical" event, you can see within the additionalDetails section of the event message that this particular message concerns an SQL statement that was terminated by Oracle Database Resource Manager because it was consuming excessive CPU or I/O resources. The eventName and description fields within the additionalDetails section provide information regarding the critical situation:


 {
  "eventType" : "com.oraclecloud.databaseservice.exadatainfrastructure.critical",
  "cloudEventsVersion" : "0.1",
  "eventTypeVersion" : "2.0",
  "source" : "Exadata Storage",
  "eventTime" : "2021-07-30T04:53:18Z",
  "contentType" : "application/json",
  "data" : {
    "compartmentId" : "ocid1.tenancy.oc1.<unique_ID>",
    "compartmentName" : "example_name",
    "resourceName" : "my_exadata_resource",
    "resourceId" : "ocid1.dbsystem.oc1.phx.<unique_ID>",
    "availabilityDomain" : "phx-ad-2",
     "additionalDetails" : {
      "serviceType" : "exacs",
      "sqlID" : "gnwfm1jgqcfuu",
      "systemId" : "ocid1.dbsystem.oc1.eu-frankfurt-1.<unique_ID>",
      "creationTime" : "2021-05-14T13:29:28+00:00",
      "dbUniqueID" : "1558836122",
      "quarantineType" : "SQLID",
      "dbUniqueName" : "AB0503_FRA1S6",
      "description" : "SQL statement terminated by Oracle Database Resource Manager due to excessive consumption of CPU and/or I/O. 
                      The execution plan associated with the terminated SQL stmt is quarantined. Please find the sql identifier in sqlId 
                      field of this JSON payload. This feature protects an Oracle database from performance degradation. 
                      Please review the SQL statement. You can see the statement using the following commands: \"set serveroutput off\",
                      \"select sql_id, sql_text from v$sqltext where sql_id =<sqlId>\", \"set serveroutput on\"",
      "quarantineReason" : "Manual",
      "asmClusterName" : "None",
      "component" : "storage",
      "infrastructureType" : "exadata",
      "name" : "143",
      "eventName" : "HEALTH.INFRASTRUCTURE.CELL.SQL_QUARANTINE",
      "comment" : "None",
      "quarantineMode" : "\"FULL Quarantine\"",
      "rpmVersion" : "OSS_20.1.8.0.0_LINUX.X64_210317",
      "cellsrvChecksum" : "14f73eb107dc1be0bde757267e931991",
      "quarantinePlan" : "SYSTEM"
    }
  },
  "eventID" : "<unique_ID>",
  "extensions" : {
    "compartmentId" : "ocid1.tenancy.oc1.<unique_ID>"
  }
}

In the following example of an "information" event, you can see within the additionalDetails section of the event message that this particular message concerns a flash disk failure that is being investigated by the Oracle Exadata operations team. The eventName and description fields within the additionalDetails section provide information regarding the event:

{
  "eventType" : "com.oraclecloud.databaseservice.exadatainfrastructure.information",
  "cloudEventsVersion" : "0.1",
  "eventTypeVersion" : "2.0",
  "source" : "Exadata Storage",
  "eventTime" : "2021-12-17T19:14:42Z",
  "contentType" : "application/json",
  "data" : {
    "compartmentId" : "ocid1.tenancy.oc1..aaaaaaaao3lj36x6lwxyvc4wausjouca7pwyjfwb5ebsq5emrpqlql2gj5iq",
    "compartmentName" : "intexadatateam",
    "resourceId" : "ocid1.dbsystem.oc1.phx.abyhqljt5y3taezn7ug445fzwlngjfszbedxlcbctw45ykkaxyzc5isxoula",
    "availabilityDomain" : "phx-ad-2",
    "additionalDetails" : {
      "serviceType" : "exacs",
      "component" : "storage",
      "systemId" : "ocid1.dbsystem.oc1.phx.abyhqljt5y3taezn7ug445fzwlngjfszbedxlcbctw45ykkaxyzc5isxoula",
      "infrastructureType" : "exadata",
      "description" : "Flash Disk Failure has been detected. This is being investigated by Oracle Exadata team and the disk will be 
                       replaced if needed. No action needed from the customer.",
      "eventName" : "HEALTH.INFRASTRUCTURE.CELL.FLASH_DISK_FAILURE",
      "FLASH_1_1" : "S2T7NA0HC01251  failed",
      "otto-ingestion-time" : "2021-12-17T19:14:43.205Z",
      "otto-send-EventService-time" : "2021-12-17T19:14:44.198Z"
    }
  },
  "eventID" : "30130ab4-42fa-4285-93a7-47e49522c698",
  "extensions" : {
    "compartmentId" : "ocid1.tenancy.oc1..aaaaaaaao3lj36x6lwxyvc4wausjouca7pwyjfwb5ebsq5emrpqlql2gj5iq"
  }
}

Exadata Cloud Infrastructure VM Cluster Event Types

Review the list of events that can be emitted by VM Cluster

Friendly Name Event Type
Cloud VM Cluster - Change Compartment Begin
com.oraclecloud.databaseservice.changecloudvmclustercompartment.begin
Cloud VM Cluster - Change Compartment End
com.oraclecloud.databaseservice.changecloudvmclustercompartment.end
Cloud VM Cluster - Create Begin
com.oraclecloud.databaseservice.createcloudvmcluster.begin
Cloud VM Cluster - Create End
com.oraclecloud.databaseservice.createcloudvmcluster.end
Cloud VM Cluster - Delete Begin
com.oraclecloud.databaseservice.deletecloudvmcluster.begin
Cloud VM Cluster - Delete End
com.oraclecloud.databaseservice.deletecloudvmcluster.end
Cloud VM Cluster - Update Begin
com.oraclecloud.databaseservice.updatecloudvmcluster.begin
Cloud VM Cluster - Update End
com.oraclecloud.databaseservice.updatecloudvmcluster.end
Cloud VM Cluster - Update IORM Configuration Begin
com.oraclecloud.databaseservice.updatecloudvmclusteriormconfig.begin
Cloud VM Cluster - Update IORM Configuration End
com.oraclecloud.databaseservice.updatecloudvmclusteriormconfig.end

This is a reference event for a cloud VM cluster resource:

{
    "cloudEventsVersion": "0.1",
    "eventID": "<unique_ID>",
    "eventType": "com.oraclecloud.databaseservice.updatecloudvmclusteriormconfig.begin",
    "source": "databaseservice",
    "eventTypeVersion": "2.0",
    "eventTime": "2022-06-27T21:16:04.000Z",
    "contentType": "application/json",
    "data": {
      "eventGroupingId": "<unique_ID>",
      "eventName": "UpdateCloudVmClusterIormConfig",
      "compartmentName": "example_compartment",
      "resourceName": "my_container_database",
      "resourceId": "ocid1.cloudvmcluster.oc1.<unique_ID>",
      "resourceVersion": null,
      "additionalDetails": {
        "cloudExadataInfrastructureId": "ocid1.cloudexadatainfrastructure.oc1.<unique_ID>",
        "freeFormTags": {},
        "definedTags": {},
        "licenseType": "BRING_YOUR_OWN_LICENSE",
        "lifecycleState": "AVAILABLE",
        "giVersion": "19.0.0.0.0",
        "cpuCoreCount": 16
      }
    }
  },
  "timeCreated": "2022-06-15T16:31:31.979Z"
}

Data Guard Association Event Types

Review the list of event types that Data Guard associations emit.

Friendly Name Event Type
Change Protection Mode Begin com.oraclecloud.databaseservice.changeprotectionmode.begin
Change Protection Mode End com.oraclecloud.databaseservice.changeprotectionmode.end
Data Guard Association - Create Begin com.oraclecloud.databaseservice.createdataguardassociation.begin
Data Guard Association - Create End com.oraclecloud.databaseservice.createdataguardassociation.end
Data Guard Association - Failover Begin com.oraclecloud.databaseservice.failoverdataguardassociation.begin
Data Guard Association - Failover End com.oraclecloud.databaseservice.failoverdataguardassociation.end
Data Guard Association - Reinstate Begin com.oraclecloud.databaseservice.reinstatedataguardassociation.begin
Data Guard Association - Reinstate End com.oraclecloud.databaseservice.reinstatedataguardassociation.end
Data Guard Association - Switchover Begin com.oraclecloud.databaseservice.switchoverdataguardassociation.begin
Data Guard Association - Switchover End com.oraclecloud.databaseservice.switchoverdataguardassociation.end

This is a reference event for Data Guard associations:

{
    "cloudEventsVersion": "0.1",
    "contentType": "application/json",
    "data": {
        "additionalDetails": {
            "ApplyLag": null,
            "DGConfigId": "7e8eff2b-a4cd-474a-abd5-940b05c0b1fd",
            "DGConfigState": "null",
            "DatabaseId": "ocid1.database.oc1.iad.<unique_ID>",
            "DbHomeId": "ocid1.dbhome.oc1.iad.<unique_ID>",
            "DbSystemId": "ocid1.dbsystem.oc1.iad.<unique_ID>",
            "LastSyncedTime": null,
            "SyncState": "null",
            "dcsDgUpdateTimestamp": null,
            "lastUpdatedIdentifier": null,
            "lifeCycleMessage": null,
            "lifecycleState": "PROVISIONING",
            "timeCreated": "2019-10-25T21:42:19.041Z",
            "timeUpdated": "2019-10-25T21:42:19.041Z"
        },
        "availabilityDomain": "XXIT:US-ASHBURN-AD-1",
        "compartmentId": "ocid1.compartment.oc1.<unique_ID>",
        "compartmentName": "example_compartment",
        "resourceId": "ocid1.dgassociation.oc1.iad.<unique_ID>"
    },
    "eventID": "5b8b7fbf-2e9a-4730-9761-e52715b7bc79",
    "eventTime": "2019-10-25T21:42:16.579Z",
    "eventType": "com.oraclecloud.databaseservice.createdataguardassociation.begin",
    "eventTypeVersion": "2.0",
    "extensions": {
        "compartmentId": "ocid1.compartment.oc1.<unique_ID>"
    },
    "source": "DatabaseService"
}

Oracle Database Home Event Types

Review the list of events emitted by Oracle Database Homes.

Friendly Name Event Type
DB Home - Create Begin
com.oraclecloud.databaseservice.createdbhome.begin
DB Home - Create End
com.oraclecloud.databaseservice.createdbhome.end
DB Home - Patch Begin
com.oraclecloud.databaseservice.patchdbhome.begin
DB Home - Patch End
com.oraclecloud.databaseservice.patchdbhome.end
DB Home - Terminate Begin
com.oraclecloud.databaseservice.deletedbhome.begin
DB Home - Terminate End
com.oraclecloud.databaseservice.deletedbhome.end
DB Home - Update Begin
com.oraclecloud.databaseservice.updatedbhome.begin
DB Home - Update End
com.oraclecloud.databaseservice.updatedbhome.end

This is a reference event for Database Homes:

{
    "cloudEventsVersion": "0.1",
    "eventID": "60600c06-d6a7-4e85-b56a-1de3e6042f57",
    "eventType": "com.oraclecloud.databaseservice.createdbhome.begin",
    "source": "databaseservice",
    "eventTypeVersion": "2.0",
    "eventTime": "2019-08-29T21:16:04Z",
    "contentType": "application/json",
    "extensions": {
      "compartmentId": "ocid1.compartment.oc1.<unique_ID>"
    },
    "data": {
      "compartmentId": "ocid1.compartment.oc1.<unique_ID>",
      "compartmentName": "example_compartment",
      "resourceName": "my_dbhome",
      "resourceId": "DbHome-unique_ID",
      "availabilityDomain": "all",
      "freeFormTags": {},
      "definedTags": {},
      "additionalDetails": {
        "id": "ocid1.id.oc1.<unique_ID>",
        "lifecycleState": "PROVISIONING",
        "timeCreated": "2019-08-29T12:00:00.000Z",
        "timeUpdated": "2019-08-29T12:30:00.000Z",
        "lifecycleDetails": "detail message",
        "dbSystemId": "DbSystem-unique_ID",
        "dbVersion": "19.0.0.0",
        "recordVersion": 4,
        "displayName": "example_display_name"
      }
    }
  }

Database Event Types

These are the event types that Oracle Databases in Exadata Cloud Service instances emit.

Friendly Name Event Type
Database - Automatic Backup Begin com.oraclecloud.databaseservice.automaticbackupdatabase.begin
Database - Automatic Backup End com.oraclecloud.databaseservice.automaticbackupdatabase.end
Database - Create Backup Begin com.oraclecloud.databaseservice.backupdatabase.begin
Database - Create Backup End com.oraclecloud.databaseservice.backupdatabase.end

Database - Critical

(see Database Service Event Types for more information)

com.oraclecloud.databaseservice.database.critical
Database - Information com.oraclecloud.databaseservice.database.information
Database - Delete Backup Begin com.oraclecloud.databaseservice.deletebackup.begin
Database - Delete Backup End com.oraclecloud.databaseservice.deletebackup.end
Database - Migrate to KMS Key Begin com.oraclecloud.databaseservice.migratedatabasekmskey.begin
Database - Migrate to KMS Key End com.oraclecloud.databaseservice.migratedatabasekmskey.end
Database - Move Begin com.oraclecloud.databaseservice.movedatabase.begin
Database - Move End com.oraclecloud.databaseservice.movedatabase.end
Database - Restore Begin com.oraclecloud.databaseservice.restoredatabase.begin
Database - Restore End com.oraclecloud.databaseservice.restoredatabase.end
Database - Rotate KMS Key Begin com.oraclecloud.databaseservice.rotatedatabasekmskey.begin
Database - Rotate KMS Key End com.oraclecloud.databaseservice.rotatedatabasekmskey.end
Database - Terminate Begin com.oraclecloud.databaseservice.database.terminate.begin
Database - Terminate End com.oraclecloud.databaseservice.database.terminate.end
Database - Update Begin com.oraclecloud.databaseservice.updatedatabase.begin
Database - Update End com.oraclecloud.databaseservice.updatedatabase.end
Database - Upgrade Begin com.oraclecloud.databaseservice.upgradedatabase.begin
Database - Upgrade End com.oraclecloud.databaseservice.upgradedatabase.end

This is a reference event for databases:

{
"eventType" : "com.oraclecloud.databaseservice.backupdatabase.begin",
udEventsVersion" : "0.1",
"eventTypeVersion" : "2.0",
"source" : "DatabaseService",
"eventTime" : "2020-01-08T17:31:43.666Z",
"contentType" : "application/json",
"data" : {
"compartmentId" : "ocid1.compartment.oc1.<unique_ID>",
"compartmentName": "example_compartment_name",
"resourceName": "my_backup",
"resourceId": "ocid1.dbbckup.oc1.<unique_ID>", 
"availabilityDomain": "<availability_domain>",
"additionalDetails" : {
"timeCreated" : "2020-01-08T17:31:44Z", 
"lifecycleState" : "CREATING", 
"dbSystemId" : "ocid1.dbsystem.oc1.<unique_ID>", 
"dbHomeId" : ocid1.dbhome.oc1.<unique_ID>", 
"dbUniqueName" : DB1115_iad1dv", 
"dbVersion" : "11.2.0.4.190716", 
"databaseEdition" : "ENTERPRISE_EDITION_HIGH_PERFORMANCE", 
"autoBackupsEnabled" : "false", 
"backupType" : "FULL", 
"databaseId" : "ocid1.database.oc1.<unique_ID>", 
},
"definedTags" : {
      "My_example_tag_name" :  
        { "Example_key" : "Example_value" }
      },
    "eventID": "<unique_ID>",
    "extensions" : {
      "compartmentId": "ocid1.compartment.oc1.<unique_ID>"
    }
}

Pluggable Database Event Types

These are the event types that Oracle pluggable databases in Oracle Cloud Infrastructure emit.

Friendly Name Event Type
Pluggable Database - Create Begin com.oraclecloud.databaseservice.createpluggabledatabase.begin
Pluggable Database - Create End com.oraclecloud.databaseservice.createpluggabledatabase.end
Pluggable Database - Delete Begin com.oraclecloud.databaseservice.deletepluggabledatabase.begin
Pluggable Database - Delete End com.oraclecloud.databaseservice.deletepluggabledatabase.end
Pluggable Database - Local Clone Begin com.oraclecloud.databaseservice.localclonepluggabledatabase.begin
Pluggable Database - Local Clone End com.oraclecloud.databaseservice.localclonepluggabledatabase.end
Pluggable Database - Remote Clone Begin com.oraclecloud.databaseservice.remoteclonepluggabledatabase.begin
Pluggable Database - Remote Clone End com.oraclecloud.databaseservice.remoteclonepluggabledatabase.end
Start Pluggable Database - Begin com.oraclecloud.databaseservice.startpluggabledatabase.begin
Start Pluggable Database - End com.oraclecloud.databaseservice.startpluggabledatabase.end
Stop Pluggable Database - Begin com.oraclecloud.databaseservice.stoppluggabledatabase.begin
Stop Pluggable Database - End com.oraclecloud.databaseservice.stoppluggabledatabase.end

This is a reference event for pluggable databases (PDBs):

{
  "eventID": "unique_id",
  "eventTime": "2021-03-23T00:49:14.123Z",
  "extensions": {
    "compartmentId": "ocid1.compartment.oc1.<unique_ID>"
  },
  "eventType": "com.oraclecloud.databaseservice.remoteclonepluggabledatabase.begin",
  "eventTypeVersion": "2.0",
  "cloudEventsVersion": "0.1",
  "source": "databaseservice",
  "contentType": "application/json",
  "definedTags": {},
  "data": {
    "compartmentId": "ocid1.compartment.oc1.<unique_ID>",
    "compartmentName": "MyCompartment",
    "resourceName": "11092020_PKS_PDB1",
    "resourceId": "ocid1.pluggabledatabases.oc1.phx.<unique_ID>",
    "availabilityDomain": "XXIT:PHX-AD-1",
    "freeFormTags": {},
    "definedTags": {},
    "additionalDetails": {
      "id": "ocid1.pluggabledatabases.oc1.phx.<unique_ID>",
      "timeCreated": "2021-03-13T21:15:59.000Z",
      "timeUpdated": "2021-03-13T21:15:59.000Z",
      "databaseId": "ocid1.database.oc1.<unique_ID>",
      "lifecycleState": "AVAILABLE",
      "lifecycleDetails": "Pluggable Database is available",
      "displayName": "Pluggable Database - Remote Clone Begin"
    }
  }
}

Database Service Events

The Database Service emits events, which are structured messages that indicate changes in resources.

Overview of Database Service Events

Database Service Events feature implementation enables you to get notified about health issues with your Oracle Databases or other components on the Guest VM.

It is possible that Oracle Database or Clusterware may not be healthy or various system components may be running out of space in the Guest VM. You are not notified of this situation, unless you opt-in.
Note

You are opting in with the understanding that the list of events can change in the future. You can opt-out of this feature at any time

Database Service Events feature implementation generates events for Guest VM operations and conditions, as well as Notifications for customers by leveraging the existing OCI Events service and Notification mechanisms in their tenancy. Customers can then create topics and subscribe to these topics through email, functions, or streams.

Note

Events flow on Exadata Cloud Infrastructure depends on the following components: Oracle Trace File Analyzer (TFA), and Oracle Database Cloud Service (DBCS) agent. Ensure that these components are up and running.

Manage Oracle Trace File Analyzer

  • To check the run status of Oracle Trace File Analyzer, run the tfactl status command as root or a non-root user:
    # tfactl status
    .-------------------------------------------------------------------------------------------------.
    | Host	| Status of TFA | PID    | Port | Version    | Build ID	      | Inventory Status|
    +----------------+---------------+--------+------+------------+----------------------+------------+
    | node1      | RUNNING	| 41312  | 5000 | 22.1.0.0.0 | 22100020220310214615 | COMPLETE        |
    | node2      | RUNNING	| 272300 | 5000 | 22.1.0.0.0 | 22100020220310214615 | COMPLETE        |
    '----------------+---------------+--------+------+------------+----------------------+------------'
  • To start the Oracle Trace File Analyzer daemon on the local node, run the tfactl start command as root:
    # tfactl start
    Starting TFA..
    Waiting up to 100 seconds for TFA to be started..
    . . . . .
    . . . . .
    . . . . .
    . . . . .
    . . . . .
    . . . . .
    . . . . .
    . . . . .
    Successfully started TFA Process..
    . . . . .
    TFA Started and listening for commands
  • To stop the Oracle Trace File Analyzer daemon on the local node, run the tfactl stop command as root:
    # tfactl stop
    Stopping TFA from the Command Line
    Nothing to do !
    Please wait while TFA stops
    Please wait while TFA stops
    TFA-00002 Oracle Trace File Analyzer (TFA) is not running
    TFA Stopped Successfully
    Successfully stopped TFA..

Manage Database Service Agent

View the /opt/oracle/dcs/log/dcs-agent.log file to identify issues with the agent.

  • To check the status of the Database Service Agent, run the systemctl status command:
    # systemctl status dbcsagent.service
    dbcsagent.service
    Loaded: loaded (/usr/lib/systemd/system/dbcsagent.service; enabled; vendor preset: disabled)
    Active: active (running) since Fri 2022-04-01 13:40:19 UTC; 6min ago
    Process: 9603 ExecStopPost=/bin/bash -c kill `ps -fu opc |grep "java.*dbcs-agent.*jar" |awk '{print $2}' ` (code=exited, status=0/SUCCESS)
    Main PID: 10055 (sudo)
    CGroup: /system.slice/dbcsagent.service
    ‣ 10055 sudo -u opc /bin/bash -c umask 077; /bin/java -Doracle.security.jps.config=/opt/oracle/...
  • To start the agent if it is not running, run the systemctl start command as the root user:
    systemctl start dbcsagent.service

Receive Notifications about Database Service Events

Subscribe to the Database Service Events and get notified.

To receive notifications, subscribe to Database Service Events and get notified using the Oracle Notification service, see Notifications Overview. For more information about Oracle Cloud Infrastructure Events, see Overview of Events.

Events Service - Event Types:
  • Database - Critical
  • DB Node - Critical
  • DB Node - Error
  • DB Node - Warning
  • DB Node - Information
  • DB System - Critical

Database Service Event Types

Review the list of event types that the Database Service emits.

Note

  • Critical events are triggered due to several types of critical conditions and errors that cause disruption to the database and other critical components. For example, database hang errors, and availability errors for databases, database nodes, and database systems to let you know if a resource becomes unavailable.
  • Information events are triggered when the database and other critical components work as expected. For example, a clean shutdown of CRS, CDB, client, or scan listener, or a startup of these components will create an event with the severity of INFORMATION.
  • Threshold limits reduce the number of notifications customers will receive for similar incident events whilst at the same time ensuring they receive the incident events and are reminded in a timely fashion.

Table 6-1 Database Service Events

Friendly Name Event Name Remediation Event Type Threshold

Resource Utilization - Disk Usage

HEALTH.DB_GUEST.FILESYSTEM.FREE_SPACE

This event is reported when VM guest file system free space falls below 10% free, as determined by the operating system df(1) command, for the following file systems:
  • /
  • /u01
  • /u02
  • /var (X8M and later only)
  • /tmp (X8M and later only)

HEALTH.DB_GUEST.FILESYSTEM.FREE_SPACE

com.oraclecloud.databaseservice.dbnode.critical

Critical threshold: 90%

CRS status Up/Down

AVAILABILITY.DB_GUEST.CRS_INSTANCE.DOWN.

An event of type CRITICAL is created when the Cluster Ready Service (CRS) is detected to be down.

AVAILABILITY.DB_GUEST.CRS_INSTANCE.DOWN

com.oraclecloud.databaseservice.dbnode.critical (if .DOWN and NOT "user_action")

N/A

AVAILABILITY.DB_GUEST.CRS_INSTANCE.DOWN_CLEARED

An event of type INFORMATION is created once it is determined that the event for CRS down has cleared.

N/A

com.oraclecloud.databaseservice.dbnode.information (if .DOWN_CLEARED)

SCAN Listener Up/Down

AVAILABILITY.DB_CLUSTER.SCAN_LISTENER.DOWN

A DOWN event is created when a SCAN listener goes down. The event is of type INFORMATION when a SCAN listener is shutdown due to user action, such as with the Server Control Utility (srvctl) or Listener Control (lsnrctl) commands, or any Oracle Cloud maintenance action that uses those commands, such as performing a grid infrastructure software update. The event is of type CRITICAL when a SCAN listener goes down unexpectedly. A corresponding DOWN_CLEARED event is created when a SCAN listener is started.

There are three SCAN listeners per cluster called LISTENER_SCAN[1,2,3].

AVAILABILITY.DB_CLUSTER.SCAN_LISTENER.DOWN

com.oraclecloud.databaseservice.dbnode.critical (if .DOWN and NOT "user_action")

N/A

AVAILABILITY.DB_CLUSTER.SCAN_LISTENER.DOWN_CLEARED

An event of type INFORMATION is created once it is determined that the event for SCAN Listener down has cleared.

N/A

com.oraclecloud.databaseservice.dbnode.information (if .DOWN_CLEARED)

Net Listener Up/Down

AVAILABILITY.DB_GUEST.CLIENT_LISTENER.DOWN

A DOWN event is created when a client listener goes down. The event is of type INFORMATION when a client listener is shutdown due to user action, such as with the Server Control Utility (srvctl) or Listener Control (lsnrctl) commands, or any Oracle Cloud maintenance action that uses those commands, such as performing a grid infrastructure software update. The event is of type CRITICAL when a client listener goes down unexpectedly. A corresponding DOWN_CLEARED event is created when a client listener is started.

There is one client listener per node, each called LISTENER.

AVAILABILITY.DB_GUEST.CLIENT_LISTENER.DOWN

com.oraclecloud.databaseservice.database.critical (if .DOWN and NOT "user_action")

N/A

AVAILABILITY.DB_GUEST.CLIENT_LISTENER.DOWN_CLEARED

An event of type INFORMATION is created once it is determined that the event for Client Listener down has cleared.

N/A

com.oraclecloud.databaseservice.database.information (if .DOWN_CLEARED)

CDB Up/Down

AVAILABILITY.DB_GUEST.CDB_INSTANCE.DOWN

A DOWN event is created when a database instance goes down. The event is of type INFORMATION when a database instance is shutdown due to user action, such as with the SQL*Plus (sqlplus) or Server Control Utility (srvctl) commands, or any Oracle Cloud maintenance action that uses those commands, such as performing a database home software update. The event is of type CRITICAL when a database instance goes down unexpectedly. A corresponding DOWN_CLEARED event is created when a database instance is started.

AVAILABILITY.DB_GUEST.CDB_INSTANCE.DOWN

com.oraclecloud.databaseservice.database.critical (if .DOWN and NOT "user_action")

N/A

AVAILABILITY.DB_GUEST.CDB_INSTANCE.DOWN_CLEARED

An event of type INFORMATION is created once it is determined that the event for the CDB down has cleared.

N/A

com.oraclecloud.databaseservice.database.information (if .DOWN_CLEARED)

CRS Eviction AVAILABILITY.DB_GUEST.CRS_INSTANCE.EVICTION An event of type CRITICAL is created when the Cluster Ready Service (CRS) evicts a node from the cluster. The CRS alert.log is parsed for the CRS-1632 error indicating that a node is being removed from the cluster. AVAILABILITY.DB_GUEST.CRS_INSTANCE.EVICTION

An event of type CRITICAL is created when the Cluster Ready Service (CRS) evicts a node from the cluster. The CRS alert.log is parsed for the CRS-1632 error indicating that a node is being removed from the cluster.

N/A

Critical DB Errors

HEALTH.DB_CLUSTER.CDB.CORRUPTION

Database corruption has been detected on your primary or standby database. The database alert.log is parsed for any specific errors that are indicative of physical block corruptions, logical block corruptions, or logical block corruptions caused by lost writes.

HEALTH.DB_CLUSTER.CDB.CORRUPTION

com.oraclecloud.databaseservice.database.critical

N/A

Other DB Errors

HEALTH.DB_CLUSTER.CDB.ARCHIVER_HANG

An event of type CRITICAL is created if a CDB is either unable to archive the active online redo log or unable to archive the active online redo log fast enough to the log archive destinations.

HEALTH.DB_CLUSTER.CDB.ARCHIVER_HANG

com.oraclecloud.databaseservice.database.critical

N/A

HEALTH.DB_CLUSTER.CDB.DATABASE_HANG

An event of type CRITICAL is created when a process/session hang is detected in the CDB.

HEALTH.DB_CLUSTER.CDB.DATABASE_HANG

Backup Failures

HEALTH.DB_CLUSTER.CDB.BACKUP_FAILURE

An event of type CRITICAL is created if there is a CDB backup with a FAILED status reported in the v$rman_status view.

HEALTH.DB_CLUSTER.CDB.BACKUP_FAILURE

com.oraclecloud.databaseservice.database.critical

N/A

Disk Group Usage

HEALTH.DB_CLUSTER.DISK_GROUP.FREE_SPACE

An event of type CRITICAL is created when an ASM disk group reaches space usage of 90% or higher. An event of type INFORMATION is created when the ASM disk group space usage drops below 90%.

HEALTH.DB_CLUSTER.DISK_GROUP.FREE_SPACE

com.oraclecloud.databaseservice.dbsystem.critical

com.oraclecloud.databaseservice.dbsystem.information (if < 90%)

Critical threshold: 90%

Example 6-62 Database Service DB Node Critical Events Examples

DB node critical reference events:
{
 "eventType" : "com.oraclecloud.databaseservice.dbnode.critical",
 "cloudEventsVersion" : "0.1",
 "eventTypeVersion" : "2.0",
 "source" : "SYSLENS/host_Name/DomU",
 "eventTime" : "2022-03-04T18:19:42Z",
 "contentType" : "application/json",
 "data" : {
   "compartmentId" : "compartment_ID",
   "compartmentName" : "compartment_Name",
   "resourceName" : "resource_Name",
   "resourceId" : "resource_ID",
   "additionalDetails" : {
     "serviceType" : "EXACS",
     "hostName" : "host_Name",
     "description" : "The '/' filesystem is over 90% used.",
     "eventName" : "HEALTH.DB_GUEST.FILESYSTEM.FREE_SPACE",
     "status" : "online"
   }
 },
 "eventID" : "a9752630-9be7-11ec-a203-00163eb980bb",
 "extensions" : {
   "compartmentId" : "compartment_ID"
 }
}

Temporarily Restrict Automatic Diagnostic Collections for Specific Events

Use the tfactl blackout command to temporarily suppress automatic diagnostic collections.

If you set blackout for a target, then Oracle Trace File Analyzer stops automatic diagnostic collections if it finds events in the alert logs for that target while scanning. By default, blackout will be in effect for 24 hours.

You can also restrict automatic diagnostic collection at a granular level, for example, only for ORA-00600 or even only ORA-00600 with specific arguments.

Syntax

tfactl blackout add|remove|print
-targettype host|crs|asm|asmdg|database|dbbackup|db_dataguard|db_tablespace|pdb_tablespace|pdb|listener|service|os
-target all|name
[-container name]
[-pdb pdb_name]
-event all|"event_str1,event_str2"|availability
[-timeout nm|nh|nd|none]
[-c|-local|-nodes "node1,node2"]
[-reason "reason for blackout"]
[-docollection]

Parameters

Table 6-2 tfactl blackout Command Parameters

Parameter Description

add|remove|print|

Adds, removes, or prints blackout conditions.

targettype type

Target type: host|crs|asm|asmdg|database|dbbackup |db_dataguard|db_tablespace |pdb_tablespace|pdb|listener|service|os

Limits blackout only to the specified target type.

host: The whole node is under blackout. If there is host blackout, then every blackout element that's shown true in the Telemetry JSON will have the reason for the blackout.

crs: Blackout the availability of the Oracle Clusterware resource or events in the Oracle Clusterware logs.

asm: Blackout the availability of Oracle Automatic Storage Management (Oracle ASM) on this machine or events in the Oracle ASM alert logs.

asmdg: Blackout an Oracle ASM disk group.

database: Blackout the availability of an Oracle Database, Oracle Database backup, tablespace, and so on, or events in the Oracle Database alert logs.

dbbackup: Blackout Oracle Database backup events (such as CDB or archive backups).

db_dataguard: Blackout Oracle Data Guard events.

db_tablespace: Blackout Oracle Database tablespace events (container database).

pdb_tablespace: Blackout Oracle Pluggable Database tablespace events (Pluggable database).

pdb: Blackout Oracle Pluggable Database events.

listener: Blackout the availability of a listener.

service: Blackout the availability of a service.

os: Blackout one or more operating system records.

target all|name

Specify the target for blackout. You can specify a comma-delimited list of targets.

By default, the target is set to all.

container name

Specify the database container name (db_unique_name) where the blackout will take effect (for PDB, DB_TABLESPACE, and PDB_TABLESPACE).

pdb pdb_name

Specify the PDB where the blackout will take effect (for PDB_TABLESPACE only).

events all|"str1,str2"

Limits blackout only to the availability events, or event strings, which should not trigger auto collections, or be marked as blacked out in telemetry JSON.

all: Blackout everything for the target specified.

string: Blackout for incidents where any part of the line contains the strings specified.

Specify a comma-delimited list of strings.

timeout nh|nd|none

Specify the duration for blackout in number of hours or days before timing out. By default, the timeout is set to 24 hours (24h).

c|local

Specify if blackout should be set to cluster-wide or local.

By default, blackout is set to local.

reason comment

Specify a descriptive reason for the blackout.

docollection

Use this option to do an automatic diagnostic collection even if a blackout is set for this target.

Example 6-63 tfactl blackout

  • To blackout event: ORA-00600 on target type: database, target: mydb
    tfactl blackout add -targettype database -target mydb -event "ORA-00600"
  • To blackout event: ORA-04031 on target type: database, target: all
    tfactl blackout add -targettype database -target all -event "ORA-04031" -timeout 1h
  • To blackout db backup events on target type: dbbackup, target: mydb
    tfactl blackout add -targettype dbbackup -target mydb
  • To blackout db dataguard events on target type: db_dataguard, target: mydb
    tfactl blackout add -targettype db_dataguard -target mydb -timeout 30m
  • To blackout db tablespace events on target type: db_tablespace, target: system, container: mydb
    tfactl blackout add -targettype db_tablespace -target system -container mydb -timeout 30m
  • To blackout ALL events on target type: host, target: all
    tfactl blackout add -targettype host -event all -target all -timeout 1h -reason "Disabling all events during patching"
  • To print blackout details
    tfactl blackout print
    
    .-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------.
    |                                                                                myhostname                                                                                     |
    +---------------+---------------------+-----------+------------------------------+------------------------------+--------+---------------+--------------------------------------+
    | Target Type   | Target              | Events    | Start Time                   | End Time                     | Status | Do Collection | Reason                               |
    +---------------+---------------------+-----------+------------------------------+------------------------------+--------+---------------+--------------------------------------+
    | HOST          | ALL                 | ALL       | Thu Mar 24 16:48:39 UTC 2022 | Thu Mar 24 17:48:39 UTC 2022 | ACTIVE | false         | Disabling all events during patching |
    | DATABASE      | MYDB                | ORA-00600 | Thu Mar 24 16:39:03 UTC 2022 | Fri Mar 25 16:39:03 UTC 2022 | ACTIVE | false         | NA                                   |
    | DATABASE      | ALL                 | ORA-04031 | Thu Mar 24 16:39:54 UTC 2022 | Thu Mar 24 17:39:54 UTC 2022 | ACTIVE | false         | NA                                   |
    | DB_DATAGUARD  | MYDB                | ALL       | Thu Mar 24 16:41:38 UTC 2022 | Thu Mar 24 17:11:38 UTC 2022 | ACTIVE | false         | NA                                   |
    | DBBACKUP      | MYDB                | ALL       | Thu Mar 24 16:40:47 UTC 2022 | Fri Mar 25 16:40:47 UTC 2022 | ACTIVE | false         | NA                                   |
    | DB_TABLESPACE | SYSTEM_CDBNAME_MYDB | ALL       | Thu Mar 24 16:45:56 UTC 2022 | Thu Mar 24 17:15:56 UTC 2022 | ACTIVE | false         | NA                                   |
    '---------------+---------------------+-----------+------------------------------+------------------------------+--------+---------------+--------------------------------------'
  • To remove blackout for event: ORA-00600 on target type: database, target: mydb
    tfactl blackout remove -targettype database -event "ORA-00600" -target mydb
  • To remove blackout for db backup events on target type: dbbackup, target: mydb
    tfactl blackout remove -targettype dbbackup -target mydb
  • To remove blackout for db tablespace events on target type: db_tablespace, target: system, container: mydb
    tfactl blackout remove -targettype db_tablespace -target system -container mydb
  • To remove blackout for host events on target type: host, target: all
    tfactl blackout remove -targettype host -event all -target all

Remediation

These topics cover some common issues you might run into and how to address them.

HEALTH.DB_GUEST.FILESYSTEM.FREE_SPACE

Problem Statement: One or more VM guest file systems has free space below 10% free.

Risk: Insufficient VM guest file system free space can cause disk space allocation failure, which can result in wide-ranging errors and failures in Oracle software (Database, Clusterware, Cloud, Exadata).

Action:

Oracle Cloud and Exadata utilities run automatically to purge old log files and trace files created by Oracle software to reclaim file system space.

If the automatic file system space reclamation utilities cannot sufficiently purge old files to clear this event, then perform the following actions:

  1. Remove unneeded files and/or directories created manually or by customer-installed applications or utilities. Files created by customer-installed software are outside the scope of Oracle's automatic file system space reclamation utilities. The following operating system command, run as the opc user, is useful for identifying directories consuming excessive disk space:
    $ sudo du -hx file-system-mount-point | sort -hr

    Only remove files or directories you are certain can be safely removed.

  2. Reclaim /u02 file system disk space by removing Database Homes that have no databases. For more information about managing Database Homes, see Manage Oracle Database Homes on Exadata Database Service on Exadata Cloud Infrastructure Instance.
  3. Open service request to receive additional guidance about reducing file system space use.
AVAILABILITY.DB_GUEST.CRS_INSTANCE.DOWN

Problem Statement: The Cluster Ready Stack is in an offline state or has failed.

Risk: If the Cluster Ready Service is offline on a node, then the node cannot provide database services for the application.

Action:

  1. Check if CRS was stopped by your administrator, as part of a planned maintenance event, or a scale up or down of local storage.
    1. The following patching events will stop CRS:
      1. GRID Patching
      2. Exadata VM patching of Guest
      3. Exadata VM Patching of Host
  2. If CRS has stopped unexpectedly, then the current status can be checked by issuing the crsctl check crs command.
    1. If the node is not responding, then the VM node may be rebooting. Wait for the node reboot to finish, CRS will normally be started through the init process.
  3. If CRS is still down, then investigate the cause of the failure by referring to the alert.log found in /u01/app/grid/diag/crs/<node_name>/crs/trace.

    Review the log entries corresponding to the date/time of the down event. Act on any potential remediation.

  4. Restart the CRS, by issuing the crsctl start crs command.
  5. A successful restart of CRS will generate the clearing event: AVAILABILITY.DB_GUEST.CRS_INSTANCE.DOWN_CLEARED.
AVAILABILITY.DB_CLUSTER.SCAN_LISTENER.DOWN

Problem Statement: A SCAN listener is down and unable to accept application connections.

Risk: If all SCAN listeners are down, then application connections to the database through the SCAN listener will fail.

Action:

Start the SCAN listener to receive the DOWN_CLEARED event.

DOWN event of type INFORMATION

  1. If the event was caused by an Oracle Cloud maintenance action, such as performing a Grid Infrastructure software update, then no action is required. The affected SCAN listener will automatically failover to an available instance.
  2. If the event was caused by user action, then start the SCAN listener at the next opportunity.

DOWN event of type CRITICAL

Check SCAN status and restart the SCAN listener.

  1. Login to the VM as opc user and sudo to the grid user:
     sudo su - grid
  2. Check the SCAN listener status on any node:
     srvctl status scan_listener 
  3. Start the SCAN listener:
     srvctl start scan_listener
  4. Recheck the SCAN listeners status on any node:

    If the scan_listener is still down, then investigate the cause of the scan listener failure:

    1. Collect both the CRS and operating system logs 30 minutes prior and 10 minutes for the <hostName>indicated in the log. Note the time in the event payload is always provided in UTC. For tfactl collection, adjust the time to the timezone of the VM Cluster. As the grid user:
       tfactl diagcollect -crs -os -node <hostName> –from "<eventTime adjusted for local vm timezone> - 30 minute " -to "<eventTime adjusted for local vm timezone> + 10 minutes"
    2. Review the SCAN listener log located under /u01/app/grid/diag/tnslsnr/<hostName>/<listenerName>/trace
AVAILABILITY.DB_GUEST.CLIENT_LISTENER.DOWN

Problem Statement: A client listener is down and unable to accept application connections.

Risk:
  • If the node's client listener is down, then the database instances on the node cannot provide services for the application.
  • If the client listener is down on all nodes, then any application that connects to any database using the SCAN or VIP will fail.

Action:

Start the client listener to receive the DOWN_CLEARED event.

DOWN event of type INFORMATION

  1. If the event was caused by an Oracle Cloud maintenance action, such as performing a Grid Infrastructure software update, then no action is required. The affected client listener will automatically restart when maintenance affecting the grid instance is complete.
  2. If the event was caused by user action, then start the client listener at the next opportunity.

DOWN event of type CRITICAL

Check the client listener status and then restart the client listener.

  1. Login to the VM as opc user and sudo to the grid user:
    [opc@vm ~] sudo su - grid
  2. Check the client listener status on any node:
    [grid@vm ~] srvctl status listener 
  3. Start the client listener:
    [grid@vm ~] srvctl start listener
  4. Recheck the client listener status on any node:

    If the client listener is still down, then investigate the cause of the client listener failure:

    1. Use tfactl to collect both the CRS and operating system logs 30 minutes prior and 10 minutes for the <hostName> indicated in the log. Note the time in the event payload is always provided in UTC. For tfactl collection, adjust the time to the timezone of the VM Cluster.
      [grid@vm ~] tfactl diagcollect -crs -os -node <hostName> –from "<eventTime adjusted for local vm timezone> - 30 minute " -to "<eventTime adjusted for local vm timezone> + 10 minutes"
    2. Review the listener log located under /u01/app/grid/diag/tnslsnr/<hostName>/<listenerName>/trace
AVAILABILITY.DB_GUEST.CDB_INSTANCE.DOWN

Problem Statement: A database instance has gone down.

Risk: A database instance has gone down, which may result in reduced performance if database instances are available on other nodes in the cluster, or complete downtime if database instances on all nodes are down.

Action:

Start the database instance to receive the DOWN_CLEARED event.

DOWN event of type INFORMATION
  1. If the event was caused by an Oracle Cloud maintenance action, such as performing a Database Home software update, then no action is required. The affected database instance will automatically restart when maintenance affecting the instance is complete.
  2. If the event was caused by user action, then start the affected database instance at the next opportunity.

DOWN event of type CRITICAL

  1. Check database status and restart the down database instance.
    1. Login to the VM as oracle user:
    2. Set the environment:
      [oracle@vm ~] . <dbName>.env
    3. Check the database status:
      [oracle@vm ~] srvctl status database -db <dbName>
    4. Start the database instance:
      [oracle@vm ~] srvctl start instance -db <dbName> -instance <instanceName>
  2. Investigate the cause of the database instance failure.
    1. Review Trace File Analyzer (TFA) events for the database:
      [oracle@vm ~] tfactl events -database <dbName> -instance <instanceName>
    2. Review the database alert log located at $ORACLE_BASE/diag/rdbms/<dbName>/<instanceName>/trace/alert_<instanceName>.log
AVAILABILITY.DB_GUEST.CRS_INSTANCE.EVICTION

Problem Statement: The Oracle Clusterware is designed to perform a node eviction by removing one or more nodes from the cluster if some critical problem is detected. A critical problem could be a node not responding via a network heartbeat, a node not responding via a disk heartbeat, a hung or severely degraded machine, or a hung ocssd.bin process. The purpose of this node eviction is to maintain the overall health of the cluster by removing impaired members.

Risks: During the time it takes to restart the evicted node, the node cannot provide database services for the application.

Action: CRS node eviction could be caused by OCSSD (CSS daemon), CSSDAGENT, or CSSDMONITOR processes. This requires determining which process was responsible for the node eviction and reviewing the relevant log files. Common causes of OCSSD eviction are network failures/latencies, IO issues with CSS voting disks, a member kill escalation. CSSDAGENT or CSSDMONITOR evictions could be OS scheduler problem or a hung thread within CSS daemon.

Log files to review include:

  • clusterware alert log
  • cssdagent log
  • cssdmonitor log
  • ocssd log
  • lastgasp log
  • /var/log/messages
  • CHM/OS Watcher data
  • opatch lsinventory detail

For more information on collecting together most of the files, see Autonomous Health Framework (AHF) - Including TFA and ORAchk/EXAchk (Doc ID 2550798.1).

For more information on troubleshooting CRS node eviction, see Troubleshooting Clusterware Node Evictions (Reboots) (Doc ID 1050693.1).

HEALTH.DB_CLUSTER.CDB.CORRUPTION

Problem Statement: Corruptions can lead to application or database errors and in worse case result in significant data loss if not addressed promptly.

A corrupt block is a block that was changed so that it differs from what Oracle Database expects to find. Block corruptions can be categorized as physical or logical:
  • In a physical block corruption, which is also called a media corruption, the database does not recognize the block at all; the checksum is invalid or the block contains all zeros. An example of a more sophisticated block corruption is when the block header and footer do not match.
  • In a logical block corruption, the contents of the block are physically sound and pass the physical block checks; however, the block can be logically inconsistent. Examples of logical block corruption include incorrect block type, incorrect data or redo block sequence number, corruption of a row piece or index entry, or data dictionary corruptions.

For more information, see Physical and Logical Block Corruptions. All you wanted to know about it. (Doc ID 840978.1).

Block corruptions can also be divided into interblock corruption and intrablock corruption:
  • In an intrablock corruption, the corruption occurs in the block itself and can be either a physical or a logical block corruption.
  • In an interblock corruption, the corruption occurs between blocks and can only be a logical block corruption.
Oracle checks for the following errors in the alert.log:
  • ORA-01578
  • ORA-00752
  • ORA-00753
  • ORA-00600 [3020]
  • ORA-00600 [kdsgrp1]
  • ORA-00600 [kclchkblk_3]
  • ORA-00600 [13013]
  • ORA-00600 [5463]

Risk: A data corruption outage occurs when a hardware, software, or network component causes corrupt data to be read or written. The service-level impact of a data corruption outage may vary, from a small portion of the application or database (down to a single database block) to a large portion of the application or database (making it essentially unusable). If remediation action is not taken promptly, then potential downtime and data loss can increase.

Action:

The current event notification currently triggers on physical block corruptions (ORA-01578), lost writes (ORA-00752, ORA-00753 and ORA-00600 with first argument 3020), and logical corruptions (typical detected from ORA-00600 with first argument of kdsgrp1, kdsgrp1, kclchkblk_3, 13013 OR 5463).

Oracle recommends the following steps:
  1. Confirm that these corruptions were reported in the alert.log trace file. Log a Service Request (SR) with latest EXAchk report, excerpt of the alert.log and trace file containing the corruption errors, any history of recent application, database or software changes and any system, clusterware and database logs for the same time period. For all these cases, a TFA collection should be available and should be attached to the SR.
  2. For repair recommendations, refer to Handling Oracle Database Corruption Issues (Doc ID 1088018.1).
For physical corruptions or ORA-1578 errors, the following notes will be helpful:
  • Doc ID 1578.1 : OERR: ORA-1578 "ORACLE data block corrupted (file # %s, block # %s)" Primary Note
  • Doc ID 472231.1 : How to identify all the Corrupted Objects in the Database reported by RMAN
  • Doc ID 819533.1 : How to identify the corrupt Object reported by ORA-1578 / RMAN / DBVERIFY
  • Depending on the object that has the corruption, follow the guidance in Doc ID 1088018.1. Note RMAN can be used to recover one or many data block that are physically corrupted. Also using Active Data Guard with real time apply, auto block repair of physical data corruptions would have occurred automatically.
For logical corruptions caused by lost writes (ORA-00752, ORA-00753 and ORA-00600 with first argument 3020) on the primary or standby databases, they will be detected on the primary or with standby's redo apply process. The following notes will be helpful:
  • Follow the guidance, follow Doc ID 1088018.1.
  • If you have a standby and lost write corruption on the primary or standby, refer to Resolving ORA-00752 or ORA-00600 [3020] During Standby Recovery (Doc ID 1265884.1)
For logical corruptions (typical detected from ORA-00600 with arguments of kdsgrp1, kclchkblk_3, 13013 OR 5463):
  • Follow the guidance, follow Doc ID 1088018.1 for specific guidance on the error that was detected.
  • If you have a standby and logical corruption on the primary, refer to Resolving Logical Block Corruption Errors in a Physical Standby Database (Doc ID 2821699.1)
HEALTH.DB_CLUSTER.CDB.ARCHIVER_HANG

Problem Statement: CDB RAC Instance may temporarily or permanently stall due to the log writer's (LGWR) inability to write the log buffers to an online redo log. This occurs because all online logs need archiving. Once the archiver (ARC) can archive at least one online redo log, LGWR will be able to resume writing the log buffers to online redo logs and the application impact will be alleviated.

Risk: If the archiver hang is temporary, then this can result in a small application brown out or stall for application processes attempting to commit their database changes. If the archiver is not unblocked, applications can experience extended delay in processing.

Action:
  • See, Script To Find Redo log Switch History And Find Archivelog Size For Each instance In RAC (Doc ID 2373477.1) to determine the hourly frequency for each thread/instance.
  • If any hourly bucket is greater than 12, then consider resizing the online redo logs. See item 2 below for resizing steps.
  • If the database hangs are temporary, then the archiver may be unable to keep up with the redo log generated. Check the alert.log, $ORACLE_BASE/diag/rdbms/<dbName>/<instanceName>/trace/alert_<instanceName>.log, for "All online logs need archiving", multiple events in a short period can indicate 2 possible solutions.
    • If the number of redo logs groups per thread is less than 4, then consider adding additional logs groups to reach 4, see item 1 below for add redo log steps.
    • The other possible solution is to resize the redo logs, see item 2 below for resizing steps.
  • For Data Guard and Non Data Guard review the Configure Online Redo Logs Appropriately of section Oracle Database High Availability Overview and Best Practices for sizing guidelines.
  1. Add a redo log group for each thread. The additional redo log should equal the current log size.
    1. Use the following query:
      select max(group#) Ending_group_number, thread#, count(*) number_of_groups_per_thread, bytes redo_size_in_bytes from v$log group by thread#,bytes
    2. Add one new group per thread using the same size as the current redo logs.
      alter database add logfile thread <thread_number> Group <max group + 1> ('<DATA_DISKGROUP>') size <redo_size_in_bytes>
  2. Resize the online redo logs by adding larger redo logs and dropping the current smaller redo logs.
    1. Use the following query:
      select max(group#) Ending_group_number, thread#, count(*) number_of_groups_per_thread, bytes redo_size_in_bytes from v$log group by thread#,bytes
    2. Add the same number of redo logs for each thread <number_of_groups_per_thread> that currently exist. The <new_redo_size_in_bytes> should be based on Configure Online Redo Logs Appropriately of section Oracle Database High Availability Overview and Best Practices.
      1. alter database add logfile thread <thread_number> Group <max group + 1> ('<DATA_DISKGROUP>') size <new_redo_size_in_bytes>
      2. The original smaller redo logs should be deleted. A redo log can only be deleted if its status is inactive.

        To determine the status of a redo logs issue:

        select group#, thread#, status, bytes from v$log order by bytes, group#, thread#;
        To delete the original smaller redo logs:
        alter database drop logfile <group#>
  • If the database is hung, the primary log archive destination and alternate may be full. Review the HEALTH.DB_CLUSTER.DISK_GROUP.FREE_SPACE for details on freeing space in RECO and DATA disk groups.
HEALTH.DB_CLUSTER.CDB.DATABASE_HANG

Problem Statement: Hang management detected a process hang and generated a ORA-32701 error message. Additionally, this event may be raised if Diagnostic Process (DIA0) process detects a hang in a critical database process.

Risk: A hang can indicate resource, operating system, or application coding related issues.

Action:

Investigate the cause of the session hang.
  1. Review TFA events for the database for the following message patterns corresponding to the date/time of the event: ORA-32701, "DIA0 Critical Database Process Blocked" or "DIA0 Critical Database Process As Root".
    [oracle@vm ~] tfactl events -database <dbName> -instance <instanceName>
  2. Review the alert.log file.
    $ORACLE_BASE/diag/rdbms/<dbName>/<instanceName>/trace/alert_<instanceName>.log
  3. For ora-32701: An overloaded system can cause slow progress, which can be interpreted as a hang.

    The hang manager may attempt to resolve the hang by terminating the final blocker process.

  4. For DIA0 Critical Database Process messages: Review the related diagnostic lines indicating the process and the reason for the hang.
HEALTH.DB_CLUSTER.CDB.BACKUP_FAILURE

Problem Statement: A daily incremental BACKUP of the CDB failed.

Risk: A failure of the backup can compromise the ability to use the backups for restore/recoverability of the database. Recoverability Point Object (RPO) and the Recoverability Time Object (RTO) can be impacted.

Action:

Review the RMAN logs corresponding to the date/time of the event. Note the event time stamp <eventTime> is in UTC, adjust as necessary for the VM's timezone.

  • For Exadata Cloud Infrastructure Oracle Managed Backups or User Configured Backups under dbaascli:
    • RMAN output can be found at /var/opt/oracle/log/<DB_NAME>/obkup.

      Daily incremental logs have the following format obkup_yyyy-mm-dd_24hh:mm:ss.zzzzzzzzzzzz.log within the obkup directory. The logs are located on the lowest active node/instance of the database when the backup was initiated.

    • Review the log for any failures:
      • If the failure is due to an external event outside of RMAN, for example the backup location was full or a networking issue, resolve the external issue.
      • For other RMAN script errors, collect the diagnostic logs and open a Service Request. See DBAAS Tooling: Using dbaascli to Collect Cloud Tooling Logs and Perform a Cloud Tooling Health Check.

    • If the issue is transient or is resolved, take a new incremental backup: See dbaascli database backup.

  • For Customer owned and managed backup taken through RMAN:
    • Review the RMAN logs for the backup.
HEALTH.DB_CLUSTER.DISK_GROUP.FREE_SPACE

Problem Statement: ASM disk group space usage is at or exceeds 90%.

Risk: Insufficient ASM disk group space can cause database creation failure, tablespace and data file creation failure, automatic data file extension failure, or ASM rebalance failure.

Action:

ASM disk group used space is determined by the running the following query while connected to the ASM instance.
[opc@node ~] sudo su - grid
[grid@node ~] sqlplus / as sysasm
 
SQL> select 'ora.'||name||'.dg', total_mb, free_mb, round ((1-(free_mb/total_mb))*100,2) pct_used from v$asm_diskgroup;
 
NAME                             TOTAL_MB    FREE_MB   PCT_USED
------------------------------ ---------- ---------- ----------
ora.DATAC1.dg                    75497472    7408292      90.19
ora.RECOC1.dg                    18874368   17720208       6.11

ASM disk group capacity can be increased in the following ways:

  1. Scale Exadata VM Cluster storage to add more ASM disk group capacity. See Scaling an Exadata Cloud Infrastructure Instance.
  2. Scale Exadata Infrastructure storage to add more ASM disk group capacity. See Scaling Exadata X8M and X9M Compute and Storage.

DATA disk group space use can be reduced in the following ways:

  1. Drop unused data files and temp files from databases. See Dropping Data Files.
  2. Terminate unused databases (e.g. test databases). See Using the Console to Terminate a Database.

RECO disk group space use can be reduced in the following ways:

  1. Drop unnecessary Guaranteed Restore Points. See Using Normal and Guaranteed Restore Points.
  2. Delete archived redo logs or database backups already backed up outside the Flash Recovery Area (FRA). See Maintaining the Fast Recovery Area.

SPARSE disk group space use can be reduced in the following ways:

  1. Move full copy test master databases to another disk group (e.g. DATA).
  2. Drop unused snapshot databases or test master databases. See Managing Exadata Snapshots.

For more information about managing the log and diagnostic files, see Managing the Log and Diagnostic Files on Oracle Exadata Database Service on Dedicated Infrastructure.

Managing the Log and Diagnostic Files on Oracle Exadata Database Service on Dedicated Infrastructure

The software components in Oracle Exadata Database Service on Dedicated Infrastructure generate a variety of log and diagnostic files, and not all these files are automatically archived and purged. Thus, managing the identification and removal of these files to avoid running out of file storage space is an important administrative task.

Database deployments on ExaDB-D include the cleandblogs script to simplify this administrative task. The script runs daily as a cron job on each compute node to archive key files and remove old log and diagnostic files.

The cleandblogs script operates by using the adrci (Automatic Diagnostic Repository [ADR] Command Interpreter) tool to identify and purge target diagnostic folders and files for each Oracle Home listed in /etc/oratab. It also targets Oracle Net Listener logs, audit files, and core dumps.

On ExaDB-D, the script is run separately as the oracle user to clean log and diagnostic files that are associated with Oracle Database, and as the grid user to clean log and diagnostic files that are associated with Oracle Grid Infrastructure.

The cleandblogs script uses a configuration file to determine how long to retain each type of log or diagnostic file. You can edit the file to change the default retention periods. The file is located at /var/opt/oracle/cleandb/cleandblogs.cfg on each compute node.

Note

Configure an optimal retention period for each type of log or diagnostic file. An insufficient retention period will hinder root cause analysis and problem investigation.
Parameter Description and Default Value

AlertRetention

Alert log (alert_instance.log) retention value in days.

Default value: 14

ListenerRetention

Listener log (listener.log) retention value in days.

Default value: 14

AuditRetentionDB

Database audit (*.aud) retention value in days.

Default value: 1

CoreRetention

Core dump/files (*.cmdp*) retention value in days.

Default value: 7

TraceRetention

Trace file (*.tr* and *.prf) retention value in days.

Default value: 7

longpRetention

Data designated in the Automatic Diagnostic Repository (ADR) as having a long life (the LONGP_POLICY attribute). For information about ADR, see Automatic Diagnostic Repository (ADR) in the Oracle Database Administrator's Guide.

Default value: 14

shortpRetention

Data designated in the Automatic Diagnostic Repository (ADR) as having a short life (the SHORTP_POLICY attribute). For information about ADR, see Automatic Diagnostic Repository (ADR) in the Oracle Database Administrator's Guide.

Default value: 7

LogRetention

Log file retention in days for files under /var/opt/oracle/log and log files in ACFS under /var/opt/oracle/dbaas_acfs/log.

Default value: 14

LogDirRetention

cleandblogs logfile retention in days.

Default value: 14

ScratchRetention

Temporary file retention in days for files under /scratch.

Default value: 7

Archiving Alert Logs and Listener Logs

When cleaning up alert and listener logs, cleandblogs first archives and compresses the logs, operating as follows:
  1. The current log file is copied to an archive file that ends with a date stamp.
  2. The current log file is emptied.
  3. The archive file is compressed using gzip.
  4. Any existing compressed archive files older than the retention period are deleted.

Running the cleandblogs Script Manually

The cleandblogs script automatically runs daily on each compute node, but you can also run the script manually if the need arises.

  1. Connect to the compute node as the oracle user to clean log and diagnostic files that are associated with Oracle Database, or connect as the grid user to clean log and diagnostic files that are associated with Oracle Grid Infrastructure.

    For detailed instructions, see Connecting to a Virtual Machine with SSH.

    Change to the directory containing the cleandblogs script:
    $ cd /var/opt/oracle/cleandb
  2. Run the cleandblogs script:
    $ ./cleandblogs.pl
    When running the script manually, you can specify an alternate configuration file to use instead of cleandblogs.cfg by using the --pfile option:
    $ ./cleandblogs.pl --pfile config-file-name
  3. Close your connection to the compute node:
    $ exit