4.2 Using Automatic Diagnostic Collections

Oracle Trace File Analyzer monitors your logs for significant problems, such as internal errors like ORA-00600, or node evictions.

4.2.1 Collecting Diagnostics Automatically

This section explains automatic diagnostic collection concepts.

If Oracle Trace File Analyzer detects any problems, then it performs the following actions:

  • Runs necessary diagnostics and collects all relevant log data at the time of a problem

  • Trims log files to collect only what is necessary for diagnosis

  • Collects and packages all trimmed diagnostics from all nodes in the cluster, consolidating everything on a single node

  • Stores diagnostic collections in the Oracle Trace File Analyzer repository

  • Sends you email notification of the problem and details of diagnostic collection that is ready for upload to Oracle Support

Figure 4-2 Automatic Diagnostic Collections

Description of Figure 4-2 follows
Description of "Figure 4-2 Automatic Diagnostic Collections"

Oracle Trace File Analyzer has a mechanism that prevents repeat errors from overwhelming your system with excessive, automatic collections.

Identifying an event triggers the start point for a collection and five minutes later Oracle Trace File Analyzer starts collecting diagnostic data. Starting five minutes later enables Oracle Trace File Analyzer to capture other relevant events in one operation. If events are still occurring after five minutes, then diagnostic collection continues to wait. Oracle Trace File Analyzer waits for 30 seconds with no events occurring up to an additional five minutes.

If events continue after 10 minutes, then Oracle Trace File Analyzer continues to perform diagnostic collection.

After completing the diagnostic collections, Oracle Trace File Analyzer sends email notifications that include the collection location to the designated recipients.

If your environment can make a connection to oracle.com, then you can use Oracle Trace File Analyzer to upload the collection to a Service Request.

$ tfactl set autodiagcollect=ON|OFF

Automatic collections are ON by default.

Table 4-3 Log Entries that Trigger Automatic collection

String Pattern Log Monitored

ORA-297(01|02|03|08|09|10|40)

ORA-00600

ORA-07445

ORA-04(69|([7-8][0-9]|9([0-3]|[5-8])))

ORA-32701

ORA-00494

ORA-04020

ORA-04021

ORA-01578

ORA-00700

System State dumped

Alert Log - Oracle Database

Alert Log - Oracle Database/Oracle ASM

Alert Log - Oracle Database/Oracle ASM Proxy

Alert Log - Oracle Database

CRS-016(07|10|11|12)

Alert Log - Oracle Clusterware

Additionally, when Oracle Cluster Health Advisor detects a problem event, Oracle Trace File Analyzer automatically triggers the relevant diagnostic collection.

4.2.2 Configuring Email Notification Details

Configure Oracle Trace File Analyzer to send an email to the registered email address after an automatic collection completes.

To send emails, configure the system on which Oracle Trace Analyzer is running. You must configure notification with a user email address.

To configure email notification details:

  1. To set the notification email for a specific ORACLE_HOME, include the operating system owner in the command:
    tfactl set notificationAddress=os_user:email
    For example:
    tfactl set notificationAddress=oracle:some.body@example.com
  2. To set the notification email for any ORACLE_HOME:
    tfactl set notificationAddress=email
    For example:
    tfactl set notificationAddress=another.body@example.com
  3. Configure the SMTP server using tfactl set smtp.

    Set the SMTP parameters when prompted.

    Table 4-4 tfactl diagnosetfa Command Parameters

    Parameter Description

    smtp.host

    Specify the SMTP server host name.

    smtp.port

    Specify the SMTP server port.

    smtp.user

    Specify the SMTP user.

    smtp.password

    Specify password for the SMTP user.

    smtp.auth

    Set the Authentication flag to true or false.

    smtp.ssl

    Set the SSL flag to true or false.

    smtp.from

    Specify the from mail ID.

    smtp.to

    Specify the comma-delimited list of recipient mail IDs.

    smtp.cc

    Specify the comma-delimited list of CC mail IDs.

    smtp.bcc

    Specify the comma-delimited list of BCC mail IDs.

    smtp.debug

    Set the Debug flag to true or false.

    Note:

    You can view current SMTP configuration details using tfactl print smtp.

  4. Verify SMTP configuration by sending a test email using tfactl sendmail email_address.

    If Oracle Trace File Analyzer detects that a significant error has occurred, then it sends an email notification as follows:

  5. Do the following after receiving the notification email:
    1. To find the root cause, inspect the referenced collection details.
    2. If you can fix the issue, then resolve the underlying cause of the problem.
    3. If you do not know the root cause of the problem, then log an SR with Oracle Support, and upload the collection details.

4.2.3 Collecting Problems Detected by Oracle Cluster Health Advisor

Configure Oracle Cluster Health Advisor to automatically collect diagnostics for abnormal events, and send email notifications.

  1. To configure Oracle Cluster Health Advisor auto collection for abnormal events:
    tfactl set chaautocollect=ON
  2. To enable Oracle Cluster Health Advisor notification through Oracle Trace File Analyzer:
    tfactl set chanotification=on
  3. To configure an email address for Oracle Cluster Health Advisor notifications to be sent to:
    tfactl set notificationAddress=chatfa:john.doe@acompany.com

4.2.4 Sanitizing Sensitive Information in Oracle Trace File Analyzer Collections

After collecting copies of diagnostic data, Oracle Trace File Analyzer uses Adaptive Classification and Redaction (ACR) to sanitize sensitive data in the collections.

Note:

Starting with Oracle Autonomous Health Framework 24.1, the Oracle Trace File Analyzer masking feature is deprecated, and can be desupported in a future release.
To mask or sanitize sensitive data in collections:
tfactl set redact=mask|sanitize|none

mask: blocks out the sensitive data in all collections, for example, replaces myhost1 with *******

sanitize: replaces the sensitive data in all collections with random characters, for example, replaces myhost1 with orzhmv1

none (default): does not mask or sanitize sensitive data in collections

You can use the -sanitize and -mask options with the diagcollect command to sanitize or mask sensitive data in a specific collection.

To mask sensitive data:

  1. To mask sensitive data in all collections:
    tfactl set redact=mask
  2. To sanitize sensitive data in all collections:
    tfactl set redact=sanitize
  3. To mask or sanitize sensitive data in a specific collection:
    For example:
    tfactl diagcollect -SRDC ORA-00600 -mask
    tfactl diagcollect -SRDC ORA-00600 -sanitize

4.2.5 Flood Control for Similar Issues

Flood control mechanism helps you save resource through fewer repeat collections for similar issues.

You can:
  • Enable or disable flood control.
  • How many times to collect for an event.
  • Pause flood control.

The flood control data is stored in Berkeley Database and persists across Oracle Trace File Analyzer restarts.

Example 4-1 Flood Control Examples

To check if flood control is enabled or disabled:
# tfactl get floodcontrol
.----------------------------------------.
|               testhost                 |
+--------------------------------+-------+
| Configuration Parameter        | Value |
+--------------------------------+-------+
| Flood Control ( floodcontrol ) | ON    |
'--------------------------------+-------'
To check flood control limit:
# tfactl get fc.limit
.------------------------------------------------.
|                   testhost                     |
+----------------------------------------+-------+
| Configuration Parameter                | Value |
+----------------------------------------+-------+
| Flood Control Limit Count ( fc.limit ) | 3     |
'----------------------------------------+-------'
To check flood control limit time:
# tfactl get fc.limittime
.-------------------------------------------------------------.
|                          testhost                           |
+-----------------------------------------------------+-------+
| Configuration Parameter                             | Value |
+-----------------------------------------------------+-------+
| Flood Control Limit Time (minutes) ( fc.limitTime ) | 60    |
'-----------------------------------------------------+-------'
To check flood control pause time:
# tfactl get fc.pausetime
.-------------------------------------------------------------.
|                          testhost                           |
+-----------------------------------------------------+-------+
| Configuration Parameter                             | Value |
+-----------------------------------------------------+-------+
| Flood Control Pause Time (minutes) ( fc.pauseTime ) | 120   |
'-----------------------------------------------------+-------'
To print flood control details:
# tfactl floodcontrol print

.----------------------------------------------------------------------------------------------------------------------------------------------------------.
| Event                  | Count | Start Date                   | Last Date                    | Limit | Limit Time | Pause Time | Coll Count | Skip Count |
+------------------------+-------+------------------------------+------------------------------+-------+------------+------------+------------+------------+
| orcl:ORA-00600:user1   |     1 | Thu May 21 09:18:56 UTC 2020 | Thu May 21 09:18:56 UTC 2020 |     3 |         60 |        120 |          1 |          0 |
+------------------------+-------+------------------------------+------------------------------+-------+------------+------------+------------+------------+
| orcl:ORA-00600:user2   |     1 | Thu May 21 09:18:25 UTC 2020 | Thu May 21 09:18:25 UTC 2020 |     3 |         60 |        120 |          4 |          2 |
'------------------------+-------+------------------------------+------------------------------+-------+------------+------------+------------+------------'
To clear flood control:
# tfactl floodcontrol clear -event orcl:ORA-00600:user1
Successfully cleared Event orcl:ORA-00600:user1

# tfactl floodcontrol print
.---------------------------------------------------------------------------------------------------------------------.
| Event                  | Count | Start Date | Last Date | Limit | Limit Time | Pause Time | Coll Count | Skip Count |
+------------------------+-------+------------+-----------+-------+------------+------------+------------+------------+
| orcl:ORA-00600:user1   |     0 | null       | null      |     3 |         60 |        120 |          3 |          2 |
'------------------------+-------+------------+-----------+-------+------------+------------+------------+------------'
To udate flood control details:
# tfactl floodcontrol update -event orcl:ORA-00600:user1 -limit 10 -limittime 90 -pausetime 180
Successfully updated Flood Control Event

# tfactl floodcontrol print -event orcl:ORA-00600:user1
.----------------------------------------------------------------------------------------------------------------------------------------------------------.
| Event                  | Count | Start Date                   | Last Date                    | Limit | Limit Time | Pause Time | Coll Count | Skip Count |
+------------------------+-------+------------------------------+------------------------------+-------+------------+------------+------------+------------+
| orcl:ORA-00600:user1   |     1 | Thu May 21 09:18:25 UTC 2020 | Thu May 21 09:18:25 UTC 2020 |    10 |         90 |        180 |          4 |          2 |
'------------------------+-------+------------------------------+------------------------------+-------+------------+------------+------------+------------'

Related Topics