JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris Cluster Data Service for Oracle Guide     Oracle Solaris Cluster
search filter icon
search icon

Document Information

Preface

1.  Installing and Configuring HA for Oracle

Overview of the Installation and Configuration Process for HA for Oracle

Planning the HA for Oracle Installation and Configuration

Configuration Requirements

Configuration Planning Questions

Preparing the Nodes and Disks

How to Prepare the Nodes

How to Configure the Oracle Database Access Using Solaris Volume Manager

How to Configure the Oracle Database Access Using Veritas Volume Manager

How to Configure the Oracle Database Access Using Oracle ASM

How to Configure an Oracle Grid Infrastructure for Clusters SCAN Listener

Installing the Oracle ASM Software

Verifying the Oracle ASM Software Installation

Installing the Oracle Database Software

How to Install the Oracle Database Software

How to Set the Oracle Database Kernel Parameters

Verifying the Oracle Database Installation and Configuration

How to Verify the Oracle Database Installation

Creating an Oracle Database

How to Create a Primary Oracle Database

Setting Up Oracle Database Permissions

How to Set Up Oracle Database Permissions

Installing the HA for Oracle Packages

How to Install the HA for Oracle Packages

Registering and Configuring HA for Oracle

Tools for Registering and Configuring HA for Oracle

Setting HA for Oracle Extension Properties

How to Register and Configure HA for Oracle (clsetup)

How to Register and Configure HA for Oracle Without Oracle Grid Infrastructure (CLI)

How to Register and Configure HA for Oracle With Oracle Grid Infrastructure for a Standalone Server (CLI)

How to Register and Configure HA for Oracle With Oracle Grid Infrastructure for a Cluster (CLI)

Verifying the HA for Oracle Installation

How to Verify the HA for Oracle Installation

Oracle Clients

Location of HA for Oracle Log Files

Tuning the HA for Oracle Fault Monitors

Operation of the Oracle Server Fault Monitor

Operation of the Main Fault Monitor

Operation of the Database Client Fault Probe

Operations to Monitor the Partition for Archived Redo Logs

Operations to Determine Whether the Database is Operational

Actions by the Server Fault Monitor in Response to a Database Transaction Failure

Scanning of Logged Alerts by the Server Fault Monitor

Operation of the Oracle Listener Fault Monitor

Obtaining Core Files for Troubleshooting DBMS Timeouts

Customizing the HA for Oracle Server Fault Monitor

Defining Custom Behavior for Errors

Custom Action File Format

Changing the Response to a DBMS Error

Responding to an Error Whose Effects Are Major

Ignoring an Error Whose Effects Are Minor

Changing the Response to Logged Alerts

Changing the Maximum Number of Consecutive Timed-Out Probes

Propagating a Custom Action File to All Nodes in a Cluster

Specifying the Custom Action File That a Server Fault Monitor Should Use

How to Specify the Custom Action File That a Server Fault Monitor Should Use

Upgrading HA for Oracle Resource Types

Upgrading the SUNW.oracle_listener Resource Type

Information for Registering the New Resource Type Version

Information for Migrating Existing Instances of the Resource Type

Upgrading the SUNW.oracle_server Resource Type

Information for Registering the New Resource Type Version

Information for Migrating Existing Instances of the Resource Type

Changing the Role of an Oracle Data Guard Instance

How to Change the Role of an Oracle Data Guard Instance

A.  HA for Oracle Extension Properties

B.  Preset Actions for DBMS Errors and Logged Alerts

C.  Sample Configurations for Oracle ASM with HA for Oracle

Index

Customizing the HA for Oracle Server Fault Monitor

Customizing the HA for Oracle server fault monitor enables you to modify the behavior of the server fault monitor as follows:


Caution

Caution - Before you customize the HA for Oracle server fault monitor, consider the effects of your customizations, especially if you change an action from restart or switch over to ignore or stop monitoring. If errors remain uncorrected for long periods, the errors might cause problems with the database. If you encounter problems with the database after customizing the HA for Oracle server fault monitor, revert to using the preset actions. Reverting to the preset actions enables you to determine if the problem is caused by your customizations.


The following sections describe the activities you perform to customize the HA for Oracle server fault monitor:

Defining Custom Behavior for Errors

The HA for Oracle server fault monitor detects the following types of errors:

To define custom behavior for these types of errors, create a custom action file. This section contains the following information about custom action files:

Custom Action File Format

A custom action file is a plain text file. The file contains one or more entries that define the custom behavior of the HA for Oracle server fault monitor. Each entry defines the custom behavior for a single DBMS error, a single timeout error, or several logged alerts. A maximum of 1024 entries is allowed in a custom action file.


Note - Each entry in a custom action file overrides the preset action for an error, or specifies an action for an error for which no action is preset. Create entries in a custom action file only for the preset actions that you are overriding or for errors for which no action is preset. Do not create entries for actions that you are not changing.


An entry in a custom action file consists of a sequence of keyword-value pairs that are separated by semicolons. Each entry is enclosed in braces.

The format of an entry in a custom action file is as follows:

{
[ERROR_TYPE=DBMS_ERROR|SCAN_LOG|TIMEOUT_ERROR;]
ERROR=error-spec; 
[ACTION=SWITCH|RESTART|STOP|NONE;]
[CONNECTION_STATE=co|di|on|*;]
[NEW_STATE=co|di|on|*;]
[MESSAGE="message-string"]
}

White space may be used between separated keyword-value pairs and between entries to format the file.

The meaning and permitted values of the keywords in a custom action file are as follows:

ERROR_TYPE

Indicates the type of the error that the server fault monitor has detected. The following values are permitted for this keyword:

DBMS_ERROR

Specifies that the error is a DBMS error.

SCAN_LOG

Specifies that the error is an alert that is logged in the alert log file.

TIMEOUT_ERROR

Specifies that the error is a timeout.

The ERROR_TYPE keyword is optional. If you omit this keyword, the error is assumed to be a DBMS error.

ERROR

Identifies the error. The data type and the meaning of error-spec are determined by the value of the ERROR_TYPE keyword as shown in the following table.

ERROR_TYPE
Data Type
Meaning
DBMS_ERROR
Integer
The error number of a DBMS error that is generated by Oracle
SCAN_LOG
Quoted regular expression
A string in an error message that Oracle has logged to the Oracle alert log file
TIMEOUT_ERROR
Integer
The number of consecutive timed-out probes since the server fault monitor was last started or restarted

You must specify the ERROR keyword. If you omit this keyword, the entry in the custom action file is ignored.

ACTION

Specifies the action that the server fault monitor is to perform in response to the error. The following values are permitted for this keyword:

NONE

Specifies that the server fault monitor ignores the error.

STOP

Specifies that the server fault monitor is stopped.

RESTART

Specifies that the server fault monitor stops and restarts the entity that is specified by the value of the Restart_type extension property of the SUNW.oracle_server resource.

SWITCH

Specifies that the server fault monitor switches over the database server resource group to another node or zone.

The ACTION keyword is optional. If you omit this keyword, the server fault monitor ignores the error.

CONNECTION_STATE

Specifies the required state of the connection between the database and the server fault monitor when the error is detected. The entry applies only if the connection is in the required state when the error is detected. The following values are permitted for this keyword:

*

Specifies that the entry always applies, regardless of the state of the connection.

co

Specifies that the entry applies only if the server fault monitor is attempting to connect to the database.

on

Specifies that the entry applies only if the server fault monitor is online. The server fault monitor is online if it is connected to the database.

di

Specifies that the entry applies only if the server fault monitor is disconnecting from the database.

The CONNECTION_STATE keyword is optional. If you omit this keyword, the entry always applies, regardless of the state of the connection.

NEW_STATE

Specifies the state of the connection between the database and the server fault monitor that the server fault monitor must attain after the error is detected. The following values are permitted for this keyword:

*

Specifies that the state of the connection must remain unchanged.

co

Specifies that the server fault monitor must disconnect from the database and reconnect immediately to the database.

di

Specifies that the server fault monitor must disconnect from the database. The server fault monitor reconnects when it next probes the database.

The NEW_STATE keyword is optional. If you omit this keyword, the state of the database connection remains unchanged after the error is detected.

MESSAGE

Specifies an additional message that is printed to the resource's log file when this error is detected. The message must be enclosed in double quotes. This message is additional to the standard message that is defined for the error.

The MESSAGE keyword is optional. If you omit this keyword, no additional message is printed to the resource's log file when this error is detected.

Changing the Response to a DBMS Error

The action that the server fault monitor performs in response to each DBMS error is preset as listed in Table B-1. To determine whether you need to change the response to a DBMS error, consider the effect of DBMS errors on your database to determine if the preset actions are appropriate. For examples, see the subsections that follow:

To change the response to a DBMS error, create an entry in a custom action file in which the keywords are set as follows:

Responding to an Error Whose Effects Are Major

If an error that the server fault monitor ignores affects more than one session, action by the server fault monitor might be required to prevent a loss of service.

For example, no action is preset for Oracle error 4031: unable to allocate num-bytes bytes of shared memory. However, this Oracle error indicates that the shared global area (SGA) has insufficient memory, is badly fragmented, or both states apply. If this error affects only a single session, ignoring the error might be appropriate. However, if this error affects more than one session, consider specifying that the server fault monitor restart the database.

The following example shows an entry in a custom action file for changing the response to a DBMS error to restart.

Example 1-4 Changing the Response to a DBMS Error to Restart

{
ERROR_TYPE=DBMS_ERROR;
ERROR=4031; 
ACTION=restart;
CONNECTION_STATE=*; 
NEW_STATE=*;
MESSAGE="Insufficient memory in shared pool.";
}

This example shows an entry in a custom action file that overrides the preset action for DBMS error 4031. This entry specifies the following behavior:

Ignoring an Error Whose Effects Are Minor

If the effects of an error to which the server fault monitor responds are minor, ignoring the error might be less disruptive than responding to the error.

For example, the preset action for Oracle error 4030: out of process memory when trying to allocate num-bytes bytes is restart. This Oracle error indicates that the server fault monitor could not allocate private heap memory. One possible cause of this error is that insufficient memory is available to the operating system. If this error affects more than one session, restarting the database might be appropriate. However, this error might not affect other sessions because these sessions do not require further private memory. In this situation, consider specifying that the server fault monitor ignore the error.

The following example shows an entry in a custom action file for ignoring a DBMS error.

Example 1-5 Ignoring a DBMS Error

{
ERROR_TYPE=DBMS_ERROR;
ERROR=4030;
ACTION=none;
CONNECTION_STATE=*;
NEW_STATE=*;
MESSAGE="";
}

This example shows an entry in a custom action file that overrides the preset action for DBMS error 4030. This entry specifies the following behavior:

Changing the Response to Logged Alerts

The Oracle software logs alerts in a file that is identified by the alert_log_file extension property. The server fault monitor scans this file and performs actions in response to alerts for which an action is defined.

Logged alerts for which an action is preset are listed in Table B-2. Change the response to logged alerts to change the preset action, or to define new alerts to which the server fault monitor responds.

To change the response to logged alerts, create an entry in a custom action file in which the keywords are set as follows:

The server fault monitor processes the entries in a custom action file in the order in which the entries occur. Only the first entry that matches a logged alert is processed. Later entries that match are ignored. If you are using regular expressions to specify actions for several logged alerts, ensure that more specific entries occur before more general entries. Specific entries that occur after general entries might be ignored.

For example, a custom action file might define different actions for errors that are identified by the regular expressions ORA-65 and ORA-6. To ensure that the entry that contains the regular expression ORA-65 is not ignored, ensure that this entry occurs before the entry that contains the regular expression ORA-6.

The following example shows an entry in a custom action file for changing the response to a logged alert.

Example 1-6 Changing the Response to a Logged Alert

{
ERROR_TYPE=SCAN_LOG;
ERROR="ORA-00600: internal error";
ACTION=RESTART;
}

This example shows an entry in a custom action file that overrides the preset action for logged alerts about internal errors. This entry specifies the following behavior:

Changing the Maximum Number of Consecutive Timed-Out Probes

By default, the server fault monitor restarts the database after the second consecutive timed-out probe. If the database is lightly loaded, two consecutive timed-out probes should be sufficient to indicate that the database is hanging. However, during periods of heavy load, a server fault monitor probe might time out even if the database is functioning correctly. To prevent the server fault monitor from restarting the database unnecessarily, increase the maximum number of consecutive timed-out probes.


Caution

Caution - Increasing the maximum number of consecutive timed-out probes increases the time that is required to detect that the database is hanging.


To change the maximum number of consecutive timed-out probes allowed, create one entry in a custom action file for each consecutive timed-out probe that is allowed except the first timed-out probe.


Note - You are not required to create an entry for the first timed-out probe. The action that the server fault monitor performs in response to the first timed-out probe is preset.


For the last allowed timed-out probe, create an entry in which the keywords are set as follows:

For each remaining consecutive timed-out probe except the first timed-out probe, create an entry in which the keywords are set as follows:


Tip - To facilitate debugging, specify a message that indicates the sequence number of the timed-out probe.


The following example shows the entries in a custom action file for increasing the maximum number of consecutive timed-out probes to five.

Example 1-7 Changing the Maximum Number of Consecutive Timed-Out Probes

{
ERROR_TYPE=TIMEOUT;
ERROR=2;
ACTION=NONE;
CONNECTION_STATE=*;
NEW_STATE=*;
MESSAGE="Timeout #2 has occurred.";
}

{
ERROR_TYPE=TIMEOUT;
ERROR=3;
ACTION=NONE;
CONNECTION_STATE=*;
NEW_STATE=*;
MESSAGE="Timeout #3 has occurred.";
}

{
ERROR_TYPE=TIMEOUT;
ERROR=4;
ACTION=NONE;
CONNECTION_STATE=*;
NEW_STATE=*;
MESSAGE="Timeout #4 has occurred.";
}

{
ERROR_TYPE=TIMEOUT;
ERROR=5;
ACTION=RESTART;
CONNECTION_STATE=*;
NEW_STATE=*;
MESSAGE="Timeout #5 has occurred. Restarting.";
}

This example shows the entries in a custom action file for increasing the maximum number of consecutive timed-out probes to five. These entries specify the following behavior:

Propagating a Custom Action File to All Nodes in a Cluster

A server fault monitor must behave consistently on all cluster nodes or zones. Therefore, the custom action file that the server fault monitor uses must be identical on all cluster nodes or zones. After creating or modifying a custom action file, ensure that this file is identical on all cluster nodes or zones by propagating the file to all cluster nodes or zones. To propagate the file to all cluster nodes or zones, use the method that is most appropriate for your cluster configuration:

Specifying the Custom Action File That a Server Fault Monitor Should Use

To apply customized actions to a server fault monitor, you must specify the custom action file that the fault monitor should use. Customized actions are applied to a server fault monitor when the server fault monitor reads a custom action file. A server fault monitor reads a custom action file when the you specify the file.

Specifying a custom action file also validates the file. If the file contains syntax errors, an error message is displayed. Therefore, after modifying a custom action file, specify the file again to validate the file.


Caution

Caution - If syntax errors in a modified custom action file are detected, correct the errors before the fault monitor is restarted. If the syntax errors remain uncorrected when the fault monitor is restarted, the fault monitor reads the erroneous file, ignoring entries that occur after the first syntax error.


How to Specify the Custom Action File That a Server Fault Monitor Should Use

  1. On a cluster node, become superuser or assume a role that provides solaris.cluster.modify RBAC authorization.
  2. Set the Custom_action_file extension property of the SUNW.oracle_server resource.

    Set this property to the absolute path of the custom action file.

    # clresource set -p custom_action_file=filepath server-resource
    -p custom_action_file=filepath

    Specifies the absolute path of the custom action file.

    server-resource

    Specifies the SUNW.oracle_server resource.