Installing and Configuring HA for Oracle
Overview of the Installation and Configuration Process for HA for Oracle
Planning the HA for Oracle Installation and Configuration
Configuration Planning Questions
How to Configure the Oracle Database Access Using Solaris Volume Manager
How to Configure the Oracle Database Access Using Veritas Volume Manager
How to Configure the Oracle Database Access Using Oracle ASM
How to Install the Oracle ASM Software
How to Verify the Oracle ASM Software Installation
Installing the Oracle Software
How to Install the Oracle Software
How to Set the Oracle Kernel Parameters
Verifying the Oracle Installation and Configuration
How to Verify the Oracle Installation
How to Create a Primary Oracle Database
Setting Up Oracle Database Permissions
How to Set Up Oracle Database Permissions
Installing the HA for Oracle Packages
How to Install the HA for Oracle Packages
Registering and Configuring HA for Oracle
Tools for Registering and Configuring HA for Oracle
Setting HA for Oracle Extension Properties
How to Register and Configure HA for Oracle by Using the clsetup Utility
Verifying the HA for Oracle Installation
How to Verify the HA for Oracle Installation
Location of HA for Oracle Log Files
Tuning the HA for Oracle Fault Monitors
Operation of the Oracle Server Fault Monitor
Operation of the Main Fault Monitor
Operation of the Database Client Fault Probe
Operations to Monitor the Partition for Archived Redo Logs
Operations to Determine Whether the Database is Operational
Actions by the Server Fault Monitor in Response to a Database Transaction Failure
Scanning of Logged Alerts by the Server Fault Monitor
Operation of the Oracle Listener Fault Monitor
Obtaining Core Files for Troubleshooting DBMS Timeouts
Customizing the HA for Oracle Server Fault Monitor
Defining Custom Behavior for Errors
Changing the Response to a DBMS Error
Responding to an Error Whose Effects Are Major
Ignoring an Error Whose Effects Are Minor
Changing the Response to Logged Alerts
Changing the Maximum Number of Consecutive Timed-Out Probes
Propagating a Custom Action File to All Nodes in a Cluster
Specifying the Custom Action File That a Server Fault Monitor Should Use
How to Specify the Custom Action File That a Server Fault Monitor Should Use
Upgrading HA for Oracle Resource Types
Upgrading the SUNW.oracle_listener Resource Type
Information for Registering the New Resource Type Version
Information for Migrating Existing Instances of the Resource Type
Upgrading the SUNW.oracle_server Resource Type
Information for Registering the New Resource Type Version
Information for Migrating Existing Instances of the Resource Type
Changing the Role of an Oracle Data Guard Instance
How to Change the Role of an Oracle Data Guard Instance
A. HA for Oracle Extension Properties
B. Preset Actions for DBMS Errors and Logged Alerts
Customizing the HA for Oracle server fault monitor enables you to modify the behavior of the server fault monitor as follows:
Overriding the preset action for an error
Specifying an action for an error for which no action is preset
Customizing the HA for Oracle server fault monitor involves the following activities:
Propagating a custom action file to all nodes or zones in a cluster
Specifying the custom action file that a server fault monitor should use
The HA for Oracle server fault monitor detects the following types of errors:
DBMS errors that occur during a probe of the database by the server fault monitor
Alerts that Oracle logs in the alert log file
Timeouts that result from a failure to receive a response within the time that is set by the Probe_timeout extension property
To define custom behavior for these types of errors, create a custom action file. This section contains the following information about custom action files:
A custom action file is a plain text file. The file contains one or more entries that define the custom behavior of the HA for Oracle server fault monitor. Each entry defines the custom behavior for a single DBMS error, a single timeout error, or several logged alerts. A maximum of 1024 entries is allowed in a custom action file.
Note - Each entry in a custom action file overrides the preset action for an error, or specifies an action for an error for which no action is preset. Create entries in a custom action file only for the preset actions that you are overriding or for errors for which no action is preset. Do not create entries for actions that you are not changing.
An entry in a custom action file consists of a sequence of keyword-value pairs that are separated by semicolons. Each entry is enclosed in braces.
The format of an entry in a custom action file is as follows:
{ [ERROR_TYPE=DBMS_ERROR|SCAN_LOG|TIMEOUT_ERROR;] ERROR=error-spec; [ACTION=SWITCH|RESTART|STOP|NONE;] [CONNECTION_STATE=co|di|on|*;] [NEW_STATE=co|di|on|*;] [MESSAGE="message-string"] }
White space may be used between separated keyword-value pairs and between entries to format the file.
The meaning and permitted values of the keywords in a custom action file are as follows:
Indicates the type of the error that the server fault monitor has detected. The following values are permitted for this keyword:
Specifies that the error is a DBMS error.
Specifies that the error is an alert that is logged in the alert log file.
Specifies that the error is a timeout.
The ERROR_TYPE keyword is optional. If you omit this keyword, the error is assumed to be a DBMS error.
Identifies the error. The data type and the meaning of error-spec are determined by the value of the ERROR_TYPE keyword as shown in the following table.
|
You must specify the ERROR keyword. If you omit this keyword, the entry in the custom action file is ignored.
Specifies the action that the server fault monitor is to perform in response to the error. The following values are permitted for this keyword:
Specifies that the server fault monitor ignores the error.
Specifies that the server fault monitor is stopped.
Specifies that the server fault monitor stops and restarts the entity that is specified by the value of the Restart_type extension property of the SUNW.oracle_server resource.
Specifies that the server fault monitor switches over the database server resource group to another node or zone.
The ACTION keyword is optional. If you omit this keyword, the server fault monitor ignores the error.
Specifies the required state of the connection between the database and the server fault monitor when the error is detected. The entry applies only if the connection is in the required state when the error is detected. The following values are permitted for this keyword:
Specifies that the entry always applies, regardless of the state of the connection.
Specifies that the entry applies only if the server fault monitor is attempting to connect to the database.
Specifies that the entry applies only if the server fault monitor is online. The server fault monitor is online if it is connected to the database.
Specifies that the entry applies only if the server fault monitor is disconnecting from the database.
The CONNECTION_STATE keyword is optional. If you omit this keyword, the entry always applies, regardless of the state of the connection.
Specifies the state of the connection between the database and the server fault monitor that the server fault monitor must attain after the error is detected. The following values are permitted for this keyword:
Specifies that the state of the connection must remain unchanged.
Specifies that the server fault monitor must disconnect from the database and reconnect immediately to the database.
Specifies that the server fault monitor must disconnect from the database. The server fault monitor reconnects when it next probes the database.
The NEW_STATE keyword is optional. If you omit this keyword, the state of the database connection remains unchanged after the error is detected.
Specifies an additional message that is printed to the resource's log file when this error is detected. The message must be enclosed in double quotes. This message is additional to the standard message that is defined for the error.
The MESSAGE keyword is optional. If you omit this keyword, no additional message is printed to the resource's log file when this error is detected.
The action that the server fault monitor performs in response to each DBMS error is preset as listed in Table 1. To determine whether you need to change the response to a DBMS error, consider the effect of DBMS errors on your database to determine if the preset actions are appropriate. For examples, see the subsections that follow:
To change the response to a DBMS error, create an entry in a custom action file in which the keywords are set as follows:
ERROR_TYPE is set to DBMS_ERROR.
ERROR is set to the error number of the DBMS error.
ACTION is set to the action that you require.
If an error that the server fault monitor ignores affects more than one session, action by the server fault monitor might be required to prevent a loss of service.
For example, no action is preset for Oracle error 4031: unable to allocate num-bytes bytes of shared memory. However, this Oracle error indicates that the shared global area (SGA) has insufficient memory, is badly fragmented, or both states apply. If this error affects only a single session, ignoring the error might be appropriate. However, if this error affects more than one session, consider specifying that the server fault monitor restart the database.
The following example shows an entry in a custom action file for changing the response to a DBMS error to restart.
Example 4 Changing the Response to a DBMS Error to Restart
{ ERROR_TYPE=DBMS_ERROR; ERROR=4031; ACTION=restart; CONNECTION_STATE=*; NEW_STATE=*; MESSAGE="Insufficient memory in shared pool."; }
This example shows an entry in a custom action file that overrides the preset action for DBMS error 4031. This entry specifies the following behavior:
In response to DBMS error 4031, the action that the server fault monitor performs is restart.
This entry applies regardless of the state of the connection between the database and the server fault monitor when the error is detected.
The state of the connection between the database and the server fault monitor must remain unchanged after the error is detected.
The following message is printed to the resource's log file when this error is detected:
Insufficient memory in shared pool.
If the effects of an error to which the server fault monitor responds are minor, ignoring the error might be less disruptive than responding to the error.
For example, the preset action for Oracle error 4030: out of process memory when trying to allocate num-bytes bytes is restart. This Oracle error indicates that the server fault monitor could not allocate private heap memory. One possible cause of this error is that insufficient memory is available to the operating system. If this error affects more than one session, restarting the database might be appropriate. However, this error might not affect other sessions because these sessions do not require further private memory. In this situation, consider specifying that the server fault monitor ignore the error.
The following example shows an entry in a custom action file for ignoring a DBMS error.
Example 5 Ignoring a DBMS Error
{ ERROR_TYPE=DBMS_ERROR; ERROR=4030; ACTION=none; CONNECTION_STATE=*; NEW_STATE=*; MESSAGE=""; }
This example shows an entry in a custom action file that overrides the preset action for DBMS error 4030. This entry specifies the following behavior:
The server fault monitor ignores DBMS error 4030.
This entry applies regardless of the state of the connection between the database and the server fault monitor when the error is detected.
The state of the connection between the database and the server fault monitor must remain unchanged after the error is detected.
No additional message is printed to the resource's log file when this error is detected.
The Oracle software logs alerts in a file that is identified by the alert_log_file extension property. The server fault monitor scans this file and performs actions in response to alerts for which an action is defined.
Logged alerts for which an action is preset are listed in Table 2. Change the response to logged alerts to change the preset action, or to define new alerts to which the server fault monitor responds.
To change the response to logged alerts, create an entry in a custom action file in which the keywords are set as follows:
ERROR_TYPE is set to SCAN_LOG.
ERROR is set to a quoted regular expression that identifies a string in an error message that Oracle has logged to the Oracle alert log file.
ACTION is set to the action that you require.
The server fault monitor processes the entries in a custom action file in the order in which the entries occur. Only the first entry that matches a logged alert is processed. Later entries that match are ignored. If you are using regular expressions to specify actions for several logged alerts, ensure that more specific entries occur before more general entries. Specific entries that occur after general entries might be ignored.
For example, a custom action file might define different actions for errors that are identified by the regular expressions ORA-65 and ORA-6. To ensure that the entry that contains the regular expression ORA-65 is not ignored, ensure that this entry occurs before the entry that contains the regular expression ORA-6.
The following example shows an entry in a custom action file for changing the response to a logged alert.
Example 6 Changing the Response to a Logged Alert
{ ERROR_TYPE=SCAN_LOG; ERROR="ORA-00600: internal error"; ACTION=RESTART; }
This example shows an entry in a custom action file that overrides the preset action for logged alerts about internal errors. This entry specifies the following behavior:
In response to logged alerts that contain the text ORA-00600: internal error, the action that the server fault monitor performs is restart.
This entry applies regardless of the state of the connection between the database and the server fault monitor when the error is detected.
The state of the connection between the database and the server fault monitor must remain unchanged after the error is detected.
No additional message is printed to the resource's log file when this error is detected.
By default, the server fault monitor restarts the database after the second consecutive timed-out probe. If the database is lightly loaded, two consecutive timed-out probes should be sufficient to indicate that the database is hanging. However, during periods of heavy load, a server fault monitor probe might time out even if the database is functioning correctly. To prevent the server fault monitor from restarting the database unnecessarily, increase the maximum number of consecutive timed-out probes.
Caution - Increasing the maximum number of consecutive timed-out probes increases the time that is required to detect that the database is hanging. |
To change the maximum number of consecutive timed-out probes allowed, create one entry in a custom action file for each consecutive timed-out probe that is allowed except the first timed-out probe.
Note - You are not required to create an entry for the first timed-out probe. The action that the server fault monitor performs in response to the first timed-out probe is preset.
For the last allowed timed-out probe, create an entry in which the keywords are set as follows:
ERROR_TYPE is set to TIMEOUT_ERROR.
ERROR is set to the maximum number of consecutive timed-out probes that are allowed.
ACTION is set to RESTART.
For each remaining consecutive timed-out probe except the first timed-out probe, create an entry in which the keywords are set as follows:
ERROR_TYPE is set to TIMEOUT_ERROR.
ERROR is set to the sequence number of the timed-out probe. For example, for the second consecutive timed-out probe, set this keyword to 2. For the third consecutive timed-out probe, set this keyword to 3.
ACTION is set to NONE.
Tip - To facilitate debugging, specify a message that indicates the sequence number of the timed-out probe.
The following example shows the entries in a custom action file for increasing the maximum number of consecutive timed-out probes to five.
Example 7 Changing the Maximum Number of Consecutive Timed-Out Probes
{ ERROR_TYPE=TIMEOUT; ERROR=2; ACTION=NONE; CONNECTION_STATE=*; NEW_STATE=*; MESSAGE="Timeout #2 has occurred."; } { ERROR_TYPE=TIMEOUT; ERROR=3; ACTION=NONE; CONNECTION_STATE=*; NEW_STATE=*; MESSAGE="Timeout #3 has occurred."; } { ERROR_TYPE=TIMEOUT; ERROR=4; ACTION=NONE; CONNECTION_STATE=*; NEW_STATE=*; MESSAGE="Timeout #4 has occurred."; } { ERROR_TYPE=TIMEOUT; ERROR=5; ACTION=RESTART; CONNECTION_STATE=*; NEW_STATE=*; MESSAGE="Timeout #5 has occurred. Restarting."; }
This example shows the entries in a custom action file for increasing the maximum number of consecutive timed-out probes to five. These entries specify the following behavior:
The server fault monitor ignores the second consecutive timed-out probe through the fourth consecutive timed-out probe.
In response to the fifth consecutive timed-out probe, the action that the server fault monitor performs is restart.
The entries apply regardless of the state of the connection between the database and the server fault monitor when the timeout occurs.
The state of the connection between the database and the server fault monitor must remain unchanged after the timeout occurs.
When the second consecutive timed-out probe through the fourth consecutive timed-out probe occurs, a message of the following form is printed to the resource's log file:
Timeout #number has occurred.
When the fifth consecutive timed-out probe occurs, the following message is printed to the resource's log file:
Timeout #5 has occurred. Restarting.
A server fault monitor must behave consistently on all cluster nodes or zones. Therefore, the custom action file that the server fault monitor uses must be identical on all cluster nodes or zones. After creating or modifying a custom action file, ensure that this file is identical on all cluster nodes or zones by propagating the file to all cluster nodes or zones. To propagate the file to all cluster nodes or zones, use the method that is most appropriate for your cluster configuration:
Locating the file on a file system that all nodes or zones share
Locating the file on a highly available local file system
Copying the file to the local file system of each cluster node or zone by using operating system commands such as the rcp(1) command or the rdist(1) command
To apply customized actions to a server fault monitor, you must specify the custom action file that the fault monitor should use. Customized actions are applied to a server fault monitor when the server fault monitor reads a custom action file. A server fault monitor reads a custom action file when the you specify the file.
Specifying a custom action file also validates the file. If the file contains syntax errors, an error message is displayed. Therefore, after modifying a custom action file, specify the file again to validate the file.
Caution - If syntax errors in a modified custom action file are detected, correct the errors before the fault monitor is restarted. If the syntax errors remain uncorrected when the fault monitor is restarted, the fault monitor reads the erroneous file, ignoring entries that occur after the first syntax error. |
Set this property to the absolute path of the custom action file.
# clresource set -p custom_action_file=filepath server-resource
Specifies the absolute path of the custom action file.
Specifies the SUNW.oracle_server resource.