This chapter contains the following procedures.
How to Configure Oracle Database Access With Solstice DiskSuite
How to Configure Oracle Database Access With VERITAS Volume Manager
How to Install the Sun Cluster HA for Oracle Packages by Using the Web Start Program
How to Install the Sun Cluster HA for Oracle Packages by Using the scinstall Utility
How to Specify the Custom Action File That a Server Fault Monitor Should Use
You can use SunPlexTM Manager to install and configure this data service. See the SunPlex Manager online help for details.
The following table summarizes the tasks for installing and configuring Sun Cluster HA for Oracle. The table also provides cross-references to detailed instructions for performing the tasks. Perform these tasks in the order that they are listed.
Table 1–1 Task Map: Installing and Configuring HA for Oracle
Task |
Cross-Reference |
---|---|
Plan the Sun Cluster HA for Oracle installation and configuration |
Planning the Sun Cluster HA for Oracle Installation and Configuration |
Prepare the nodes and disks | |
Install the Oracle software | |
Verify the Oracle installation | |
Create an Oracle database | |
Set up Oracle database permissions | |
Install the Sun Cluster HA for Oracle packages | |
Register and configure Sun Cluster HA for Oracle | |
Verify the Sun Cluster HA for Oracle installation | |
Understand Sun Cluster HA for Oracle fault monitor | |
(Optional) Customize the Sun Cluster HA for Oracle server fault monitor |
Customizing the Sun Cluster HA for Oracle Server Fault Monitor |
(Optional) Upgrade the SUNW.oracle_server resource type |
This section contains the information that you need to plan your Sun Cluster HA for Oracle installation and configuration.
Your data service configuration might not be supported if you do not adhere to these requirements.
Use the requirements in this section to plan the installation and configuration of Sun Cluster HA for Oracle. These requirements apply to Sun Cluster HA for Oracle only. You must meet these requirements before you proceed with your Sun Cluster HA for Oracle installation and configuration.
For requirements that apply to all data services, see “Identifying Data Service Special Requirements” on page 3.
Oracle application files – These files include Oracle binaries, configuration files, and parameter files. You can install these files either on the local file system, the highly available local file system, or on the cluster file system.
See “Configuration Guidelines for Sun Cluster Data Services” in Sun Cluster 3.1 Data Service Planning and Administration Guide for the advantages and disadvantages of placing the Oracle binaries on the local file system, highly available local file system, and the cluster file system.
Database-related files – These files include the control file, redo logs, and data files. You must install these files on the highly available local file system or the cluster file system as either raw devices or regular files.
Use the questions in this section to plan the installation and configuration of Sun Cluster HA for Oracle. Write the answers to these questions in the space that is provided on the data service worksheets in “Configuration Worksheets” in Sun Cluster 3.1 Data Service Planning and Administration Guide.
What resource groups will you use for network addresses and application resources and the dependencies between them?
What is the logical hostname (for failover services) or shared address (for scalable services) for clients that will access the data service?
Where will the system configuration files reside?
See “Configuration Guidelines for Sun Cluster Data Services” in Sun Cluster 3.1 Data Service Planning and Administration Guide for the advantages and disadvantages of placing the Oracle binaries on the local file system rather than the cluster file system.
This section contains the procedures that you need to prepare the nodes and disks.
Use this procedure to prepare for the installation and configuration of Oracle software.
Perform all of the steps in this section on all of the nodes. If you do not perform all of the steps on all of the nodes, the Oracle installation is incomplete. An incomplete Oracle installation causes Sun Cluster HA for Oracle to fail during startup.
Consult the Oracle documentation before you perform this procedure.
The following steps prepare your nodes and install the Oracle software.
Become superuser on all of the cluster members.
Configure the /etc/nsswitch.conf files as follows so that the data service starts and stops correctly if a switchover or failover occurs.
On each node that can master the logical host that runs Sun Cluster HA for Oracle, include one of the following entries for group in the /etc/nsswitch.conf file.
group: files group: files [NOTFOUND=return] nis group: files [NOTFOUND=return] nisplus |
Sun Cluster HA for Oracle uses the su user command to start and stop the database node. The network information name service might become unavailable when a cluster node's public network fails. Adding one of the preceding entries for group ensures that the su(1M) command does not refer to the NIS/NIS+ name services if the network information name service is unavailable.
Configure the cluster file system for Sun Cluster HA for Oracle.
If raw devices contain the databases, configure the global devices for raw device access. See the Sun Cluster 3.1 10/03 Software Installation Guide for information about how to configure global devices.
If you use the Solstice DiskSuiteTM/Solaris Volume Manager software, configure the Oracle software to use UNIX file system (UFS) logging on mirrored metadevices or raw-mirrored metadevices. See the Solstice DiskSuite/Solaris Volume Manager documentation for more information about how to configure raw-mirrored metadevices.
Prepare the $ORACLE_HOME directory on a local or multihost disk.
If you install the Oracle binaries on a local disk, use a separate disk if possible. Installing the Oracle binaries on a separate disk prevents the binaries from overwrites during operating environment reinstallation.
On each node, create an entry for the database administrator (DBA) group in the /etc/group file, and add potential users to the group.
You typically name the DBA group dba. Verify that the root and oracle users are members of the dba group, and add entries as necessary for other DBA users. Ensure that the group IDs are the same on all of the nodes that run Sun Cluster HA for Oracle, as the following example illustrates.
dba:*:520:root,oracle |
You can create group entries in a network name service (for example, NIS or NIS+). If you create group entries in this way, add your entries to the local /etc/inet/hosts file to eliminate dependency on the network name service.
On each node, create an entry for the Oracle user ID (oracle).
You typically name the Oracle user ID oracle. The following command updates the /etc/passwd and /etc/shadow files with an entry for the Oracle user ID.
# useradd -u 120 -g dba -d /Oracle-home oracle |
Ensure that the oracle user entry is the same on all of the nodes that run Sun Cluster HA for Oracle.
Use this procedure to configure the Oracle database with Solstice DiskSuite volume manager.
Configure the disk devices for the Solstice DiskSuite software to use.
See the Sun Cluster 3.1 10/03 Software Installation Guide for information about how to configure the Solstice DiskSuite software.
If you use raw devices to contain the databases, run the following commands to change each raw-mirrored metadevice's owner, group, and mode.
If you do not use raw devices, do not perform this step.
If you create raw devices, run the following commands for each device on each node that can master the Oracle resource group.
# chown oracle /dev/md/metaset/rdsk/dn # chgrp dba /dev/md/metaset/rdsk/dn # chmod 600 /dev/md/metaset/rdsk/dn |
Specifies the name of the diskset
Specifies the name of the raw disk device within the metaset diskset
Verify that the changes are effective.
# ls -lL /dev/md/metaset/rdsk/dn |
Use this procedure to configure the Oracle database with VERITAS Volume Manager software.
Configure the disk devices for the VxVM software to use.
See the Sun Cluster 3.1 10/03 Software Installation Guide for information about how to configure VERITAS Volume Manager.
If you use raw devices to contain the databases, run the following commands on the current disk-group primary to change each device's owner, group, and mode.
If you do not use raw devices, do not perform this step.
If you create raw devices, run the following command for each raw device.
# vxedit -g diskgroup set user=oracle group=dba mode=600 volume |
Specifies the name of the disk group
Specifies the name of the raw volume within the disk group
Verify that the changes are effective.
# ls -lL /dev/vx/rdsk/diskgroup/volume |
Reregister the disk device group with the cluster to keep the VxVM namespace consistent throughout the cluster.
# scconf -c -D name=diskgroup |
This section contains the procedure that you need to install Oracle software.
Become superuser on a cluster member.
Note the Oracle installation requirements.
Install Oracle binaries on one of the following locations.
Local disks of the cluster nodes
Highly available local file system
Cluster file system
Before you install the Oracle software on the cluster file system, start the Sun Cluster software and become the owner of the disk device group.
See Preparing the Nodes and Disks for more information about installation locations.
Install the Oracle software.
Regardless of where you install the Oracle software, modify each node's /etc/system files as you would in standard Oracle installation procedures. Then reboot.
Log in as oracle to ensure ownership of the entire directory before you perform this step. See the appropriate Oracle installation and configuration guides for instructions about how to install Oracle software.
This section contains the procedure that you need to verify the Oracle installation and configuration.
This procedure does not verify that your application is highly available because you have not yet installed your data service.
Verify that the oracle user and the dba group own the $ORACLE_HOME/bin/oracle directory.
Verify that the $ORACLE_HOME/bin/oracle permissions are set as follows.
-rwsr-s--x |
Verify that the listener binaries exist in the $ORACLE_HOME/bin directory.
When you have completed the work in this section, go to Creating an Oracle Database.
This section contains the procedure to configure and create the initial Oracle database in a Sun Cluster environment. If you create and configure additional databases, omit the procedure How to Create an Oracle Database.
Prepare database configuration files.
Place all of the database-related files (data files, redo log files, and control files) on either shared raw global devices or on the cluster file system. See Preparing the Nodes and Disks for information about installation locations.
Within the init$ORACLE_SID.ora or config$ORACLE_SID.ora file, you might need to modify the assignments for control_files and background_dump_dest to specify the locations of the control files and alert files.
If you use Solaris authentication for database logins, set the remote_os_authent variable in the init$ORACLE_SID.ora file to True.
Create the database.
Start the Oracle installer and select the option to create a database. Alternatively, depending on your Oracle version, you can use the Oracle svrmgrl(1M) command to create the database.
During creation, ensure that all of the database-related files are placed in the appropriate location, either on shared global devices or on the cluster file system.
Verify that the file names of your control files match the file names in your configuration files.
Create the v$sysstat view.
Run the catalog scripts that create the v$sysstat view. The Sun Cluster HA for Oracle fault monitor uses this view.
When you have completed the work in this section, go to Setting Up Oracle Database Permissions.
Perform the procedure in this section to set up Oracle database permissions for Oracle 8i and Oracle 9i.
Enable access for the user and password to be used for fault monitoring.
To use the Oracle authentication method – For all of the supported Oracle releases, type the following script at the sqlplus prompt.
# sqlplus “/as sysdba” grant connect, resource to user identified by passwd; alter user user default tablespace system quota 1m on system; grant select on v_$sysstat to user; grant create session to user; grant create table to user; exit; |
To use the Solaris authentication method – Grant permission for the database to use Solaris authentication.
The user for which you enable Solaris authentication is the user who owns the files under the $ORACLE_HOME directory. The following code sample shows that the user oracle owns these files.
# sqlplus “/as sysdba” create user ops$oracle identified by externally default tablespace system quota 1m on system; grant connect, resource to ops$oracle; grant select on v_$sysstat to ops$oracle; grant create session to ops$oracle; grant create table to ops$oracle; exit; |
Configure NET8 for the Sun Cluster software.
The listener.ora file must be accessible from all of the nodes that are in the cluster. Place these files either under the cluster file system or in the local file system of each node that can potentially run the Oracle resources.
If you place the listener.ora file in a location other than the /var/opt/oracle directory or the $ORACLE_HOME/network/admin directory, you must specify the TNS_ADMIN variable or an equivalent Oracle variable in a user-environment file. For information about Oracle variables, see the Oracle documentation. You must also run the scrgadm(1M) command to set the resource extension parameter User_env, which sources the user-environment file. See Table 1–2 or Table 1–3 for format details.
Sun Cluster HA for Oracle imposes no restrictions on the listener name—it can be any valid Oracle listener name.
The following code sample identifies the lines in listener.ora that are updated.
LISTENER = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = logical-hostname) <- use logical hostname (PORT = 1527) ) ) . . SID_LIST_LISTENER = . . (SID_NAME = SID) <- Database name, default is ORCL |
The following code sample identifies the lines in tnsnames.ora that are updated on client machines.
service_name = . . (ADDRESS = (PROTOCOL = TCP) (HOST = logicalhostname) <- logical hostname (PORT = 1527) <- must match port in LISTENER.ORA ) ) (CONNECT_DATA = (SID = <SID>)) <- database name, default is ORCL |
The following example shows how to update the listener.ora and tnsnames.ora files for the following Oracle instances.
Instance |
Logical Host |
Listener |
---|---|---|
ora8 |
hadbms3 |
LISTENER-ora8 |
ora9 |
hadbms4 |
LISTENER-ora9 |
The corresponding listener.ora entries are the following entries.
LISTENER-ora9 = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = hadbms4) (PORT = 1530) ) ) SID_LIST_LISTENER-ora9 = (SID_LIST = (SID_DESC = (SID_NAME = ora9) ) ) LISTENER-ora8 = (ADDRESS_LIST = (ADDRESS= (PROTOCOL=TCP) (HOST=hadbms3)(PORT=1806)) ) SID_LIST_LISTENER-ora8 = (SID_LIST = (SID_DESC = (SID_NAME = ora8) ) ) |
The corresponding tnsnames.ora entries are the following entries.
ora8 = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = hadbms3) (PORT = 1806)) ) (CONNECT_DATA = (SID = ora8)) ) ora9 = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = hadbms4) (PORT = 1530)) ) (CONNECT_DATA = (SID = ora9)) ) |
Verify that the Sun Cluster software is installed and running on all of the nodes.
# scstat |
Go to Installing the Sun Cluster HA for Oracle Packages to install the Sun Cluster HA for Oracle packages.
If you did not install the Sun Cluster HA for Oracle packages during your initial Sun Cluster installation, perform this procedure to install the packages. Perform this procedure on each cluster node where you are installing the Sun Cluster HA for Oracle packages. To complete this procedure, you need the Sun Cluster Agents CD-ROM.
If you are installing more than one data service simultaneously, perform the procedure in “Installing the Software” in Sun Cluster 3.1 10/03 Software Installation Guide.
Install the Sun Cluster HA for Oracle packages by using one of the following installation tools:
The Web Start program
The scinstall utility
The Web Start program is not available in releases earlier than Sun Cluster 3.1 Data Services 10/03.
You can run the Web Start program with a command-line interface (CLI) or with a graphical user interface (GUI). The content and sequence of instructions in the CLI and the GUI are similar. For more information about the Web Start program, see the installer(1M) man page.
On the cluster node where you are installing the Sun Cluster HA for Oracle packages, become superuser.
(Optional) If you intend to run the Web Start program with a GUI, ensure
that your DISPLAY
environment
variable is set.
Load the Sun Cluster Agents CD-ROM into the CD-ROM drive.
If the Volume Management daemon vold(1M) is running and configured to manage CD-ROM devices, it automatically mounts the CD-ROM on the /cdrom/scdataservices_3_1_vb directory.
Change to the Sun Cluster HA for Oracle component directory of the CD-ROM.
The Web Start program for the Sun Cluster HA for Oracle data service resides in this directory.
# cd /cdrom/scdataservices_3_1_vb/\ components/SunCluster_HA_Oracle_3.1 |
Start the Web Start program.
# ./installer |
When you are prompted, select the type of installation.
Follow instructions on the screen to install the Sun Cluster HA for Oracle packages on the node.
After the installation is finished, the Web Start program provides an installation summary. This summary enables you to view logs that the Web Start program created during the installation. These logs are located in the /var/sadm/install/logs directory.
Exit the Web Start program.
Unload the Sun Cluster Agents CD-ROM from the CD-ROM drive.
See Registering and Configuring Sun Cluster HA for Oracle to register Sun Cluster HA for Oracle and to configure the cluster for the data service.
Load the Sun Cluster Agents CD-ROM into the CD-ROM drive.
Run the scinstall utility with no options.
This step starts the scinstall utility in interactive mode.
Choose the menu option, Add Support for New Data Service to This Cluster Node.
The scinstall utility prompts you for additional information.
Provide the path to the Sun Cluster Agents CD-ROM.
The utility refers to the CD as the “data services cd.”
Specify the data service to install.
The scinstall utility lists the data service that you selected and asks you to confirm your choice.
Exit the scinstall utility.
Unload the CD from the drive.
See Registering and Configuring Sun Cluster HA for Oracle to register Sun Cluster HA for Oracle and to configure the cluster for the data service.
This section contains the procedures that you need to configure Sun Cluster HA for Oracle.
Use the extension properties in Table 1–2 to create your resources. Use the command scrgadm -x parameter=value to configure extension properties when you create your resource. Use the procedure in “Administering Data Service Resources” in Sun Cluster 3.1 Data Service Planning and Administration Guide to configure the extension properties if you have already created your resources. You can update some extension properties dynamically. You can update others, however, only when you create or disable a resource. The Tunable entries indicate when you can update each property. See “Standard Properties” in Sun Cluster 3.1 Data Service Planning and Administration Guide for details about all Sun Cluster properties.
Table 1–2 Sun Cluster HA for Oracle Listener Extension Properties
Name/Data Type |
Description |
---|---|
The name of the Oracle listener.
Default: LISTENER Range: None Tunable: When disabled
|
|
The path to the Oracle home directory.
Default: None Range: Minimum = 1 Tunable: When disabled |
|
A file that contains environment variables to be set before listener startup and shutdown. Those environment variables that have values that differ from Oracle defaults must be defined in this file. For example, a user's listener.ora file might not reside under the /var/opt/oracle directory or the $ORACLE_HOME/network/admin. directory. In this situation, the TNS_ADMIN environment variable should be defined. The definition of each environment variable that is defined must follow the format VARIABLE_NAME=VARIABLE_VALUE. Each of these environment variables must be specified, one per line in the environment file.
Default: ““ Range: None Tunable: Any time |
Table 1–3 describes the extension properties that you can set for the Oracle server. For the Oracle server, you are required to set only the following extension properties:
ORACLE_HOME
ORACLE_SID
Alert_log_file
Connect_string
Use this procedure to configure Sun Cluster HA for Oracle as a failover data service. This procedure assumes that you installed the data service packages during your initial Sun Cluster installation. If you did not install the Sun Cluster HA for Oracle packages as part of your initial Sun Cluster installation, go to Installing the Sun Cluster HA for Oracle Packages to install the data service packages. Otherwise, use this procedure to configure the Sun Cluster HA for Oracle.
You must have the following information to perform this procedure.
The names of the cluster nodes that master the data service.
The network resource that clients use to access the data service. Normally, you set up this IP address when you install the cluster. See the Sun Cluster 3.1 10/03 Concepts Guide for details about network resources.
The path to the Oracle application binaries for the resources that you plan to configure.
Become superuser on a cluster member.
Run the scrgadm command to register the resource types for the data service.
For Sun Cluster HA for Oracle, you register two resource types, SUNW.oracle_server and SUNW.oracle_listener, as follows.
# scrgadm -a -t SUNW.oracle_server # scrgadm -a -t SUNW.oracle_listener |
Adds the data service resource type.
Specifies the predefined resource type name for your data service.
Create a failover resource group to hold the network and application resources.
You can optionally select the set of nodes on which the data service can run with the -h option, as follows.
# scrgadm -a -g resource-group [-h nodelist] |
Specifies the name of the resource group. This name can be your choice but must be unique for resource groups within the cluster.
Specifies an optional comma-separated list of physical node names or IDs that identify potential masters. The order here determines the order in which the nodes are considered as primary during failover.
Use the -h option to specify the order of the node list. If all of the nodes that are in the cluster are potential masters, you do not need to use the -h option.
Verify that all of the network resources that you use have been added to your name service database.
You should have performed this verification during the Sun Cluster installation.
Ensure that all of the network resources are present in the server's and client's /etc/inet/hosts file to avoid any failures because of name service lookup.
Add a network resource to the failover resource group.
# scrgadm -a -L -g resource-group -l logical-hostname [-n netiflist] |
Specifies a network resource. The network resource is the logical hostname or shared address (IP address) that clients use to access Sun Cluster HA for Oracle.
Specifies an optional, comma-separated list that identifies the IP Networking Multipathing groups that are on each node. Each element in netiflist must be in the form of netif@node. netif can be given as an IP Networking Multipathing group name, such as sc_ipmp0. The node can be identified by the node name or node ID, such as sc_ipmp0@1 or sc_ipmp@phys-schost-1.
Sun Cluster does not currently support the use of the adapter name for netif.
Register the SUNW.HAStoragePlus resource type with the cluster.
# scrgadm -a -t SUNW.HAStoragePlus |
Create the resource oracle-hastp-rs of type SUNW.HAStoragePlus.
# scrgadm -a -j oracle-hastp-rs -g oracle-rg -t SUNW.HAStoragePlus \ [If your database is on a raw device, specify the global device path.] -x GlobalDevicePaths=ora-set1,/dev/global/dsk/dl \ [If your database is on a Cluster File Service, specify the global file system and local file system mount points.] -x FilesystemMountPoints=/global/ora-inst,/global/ora-data/logs,/ local/ora-data \ [Set AffinityOn to true.] -x AffinityOn=TRUE |
AffinityOn must be set to TRUE and the local file system must reside on global disk groups to be failover.
Run the scswitch command to complete the following tasks and bring the resource group oracle-rg online on a cluster node.
Be sure to switch only at the resource group level. Switching at the device group level confuses the resource group, causing it to fail over.
Move the resource group into a MANAGED state.
Bring the resource group online.
This node is made the primary for device group ora-set1 and raw device /dev/global/dsk/d1. Device groups that are associated with file systems such as /global/ora-inst and /global/ora-data/logs are also made primaries on this node.
# scswitch -Z -g oracle-rg |
Create Oracle application resources in the failover resource group.
Oracle server resource:
# scrgadm -a -j resource -g resource-group \ -t SUNW.oracle_server \ -x Connect_string=user/passwd \ -x ORACLE_SID=instance \ -x ORACLE_HOME=Oracle-home \ -x Alert_log_file=path-to-log \ -x Restart_type=entity-to-restart -y resource_dependencies=storageplus-resource |
Oracle listener resource:
# scrgadm -a -j resource -g resource-group \ -t SUNW.oracle_listener \ -x LISTENER_NAME=listener \ -x ORACLE_HOME=Oracle-home -y resource_dependencies=storageplus-resource |
Specifies the name of the resource to add.
Specifies the name of the resource group into which the resources are to be placed.
Specifies the type of the resource to add.
Sets the path under $ORACLE_HOME for the server message log.
Specifies the user and password that the fault monitor uses to connect to the database. These settings must agree with the permissions that you set up in How to Set Up Oracle Database Permissions. If you use Solaris authorization, type a slash (/) instead of the user name and password.
Sets the Oracle system identifier.
Sets the name of the Oracle listener instance. This name must match the corresponding entry in listener.ora.
Sets the path to the Oracle home directory.
Specifies the entity that the server fault monitor restarts when the response to a fault is restart. Set entity-to-restart as follows:
To specify that all resources in the resource group that contains this resource are restarted, set entity-to-restart to RESOURCE_GROUP_RESTART. By default, the resource group that contains this resource is restarted.
If you set entity-to-restart to RESOURCE_GROUP_RESTART, all other resources (such as Apache or DNS) in the resource group are restarted, even if they are not faulty. Therefore, include in the resource group only the resources that you require to be restarted when the Oracle server resource is restarted.
To specify that only this resource is restarted, set entity-to-restart to RESOURCE_RESTART.
Optionally, you can set additional extension properties that belong to the Oracle data service to override their default values. See Sun Cluster HA for Oracle Extension Properties for a list of extension properties.
Enable the resource and fault monitoring.
# scswitch -Z -g resource-group |
Enables the resource and monitor, moves the resource group to the MANAGED state, and brings it online.
Specifies the name of the resource group.
The following example shows how to register Sun Cluster HA for Oracle on a two-node cluster.
Cluster Information Node names: phys-schost-1, phys-schost-2 Logical Hostname: schost-1 Resource group: resource-group-1 (failover resource group) Oracle Resources: oracle-server-1, oracle-listener-1 Oracle Instances: ora-lsnr (listener), ora-srvr (server) (Add the failover resource group to contain all of the resources.) # scrgadm -a -g resource-group-1 (Add the logical hostname resource to the resource group.) # scrgadm -a -L -g resource-group-1 -l schost-1 (Register the Oracle resource types.) # scrgadm -a -t SUNW.oracle_server # scrgadm -a -t SUNW.oracle_listener (Add the Oracle application resources to the resource group.) # scrgadm -a -j oracle-server-1 -g resource-group-1 \ -t SUNW.oracle_server -x ORACLE_HOME=/global/oracle \ -x Alert_log_file=/global/oracle/message-log \ -x ORACLE_SID=ora-srvr -x Connect_string=scott/tiger # scrgadm -a -j oracle-listener-1 -g resource-group-1 \ -t SUNW.oracle_listener -x ORACLE_HOME=/global/oracle \ -x LISTENER_NAME=ora-lsnr (Bring the resource group online.) # scswitch -Z -g resource-group-1 |
Go to Verifying the Sun Cluster HA for Oracle Installation after you register and configure Sun Cluster HA for Oracle.
Perform the following verification tests to make sure that you have correctly installed Sun Cluster HA for Oracle.
These sanity checks ensure that all of the nodes that run Sun Cluster HA for Oracle can start the Oracle instance and that the other nodes in the configuration can access the Oracle instance. Perform these sanity checks to isolate any problems in starting the Oracle software from Sun Cluster HA for Oracle.
Log in as oracle to the node that currently masters the Oracle resource group.
Set the environment variables ORACLE_SID and ORACLE_HOME.
Confirm that you can start the Oracle instance from this node.
Confirm that you can connect to the Oracle instance.
Use the sqlplus command with the user/password variable that is defined in the connect_string property.
# sqlplus user/passwd@tns_service |
Shut down the Oracle instance.
The Sun Cluster software restarts the Oracle instance because the Oracle instance is under Sun Cluster control.
Switch the resource group that contains the Oracle database resource to another cluster member.
The following example shows how to complete this step.
# scswitch -z -g resource-group -h node |
Log in as oracle to the node that now contains the resource group.
Repeat Step 3 and Step 4 to confirm interactions with the Oracle instance.
Clients must always refer to the database by using the network resource, not the physical hostname. The network resource is an IP address that can move between physical nodes during failover. The physical hostname is a machine name.
For example, in the tnsnames.ora file, you must specify the network resource as the host on which the database instance is running. The network resource is a logical hostname or a shared address. See How to Set Up Oracle Database Permissions.
Oracle client-server connections cannot survive a Sun Cluster HA for Oracle switchover. The client application must be prepared to handle disconnection and reconnection or recovery as appropriate. A transaction monitor might simplify the application. Further, Sun Cluster HA for Oracle node recovery time is application dependent.
Each instance of the Sun Cluster HA for Oracle data service maintains log files in subdirectories of the /var/opt/SUNWscor directory.
The /var/opt/SUNWscor/oracle_server directory contains log files for the Oracle server.
The /var/opt/SUNWscor/oracle_listener directory contains log files for the Oracle listener.
These files contain information about actions that the Sun Cluster HA for Oracle data service performs. Refer to these files to obtain diagnostic information for troubleshooting your configuration or to monitor the behavior of the Sun Cluster HA for Oracle data service.
The two fault monitors for Sun Cluster HA for Oracle are a server and a listener monitor.
The fault monitor for the Oracle server uses a request to the server to query the health of the server.
The server fault monitor is started through pmfadm to make the monitor highly available. If the monitor is killed for any reason, the Process Monitor Facility (PMF) automatically restarts the monitor.
The server fault monitor consists of the following processes.
A main fault monitor process, which performs error lookup and scha_control actions
A database client fault probe, which performs database transactions
The main fault monitor determines that an operation is successful if the database is online and no errors are returned during the transaction.
The database client fault probe queries the dynamic performance view v$sysstat to obtain database performance statistics. Changes to these statistics indicate that the database is operational. If these statistics remain unchanged between consecutive queries, the fault probe performs database transactions to determine if the database is operational. These transactions involve the creation, updating, and dropping of a table in the user table space.
The database client fault probe performs all its transactions as the Oracle user. The ID of this user is specified during the preparation of the nodes as explained in How to Prepare the Nodes.
The probe uses the time-out value that is set in the resource property Probe_timeout to determine how much time to allocate to successfully probe Oracle.
If a database transaction fails, the server fault monitor performs an action that is determined by the error that caused the failure. To change the action that the server fault monitor performs, customize the server fault monitor as explained in Customizing the Sun Cluster HA for Oracle Server Fault Monitor.
If the action requires an external program to be run, the program is run as a separate process in the background.
Possible actions are as follows:
Ignore. The server fault monitor ignores the error.
Stop monitoring. The server fault monitor is stopped without shutting down the database.
Restart. The server fault monitor stops and restarts the entity that is specified by the value of the Restart_type extension property:
If the Restart_type extension property is set to RESOURCE_GROUP_RESTART, the server fault monitor restarts the database server resource group. By default, the server fault monitor restarts the database server resource group.
If the Restart_type extension property is set to RESOURCE_RESTART, the server fault monitor restarts the database server resource.
The number of attempts to restart might exceed the value of the Retry_count resource property within the time that the Retry_interval resource property specifies. If this situation occurs, the server fault monitor attempts to switch over the resource group to another node.
Switch over. The server fault monitor switches over the database server resource group to another node. If no nodes are available, the attempt to switch over the resource group fails. If the attempt to switch over the resource group fails, the database server is restarted.
The Oracle software logs alerts in an alert log file. The absolute path of this file is specified by the alert_log_file extension property of the SUNW.oracle_server resource. The server fault monitor scans the alert log file for new alerts at the following times:
When the server fault monitor is started
Each time that the server fault monitor queries the health of the server
If an action is defined for a logged alert that the server fault monitor detects, the server fault monitor performs the action in response to the alert.
Preset actions for logged alerts are listed in Table A–2. To change the action that the server fault monitor performs, customize the server fault monitor as explained in Customizing the Sun Cluster HA for Oracle Server Fault Monitor.
The Oracle listener fault monitor checks the status of an Oracle listener.
If the listener is running, the Oracle listener fault monitor considers a probe successful. If the fault monitor detects an error, the listener is restarted.
The listener probe is started through pmfadm to make the probe highly available. If the probe is killed, PMF automatically restarts the probe.
If a problem occurs with the listener during a probe, the probe tries to restart the listener. The value that is set in the resource property Retry_count determines the maximum number of times that the probe attempts the restart. If, after trying for the maximum number of times, the probe is still unsuccessful, the probe stops the fault monitor and does not switch over the resource group.
Customizing the Sun Cluster HA for Oracle server fault monitor enables you to modify the behavior of the server fault monitor as follows:
Overriding the preset action for an error
Specifying an action for an error for which no action is preset
Before you customize the Sun Cluster HA for Oracle server fault monitor, consider the effects of your customizations, especially if you change an action from restart or switch over to ignore or stop monitoring. If errors remain uncorrected for long periods, the errors might cause problems with the database. If you encounter problems with the database after customizing the Sun Cluster HA for Oracle server fault monitor, revert to using the preset actions. Reverting to the preset actions enables you to determine if the problem is caused by your customizations.
Customizing the Sun Cluster HA for Oracle server fault monitor involves the following activities:
Defining custom behavior for errors
Propagating a custom action file to all nodes in a cluster
Specifying the custom action file that a server fault monitor should use
The Sun Cluster HA for Oracle server fault monitor detects the following types of errors:
DBMS errors that occur during a probe of the database by the server fault monitor
Alerts that Oracle logs in the alert log file
Timeouts that result from a failure to receive a response within the time that is set by the Probe_timeout extension property
To define custom behavior for these types of errors, create a custom action file.
A custom action file is a plain text file. The file contains one or more entries that define the custom behavior of the Sun Cluster HA for Oracle server fault monitor. Each entry defines the custom behavior for a single DBMS error, a single time-out error, or several logged alerts. A maximum of 1024 entries is allowed in a custom action file.
Each entry in a custom action file overrides the preset action for an error, or specifies an action for an error for which no action is preset. Create entries in a custom action file only for the preset actions that you are overriding or for errors for which no action is preset. Do not create entries for actions that you are not changing.
An entry in a custom action file consists of a sequence of keyword-value pairs that are separated by semicolons. Each entry is enclosed in braces.
The format of an entry in a custom action file is as follows:
{ [ERROR_TYPE=DBMS_ERROR|SCAN_LOG|TIMEOUT_ERROR;] ERROR=error-spec; [ACTION=SWITCH|RESTART|STOP|NONE;] [CONNECTION_STATE=co|di|on|*;] [NEW_STATE=co|di|on|*;] [MESSAGE="message-string"] }
White space may be used between separated keyword-value pairs and between entries to format the file.
The meaning and permitted values of the keywords in a custom action file are as follows:
Indicates the type of the error that the server fault monitor has detected. The following values are permitted for this keyword:
Specifies that the error is a DBMS error.
Specifies that the error is an alert that is logged in the alert log file.
Specifies that the error is a timeout.
The ERROR_TYPE keyword is optional. If you omit this keyword, the error is assumed to be a DBMS error.
Identifies the error. The data type and the meaning of error-spec are determined by the value of the ERROR_TYPE keyword as shown in the following table.
ERROR_TYPE |
Data Type |
Meaning |
---|---|---|
DBMS_ERROR |
Integer |
The error number of a DBMS error that is generated by Oracle |
SCAN_LOG |
Quoted regular expression |
A string in an error message that Oracle has logged to the Oracle alert log file |
TIMEOUT_ERROR |
Integer |
The number of consecutive timed‐out probes since the server fault monitor was last started or restarted |
You must specify the ERROR keyword. If you omit this keyword, the entry in the custom action file is ignored.
Specifies the action that the server fault monitor is to perform in response to the error. The following values are permitted for this keyword:
Specifies that the server fault monitor ignores the error.
Specifies that the server fault monitor is stopped.
Specifies that the server fault monitor stops and restarts the entity that is specified by the value of the Restart_type extension property of the SUNW.oracle_server resource.
Specifies that the server fault monitor switches over the database server resource group to another node.
The ACTION keyword is optional. If you omit this keyword, the server fault monitor ignores the error.
Specifies the required state of the connection between the database and the server fault monitor when the error is detected. The entry applies only if the connection is in the required state when the error is detected. The following values are permitted for this keyword:
Specifies that the entry always applies, regardless of the state of the connection.
Specifies that the entry applies only if the server fault monitor is attempting to connect to the database.
Specifies that the entry applies only if the server fault monitor is online. The server fault monitor is online if it is connected to the database.
Specifies that the entry applies only if the server fault monitor is disconnecting from the database.
The CONNECTION_STATE keyword is optional. If you omit this keyword, the entry always applies, regardless of the state of the connection.
Specifies the state of the connection between the database and the server fault monitor that the server fault monitor must attain after the error is detected. The following values are permitted for this keyword:
Specifies that the state of the connection must remain unchanged.
Specifies that the server fault monitor must disconnect from the database and reconnect immediately to the database.
Specifies that the server fault monitor must disconnect from the database. The server fault monitor reconnects when it next probes the database.
The NEW_STATE keyword is optional. If you omit this keyword, the state of the database connection remains unchanged after the error is detected.
Specifies an additional message that is printed to the resource's log file when this error is detected. The message must be enclosed in double quotes. This message is additional to the standard message that is defined for the error.
The MESSAGE keyword is optional. If you omit this keyword, no additional message is printed to the resource's log file when this error is detected.
The action that the server fault monitor performs in response to each DBMS error is preset as listed in Table A–1. To determine whether you need to change the response to a DBMS error, consider the effect of DBMS errors on your database to determine if the preset actions are appropriate. For examples, see the subsections that follow.
To change the response to a DBMS error, create an entry in a custom action file in which the keywords are set as follows:
ERROR_TYPE is set to DBMS_ERROR.
ERROR is set to the error number of the DBMS error.
ACTION is set to the action that you require.
If an error that the server fault monitor ignores affects more than one session, action by the server fault monitor might be required to prevent a loss of service.
For example, no action is preset for Oracle error 4031: unable to allocate num-bytes bytes of shared memory. However, this Oracle error indicates that the shared global area (SGA) has insufficient memory, is badly fragmented, or both states apply. If this error affects only a single session, ignoring the error might be appropriate. However, if this error affects more than one session, consider specifying that the server fault monitor restart the database.
The following example shows an entry in a custom action file for changing the response to a DBMS error to restart.
{ ERROR_TYPE=DBMS_ERROR; ERROR=4031; ACTION=restart; CONNECTION_STATE=*; NEW_STATE=*; MESSAGE="Insufficient memory in shared pool."; }
This example shows an entry in a custom action file that overrides the preset action for DBMS error 4031. This entry specifies the following behavior:
In response to DBMS error 4031, the action that the server fault monitor performs is restart.
This entry applies regardless of the state of the connection between the database and the server fault monitor when the error is detected.
The state of the connection between the database and the server fault monitor must remain unchanged after the error is detected.
The following message is printed to the resource's log file when this error is detected:
Insufficient memory in shared pool. |
If the effects of an error to which the server fault monitor responds are minor, ignoring the error might be less disruptive than responding to the error.
For example, the preset action for Oracle error 4030: out of process memory when trying to allocate num-bytes bytes is restart. This Oracle error indicates that the server fault monitor could not allocate private heap memory. One possible cause of this error is that insufficient memory is available to the operating system. If this error affects more than one session, restarting the database might be appropriate. However, this error might not affect other sessions because these sessions do not require further private memory. In this situation, consider specifying that the server fault monitor ignore the error.
The following example shows an entry in a custom action file for ignoring a DBMS error.
{ ERROR_TYPE=DBMS_ERROR; ERROR=4030; ACTION=none; CONNECTION_STATE=*; NEW_STATE=*; MESSAGE=""; }
This example shows an entry in a custom action file that overrides the preset action for DBMS error 4030. This entry specifies the following behavior:
The server fault monitor ignores DBMS error 4030.
This entry applies regardless of the state of the connection between the database and the server fault monitor when the error is detected.
The state of the connection between the database and the server fault monitor must remain unchanged after the error is detected.
No additional message is printed to the resource's log file when this error is detected.
The Oracle software logs alerts in a file that is identified by the Alert_log_file extension property. The server fault monitor scans this file and performs actions in response to alerts for which an action is defined.
Logged alerts for which an action is preset are listed in Table A–2. Change the response to logged alerts to change the preset action, or to define new alerts to which the server fault monitor responds.
To change the response to logged alerts, create an entry in a custom action file in which the keywords are set as follows:
ERROR_TYPE is set to SCAN_LOG.
ERROR is set to a quoted regular expression that identifies a string in an error message that Oracle has logged to the Oracle alert log file.
ACTION is set to the action that you require.
The server fault monitor processes the entries in a custom action file in the order in which the entries occur. Only the first entry that matches a logged alert is processed. Later entries that match are ignored. If you are using regular expressions to specify actions for several logged alerts, ensure that more specific entries occur before more general entries. Specific entries that occur after general entries might be ignored.
For example, a custom action file might define different actions for errors that are identified by the regular expressions ORA-65 and ORA-6. To ensure that the entry that contains the regular expression ORA-65 is not ignored, ensure that this entry occurs before the entry that contains the regular expression ORA-6.
The following example shows an entry in a custom action file for changing the response to a logged alert.
{ ERROR_TYPE=SCAN_LOG; ERROR="ORA-00600: internal error"; ACTION=RESTART; }
This example shows an entry in a custom action file that overrides the preset action for logged alerts about internal errors. This entry specifies the following behavior:
In response to logged alerts that contain the text ORA-00600: internal error, the action that the server fault monitor performs is restart.
This entry applies regardless of the state of the connection between the database and the server fault monitor when the error is detected.
The state of the connection between the database and the server fault monitor must remain unchanged after the error is detected.
No additional message is printed to the resource's log file when this error is detected.
By default, the server fault monitor restarts the database after the second consecutive timed-out probe. If the database is lightly loaded, two consecutive timed-out probes should be sufficient to indicate that the database is hanging. However, during periods of heavy load, a server fault monitor probe might time out even if the database is functioning correctly. To prevent the server fault monitor from restarting the database unnecessarily, increase the maximum number of consecutive timed-out probes.
Increasing the maximum number of consecutive timed-out probes increases the time that is required to detect that the database is hanging.
To change the maximum number of consecutive timed-out probes allowed, create one entry in a custom action file for each consecutive timed‐out probe that is allowed except the first timed-out probe.
You are not required to create an entry for the first timed‐out probe. The action that the server fault monitor performs in response to the first timed‐out probe is preset.
For the last allowed timed-out probe, create an entry in which the keywords are set as follows:
ERROR_TYPE is set to TIMEOUT_ERROR.
ERROR is set to the maximum number of consecutive timed‐out probes that are allowed.
ACTION is set to RESTART.
For each remaining consecutive timed‐out probe except the first timed-out probe, create an entry in which the keywords are set as follows:
ERROR_TYPE is set to TIMEOUT_ERROR.
ERROR is set to the sequence number of the timed-out probe. For example, for the second consecutive timed-out probe, set this keyword to 2. For the third consecutive timed-out probe, set this keyword to 3.
ACTION is set to NONE.
To facilitate debugging, specify a message that indicates the sequence number of the timed-out probe.
The following example shows the entries in a custom action file for increasing the maximum number of consecutive timed-out probes to five.
{ ERROR_TYPE=TIMEOUT; ERROR=2; ACTION=NONE; CONNECTION_STATE=*; NEW_STATE=*; MESSAGE="Timeout #2 has occurred."; } { ERROR_TYPE=TIMEOUT; ERROR=3; ACTION=NONE; CONNECTION_STATE=*; NEW_STATE=*; MESSAGE="Timeout #3 has occurred."; } { ERROR_TYPE=TIMEOUT; ERROR=4; ACTION=NONE; CONNECTION_STATE=*; NEW_STATE=*; MESSAGE="Timeout #4 has occurred."; } { ERROR_TYPE=TIMEOUT; ERROR=5; ACTION=RESTART; CONNECTION_STATE=*; NEW_STATE=*; MESSAGE="Timeout #5 has occurred. Restarting."; }
This example shows the entries in a custom action file for increasing the maximum number of consecutive timed-out probes to five. These entries specify the following behavior:
The server fault monitor ignores the second consecutive timed-out probe through the fourth consecutive timed-out probe.
In response to the fifth consecutive timed-out probe, the action that the server fault monitor performs is restart.
The entries apply regardless of the state of the connection between the database and the server fault monitor when the timeout occurs.
The state of the connection between the database and the server fault monitor must remain unchanged after the timeout occurs.
When the second consecutive timed-out probe through the fourth consecutive timed-out probe occurs, a message of the following form is printed to the resource's log file:
Timeout #number has occurred. |
When the fifth consecutive timed-out probe occurs, the following message is printed to the resource's log file:
Timeout #5 has occurred. Restarting. |
A server fault monitor must behave consistently on all cluster nodes. Therefore, the custom action file that the server fault monitor uses must be identical on all cluster nodes. After creating or modifying a custom action file, ensure that this file is identical on all cluster nodes by propagating the file to all cluster nodes. To propagate the file to all cluster nodes, use the method that is most appropriate for your cluster configuration:
Locating the file on a file system that all nodes share
Locating the file on a highly available local file system
Copying the file to the local file system of each cluster node by using operating system commands such as the rcp(1) command or the rdist(1) command
To apply customized actions to a server fault monitor, you must specify the custom action file that the fault monitor should use. Customized actions are applied to a server fault monitor when the server fault monitor reads a custom action file. A server fault monitor reads a custom action file when the you specify the file.
Specifying a custom action file also validates the file. If the file contains syntax errors, an error message is displayed. Therefore, after modifying a custom action file, specify the file again to validate the file.
If syntax errors in a modified custom action file are detected, correct the errors before the fault monitor is restarted. If the syntax errors remain uncorrected when the fault monitor is restarted, the fault monitor reads the erroneous file, ignoring entries that occur after the first syntax error.
On a cluster node, become superuser.
Set the Custom_action_file extension property of the SUNW.oracle_server resource.
Set this property to the absolute path of the custom action file.
# scrgadm -c -j server-resource\ -x custom_action_file=filepath |
Specifies the SUNW.oracle_server resource
Specifies the absolute path of the custom action file
Upgrade the SUNW.oracle_server resource type if the following conditions apply:
You are upgrading from an earlier version of the Sun Cluster HA for Oracle data service.
You require to use the new features of this data service.
For general instructions that explain how to upgrade a resource type, see “Upgrading a Resource Type” in Sun Cluster 3.1 Data Service Planning and Administration Guide. The information that you require to complete the upgrade of the SUNW.oracle_server resource type is provided in the subsections that follow.
The relationship between a resource type version and the release of Sun Cluster data services is shown in the following table. The release of Sun Cluster data services indicates the release in which the version of the resource type was introduced.
Resource Type Version |
Sun Cluster Data Services Release |
---|---|
1 |
1.0 |
3.1 |
3.1 5/03 |
4 |
3.1 10/03 |
To determine the version of the resource type that is registered, use one command from the following list:
scrgadm -p
scrgadm -pv
The resource type registration (RTR) file for this resource type is /opt/SUNWscor/oracle_server/etc/SUNW.oracle_server.
The information that you require to edit each instance of the SUNW.oracle_server resource type is as follows:
You can perform the migration at any time.
If you require to use the new features of the Sun Cluster HA for Oracle data service, the required value of the Type_version property is 4.
If you customized the behavior of the server fault monitor, set the Custom_action_file extension property. For more information, see Customizing the Sun Cluster HA for Oracle Server Fault Monitor.
The following example shows a command for modifying an instance of the SUNW.oracle_server resource type.
# scrgadm -cj oracle-rs -y Type_version=4 \ -x custom_action_file=/opt/SUNWscor/oracle_server/etc/srv_mon_cust_actions |
This command modifies a SUNW.oracle_server resource as follows:
The SUNW.oracle_server resource is named oracle-rs.
The Type_version property of this resource is set to 4.
Custom behavior for the fault monitor of this resource is specified in the file /opt/SUNWscor/oracle_server/etc/srv_mon_cust_actions.