Oracle® Fail Safe Concepts and Administration Guide
Release 3.3.1 for Windows
Part No. A96684-01
Oracle Fail Safe provides high availability for single-instance Oracle databases (both Standard and Enterprise editions) running on Windows clusters configured with MSCS.
By making a single-instance Oracle database highly available, you ensure that even when a cluster node is shut down or fails, applications that access that database will suffer only a momentary loss of connection with the database while the database is restarted on another cluster node. Applications can automatically reconnect to the database after such a failover event occurs using transparent application failover, resulting in a failover that is not apparent to users.
This chapter discusses the following topics:
|Discovering Standalone Single-Instance Databases||Section 7.1|
|Oracle Net Configuration for Standalone Single-Instance Databases||Section 7.2|
|Adding Single-Instance Oracle Databases to a Group||Section 7.3|
|Oracle Net Listener Resource Creation and Configuration||Section 7.4|
|Security Requirements for Single-Instance Databases||Section 7.5|
|Configuring a ctxsrv Server for High Availability||Section 7.6|
|Integrating with Oracle Enterprise Manager||Section 7.7|
|Optimizations for Single-Instance Database Recovery||Section 7.8|
|Performing Administrative Tasks on a Single-Instance Fail-Safe Database||Section 7.9|
|Configuring Transparent Application Failover (TAF)||Section 7.10|
|Handling Errors and Troubleshooting Problems with Databases||Section 7.11|
Oracle Services for MSCS discovers standalone single-instance databases (those that are not in a group) to display them in the Oracle Fail Safe Manager tree view. Standalone single-instance databases created with the Create Sample Database command (sample databases) and standalone single-instance databases created using other methods are discovered in different ways, as follows:
Sample standalone single-instance databases are discovered by looking in the Windows registry under the key:
Other standalone single-instance databases
Other standalone single-instance databases (those not created with the Create Sample Database command) are discovered by parsing the tnsnames.ora file on each cluster node and looking for valid net service name entries. Standalone single-instance databases are discovered when the following conditions are true:
The standalone single-instance database has an instance on a cluster node.
The standalone single-instance database has a valid entry in the tnsnames.ora file, including:
A SID that matches the database instance on the node or a SERVICE_NAME that matches the service name in the database parameter file.
A host name or IP address (Oracle Fail Safe does not support the use of alias network names.)
If no valid net service name entry can be found for an instance, then the instance name is used to represent the database instance in Oracle Fail Safe Manager.
The following sections briefly summarize the Oracle Net configuration for standalone single-instance databases.
If you use the Database Configuration Assistant (DBCA) to create a standalone single-instance database, DBCA adds information about the new database in the Oracle Net configuration, as follows:
After DBCA configures the Oracle Net information, Oracle Fail Safe displays the new single-instance database in the Oracle Fail Safe Manager tree view (under Standalone Resources for the cluster node on which the database was created).
If there are multiple Oracle homes on the cluster node where you created the database, run the Verify Standalone Database command in Oracle Fail Safe Manager on the new database. Oracle Fail Safe checks the tnsnames.ora files in all Oracle homes. When it detects that a tnsnames.ora file does not contain the net service name entry for the database, it asks if you want to update that file. If you select Yes, Oracle Fail Safe adds a net service name entry for the new database.
If the default domain name values (for example, values of names.default_domain parameters in the sqlnet.ora files) are different across Oracle homes, the net service name entry is not accessible from some Oracle homes. To resolve this problem, edit the tnsnames.ora file in each Oracle home and append the default domain name of the respective Oracle home to the net service name entry.
Beginning with Oracle8i release 8.1.5, the behavior of the Oracle Net listener changed. If the system host name is used in the definition of an Oracle8i release 8.1.5 (or later) listener, then this listener listens on all IP addresses on that node, not just the IP address associated with the host name. This causes conflicts later when you add a database to a group and listeners configured with Oracle Fail Safe are defined to listen on a virtual address.
To work around this change, the listener must use the node IP address for its host entry instead of the host name. When you add a single-instance database to a group, if Oracle Fail Safe finds an Oracle8i release 8.1.5 (or later) listener using a host entry, you are asked if you want Oracle Fail Safe to modify this entry to use an IP address. If you do not, then the Add Resource to Group operation does not continue.
The following is an example of an invalid entry in an Oracle Fail Safe environment:
LISTENER = .... (ADDRESS= (PROTOCOL=TCP) (HOST=NTCLU-152) (PORT=1521) )
The following is an example of a valid entry in an Oracle Fail Safe environment:
LISTENER = .... (ADDRESS= (PROTOCOL=TCP) (HOST=18.104.22.168) (PORT=1521) )
If you upgrade Oracle database server software from Oracle7 or Oracle8 to Oracle8i or Oracle9i, you must use the Oracle8i listener or Oracle9i listener, and you must move the entries in the SID list of the Oracle7 or Oracle8 listener to the Oracle8i or Oracle9i listener, as follows:
Locate the listener.ora files of the Oracle8i or Oracle9i release and the Oracle7 or Oracle8 release. For example:
The listener.ora file for Oracle release 7.3.4, Oracle8i, and Oracle9i is in the <Oracle_Home>\NETWORK\ADMIN directory.
The listener.ora file for Oracle8 is in the <Oracle_Home>\NET80\ADMIN directory.
Determine the SID list of the Oracle7 or Oracle8 listener.
For example, for the Oracle8 listener in <Oracle_Home>\NET80\ADMIN\listener.ora you might find the following:
SID_LIST_LISTENER = (SID_LIST= (SID_DESC= (SID_NAME=ORCL) ) (SID_DESC= (SID_NAME=OFS2) ) )
Copy the SID_DESC entries in the SID list in step 2 and add them to the SID list of the Oracle8i or Oracle9i listener. For example, the updated SID list in the <Oracle_Home>\NETWORK\ADMIN\listener.ora file will be as follows:
SID_LIST_LISTENER = (SID_LIST = (SID_DESC = (SID_NAME = PLSExtProc) (ORACLE_HOME= c:\oracle) (PROGRAM = extproc) ) (SID_DESC= (SID_NAME=ORCL) ) (SID_DESC= (SID_NAME=OFS2) ) )
Stop the listener of the Oracle7 or Oracle8 database if it is started. In addition, change the startup state of the Oracle7 or Oracle8 listener to manual. (To make this change, use the Windows Control Panel to access the Services window.)
Start the Oracle8i or Oracle9i listener.
If you do not use the Oracle8i or Oracle9i listener, you will run into problems when you perform the following Oracle Fail Safe operations:
Add database resource to group
Remove database resource from group
When Oracle Fail Safe searches for a standalone database listener, it looks at the listeners defined in all Oracle homes. When multiple Oracle servers are installed (in one or more Oracle homes), it is possible to have more than one listener. Even when there are multiple listeners, Oracle Fail Safe can find the standalone database listener.
The state of a listener defined in a listener.ora file affects the result when Oracle Fail Safe searches for the listener of a standalone database. The following list shows the order in which Oracle Fail Safe looks for the listeners:
Listener is started.
Listener is stopped.
Listener has no Windows service defined.
For example, suppose you have a database with two listeners in two different Oracle homes, and assume that the listener in Home 1 is stopped and the listener in Home 2 is started. When Oracle Fail Safe looks through the two homes on the system, it finds the listener in Home 2 because in the first pass, Oracle Fail Safe looks only for listeners that are started. As soon as a started listener is found, Oracle Fail Safe stops looking. If a started listener is not found, Oracle Fail Safe looks through the list for stopped listeners, and so on.
Note:Make sure that the listeners of standalone single-instance databases are in the state (stopped or started) that you intend before you run any Oracle Fail Safe operations.
When any database is configured for high availability, Oracle Fail Safe makes adjustments to the default listener. This affects the Oracle Net configuration for all databases, including standalone databases. Therefore, all standalone databases in an Oracle Fail Safe environment require some adjustments to the Oracle Net configuration if any database in the cluster has been made highly available.
If the shared server configuration for standalone single-instance Oracle8i and Oracle9i databases relies on the default listener, then no listener parameters are specified in the database parameter file. (The default listener is a listener that listens on the host name of the node, the default port number, and TCP protocol.) In this case, the configuration will no longer work after Oracle Fail Safe has changed the default listener to use an IP address in place of the host name.
Add the LOCAL_LISTENER parameter to the database initialization parameter file. The LOCAL_LISTENER parameter specifies a network name that resolves to an address of the Oracle Net default listener.
Locate the database initialization parameter file of the Oracle8i or Oracle9i database. Add the LOCAL_LISTENER parameter to the file.
LOCAL_LISTENER = <network-name>
Determine the address of the Oracle Net default listener.
Find the definition of the default listener in the listener.ora file of the Oracle8i or Oracle9i home. In the definition, identify the first address that uses the TCP protocol.
For example, assume that the default listener is defined as follows:
LISTENER = (DESCRIPTION_LIST= (DESCRIPTION= (ADDRESS_LIST= (ADDRESS= (PROTOCOL=TCP) (HOST=22.214.171.124) (PORT=1521) ) ) ) )
Then the first address is:
(ADDRESS_LIST= (ADDRESS= (PROTOCOL=TCP) (HOST=126.96.36.199) (PORT=1521)
Create a <network-name> entry in the tnsnames.ora file.
In the tnsnames.ora file, create an entry for the <network-name> using the address found in step 2.
In this example, the entry is as follows:
<network-name>= (ADDRESS= (PROTOCOL=TCP) (HOST=188.8.131.52) (PORT=1521) )
This change will take effect when the database is restarted.
To configure a single-instance Oracle database for high availability, you add it to a group that currently contains at least one virtual address. Oracle Fail Safe adds all other resources that the single-instance Oracle database requires. Typically, the group includes the following resources:
One or more virtual addresses, each of which consists of an IP address and network name
The Oracle database server instance
All disks used by the Oracle database
An Oracle Net network listener that listens on the virtual address (or addresses) of the group for connection requests to the databases in the group
An Oracle Intelligent Agent configured to use one of the group's virtual addresses (if Oracle Enterprise Manager will be used to manage the database)
All files used by the single-instance database must be on the shared cluster disks, with the exception of the database initialization parameter file, which can be placed on a private disk or on a shared cluster disk. See Section 184.108.40.206 for more information on the placement of the initialization parameter file.
Resources must belong to one group only. Therefore, if two single-instance databases share the same disk drives, then both databases must be in the same group.
In a failover, the data in a temporary table does not fail over. Operations that involve the use of temporary tables and tablespaces (such as sorts and hash joins) re-create any needed temporary objects when restarted on the failover node; however, you should review applications that rely on the existence of specific data in temporary tables to be sure they function as expected.
Refer to the Temporary Tables discussion in the Oracle9i Concepts manual for more information about temporary tables.
The group must contain at least one virtual address.
Table 7-1 provides a quick reference to the tasks needed to configure a single-instance Oracle database for high availability. For detailed instructions about a particular task, see the online help and tutorial. From the Oracle Fail Safe Manager menu bar, choose Help, then "Search for Help on" or Help, then Tutorial for step-by-step instructions.
Table 7-1 Steps for Configuring Databases
|Step||Procedure||Oracle Fail Safe Manager Procedure|
|1||Ensure that the Oracle database server software is installed on a private disk on each node in the cluster that you intend to be a possible owner for the Oracle database.||See the Oracle database server documentation for installation information.|
|2||Create a group and add one or more virtual addresses.||On the Groups menu, choose Create to open the Create Group Wizard. The wizard helps you to set up failover and failback policies and automatically opens the Add Resource to Group Wizard to allow you to add a virtual address to the group. (Choose Resources, then Add to Group to add additional virtual addresses to the group.)|
|3||Create a sample database, if desired.||On the Resources menu, choose Create Sample Database to create a sample standalone single-instance database on which you can try out the features of Oracle Fail Safe before using them on a production database. Do not use the sample database for production work.|
|4||Verify the standalone database.||On the Troubleshooting menu, choose Verify Standalone Database to validate the database and Oracle Net configuration for the database. This command makes sure that Oracle Fail Safe can attach to the database, and confirms that the standalone database is located on a cluster disk.|
|5||Add the Oracle database to the group.||On the Resources menu, choose Add to Group, and then select Oracle Database to open the Add Resource to Group Wizard. The wizard helps you configure the single-instance Oracle database server for high availability.|
|6||Modify the tnsnames.ora file on each client system.||Configure clients (modify the tnsnames.ora file on each client system using a network configuration tool) to recognize the virtual server. See Section 7.4 for more information.|
Oracle Fail Safe Manager provides the Add Resource to Group Wizard to assist you in configuring a single-instance Oracle database for high availability. The pages presented in the wizard vary, depending on the number of virtual addresses currently in the group, and the number of nodes in the cluster.
Typically, one virtual address is in a group; more complex configurations might have more than one virtual address. To perform a typical configuration using the Add Resource to Group Wizard, you need the following data:
Identity of the single-instance Oracle database, including the service name, instance name, database name, and specification for the database initialization parameter file
The database SYSDBA (usually, SYS or INTERNAL) account and password
If you are adding a database to a group that currently contains more than one virtual address, you are also asked to specify which virtual address or addresses you want clients to use to access the database.
The following sections describe in detail the configuration requirements for single-instance databases.
Note:Support for the INTERNAL account is provided for Oracle databases created prior to Oracle9i.
If you are adding a database to a group and the cluster consists of more than two nodes, you are asked to specify which nodes should be possible owners for the database by specifying a list of selected nodes, as shown in Figure 7-1. To specify that a particular node should not be a possible owner for the database, select the node from the Selected Nodes list and click the left arrow.
Section 2.6.6 describes in detail the concept of the possible owner nodes list.
Figure 7-1 Choose Nodes Wizard Page When All Nodes Are Available
If you are adding a single-instance database to a group and the cluster consists of two or more nodes, but one or more nodes are unavailable, you are also asked to specify which nodes should be possible owners for the database. In this case, the wizard page displays which nodes are unavailable and why, as shown in Figure 7-2.
Figure 7-2 Choose Nodes Wizard Page When Any Node Is Unavailable
If the group to which you are adding a single-instance database contains more than one virtual address, the Add Resource to Group Wizard asks you which of the virtual addresses in the group you want clients to use when they access the database or databases in the group, as shown in Figure 7-3. This page is not displayed if the group to which you are adding a database contains only one virtual address.
Figure 7-3 Database Virtual Address Wizard Page
Oracle Fail Safe includes support for multiple virtual addresses in a group. All databases in a group must use the same virtual addresses, and the virtual addresses must be added to the group before you add the databases to the group. The sequence for building a group is as follows:
Create a group.
Add one or more virtual addresses to the group.
Add one or more single-instance databases to the group.
For example, if a group contains a database that is using two virtual addresses and you add a second database to the group, the second database must use the same virtual addresses as the first database that was configured into the group. Oracle Fail Safe Manager checks to ensure that the same virtual addresses are used for all single-instance databases that you add to a group.
See Section 4.7 for information about configuring a resource in a group with multiple virtual addresses.
The Add Resource to Group Wizard requests database identity information to uniquely identify the single-instance database that is being configured for high availability, as shown in Figure 7-4.
Figure 7-4 Database Identity Wizard Page
Oracle Fail Safe uses this data to configure the database into the cluster (for example, to update the tnsnames.ora file). It also passes the data that you supply to MSCS, where it is registered for use when the database is brought online, taken offline, or when Is Alive polling is performed. Oracle Fail Safe requests the following information:
This is the net service name. (Prior to Oracle8i this was called a service name.) This is the name that will appear in the Oracle Fail Safe Manager tree view and the MSCS tree view. This is the name that client applications specify in a connection request.
If you do not specify a domain name in the Oracle Net service name, Oracle Fail Safe will choose a domain name to append to the net service name, as described in Section 220.127.116.11.
This is the name of the database instance, also referred to as a SID.
This is the db_name parameter used to identify the database in the initialization parameter file. It is the name that was used when the database was created (for example, in the SQL CREATE DATABASE statement).
When an Oracle database server starts up, it uses the initialization parameter file to specify the name of the database, the amount of memory to allocate, the names of control files, and various limits and other system parameters.
In most cases, you place the parameter file on a cluster disk so that it can be accessed regardless of which cluster node is currently hosting the database. However, you can place a copy of the initialization parameter file on each node's private disk, if you make sure that the file exists at the same location on all cluster nodes that are configured to run a database. You might decide to place the parameter file on each node's private disk to set different parameters for the database, depending on which node is hosting it. This can be useful if some nodes have less memory or processing capabilities than others. See Section 18.104.22.168.2 for special considerations for using this method on an Oracle9i database that was created with Database Configuration Assistant.
Oracle Fail Safe requires that a text initialization parameter file (PFILE) be specified in the Parameter File field. To use a binary server parameter file (SPFILE) with Oracle9i databases configured for high availability, specify the location of the SPFILE from within the PFILE using the SPFILE=<SPFILE-location> parameter. For example, the contents of the PFILE might include the following parameters:
(If you specify an SPFILE in the PFILE that Oracle Fail Safe uses, be careful if and when you export the SPFILE. If you use a CREATE PFILE FROM SPFILE command without including file specifications, you will overwrite the PFILE that Oracle Fail Safe is using. Therefore, be sure to specify a unique file name for the PFILE to which the SPFILE is exported. See the Oracle9i Database Administrator's Guide for detailed information on server parameter files.)
When you use Database Configuration Assistant to create an Oracle9i database, a text initialization parameter file (init.ora) is created in the <Oracle_Home>\database directory. This file contains the IFILE parameter. The IFILE parameter value is the file specification of another text initialization parameter file, which contains the initialization parameters of the database.
The file specification of the initialization parameter file IFILE parameter is stored in the Windows registry. By default, Oracle Fail Safe displays the file specification of the initialization parameter file that contains the IFILE parameter (except for databases created with the Create Sample Database command) in the Parameter File field. Databases created with the Create Sample Database command will display the initialization parameter file that the IFILE parameter specifies in the Parameter File field.
Oracle Corporation recommends that in a cluster environment, you enter the file that the IFILE parameter specifies in the Parameter File field. This provides the most direct route to the database initialization parameters. In addition, if you choose not to do this and you also choose to keep the parameter file on a private disk, then you must remember to copy both the initialization parameter file containing the IFILE parameter and the file that the IFILE parameter specifies to the private disk of each cluster node.
The Authentication page is presented if the account under which Oracle Services for MSCS was installed is not in one of the following Windows operating system groups: the ORA_DBA group or the ORA_<SID>_DBA group associated with the database. When the account under which Oracle Services for MSCS is in the ORA_DBA group or the ORA_<SID>_DBA group, it can use operating system authentication to access the database. If the account is not a member of the ORA_DBA group or the ORA_<SID>_DBA group, it must use the SYS account to access the database.
This page allows you to specify whether Oracle Services for MSCS should use operating system authentication or the SYS (or INTERNAL) account to access the database and its instances, as shown in Figure 7-5.
Note:The Database Authentication page is not presented if the account under which Oracle Services for MSCS was installed is already a member of a group that will allow it to access the database using operating system authentication.
To specify operating system authentication, select "Use operating system authentication." To specify the database SYS account, select "Use SYS account," select SYS, then enter and confirm the password for the account. For databases created with releases of Oracle server prior to Oracle9i, you can select the INTERNAL account instead of the SYS account in the User Name box.
Figure 7-5 Database Authentication Wizard Page
If you select the "Use operating system authentication" option, then Oracle Services for MSCS opens the Confirm Add to DBA Group window, as shown in Figure 7-6.
The text in Figure 7-6 indicates that Oracle Services for MSCS was installed under the nedcdomain\cluadmin account. If you click Yes, the nedcdomain\cluadmin account will be added to the Windows operating system group ORA_OFS2_DBA.
Figure 7-6 Confirm Add to DBA Group Window
If Oracle Services for MSCS detects that the standalone database has a password file associated with it, then the Add Resource to Group Database Wizard asks if you want Oracle Services for MSCS to create the password file on all nodes that will be possible owner nodes for the fail-safe database, as shown in Figure 7-7.
Oracle Corporation recommends that you select the "Yes, create the password file" option. A password file is often required when you perform remote operations. For example, Recovery Manager (RMAN) requires a password file when connecting to the target database over a nonsecure Oracle Net connection.
If you select the "No, do not create the password file" option, all users must access the database using operating system authentication, and users will not be able to perform remote database administration operations.
Note:If Oracle Services for MSCS does not detect password files for the database that you are adding to a group, then it does not present the Database Password page.
Figure 7-7 Database Password Wizard Page
Oracle Services for MSCS makes the following adjustments to the database initialization parameter file (if needed), depending on whether or not you choose to have Oracle Services for MSCS create the password file on all cluster nodes that are possible owner nodes for the database:
Yes, create the password files
No, do not create the password files
Note:Oracle Services for MSCS does not support setting the Windows registry DBA_AUTHORIZATION parameter to the value of BYPASS.
If you want to change the password for the SYS (or INTERNAL) account after the database has been added to a group, you must also update the password through Oracle Fail Safe Manager. See Section 7.5.2 for information on how to update the password for this account after the database has been added to a group.
When you add a single-instance database to a group, Oracle Fail Safe creates and configures the Oracle Net listener resource and the database resource in the group. Prior to release 3.1, Oracle Fail Safe made the database resource dependent on the listener resource. If the listener failed (for example, while under a heavy load), the cluster took the database resource offline. However, the database resource does not require the listener to be online; the listener is required for the client to be able to connect to the database.
Therefore, starting with release 3.1, Oracle Fail Safe does not create a dependency between the database and listener resources. If Oracle Fail Safe finds a database resource that is dependent on a listener, it will remove the dependency. When it brings the database online, Oracle Fail Safe first makes sure that the listener resource is online. If Oracle Fail Safe is not able to connect to the database through the listener during Is Alive polling of the database resource, it will use the bequeath protocol adapter to connect to the database; it will also log an event to notify users of the failure. If you see this event, check the listener resource. The failure of the listener resource does not affect the existing connections to the database. It does, however, prevent new users from connecting to the database. The listener resource has its own restart policy, so it will be restarted automatically by the cluster in the event of failure.
Starting with release 3.2.1, Oracle Fail Safe creates a dependency between the database and the IP address associated with the listener (but not on the listener itself). This dependency was created to avoid a situation in which clients would hang when an IP address was taken offline before the database.
You can set the USE_SHARED_SOCKET parameter to true to enable the use of shared sockets. If this parameter is set to true, the network listener passes the socket descriptor for client connections to the database thread. As a result, the client does not need to establish a new connection to the database thread and database connection time improves. Also, all database connections share the port number used by the network listener, which can be useful if you are setting up third-party proxy servers.
Note:This feature works for Oracle9i listeners and databases only. If you set the USE_SHARED_SOCKET parameter to true in Oracle homes prior to Oracle9i, the following error will be returned when you add or remove a database from an Oracle Fail Safe group:
Network objects (including databases) are identified by a network address. For a connection between a client and a database to be made, the network address in the tnsnames.ora file on the client and the network address in the listener.ora file on the server must match. In other words, a client uses a network address to send a connection request to a particular network object location, and the recipient listens for requests on this address and grants a connection based on its address information matching its client information.
When you add a single-instance database to a group, Oracle Fail Safe creates a listener for the group in the same Oracle home where the database resides. When Oracle Fail Safe configures the virtual address information, it updates the tnsnames.ora files in all Oracle homes on cluster nodes that are possible owners for the database, and on the client system from which you are running Oracle Fail Safe Manager. This allows Oracle Fail Safe to access the database instance using the updated configuration.
Section 7.4.3 describes how Oracle Fail Safe creates an entry in the listener.ora file and updates the tnsnames.ora file after you add a database to a group so that clients can connect to the database, regardless of which cluster node is hosting the database.
Note:Oracle Fail Safe does not support the use of the TNS_ADMIN Windows environment variable or registry parameter. Oracle Fail Safe retrieves and updates Oracle Net files in the <Oracle_Home>\network\admin directory; it ignores the TNS_ADMIN Windows environment variable or registry parameter if either is specified.
When you add a single-instance database to a group, Oracle Fail Safe changes the Oracle Net configuration for the database in the tnsnames.ora file, the listener.ora file, and the sqlnet.ora file as described in the following sections.
When you add a single-instance database to a group, Oracle Fail Safe updates the net service name entry in the tnsnames.ora file for the database to use the virtual addresses of the group. If there are multiple Oracle homes on the node, all tnsnames.ora files are updated. In addition:
Oracle Fail Safe may make adjustments to the net service name, and the SERVICE_NAME parameter or the SID parameter.
When you create a sample single-instance database or add a single-instance database to a group, if you do not specify a domain name in the Oracle Net service name, Oracle Fail Safe chooses a domain name to append to the net service name as follows:
Oracle Fail Safe looks for the default domain name in the Oracle home of the highest database version on the node. If found, this default domain name is appended to the net service name. For example, assuming Oracle9i is the highest database version on the node, if you specify "MyDB" as the Oracle Net service name, and the default domain name in the Oracle9i home is "us.oracle.com", then the net service name will become "MyDB.us.oracle.com".
If there is no default domain name in the Oracle home of the highest database version on the node, then Oracle Fail Safe appends nothing to the net service name. For example, if you specify "MyDB", then the net service name will also be "MyDB".
In either case, once the net service name is decided, Oracle Fail Safe checks the tnsnames.ora file for an existing entry with the same name and updates the tnsnames.ora file, as follows:
If an entry for the net service name exists, then Oracle Fail Safe updates the entry so that it connects to the virtual host. Oracle Fail Safe does not change the CONNECT_DATA parameter, including the SERVICE_NAME (which was added in Oracle8i).
If an entry does not exist, then Oracle Fail Safe writes a new entry to the tnsnames.ora file. In this case, the SID is used in the CONNECT_DATA parameter, even if the target is an Oracle8i or later database. (Users can still connect to an Oracle8i or later database using a net service name that refers to the SID, rather than the service name.)
On server nodes, addresses that use virtual host names configured only for internal cluster communications will be placed at the top of the address list. Therefore, applications running on the server nodes, including the database resource monitor, will attempt to use addresses that use the private network interconnect among the cluster nodes before using addresses that use the public interconnect for connections to the database resource.
On client nodes, addresses that use virtual host names configured only for internal cluster communications will not be included in the address list.
Note:Oracle Fail Safe updates the tnsnames.ora files on all cluster nodes (that are possible owners for the single-instance database) and on the client node where you are running Oracle Fail Safe Manager. If you need to enable remote clients (that are not running Oracle Fail Safe) to process work against a single-instance Oracle database through a cluster node, you must edit the tnsnames.ora files to update the host name with the virtual address information. Edit each client's local tnsnames.ora file using a network configuration tool.
When you add a single-instance database to a group, Oracle Fail Safe makes the following changes to the listener.ora file:
Creates a new Oracle Fail Safe listener that is configured to listen on the virtual address associated with the single-instance database
Adds the SID_DESC parameter to the new Oracle Fail Safe listener
Stops and restarts the standalone database listener to accept the changes that have been made
Starts the new Oracle Fail Safe listener
When you add a single-instance database to a group, if operating system authentication has been chosen for the database, then Oracle Fail Safe adds the SQLNET.AUTHENTICATION_SERVICES=(NTS) parameter to the sqlnet.ora file (assuming the parameter is not already set).
Oracle Fail Safe configures the address of an external procedure in the Oracle Net listener definition for a group (the listener.ora file). As the first database is added to a group, Oracle Fail Safe determines whether or not external procedures are configured in the original Oracle Net listener serving the database. If they are configured, Oracle Fail Safe creates both an IPC listener address and a SID descriptor (SID_DESC) in the listener.ora file for the group.
Oracle Fail Safe creates the key of the IPC address by concatenating the prefix "EXTPROC" to the network name of the first virtual address in the group. Oracle Fail Safe uses the alphabetic order of the virtual address network names in the group to determine which network name is first. Therefore, if the network name of the first virtual address is ntclu45, then the key value is EXTPROCntclu45 and the IPC address entry will be as follows:
(ADDRESS= (PROTOCOL=IPC) (KEY=EXTPROCntclu45) )
Oracle Fail Safe copies the SID descriptor information from the original Oracle Net listener for the database to the Oracle Net listener for the group. For example, if the original Oracle Net listener for the database has the following SID descriptor, then the Oracle Net listener for the group will have the same descriptor:
(SID_DESC= (SID_NAME=PLSExtProc) (ORACLE_HOME = C:\Oracle\Ora) (PROGRAM=extproc) )
When a PL/SQL or SQL application calls an external procedure, the application specifies the EXTPROC_CONNECTION_DATA net service name entry in the tnsnames.ora file. This entry contains the information needed by the listener to start a process for the external procedure. The IPC address of the Oracle Net listener in a group is added to the EXTPROC_CONNECTION_DATA net service name entry in all tnsnames.ora files on the cluster.
Note:When you set up a database to use a shared servers configuration, you need to make sure that Oracle Fail Safe can continue to use a dedicated server connection for its internal operations. You do this by specifying the (SERVER=DEDICATED) parameter in the connect data portion of the net service name entry for the database in the tnsnames.ora file on each cluster server node. (By default, if shared servers are used and no SERVER parameter is specified, the listener establishes a connection using shared servers.)
Oracle Fail Safe supports single-instance Oracle8 databases that use a shared servers configuration. However, Oracle Fail Safe does not automatically update the database initialization file where the shared servers configuration is defined.
You can configure a standalone single-instance database or a single-instance database that is currently a resource in a group to use shared servers. In both cases, you must update the database initialization file by performing the following steps:
Determine the listener parameters for the group containing the single-instance database using the shared servers configuration, as follows:
Find the listener.ora file in the Oracle Net configuration directory in the Oracle home where the database resides. (See Section 7.2.3 for examples of where the listener.ora files can be found for each Oracle release.)
Search the listener.ora file for the SID of the database and find the first listener address for the group using the TCP protocol.
For example, the boldface text in the following listener.ora file shows the first listener address of the group:
LISTENER = (Entries for default Listener) (ADDRESS_LIST = . . . Fslvirtualnode = (Entries for Fail Safe Listener) (ADDRESS_LIST= (ADDRESS= (PROTOCOL=IPC) (KEY=OFS1) ) (ADDRESS= (PROTOCOL=IPC) (KEY=805mts.world) ) (ADDRESS= (PROTOCOL=TCP) (Host=virtualnode) (Port=1521) ) (ADDRESS= (PROTOCOL=TCP) (Host=virtualnode) (Port=1526) ) ) SID_LIST_Fslvirtualnode = (SID_LIST= (SID_DESC= (SID_NAME=OFS1) ) )
Find the first address in the file that includes the line (PROTOCOL=TCP), and format the address parameters into a single line. For example:
Update the database initialization file (for example, initofs1.ora) to use the listener parameters for the group. To do this, perform the following steps:
Open the database initialization parameter file.
Note that the initialization parameter file for the database may reside on a disk on the shared interconnect, for example:
If each node of the cluster has its own copy of the file on its private disk, then you need to update all copies.
In the database initialization parameter file, search for the line containing the following parameter:
Replace the value of the mts_listener_address parameter with the listener address that you formatted in step 1c.
For example, assume the original mts_listener_address parameter contains the following value:
mts_listener_address = "(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521))"
Replace the line, as follows:
mts_listener_address = "(ADDRESS=(PROTOCOL=TCP)(HOST=virtualnode)(PORT=1521))"
Save the database initialization file.
Check that value of the mts_service parameter is the database SID.
The Database Configuration Assistant might use the database name for the mts_service parameter value. If so, change the value to the database SID.
Stop and restart the resources in the group.
To have your changes take effect, use Oracle Fail Safe Manager or the FSCMD command to take the group that contains the database offline and then place it back online. This stops and restarts all resources in the group.
To use a shared server configuration in an environment with Oracle8i or Oracle9i and Oracle Fail Safe, you might be required to make modifications to the database parameter files of the Oracle8i or Oracle9i databases.
Note:Starting with Oracle release 9.0.1, the MTS_DISPATCHERS parameter is named the DISPATCHERS parameter. The MTS_DISPATCHERS parameter is retained for backward compatibility; however, for Oracle9i, Oracle Corporation recommends that you use the DISPATCHERS parameter.
If the shared server configuration uses the LOCAL_LISTENER parameter to specify full listener information (full listener information specifies both host and port values), then Oracle Fail Safe automatically updates the database parameter file for the shared server configuration during the Add Resource to Group operation.
The single-instance Oracle8i or Oracle9i database will run in shared server mode after you add it to a group. You do not need to make any further changes to the database parameter file.
The following example shows a shared server configuration that will be updated automatically by Oracle Fail Safe:
dispatchers = "(PROTOCOL=TCP)(DISPATCHERS=1)" local_listener = "(ADDRESS=(PROTOCOL=TCP)(HOST=22.214.171.124)(PORT=1521))"
However, if the shared servers configuration uses the DISPATCHERS parameter to specify full listener information, then you must remove the host and port values from the DISPATCHERS parameter. Oracle Fail Safe always writes the LOCAL_LISTENER parameter to the database parameter file.
When you remove a database from a group using Oracle Fail Safe Manager, it deletes the LOCAL_LISTENER parameter from the database initialization file. You must add the parameter back into the database initialization file by following the instructions in Section 7.2.5.
When you create a single-instance sample database or add a single-instance database to a group, Oracle Fail Safe must use operating system authentication or the SYS user account to access the database. Use an authentication password file and set the initialization parameter, REMOTE_LOGIN_PASSWORDFILE, in the database initialization parameter file (init<database-name>.ora) to either SHARED or EXCLUSIVE if users will access the database using the SYS account. Set the REMOTE_LOGIN_PASSWORDFILE to NONE if users will only access the database using operating system authentication.
Note:Oracle Fail Safe does not support setting the Windows registry DBA_AUTHORIZATION parameter to the value of BYPASS.
Refer to the Oracle9i Administrator's Guide for more information about database administrator authentication and the REMOTE_LOGIN_PASSWORDFILE parameter.
Therefore, if you add an account to the password file on one cluster node, then you must add that account to the password file on the other cluster nodes that are configured to run the database instance. If there are accounts in addition to SYS stored in a password file, then you must grant SYSOPER and SYSDBA privileges for the additional accounts on the other cluster nodes for a single-instance fail-safe database.
Note:Support for the INTERNAL account is provided for Oracle databases created prior to Oracle9i. References in this section to the SYS account also apply to the INTERNAL account.
If you add a single-instance database to a group with the Oracle Fail Safe Manager Add Resource to Group Wizard, Oracle Services for MSCS creates a database instance on the other nodes that are configured to run the database and uses the default value for the maximum number of users in the password file. The password file on the node where the instance is created only contains the password for the SYS account that you supply in the Add Resource to Group Wizard.
On the other nodes configured to run the database instance, perform the following steps to synchronize the password files on the other cluster nodes:
If the number of accounts in the password file exceeds the default maximum, create a new password file. Otherwise, skip to step 2.
To create a new password file, refer to instructions about creating password files in the Administrator's Guide for your Oracle database release.
Move the group containing the single-instance database to another node configured to run the database instance.
Grant privileges to accounts other than SYS on the node to which you moved the database.
Repeat steps 2 and 3 for each node in the cluster configured to run the database.
Now the local copies of the password file are identical on all nodes configured to run the database instance.
If you want to change the SYSDBA (SYS or INTERNAL) account password for one or more databases after those databases have been added to groups, use Oracle Fail Safe Manager to make this change so that Oracle Services for MSCS will use the correct password to access the database when it performs operations such as Is Alive polling.
Note:Support for the INTERNAL account is provided for Oracle databases created prior to Oracle9i. References in this section to the SYS account also apply to the INTERNAL account.
To change the SYSDBA password for several databases, Oracle Corporation recommends that you use the Update Database Password Wizard. Access the Update Database Password Wizard in Oracle Fail Safe Manager by choosing Resources
-> Update Database Password.
To change the SYSDBA password for a single database, Oracle Corporation recommends that you use the database Authentication tab. From the tree view, choose the fail-safe database for which you want to change the password, then choose the Authentication tab.
If operating system authentication is enabled (without the use of a password file), then you cannot change the SYSDBA account information using Oracle Fail Safe Manager unless you first remove the database from the group, then add it to the group again with the Add Resource to Group Wizard. When you add the database to the group again, select Use SYS Account on the Database Authentication page of the wizard.
See the online help for the following additional information:
Detailed instructions on using the Update Database Password Wizard and the Authentication property page.
Information on using these tools with releases of Oracle Services for MSCS prior to release 3.3.1.
Information on using these tools when the SYSDBA password has already been changed with a tool other than Oracle Fail Safe Manager.
This section describes how to use the Oracle Database Upgrade Assistant to upgrade a single-instance fail-safe database from one release to another or to move a single-instance Oracle database from one Oracle home to another.
For each single-instance database that you want to upgrade or move to a new home, perform the following steps:
Remove the single-instance database from the group. In the Oracle Fail Safe Manager tree view, select the database. Then choose:
-> Remove from Group
Run the Oracle Database Upgrade Assistant from the Oracle home to which you are moving or upgrading your single-instance database.
Be prepared to provide the location of the database parameter file for the single-instance database you are upgrading. During a database upgrade, the database parameter file is converted. If the database parameter file is on a cluster disk, then your parameter file is appropriately located for Oracle Fail Safe to make the conversion. If the database parameter file is located on a private disk, then the Oracle Database Upgrade Assistant only converts the local copy. In this case, you must edit the copy on the other cluster nodes and make the appropriate changes.
The Oracle Database Upgrade Assistant also asks you to specify the location of the converted database files. Either leave the data files in their current location, or specify a cluster disk that is currently accessible by the local node. If you choose the latter, make sure the cluster disk is not being used by another group.
If you are upgrading an Oracle7 database to an Oracle8 database, the Oracle Database Upgrade Assistant creates a new data file called
<Oracle_Home>\database\mig<SID>.ora on a private disk, where SID is the database instance name. Move this new data file to a cluster disk with your other data files, as follows:
Use SQL*Plus to connect to the database, then shut it down.
Copy the <Oracle_Home>\database\mig<SID>.ora to a cluster disk where <SID> is the database instance name.
Use SQL*Plus to connect to the database and then execute the following commands:
SQL> STARTUP PFILE=init<SID>.ora MOUNT SQL> ALTER DATABASE RENAME FILE '<Oracle_Home>\database\mig<SID>.ora' TO 'cluster_disk\mig<SID>.ora'; SQL> SHUTDOWN SQL> EXIT
When all databases in the group have been upgraded or moved to a new home with the Oracle Database Upgrade Assistant, use Oracle Fail Safe Manager to put the databases back into the group and then place the databases online, as follows:
On the Resources menu, select Add to Group.
Follow the steps through the Add Resource to Group Wizard.
All databases in the group being moved must be from the same Oracle home. If one database in a group is moved with the Oracle Database Upgrade Assistant to a new Oracle home, then all databases in the group must be moved to the new Oracle home.
For releases of the Oracle database server prior to Oracle9i, the ctxsrv server processes background data manipulation language (DML) for indexing, searching, retrieving, and viewing documents. (Beginning with Oracle9i, you can index, search, retrieve, and view documents with standard SQL or PL/SQL procedures.)
If you are using a ctxsrv server with a single-instance Oracle database server, you can configure the ctxsrv server for high availability, as follows:
Create a batch file to start the ctxsrv server and specify the personality mask. For example, create a file named context.bat that contains the following command line:
ctxsrv -user CTXSYS/CTXSYS -personality QDM
On the File menu, click New, then Resource.
On the New Resource page:
On the Possible Owners page, specify the nodes in the cluster on which the ctxsrv server can be brought online. These should be the same as the possible owners for the single-instance database with which the ctxsrv server is associated.
On the Dependencies page, specify the single-instance database with which the ctxsrv server is associated as a resource dependency. If the ctxsrv server has a disk dependency (other than those that the database requires), specify the disk or disks as resource dependencies also.
On the Generic Application Parameters page:
In the Command line field, enter the file specification for the batch file you created in step 1 (for example, context.bat).
In the Current directory field, enter the directory where the ctxsrv server was installed (for example, D:\Orant\bin).
On the Registry Replication page, you need not enter any registry keys; click Finish.
After you have completed the configuration, the cluster service will start up the .bat file and open a command window to display ctxsrv server logging information. If someone closes the command window, another command window opens immediately and the ctxsrv server continues to search, as usual. If the group containing the ctxsrv server fails over, operations and the searching function continue as usual.
You can use Oracle Enterprise Manager to manage and monitor single-instance databases in an Oracle Fail Safe environment. For example, you can use Oracle Enterprise Manager to:
For Oracle Enterprise Manager to discover Oracle Fail Safe clusters, you must edit the nmiconf.lst file and add fs_discover.tcl as the first entry in the list. If the fs_discover.tcl entry is not listed first in the file, Oracle Enterprise Manager may not be able to discover MSCS clusters. This must be done on all nodes of the cluster. If you are using Oracle Intelligent Agents from multiple Oracle homes, then add the fs_discover.tcl entry in each Oracle home. The nmiconf.lst file is located in one of the following directories:
In Oracle8 homes:
In Oracle8i or Oracle9i homes:
Note:You must perform discovery on each group for Oracle Enterprise Manager to see the resources configured in that group. Once discovered, each group appears as a node in the Oracle Enterprise Manager nodes list, and you can manage the resources in the group as you would manage any standalone resource.
Create and register jobs and events for a group as if it were a physical node
Many of the services available from Oracle Enterprise Manager, such as dynamic discovery and job scheduling, depend on an Oracle Intelligent Agent. The following sections provide more information about adding an Oracle Intelligent Agent to a group, scheduling jobs and events, and setting up the proper privileges if you use Oracle Enterprise Manager to manage Oracle databases.
The Oracle Fail Safe online help for more information about integrating with Oracle Enterprise Manager and starting the default Intelligent Agent
Section 7.11.9 for information on troubleshooting problems related to integrating with Oracle Enterprise Manager
After you add a single-instance database to a group, run the Add Resource to Group Wizard again to add an Oracle Intelligent Agent to that group. The wizard displays a dialog box similar to the one shown in Figure 7-8.
Figure 7-8 Add Resource to Group Wizard - Resource Page
You add only one Oracle Intelligent Agent to a group, regardless of the number of databases in the group. However, the group must contain at least one database resource before you can add the Oracle Intelligent Agent. Similarly, you cannot remove the last database resource from a group without first removing the Oracle Intelligent Agent.
When you specify that you want to add an Oracle Intelligent Agent to a group:
Oracle Fail Safe creates a new Intelligent Agent
See the Oracle Fail Safe Help for information on scheduling jobs for Oracle databases configured in a cluster and for monitoring events (such as failovers) using Oracle Enterprise Manager.
Oracle databases configured with Oracle Fail Safe for high availability ensure fast failover and fast recovery during both unplanned and planned outages (such as software upgrades and scheduled maintenance). You can take advantage of Oracle fast-start and disaster-recovery features, control time spent during database recovery, and ensure continuous monitoring of databases configured with Oracle Fail Safe for high availability.
Oracle Fail Safe and Oracle database technology optimize the time it takes to shut down a database on one node and complete instance recovery on another node for both planned and unplanned failovers. The Oracle database checkpoint algorithms optimize the time it takes to perform instance recovery for planned and unplanned failovers.
When you use Oracle Fail Safe Manager (or FSCMD) to execute a planned failover, Oracle Services for MSCS checkpoints the single-instance Oracle database server before it is shut down. The single-instance database is started on the other node in a restricted mode so that instance recovery can be completed quickly and the database made available to the database clients promptly. (If you use MSCS to execute a planned failover, it does not checkpoint the database before shutting it down.)
Note:After you add a single-instance database to a group, use only Oracle Fail Safe Manager or the FSCMD command to place the Oracle database online and take the database offline. Otherwise, the database will not be checkpointed first. In addition, if you use a tool other than Oracle Fail Safe Manager, FSCMD, or MSCS to take a database offline, Oracle Fail Safe will consider it a failed resource and will attempt to place it back online.
For unplanned failover, the instance recovery time is controlled by the database recovery processing. See the Oracle database server documentation for details on fast-start recovery operations.
You perform administrative tasks on a database configured for high availability as you would for any database, with one exception. You must use Oracle Fail Safe Manager or the FSCMD command-line interface (see Chapter 5) to take a database offline (and stop cluster monitoring of the database) during any operation that restricts access to the database or for which you want to temporarily disable the possibility of failover. This includes not only cold backup operations but also administrative operations that need to be performed while users continue to access the database, or operations that could affect response times during the periodic Is Alive polling of the database by MSCS.
Use these steps to perform administrative tasks on a database that is configured in a group with Oracle Fail Safe Manager:
Use Oracle Fail Safe Manager or the FSCMD command to take the database offline, shut down the database, and suspend monitoring of the database by the cluster. All users connected to the database will be disconnected.
Once you have completed the administrative tasks, use a tool such as SQL*Plus to shut down the database.
Use Oracle Fail Safe Manager or the FSCMD command to place the database online again. The cluster will resume monitoring the database.
Chapter 5 provides an example of a script in which FSCMD commands are used to perform a backup operation.
If, during an administrative task, you perform an operation that changes the configuration of the database (such as adding a new tablespace and associated data file), you should run the Verify Group operation. Adding a new data file can introduce a new disk dependency in the group. When you run the Verify Group operation, it checks to make sure that the disk is a cluster disk and that it does not already belong to another group. If adding the new data file introduces a new disk dependency in the group, then the disk is added to the same group as the database and the information in the cluster registry is updated to ensure that the new disk will correctly fail over with the database.
For standalone single-instance databases, transparent application failover (TAF) instructs Oracle Net to reestablish a failed connection to a database by connecting to a different listener. This enables the user to continue work using the new connection as if the original connection had never failed. The transparent application failover feature does not work the same way for a single-instance fail-safe database as it does for a standalone single-instance database. For a fail-safe database, a transparent application failover instructs Oracle Net to reconnect to the same listener, which has moved to another cluster node due to a group failover.
For a standalone database, the term failover in the phrase "transparent application failover" refers to Oracle Net failing over a connection from one listener to another. For a fail-safe database, the term failover in the phrase "transparent application failover" is a bit of a misnomer; the application does not fail over, the listener to which it is connected fails over, and then a connection is reestablished.
These differences in implementation do not affect how you manage transparent application failover.
To take advantage of transparent application failover when connected to a database configured with Oracle Fail Safe, the client applications must:
Use either Oracle OCI release 8.0.6 or later or Oracle ODBC release 8.0.6 or later
Connect through Oracle Net to an Oracle database
With transparent application failover, clients do not need to explicitly reconnect after a group fails over. The OCI connection handles reconnection and state recovery automatically for the client application. In fact, applications that are not actively updating the database at the time of a failure might not notice that failover is occurring.
Note:The transparent application failover feature is supported for Oracle database server Enterprise Edition only.
Transparent application failover (TAF) automatically restores some or all of the following elements associated with active database connections. Other elements, however, may need to be embedded in the application code to enable TAF to reestablish the connection.
Client/server database connections
TAF automatically reestablishes the connection using the same connect string or an alternate connect string that you specify when configuring TAF.
Users' database sessions
TAF automatically logs a user in with the same user ID as was used prior to the failure. If multiple users were using the connection, then TAF automatically logs them in as they attempt to process database commands. TAF cannot automatically restore other session properties. These properties can, however, be restored by invoking a callback function.
If a command was completely executed upon connection failure, and it changed the state of the database, TAF does not resend the command. If TAF reconnects in response to a command that may have changed the database, TAF issues an error message to the application.
Open cursors used for fetching
TAF allows applications that began fetching rows from a cursor before failover to continue fetching rows after failover. This is called select failover. It is accomplished by reexecuting a SELECT statement using the same snapshot, discarding those rows already fetched, and retrieving those rows that were not fetched initially. TAF verifies that the discarded rows are those that were returned initially, or it returns an error message.
Any active transactions are rolled back at the time of failure because TAF cannot preserve active transactions after failover. The application instead receives an error message until a ROLLBACK command is submitted.
Server-side program variables
Server-side program variables, such as PL/SQL package states, are lost during failures; TAF cannot recover them. They can be initialized by making a call from the failover callback.
Note:Applications can use the Oracle Call Interface (OCI) callback function to enhance or customize recovery after a failover. For example, a callback can be used to handle interrupted write operations. See the Oracle Call Interface Programmer's Guide for more information.
With transparent application failover, the application user sees no loss of connection as long as a cluster node is available to serve the application.
Figure 7-9 illustrates the steps involved in the transparent application failover event.
The following list describes the callouts shown in Figure 7-9.
Work proceeds normally until a failure occurs, causing the client to lose its database connection. (The application discovers the lost connection when it next requests work by the Oracle database.)
The OCI library routines attempt to reconnect to the database until a connection is restored, or the application reconnection timeout has been exceeded.
If the database connection cannot be restored within the designated time period, the application notifies the user by returning a status message.
The OCI library routines reconnect the application to the database at the virtual address.
After the client and user sessions are reconnected to a surviving cluster node, the transactions must be reexecuted manually.
When the database connection is reestablished, the OCI library routines can reexecute any SQL SELECT statements that were interrupted after a failover. The OCI routines fetch the same number of rows that had been fetched before the failure and discard those rows. This occurs so that the next fetch after failover continues from where the initial query was interrupted. The original query snapshot time is used when the query is reissued to ensure that the same set of rows will be returned. This feature is activated by the FAILOVER_MODE parameter with the TYPE=SELECT subparameter in the CONNECT_DATA string of the tnsnames.ora file. Information on setting parameters in the tnsnames.ora file is provided in Section 7.10.3.
Because the application uses the same query snapshot time:
In most cases, the ordering of rows retrieved is not fixed when the SELECT statement is reexecuted; for this reason, queries that might be replayed should contain an ORDER BY clause. However, even without an ORDER BY clause, rows returned by the reissued query are nearly always returned in the original order. Known exceptions include queries that execute using the hash join or parallel query functionality. Oracle Call Interface software transparently checks that the set of discarded rows match those previously retrieved to ensure that the application does not generate incorrect results.
Recovery time after a failover can be significantly longer when you use the TYPE=SELECT subparameter. For example, if a query that retrieves 100,000 rows is interrupted by a failover after 99,990 rows have been fetched, then the client application will not be available for new work until it has refetched and discarded 99,990 rows and retrieves the last 10 rows of the query.
If multiple sessions are using a connection when failover occurs, the OCI library can reconnect all sessions. However, if ALTER SESSION statements are in progress for any sessions, the OCI library cannot reexecute them on the failover node. The ALTER SESSION statements must be reexecuted by the user.
To enable transparent application failover for client applications, the Oracle Net tnsnames.ora file must be updated. The changes described in Section 126.96.36.199 and Section 188.8.131.52 are required to enable automatic reconnection for both Oracle8 (and later) ODBC and OCI clients.
The goal is to make the new connection appear as much like the old connection as possible. Ideally, the client (and therefore the user) will not notice that anything has changed.
To access a single-instance database that is a resource in a group, you must first configure the client applications to use the virtual server address associated with the database. Update the tnsnames.ora file on the client system so that the HOST parameter specifies the name of the appropriate virtual server instead of the name of a physical cluster node.
(DESCRIPTION = (ADDRESS_LIST = (ADDRESS=(COMMUNITY=A.WORLD) (PROTOCOL=TCP) (HOST= 184.108.40.206) <-- virtual server address (PORT=1521)) . . . ) )
Specifying the correct virtual server address in the tnsnames.ora file is required whether or not an application uses transparent application failover.
(DESCRIPTION = (ADDRESS_LIST = . . . ) (CONNECT_DATA= (SERVICE_NAME=NH_SALES) (FAILOVER_MODE=(TYPE=SELECT) (METHOD=BASIC) (RETRIES=20) (DELAY=15))))
The TYPE subparameter indicates whether or not to reestablish user sessions, and what application operations are automatically recovered when sessions are reestablished. Table 7-2 describes the options for the TYPE subparameter.
The METHOD subparameter optimizes failover performance. It has two options, BASIC and PRECONNECT, which define how to establish a connection to the failover node. The BASIC subparameter establishes connections at failover time and is the only connection method supported with Oracle Fail Safe. Because METHOD=BASIC is the default value, including the METHOD subparameter in the CONNECT_DATA parameter is optional.
The DELAY subparameter specifies the amount of time in seconds to wait between attempts to connect. If the RETRIES subparameter is specified, the default value for the DELAY subparameter is 1 second. This subparameter is particularly important in an Oracle Fail Safe environment. It allows you to specify time for the database to come back online on the failover node.
Refer to the Oracle Net Services Administrator's Guide for complete information and syntax about connect operations.
|NONE||Does not reestablish user sessions.||Does not reexecute application queries.||Returns an error message to the user without attempting a transparent application failover. This is the default.|
|SELECT||Reestablishes user sessions on the new client connection.||SELECT statements that were started before the failure are reexecuted automatically on the new connection. Active transactions are rolled back and are not reexecuted on the new connection.||Incurs overhead on the client side for all SELECT statements, even if a failure does not occur. When failover occurs, the SELECT statements on the failover node are reexecuted. Users are unaware that SELECT statements are being reexecuted.|
|SESSION||Reestablishes user sessions on the new client connection.||SELECT statements that were started before the failure are not reexecuted on the new connection.||Avoids the overhead that is incurred when you choose the TYPE=SELECT option.|
Oracle Fail Safe provides an executable demonstration that shows transparent application failover for clients using ODBC or OCI in an Oracle Fail Safe configuration. This program and its documentation (readme.txt) are available online in the <Oracle_Home>\fs\fsmgr\sample directory.
The following sections describe how to specify a script to handle errors if they occur when Oracle Fail Safe attempts to bring a highly available single-instance database online and how to troubleshoot specific problems that you may encounter with single-instance Oracle databases configured for high availability. For general information about troubleshooting Oracle databases, see the Oracle database server documentation.
You can specify a script to handle errors that might occur when Oracle Fail Safe is attempting to place a single-instance database online. Oracle Fail Safe uses the same script for all single-instance fail-safe databases on the cluster.
To specify an error handling script:
Create a script to handle the error or errors.
Ensure that the script returns 0 if it succeeds and any nonzero integer if it fails.
Place the script in the following directory on each cluster node that is a possible owner for a database resource and ensure that the file owner has local Administrator privileges on that cluster node:
If Oracle Fail Safe cannot bring a single-instance database online, it spawns a process in which to run the script, then it passes the error code, the database name, the database SID, the TNS service name, and the database parameter file specification to the script and executes the script, as follows:
FsDbError.bat <error-code> <database-name> <SID> <TNS service name> <parameter-file-spec>
FsDbError.bat ORA-01113 OracleDB OracleDB OracleDB.WORLD D:\Ora\admin\OracleDB\pfile\initOracleDB.ora
The process in which the script is running waits for the script to finish for the period of time specified as the Pending Timeout value for database resources. If the script does not finish within the pending timeout period, then the script is terminated.
Oracle Fail Safe logs an event to the Windows Event log to indicate whether the script succeeded, failed, or was terminated by Oracle Fail Safe. If the script failed, the error code is also written to the event log.
Regardless of whether the script succeeds or fails, Oracle Fail Safe will continue to attempt to bring the single-instance database online as defined in the database restart and failover policies.
In most cases, the first step in troubleshooting a problem is to issue the Verify Cluster, Verify Group, or Verify Standalone Database command. These tools are described in general in Chapter 6.
When you issue a Verify Group command on a group containing a single-instance database, Oracle Fail Safe:
Queries each database in the group to determine which disks it uses. Then, it validates that the disks are cluster disks and have been added to the group. If the disk validation fails (for example, because a disk has been added to the database since it was configured for high availability), then the Verify Group operation prompts you before fixing the problem.
Validates that the network name pings the correct IP address.
You can run the Verify Group operation at any time. However, you should run it when any of the following occurs:
A group or resource in a group does not come online.
A new node is added to the cluster.
For example, imagine that you add a new disk to a single-instance database, but you do not use Oracle Fail Safe Manager to update the cluster configuration. If a server node subsequently shuts down, failover will not occur correctly because the cluster software was never notified that there was a change in the configuration. To prevent this from happening, you should verify the group containing a single-instance database whenever you make a structural change to the database. When you verify the group, Oracle Fail Safe automatically detects changes and updates the cluster configuration for you. In the previous example, Oracle Fail Safe would add the new disk to the group for you.
If any problems are found during the group verification, Oracle Fail Safe prompts you to fix them or returns an error message that further describes the problem.
To troubleshoot problems when adding a single-instance database to a group:
Run the Verify Standalone Database operation (described in Section 6.1.3).
Running the Verify Standalone Database operation verifies that the database is a valid working standalone database.
For example, if you try to add a standalone database to a group and it fails during the Oracle Net configuration, Oracle Fail Safe rolls back the clusterwide operation and the database remains as a standalone database. To fix this problem, follow these steps:
Run the Verify Cluster operation to ensure that the cluster network configuration is correct.
Run the Verify Standalone Database operation to ensure that the network (Oracle Net) is working.
Attempt to add the standalone database to a group.
If the Add Database to Group operation fails, check the Oracle Net rollback file as described in Section 220.127.116.11.
The single-instance database files are located on shared cluster disks
For each disk in the configuration, Oracle Fail Safe determines if the disk resides on a shared storage interconnect. If database files are on nonclustered disks, you must move the database files so that they are located on a shared cluster disk.
You correctly specified the following information in the Add Resource to Group Wizard:
User name and password used to access the database
Database parameter file
(Net) service name
If there is a problem placing a group that contains a single-instance database online, try the following:
When you use the Verify Group command (from the Oracle Fail Safe Manager Troubleshooting menu), Oracle Fail Safe checks the group configuration and attempts to fix any problems that it finds. If the Verify Group command cannot fix the problem, it returns an error message that should help you to resolve the problem manually.
Oracle Net logs an entry to the listener log file every time an error is encountered or a database is accessed through the listener. Check for errors in the log file that might help you to identify the problem.
On an Oracle8i or Oracle9i system, the log files reside in the
On an Oracle8 system, the log files reside in the
Check the net service name of the single-instance database.
The Oracle Fail Safe database resource DLL accesses each database in a group at the Is Alive interval. (The Is Alive interval appears on the Failover property page for the database in Oracle Fail Safe Manager.) It uses the database connection information to access the database. If the database access information has changed, then Oracle Fail Safe will fail to access the database. Hence, MSCS will not consider the database resource to be alive.
Check the Oracle Net configuration data.
Bring each resource in the group online individually.
If more than one single-instance database is in the group, this will help you to identify which database is causing the problem.
If a group containing a database fails to come online or frequently fails over, check that the Pending Timeout value is set correctly. Failure to come online and frequent failovers occur if the Pending Timeout value for the database is set too low.
Set the Pending Timeout value to specify the length of time you want the cluster software to allow for the database to be brought online (or taken offline) before considering the operation to have failed. Set the value high enough to prevent a cluster system from mistaking slow response time for unavailability, yet low enough to minimize the failover response time when a failure does occur.
You can set the Pending Timeout value by modifying the database properties, as follows:
In the Oracle Fail Safe Manager tree view, select the database name.
Click the Policies tab.
In the Pending Timeout box, modify the Pending Timeout value.
If users use the SYS (or INTERNAL) account to access the database, make sure that the initialization parameter REMOTE_LOGIN_PASSWORDFILE in the database initialization parameter file (init<database-name>.ora) is set to SHARED or EXCLUSIVE.
If users access the database using operating system authentication only, make sure that the initialization parameter REMOTE_LOGIN_PASSWORDFILE in the database initialization parameter file is set to NONE.
If the password for the account through which Oracle Fail Safe accesses a database changes and you do not update the information through Oracle Fail Safe Manager, the attempts at polling the database will fail. See Section 7.5.2 for information on how to update database password changes for Oracle Fail Safe.
Sometimes, processing-intensive operations (such as an Import operation) can cause Is Alive polling to fail and may result in an undesired group failover. In such cases, you can disable Is Alive polling for the database by issuing the FSCMD DISABLEISALIVE command. However, be aware that when you disable Is Alive polling, Oracle Fail Safe suspends monitoring the instance until Is Alive polling is reenabled. You reenable Is Alive polling with the FSCMD ENABLEISALIVE command.
Oracle Corporation recommends that you issue these FSCMD commands from within a script so that you can ensure that Is Alive polling is reenabled when the processing-intensive operation completes.
For information on the FSCMD commands, see Chapter 5.
If there is a problem when Oracle Fail Safe tries to bring a single-instance database online or offline, the problem might be caused by the way you have set up database authentication. Try the following to solve the problem:
If you selected "Use this account" on the Authentication page in the Add Resource to Group Wizard, ensure that the REMOTE_LOGIN_PASSWORDFILE initialization parameter in the database initialization parameter file (init<database-name>.ora) is set to SHARED or EXCLUSIVE.
Section 7.5 describes how to correctly set up this parameter for database authentication.
For some operations that Oracle Fail Safe performs, such as a group verification and polling the database to ensure that it is online, Oracle Fail Safe must have access to the databases in a group. If the database account password has changed, you need to update it in Oracle Fail Safe Manager. Otherwise, Oracle Fail Safe is unable to monitor the database using Is Alive polling. This situation will be logged to the Windows Event Viewer.
Section 7.5.2 describes how to correctly update the database password.
If you receive errors when you create or delete a sample database, check the following:
If the Create Sample Database command cannot open the Create Sample Database script files, reinstall the sample database files with your Oracle Fail Safe software, or access the files directly from the CD-ROM. Then, try the Create Sample Database operation again. Refer to the Oracle Fail Safe Installation Guide for more information about installing the sample database files.
If you try to create a sample database and you do not have Oracle Fail Safe sample database files installed on your cluster, or if you installed the Oracle database software after you installed Oracle Fail Safe, the following error message is displayed:
FS-10270: Oracle Fail Safe sample database files are not found in the installation directory or CD-ROM.
You must run the Oracle Fail Safe installation and install the Oracle database sample files or load the Oracle Fail Safe CD-ROM before you can create a sample database. When you install Oracle Fail Safe, the only sample database files it installs are those that correspond to the database releases that are currently installed on the cluster node where you are creating the sample database.
If you install a version of the Oracle database server after you install Oracle Fail Safe, then you must install the sample database files only to get a sample database that corresponds to the release of Oracle database server you installed.
If the Delete Sample Database command fails, you may have selected a database that is not a sample database. Make sure that the database you selected is a sample database and try the delete operation again.
Oracle Fail Safe stores information about sample databases in the Windows registry under the HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\FailSafe\SampleDB key.
If a sample database that you have deleted is listed in the Oracle Fail Safe Manager tree view, make sure that the Windows registry entries for that database have been cleared from the registry.
If you encounter problems when trying to establish a connection to either a standalone database or a database configured in a group, you should check the Oracle Net configuration for the database.
Oracle Fail Safe changes the listener.ora and tnsnames.ora files, and stops and starts listeners when configuring the virtual address information. The following list describes potential problems and the action you can take to correct each problem:
If these files are no longer valid due to improper update or file damage, Oracle Fail Safe cannot use them to configure virtual server information. You should retrieve a valid version of these files or re-create the files using Oracle Net Assistant.
If these files are valid, check that the net service name, the database SID, and the network name of the group used in the operation are correct. Incorrect information may cause the virtual server configuration to fail. You must ensure that a database SID is not included in more than one listener. On systems with multiple Oracle homes, check all of the listener.ora files.
Oracle Fail Safe starts a listener after changing the definition of a listener or creating the definition of a new listener.
The most common reason for this error is that another listener is already listening for a database. There can be only one listener on the system listening for a particular address or database SID. For example, if LISTENER_A has the following definition, then no other listener on the system can listen for key ORCL using the IPC protocol, or port 1521 on host server_A using the TCP protocol, or ORCL SID name:
LISTENER = (ADDRESS_LIST= (ADDRESS= (PROTOCOL=IPC) (KEY=ORCL) ) (ADDRESS= (PROTOCOL=TCP) (Host=server_A) (Port=1521) ) ) SID_LIST_LISTENER = (SID_LIST= (SID_DESC= (SID_NAME=ORCL) ) )
Any other listeners that try to use the same address or database SID as LISTENER_A will fail to start.
When you encounter this problem, Oracle Fail Safe saves the listener.ora and tnsnames.ora files that contain definitions of the updated listeners and net service names as rollback files. The file names of the rollback files are <filename>_rlb.ora.
You should read the rollback files to find the definition of the listener and check it against the definition of all other listeners running on the system. There should be no overlapping addresses or database SID names among the listeners. Again, check for all listener.ora files on systems using multiple Oracle homes.
See the Oracle Net documentation (including information about the log directory) for additional information on troubleshooting problems with the network configuration.
For example, if the Oracle home and network directory path is C:\ORANT\NETWORK\ADMIN, and the virtual address on which the listener listens is ntclu-155, then the listener output files will be written to the following directory and file:
Each listener has its own output file that is named using the listener name and the .out extension. (In the example, the listener name is fslntclu-155.) If you experience difficulties when creating a new listener, you can use the output file to help you diagnose the problem.
Whenever Oracle Fail Safe makes changes in the listener.ora or tnsnames.ora files, the original version of the file is archived. If you need to reference an Oracle Net net service name definition or a listener definition as it was before Oracle Fail Safe changed the definition, you can look at the archived versions of the configuration files.
Oracle Fail Safe keeps up to two archived versions of configuration files. The file name of the archived version has a format of <filename>_000.ora and <filename>_001.ora. Note that <filename>_000.ora is the most recent file.
Whenever Oracle Fail Safe encounters an error during an operation after Oracle Net configuration files have been changed, the updated version of the file is saved as <filename>_rlb.ora. Then, the original version of the file is restored.
The rollback version of the file may be useful for problem diagnosis.
Access and authorization problems occur most frequently when you are attempting to perform operations through Oracle Enterprise Manager.
The following list addresses some typical authentication problems:
In the Oracle Enterprise Manager Console, make sure that the User Credentials for the cluster are those of a Windows Administrator on all cluster nodes and that the user name and domain are specified correctly. (This does not apply to Oracle Enterprise Manager release 2.0 or 2.1; preferred credentials for clusters cannot be specified with Oracle Enterprise Manager 2.0 or 2.1.)
If users and client applications are unable to access a database that is configured in a group, perform the following steps to fix the problem:
Update the tnsnames.ora file to use the virtual server for the group.