Oracle Fail Safe provides high availability for single-instance Oracle Databases (except Oracle Database Personal Edition) running on Windows clusters configured with Microsoft Windows Failover Clusters.
By making a single-instance Oracle Database highly available, you ensure that even when a cluster node is shut down or fails, applications accessing that database suffer only a momentary loss of connection with the database while the database is restarted on another cluster node. Applications can automatically reconnect to the database after using transparent application failover, resulting in a failover that is not apparent to users.
This chapter discusses the following topics:
Oracle Fail Safe Server discovers standalone single-instance databases (those that are not in a cluster group) by looking for Oracle Database instance Windows services. Any service found on any cluster node that is not currently in a cluster group is displayed in the Oracle Fail Safe Manager's Available Oracle Resources list.
The following sections briefly summarize the Oracle Net configuration for standalone single-instance databases.
If the system host name is used in the definition of a listener, then this listener listens on all IP addresses on that node, not just the IP address associated with the host name. The local listener also opens any cluster IP addresses causing a cluster group listener failure if it attempts to listen on IP addresses assigned to the group.
To avoid this problem, the listener must use the node IP address for its host entry instead of the host name. Whenever Oracle Fail Safe validates a cluster group or adds a database to a group, it searches ADDRESS entries that have a HOST set to the local node's host name. All HOST entries that use the local node name are changed to use the IP address for the node.
The following is an example of an invalid entry in an Oracle Fail Safe environment:
LISTENER = .... (ADDRESS= (PROTOCOL=TCP) (HOST=NTCLU-152) (PORT=1521) )
The following is an example of a valid entry in an Oracle Fail Safe environment:
LISTENER = .... (ADDRESS= (PROTOCOL=TCP) (HOST=192.0.2.254) (PORT=1521) )
When a database is configured for high availability, Oracle Fail Safe makes adjustments to the default listener. This affects the Oracle Net configuration for all databases, including standalone databases. As a result, all standalone databases in an Oracle Fail Safe environment require some adjustments to the Oracle Net configuration if any database in the cluster has been made highly available.
If the shared server configuration for standalone single-instance databases relies on the default listener, then no listener parameters are specified in the database parameter file. (The default listener is a listener that listens on the host name of the node, the default port number, and TCP protocol.) In this case, the configuration will no longer work after Oracle Fail Safe has changed the default listener to use an IP address in place of the host name.
When Oracle Fail Safe searches for a standalone database listener, it scans the listener Windows services to find one listening on the network address used by the database. If there are multiple listeners on a network address, then Oracle Fail Safe selects the running listener service. If none of the listeners are started, then Oracle Fail Safe chooses the first listener found that is listening on the network address.
Note:
To prevent network configuration errors, ensure that the listeners of standalone single-instance databases are in the intended state, stopped or started, before you run any Oracle Fail Safe operations.
To configure a single-instance Oracle Database for high availability, add it to a group that currently contains at least one network name. Oracle Fail Safe adds all other resources that the single-instance Oracle Database requires. Typically, the group includes the following resources:
One or more network names, each of which consists of an IP address and network name
The Oracle Database instance
All disks used by the Oracle Database
An Oracle Net network listener that listens on the network name (or names) of the group for connection requests to the databases in the group
Before you add a single-instance database to a group, note the following prerequisites:
All files used by the single-instance database must be on the shared cluster disks, except the database initialization parameter file, which can be placed on a private disk or on a shared cluster disk. See Identifying Database Parameters for more information about the placement of the initialization parameter file.
Resources must belong to one group only. If two single-instance databases share the same disk drives, then both databases must be in the same group.
In a failover, the data in a temporary table does not fail over. Operations that involve the use of temporary tables and tablespaces (such as sorts and hash joins) re-create any needed temporary objects when restarted on the failover node. However, you must review applications that rely on the existence of specific data in temporary tables to be sure they function as expected.
Refer to the Temporary Tables discussion in the Oracle Database Concepts manual for more information about temporary tables.
The group must contain at least one network name.
Database service names must be unique across the cluster.
The listener and all the databases in the group must use the same Oracle home.
Oracle Fail Safe does not support the use of mounted folders (mount points) or symbolic links for files used by an Oracle database. For example, a control file or server parameter file cannot be referenced by any filename other than the actual filename that represents the file. Files that are stored on a cluster shared volume, and thus use the C:\ClusterStorage
root folder are supported.
A text initialization parameter file (PFILE
) must be used to start the database. Use the oradim utility to set the location of the PFILE
.
Table 7-1 provides a quick reference to the tasks needed to configure a single-instance Oracle Database for high availability. For detailed instructions about a particular task, see the online help and tutorial. To access online help, select Help from the Actions menu on the right pane of Oracle Fail Safe Manager window. Or select Fail Safe Documentation in the middle pane of Oracle Fail Safe Manager, then select the HTML or PDF version of the Tutorial for step-by-step instructions.
Table 7-1 Steps for Configuring Databases
Step | Procedure | Oracle Fail Safe Manager Procedure |
---|---|---|
1 |
Ensure that the Oracle Database software is installed on a private disk on each node in the cluster that you intend to be a possible owner for the Oracle Database. |
See the Oracle Database documentation for installation information. |
2 |
Create a group and add one or more network names. |
|
3 |
Create a sample database, if desired. |
From Oracle Resources view, choose Create Sample Database action from the Actions menu in the right pane of the screen. You can use this sample database to try out the features of Oracle Fail Safe before using them on a production database. Do not use the sample database for production work. |
4 |
Verify the standalone database. |
Select the resource that you want to verify from the Available Oracle Resources list, then select Validate from the Actions menu of the Oracle Resources view. This operation performs validation checks to ensure that the standalone database is configured correctly on the node where it resides and to remove any references to the database that may exist on other cluster nodes. |
5 |
Add the Oracle Database to the group. |
Select the resource that you want to add from the Available Oracle Resources list, then select Add Resource from the Actions menu of the Oracle Resources view.This helps you configure the single-instance Oracle Database for high availability. |
6 |
Configure clients (modify the |
Oracle Fail Safe Manager provides the Add Resource to Group Wizard to assist you in configuring a single-instance Oracle Database for high availability. The pages presented in the wizard vary, depending on the number of network names currently in the group, and the number of nodes in the cluster.
Typically, each group has one network name, but more complex configurations may have multiple network names. To perform a typical configuration using the Add Resource to Group Wizard, you need the following data:
Identity of the single-instance Oracle Database, including instance name and specification for the database initialization parameter file
The database SYS
password, if OS authentication is not used
If you add a database to a group that currently contains multiple network names, then you are also asked to specify the network name or names for the listener.
The following sections describe in detail the configuration requirements for single-instance databases.
Microsoft failover clusters allow you to use any text string for the name of a resource. By default, Oracle Fail Safe uses the instance ID for the database. You can change the name to something more meaningful, if desired. For example, the cluster resource name is changed to Test Database here.
Figure 7-1 Add Resource to Group Cluster Resource Name Wizard Page
If you are adding a database to a group and the cluster consists of more than two nodes, then you are asked to specify the nodes which must be possible owners for the database by specifying a list of selected nodes, as shown in Figure 7-2. To specify that a particular node must not be a possible owner for the database, select the node from the Selected Nodes list and click the left arrow.
Resource Possible Owner Nodes List describes in detail the concept of the possible owner nodes list.
Figure 7-2 Add Resource to Group Wizard Page When All Nodes Are Available
If you are adding a single-instance database to a group and the cluster consists of two or more nodes, but one or more nodes are unavailable, then you are also asked to specify which nodes must be possible owners for the database. In this case, the wizard page displays which nodes are unavailable, as shown in Figure 7-3.
Figure 7-3 Add Resource to Group Wizard Page When Any Node Is Unavailable
If the group to which you are adding a single-instance database contains multiple network names, then the Add Resource to Group Wizard asks you to specify the network name or names for the listener, as shown in Figure 7-4. This page is not displayed if the group to which you are adding a database contains only one network name.
Figure 7-4 Add Resource to Group Network Name Wizard Page
Oracle Fail Safe includes support for multiple network names in a group. All databases in a group must use the same network names, and the network names must be added to the group before you add the databases to the group. The sequence for building a group is as follows:
For example, if a group contains a database that is using two network names and you add a second database to the group, then the second database must use the same network names as the first database that was configured into the group. Oracle Fail Safe Manager checks to ensure that the same network names are used for all single-instance databases that you add to a group.
See Configurations Using Multiple Network Names for information about configuring a resource in a group with multiple network names.
The Add Resource to Group Wizard requests database parameters information to uniquely identify the single-instance database that is being configured for high availability, as shown in Figure 7-5.
Figure 7-5 Database Parameters Wizard Page
Oracle Fail Safe uses this data to configure the database into the cluster (for example, to update the tnsnames.ora
file). It also passes the data that you supply to Microsoft Windows Failover Clusters, where it is registered for use when the database is brought online, taken offline, or when Is Alive polling is performed. Oracle Fail Safe requests the name and location of the initialization parameter file.
When an Oracle Database starts, it uses the initialization parameter file to specify the name of the database, the amount of memory to allocate, the names of control files, and various limits and other system parameters.
In most cases, place the parameter file on a cluster disk so that it can be accessed regardless of which cluster node is currently hosting the database. However, a copy of the initialization parameter file can be placed on each node's private disk, if you ensure that the file exists at the same location on all cluster nodes that are configured to run a database. You may decide to place the parameter file on each node's private disk to set different parameters for the database, depending on which node is hosting it. This can be useful if some nodes have less memory or processing capabilities than others.
Note:
If needed, you can move the initialization parameter file after a database has been configured for high availability. See the Oracle Fail Safe Manager Help for information about how this is performed.
Oracle Fail Safe requires that a text initialization parameter file (PFILE
) be specified in the Parameter File field. To use a binary server parameter file (SPFILE
) with databases configured for high availability, specify the location of the SPFILE
from within the PFILE
using the SPFILE=
SPFILE-location
parameter. The SPFILE
must reside on a shared disk that is a member of the cluster group where the database resides. For example, the contents of the PFILE
may include the following parameters:
SPFILE=F:\OFSDB\oradata\OFS1\spfileTestDboradb.ora
(If you specify an SPFILE
in the PFILE
that Oracle Fail Safe uses, then use caution if you export the SPFILE
. If you use a CREATE PFILE FROM SPFILE
command without including file specifications, then you overwrite the PFILE
that Oracle Fail Safe is using. Ensure that you specify a unique file name for the PFILE
to which the SPFILE
is exported. See Oracle Database Administrator's Guide for detailed information about server parameter files.)
All Oracle database instances on each node of the cluster must use the same SPFILE
and the file must be on shared storage. If the SPFILE
is not currently stored on a shared disk, then create a copy using SQL*PLUS
as follows:
CREATE SPFILE=shared disk path\spfiledb_unique_name.ora
Create a PFILE
, ORACLE_HOME
\dbs\initsid.ora
that contains the name SPFILE=
shared disk path
\spfile
db_unique_name
.ora
.
ORA_DBA
group
ORA_
SID
_DBA
group
or the ORA_
home
_DBA
group associated with the database.
SYS
account to access the database.When an Oracle database is added to a cluster group, Oracle Fail Safe checks to see if the ORA_sid_DBA
and ORA_sid_OPER
local user groups exist on the cluster node that owns the database. If those local groups exist, then they are replicated to the other nodes in the cluster. Fail Safe will not copy any group members that are specific to that node, such as a local user name. It will copy built-in members. For example, the Administrators member will be copied to other nodes. If a local group has no members then it is ignored.
This page lets you specify whether Oracle Fail Safe should use operating system authentication or the SYS
account to access the database and its instances, as shown in Figure 7-6.
If operating system authentication is enabled for all databases in the cluster (the Fail Safe server username is a member of the ORA_DBA
user group), then the Use operating system authentication option will be selected by default and cannot be changed.
If cluster-wide operating system authentication is not enabled, then you can choose to Use operating system authentication for a specific database (the Fail Safe server user will be added to the ORA_SID_DBA
user group) or select Use SYS account to specify that Fail Safe should authenticate using the database SYS
account and its associated password.
If an Oracle Home User is configured, then Oracle Fail Safe displays an additional set of password fields for the Oracle Home User. Ensure that you provide the Oracle Home User password too.
If Oracle Fail Safe detects that the standalone database has an associated password file, then the wizard asks if you want Oracle Fail Safe to create the password file on all possible owner nodes for the fail-safe database, as shown in Figure 7-8.
Oracle recommends that you select the "Yes, create the password file" option. A password file is often required when you perform remote operations. For example, Recovery Manager (RMAN) requires a password file when connecting to the target database over a nonsecure Oracle Net connection.
If you select the "No, do not create the password file" option, then all users must access the database using operating system authentication, and users will not be able to perform remote database administration operations.
Oracle Fail Safe makes the following adjustments to the database initialization parameter file, depending on your choice to have Oracle Fail Safe create the password file on all cluster nodes that are possible owner nodes for the database:
Yes, create the password file
Sets the REMOTE_LOGIN_PASSWORDFILE
parameter to EXCLUSIVE
.
No, do not create the password file
Sets the REMOTE_LOGIN_PASSWORDFILE
parameter to NONE
.
If you want to change the password for the SYS
account after the database has been added to a group, then you must also update the password through Oracle Fail Safe Manager. See Changing the SYS Account Password for information about how to update the password for this account after the database has been added to a group.
Finally, the Add Resource wizard asks you to confirm the operation. Note that the cluster group will be taken offline during the Add operation. The database and any resources in the group will be unavailable while Oracle Fail Safe adds the database to the group. Click Finish to complete the task of adding the Oracle Database to group.
Figure 7-9 Database Resource Addition Confirmation Page
When you add a single-instance database to a group, Oracle Fail Safe creates and configures the Oracle Net listener resource and the database resource in the group. The new group listener configuration is based on the listener that the standalone database is using when it is being added to the group. The new listener will be given the same parameters as the original listener and it will use the same port numbers in its address list.
During normal operations, the cluster will periodically poll the listener to verify that the Windows service is started and that the listener responds to status commands. If those checks fail, then the listener is terminated and the cluster starts its failover policies to determine if the listener resource should be restarted or if the group should be failed over to a different node. Any resource failure discovered by the Oracle cluster resource control manager will be logged in the Windows application event log.
Oracle Fail Safe creates a dependency between the database and the IP address associated with the listener but not on the listener itself. This dependency is created to avoid a situation in which clients would stop responding when an IP address was taken offline before the database.
Network objects (including databases) are identified by a network address. For a connection between a client and a database to be made, the network address in the tnsnames.ora
file on the client and the network address in the listener.ora
file on the server must match. In other words, a client uses a network address to send a connection request to a particular network object location, and the recipient listens for requests on this address and grants a connection based on its address information matching its client information.
When you add a single-instance database to a group, Oracle Fail Safe creates a listener for the group in the same Oracle home where the database resides. When Oracle Fail Safe configures the network name information, it updates the tnsnames.ora
files in all Oracle homes on cluster nodes that are possible owners for the database. This enables Oracle Fail Safe to access the database instance using the updated configuration.
When you add a single-instance database to a group, Oracle Fail Safe changes the Oracle Net configuration for the database in the tnsnames.ora
file, the listener.ora
file, and the sqlnet.ora
file as described in the following sections.
If the TNS_ADMIN environment variable exists, then Oracle Fail Safe will update the network configuration files in the directory pointed by the TNS_ADMIN environment variable instead of Oracle home.
First, Oracle Fail Safe scans the tnsnames.ora file for any existing net service descriptors that reference the database. Any existing descriptors are changed to use the address list of the group's TNS listener.
Then, for each service name found in the database's service_names parameter, Oracle Fail Safe ensures that there is a net service descriptor in the tnsnames.ora file.
If no net service descriptor is found, Oracle Fail Safe creates a new net service descriptor that contains an address list that matches the group's listener address list. If there are multiple Oracle homes on the node and the TNS_ADMIN environment variable is not set, then all net service descriptors for the database are duplicated to the tnsnames.ora files in the other Oracle homes.
Similarly, the new net service descriptors are duplicated to all tnsnames.ora files on all other nodes in the cluster.
When adding a single-instance database to a group, if you do not specify a domain name in the Oracle Net service name Oracle Fail Safe chooses a domain name to append to the net service name as follows:
Oracle Fail Safe looks for the default domain name in the Oracle home of the latest database version on the node. If found, this default domain name is appended to the net service name. For example, assuming Oracle Database 12c is the latest database version on the node, if you specify MyDB
as the Oracle Net service name, and the default domain name in the Oracle Database 12c home is example.com
, then the net service name will become MyDB.example.com
.
If there is no default domain name in the Oracle home of the latest database version on the node, then Oracle Fail Safe appends nothing to the net service name. For example, if you specify MyDB
, then the net service name will also be MyDB
.
If you define an archive log destination as a service name, as shown in the following example, then Oracle Fail Safe will not automatically update the tnsnames.ora
file on all cluster nodes. On each cluster node, edit or add the service name entry to the tnsnames.ora
file manually.
log_archive_dest_2='SERVICE=standby OPTIONAL REOPEN=120'
All client systems that connect to the database must have their tnsnames.ora files updated to use the cluster group's network name for the HOST
parameter in each network service descriptor's address list that references the database. Edit each client's local tnsnames.ora file manually or use a network configuration tool.
When you add a single-instance database to a group, Oracle Fail Safe makes the following changes to the listener.ora
file:
Creates a new Oracle Fail Safe listener that is configured to listen on the network name associated with the single-instance database
Stops and restarts the standalone database listener to accept the changes that have been made
Starts the new Oracle Fail Safe listener
When a new cluster group listener is created, Oracle Fail Safe duplicates the port numbers from the original listener to the new listener. For example, if the original listener had ADDRESS
entries with ports 1521
and 1522
in the ADDRESS_LIST
, then the new listener will create an ADDRESS_LIST
that contains the same port numbers.
When a new group listener is created, Oracle Fail Safe forces all databases in the cluster group to use secure registration through the IPC protocol. So, Oracle Fail Safe creates a parameter, SECURE_REGISTER_group_listener_name
with the value, IPC
.
When a database is added to a cluster group and there is no listener configured for the group, then Oracle Fail Safe will copy the parameters from the database's current listener to the new group listener. For example, if the database is currently using the default listener named "listener", and that listener has the parameter INBOUND_CONNECT_TIMEOUT_LISTENER
in the listener.ora
file, then Oracle Fail Safe will create the parameter INBOUND_CONNECT_TIMEOUT_group_listener_name
for the new listener and assign it the value used for the INBOUND_CONNECT_TIMEOUT_LISTENER
parameter.
When you add a single-instance database to a group, if operating system authentication has been chosen for the database, then Oracle Fail Safe adds the SQLNET.AUTHENTICATION_SERVICES=(NTS)
parameter to the sqlnet.ora
file (assuming the parameter is not set).
Oracle Fail Safe does not create external procedure parameters for new group listeners. If your application uses external procedures, then you must manually edit the listener.ora
and tnsname.ora
files on each node in the cluster and add the parameters needed for the external procedures used by your application.
The following sections describe how Oracle Fail Safe supports single-instance databases that use a shared server configuration.
Note:
When you set up a database to use a shared servers configuration, ensure that Oracle Fail Safe can continue to use a dedicated server connection for its internal operations. Do this by specifying the (SERVER=DEDICATED
) parameter in the connect data portion of the net service name entry for the database in the tnsnames.ora
file on each cluster server node. (By default, if shared servers are used and no SERVER
parameter is specified, then the listener establishes a connection using shared servers.)
Oracle Fail Safe supports single-instance Oracle8 databases that use a shared servers configuration. However, Oracle Fail Safe does not automatically update the database initialization file where the shared servers configuration is defined.
You can configure a standalone single-instance database or a single-instance database that is currently a resource in a group to use shared servers. In both cases, you must update the database initialization file by performing the following steps:
Determine the listener parameters for the group containing the single-instance database using the shared servers configuration, as follows:
Find the listener.ora file in the Oracle Net configuration directory in the Oracle home where the database resides.
Search the listener.ora file for the SID of the database and find the first listener address for the group using the TCP protocol.
For example, the boldface text in the following listener.ora file shows the first listener address of the group:
LISTENER = (Entries for default Listener) (ADDRESS_LIST = . . . Fslvirtualnode = (Entries for Fail Safe Listener) (ADDRESS_LIST= (ADDRESS= (PROTOCOL=IPC) (KEY=OFS1) ) (ADDRESS= (PROTOCOL=IPC) (KEY=805mts.world) ) (ADDRESS= (PROTOCOL=TCP) (Host=virtualnode) (Port=1521) ) (ADDRESS= (PROTOCOL=TCP) (Host=virtualnode) (Port=1526) ) ) SID_LIST_Fslvirtualnode = (SID_LIST= (SID_DESC= (SID_NAME=OFS1) ) )
Find the first address in the file that includes the line (PROTOCOL=TCP), and format the address parameters into a single line. For example:
(ADDRESS=(PROTOCOL=TCP)(Host=virtualnode)(Port=1521))
Update the database initialization file (for example, initofs1.ora) to use the listener parameters for the group. To do this, perform the following steps:
Open the database initialization parameter file.
Note that the initialization parameter file for the database may reside on a disk on the shared interconnect, for example:
H:\OFSDB\OFS1\PARAM\initofs1.ora
If each node of the cluster has its own copy of the file on its private disk, then you need to update all copies.
In the database initialization parameter file, search for the line containing the following parameter:
mts_listener_address
Replace the value of the mts_listener_address parameter with the listener address that you formatted in step 1c.
For example, assume the original mts_listener_address parameter contains the following value:
mts_listener_address = "(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521))"
Replace the line, as follows:
mts_listener_address = "(ADDRESS=(PROTOCOL=TCP)(HOST=virtualnode)(PORT=1521))"
Save the database initialization file.
Check that value of the mts_service parameter is the database SID.
The Database Configuration Assistant might use the database name for the mts_service parameter value. If so, change the value to the database SID.
Stop and restart the resources in the group.
To have your changes take effect, use Oracle Fail Safe Manager or the FSCMD command to take the group that contains the database offline and then place it back online. This stops and restarts all resources in the group.
To use a shared server configuration, it may be necessary to make modifications to the database parameter file.
You can specify listener information in either the LOCAL_LISTENER
or the DISPATCHERS
parameter for a shared server configuration.
If the shared server configuration uses the LOCAL_LISTENER
parameter to specify full listener information (full listener information specifies both host and port values), then Oracle Fail Safe automatically updates the database parameter file for the shared server configuration during the Add Resource to Group operation.
The single-instance database runs in shared server mode after you add it to a group. Do not make any further changes to the database parameter file.
The following example shows a shared server configuration that will be updated automatically by Oracle Fail Safe:
dispatchers = "(PROTOCOL=TCP)(DISPATCHERS=1)" local_listener = "(ADDRESS=(PROTOCOL=TCP)(HOST=124.7.56.1)(PORT=1521))"
After you add the database to a group, Oracle Fail Safe updates the LOCAL_LISTENER
parameter to use the listener information for the group.
However, if the shared servers configuration uses the DISPATCHERS
parameter to specify full listener information, then remove the host and port values from the DISPATCHERS
parameter. Oracle Fail Safe always writes the LOCAL_LISTENER
parameter to the database parameter file.
When you remove a database from a group using Oracle Fail Safe Manager, it deletes the LOCAL_LISTENER
parameter from the database initialization file. You must add the parameter back into the database initialization file by following the instructions in Shared Server Configuration and a Standalone Database.
To manage a single-instance Oracle Database, use a database administrator account that has SYSDBA
privileges. This lets you administer Oracle Databases from a remote client.
When you create a single-instance sample database or add a single-instance database to a group, Oracle Fail Safe must use operating system authentication or the SYS
user account to access the database. Use an authentication password file and set the initialization parameter, REMOTE_LOGIN_PASSWORDFILE
, in the database initialization parameter file (init
database-name
.ora
) to either SHARED
or EXCLUSIVE
if users access the database using the SYS
account. Set the REMOTE_LOGIN_PASSWORDFILE
to NONE
if users only access the database using operating system authentication.
Note:
Oracle Fail Safe does not support setting the Windows registry DBA_AUTHORIZATION
parameter to the value of BYPASS
.
Refer to Oracle Database Administrator's Guide for more information about database administrator authentication and the REMOTE_LOGIN_PASSWORDFILE
parameter.
Database password files are stored on private disks. Changes that you make to the password file on one cluster node are not automatically applied to the corresponding file on the other cluster nodes.
If you add an account to the password file on one cluster node, then you must add that account to the password file on the other cluster nodes that are configured to run the database instance. If there are accounts in addition to SYS
stored in a password file, then you must grant SYSOPER
and SYSDBA
privileges for the additional accounts on the other cluster nodes for a single-instance fail-safe database.
If you add a single-instance database to a group with the Oracle Fail Safe Manager Add Resource to Group Wizard, then Oracle Fail Safe creates a database instance on the other nodes that are configured to run the database and uses the default value for the maximum number of users in the password file. The password file on the node where the instance is created contains only the password for the SYS
account that you supply in the Add Resource to Group Wizard.
On the other nodes configured to run the database instance, perform the following steps to synchronize the password files on the other cluster nodes:
The password for the SYS account is normally stored in a password file that is located in the Oracle home associated with the database. Since each cluster node has an Oracle home that is used for a database, that means that there are multiple password files to be maintained when a database is a cluster resource. To change the password for the SYS account use the Oracle Fail Safe Manager utility so that the change can be propagated to each Oracle home in the cluster that is associated with the database. Do not change the SYS account password using SQL*Plus or any other utility since that will interfere with the password maintenance strategy used by Oracle Fail Safe.
Select the database from the cluster resource list and click Properties in the Actions menu.
The resource properties page displays.
If the operating system authentication is not used, then the password field displays.
For most databases, the password change takes effect immediately. However, if the database password file is being shared, then the database must be re-opened before the password change takes effect. You can do this by taking the database cluster resource offline and then bringing them back online. Alternatively you can move the cluster group that owns the database to another node in the cluster.
This section describes how to use the Oracle Database Upgrade Assistant to upgrade a single-instance fail-safe database from one release to another, or to move a single-instance Oracle Database from one Oracle home to another.
For each single-instance database you upgrade or move to a new home, perform the following steps:
Remove the single-instance database from the group.
Run the Oracle Database Upgrade Assistant from the Oracle home to which you are moving or upgrading your single-instance database.
Be prepared to provide the location of the database parameter file for the single-instance database you are upgrading. During a database upgrade, the database parameter file is converted. If the database parameter file is on a cluster disk, then your parameter file is appropriately located for Oracle Fail Safe to make the conversion. If the database parameter file is located on a private disk, then the Oracle Database Upgrade Assistant only converts the local copy. In this case, you must edit the copy on the other cluster nodes and make the appropriate changes.
Specify the location of the converted database files as asked by the Oracle Database Upgrade Assistant. Either leave the data files in their current location, or specify a cluster disk that is currently accessible by the local node. If you choose the latter, then ensure the cluster disk is not being used by another group.
If you are upgrading an Oracle7 database to an Oracle8 database, the Oracle Database Upgrade Assistant creates a new data file called <Oracle_Home>\database\mig<SID>.ora on a private disk, where SID is the database instance name. Move this new data file to a cluster disk with your other data files, as follows:
Use SQL*Plus to connect to the database, then shut it down.
SQL> SHUTDOWN
Copy the <Oracle_Home>\database\mig<SID>.ora file to a cluster disk where <SID> is the database instance name.
Use SQL*Plus to connect to the database and then execute the following commands:
SQL> STARTUP PFILE=init<SID>.ora MOUNT
SQL> ALTER DATABASE RENAME FILE
'<Oracle_Home>\database\mig<SID>.ora' TO
'cluster_disk\mig<SID>.ora';
SQL> SHUTDOWN
SQL> EXIT
When all databases in the group have been upgraded or moved to a new home with the Oracle Database Upgrade Assistant, use Oracle Fail Safe Manager to put the databases back into the group and then place the databases online, as follows:
To add an available resource to a group, select the resource you want to add to a group, then select Add Resource from the Actions menu of the Oracle Resources view.
The Add Resource to Group guided process wizard opens to assist in the configuration of the cluster resource.
If one database in a group is moved with the Oracle Database Upgrade Assistant to a new Oracle home, then all databases in the group must use the same Oracle home.
For releases of the Oracle database server prior to Oracle9i, the ctxsrv server processes background data manipulation language (DML) for indexing, searching, retrieving, and viewing documents. (Beginning with Oracle9i, you can index, search, retrieve, and view documents with standard SQL or PL/SQL procedures.)
If you are using a ctxsrv server with a single-instance Oracle database server, you can configure the ctxsrv server for high availability, as follows:
Create a batch file to start the ctxsrv server and specify the personality mask. For example, create a file named context.bat that contains the following command line:
ctxsrv -user CTXSYS/CTXSYS -personality QDM
Open Windows Failover Cluster Manager and configure the ctxsrv server as a highly available generic application, as follows:
On the File menu, click New, then Resource.
On the New Resource page:
In the Name field, enter the name of the ctxsrv server.
In the Description field, enter a description of the ctxsrv server, if desired.
In the Resource type field, select Generic Application.
In the Group field, select the group that contains the database with which the ctxsrv server is associated.
On the Possible Owners page, specify the nodes in the cluster on which the ctxsrv server can be brought online. These should be the same as the possible owners for the single-instance database with which the ctxsrv server is associated.
On the Dependencies page, specify the single-instance database with which the ctxsrv server is associated as a resource dependency. If the ctxsrv server has a disk dependency (other than those that the database requires), specify the disk or disks as resource dependencies also.
On the Generic Application Parameters page:
In the Command line field, enter the file specification for the batch file you created earlier. For example, context.bat
.
In the Current directory field, enter the directory where the ctxsrv server was installed (for example, D:\Orant\bin).
On the Registry Replication page, you need not enter any registry keys; click Finish.
After you complete the configuration, the cluster service starts up the .bat file and opens a command window to display ctxsrv server logging information. If someone closes the command window, another command window opens immediately and the ctxsrv server continues to search, as usual. If the group containing the ctxsrv server fails over, operations and the searching function continue as usual.
You can use Oracle Enterprise Manager to manage and monitor single-instance databases in an Oracle Fail Safe environment. For example, you can use Oracle Enterprise Manager to:
For Oracle Enterprise Manager to discover Oracle Fail Safe clusters, you must edit the nmiconf.lst
file and add fs_discover.tcl
as the first entry in the list. This must be performed on all nodes of the cluster. If you are using Oracle Intelligent Agents from multiple Oracle homes, then add the fs_discover.tcl
entry in one of the Oracle homes. The nmiconf.lst
file is located in Oracle_Home
\NETWORK\AGENT\CONFIG
.
Discover groups (virtual servers)
Note:
You must perform discovery on each group for Oracle Enterprise Manager to see the resources configured in that group. Once discovered, each group appears as a node in the Oracle Enterprise Manager nodes list, and you can manage the resources in the group as you would manage any standalone resource.
Create and register jobs and events for a database in a group as you would for a standalone database
Create and register jobs and events for a group as if it were a physical node (host)
See Also:
Security Access and Authentication Problems for information about troubleshooting problems related to integrating with Oracle Enterprise Manager.
After you add a single-instance database to a group, run the Add Resource to Group Wizard again to add an Oracle Management Agent to that group. The wizard displays a dialog box similar to the one shown in Figure 7-10.
Figure 7-10 Add Resource to Group Wizard - Resource Page
You add only one Oracle Management Agent to a group, regardless of the number of databases in the group. However, the group must contain at least one database resource before you can add the Oracle Management Agent. Similarly, you cannot remove the last database resource from a group without first removing the Oracle Management Agent.
When you specify that you want to add an Oracle Management Agent to a group:
Oracle Fail Safe creates a new Management Agent
The new Management Agent uses one of the cluster disks (specified when you ran the Add Resource to Group Wizard to add the Oracle Management Agent) to store its jobs and events information
Oracle Fail Safe configures the new Management Agent to be a part of the group:
See the Oracle Fail Safe Help for information about scheduling jobs for Oracle databases configured in a cluster and for monitoring events (such as failovers) using Oracle Enterprise Manager.
Oracle Databases configured with Oracle Fail Safe for high availability ensure fast failover and fast recovery during both unplanned and planned outages (such as software upgrades and scheduled maintenance). You can take advantage of Oracle fast-start and disaster-recovery features, control time spent during database recovery, and ensure continuous monitoring of databases configured with Oracle Fail Safe for high availability.
Oracle Fail Safe and Oracle Database technology optimize the time it takes to shut down a database on one node and complete instance recovery on another node for both planned and unplanned failovers. The Oracle Database checkpoint algorithms optimize the time it takes to perform instance recovery for planned and unplanned failovers.
When you use Oracle Fail Safe Manager (or PowerShell cmdlets) to carry out a planned failover, Oracle Fail Safe checkpoints the single-instance Oracle Database before it is shut down. The single-instance database is started on the other node in a restricted mode so that instance recovery can be completed quickly and the database made available to the database clients promptly. (If you use Microsoft Windows Failover Clusters to carry out a planned failover, then it does not checkpoint the database before shutting it down.)
Note:
If you use a tool other than Oracle Fail Safe Manager, Oracle Fail Safe PowerShell cmdlet, or Windows failover clusters to take a database offline, then Oracle Fail Safe considers it a failed resource and attempts to place it back online.
For unplanned failover, the instance recovery time is controlled by the database recovery processing.
Perform administrative tasks on a database configured for high availability as you would for any database, with one exception. Use Oracle Fail Safe Manager or the PowerShell cmdlets command-line interface (see PowerShell Commands) to take a database offline (and stop cluster monitoring of the database) during any operation that restricts access to the database or for which you want to temporarily disable the possibility of failover. This includes cold backup operations, administrative operations that must be performed while users continue to access the database, and operations that could affect response times during the periodic Is Alive polling by Microsoft Windows Failover Clusters.
Use the following steps to perform administrative tasks on a database that is configured in a group with Oracle Fail Safe Manager:
If, during an administrative task, you perform an operation that changes the configuration of the database (such as adding a new tablespace and associated data file), then you must run the Validate group operation. Adding a new data file can introduce a new disk dependency in the group. When you run the Validate group operation, it checks to ensure that the disk is a cluster disk and that it does not belong to another group. If adding the new data file introduces a new disk dependency in the group, then the disk is added to the same group as the database and the information in the cluster registry is updated to ensure that the new disk correctly fails over with the database.
Starting with Oracle Database 12c Release 1 (12.1), Oracle Database supports the use of Oracle Home User specified at the time of installation. Oracle Home User must be a domain user account. Oracle Home User is associated with an Oracle home. Ensure that all nodes in a cluster that use the same Oracle home use the same Oracle Home User.
When Oracle Fail Safe accesses a database, it usually uses the same Oracle Database home to access any database on the system. Oracle Fail Safe selects the database home when the server starts. Oracle Fail Safe scans all database homes to look for the home that has the highest software version. Initially, it only looks at homes that have their \bin
path included in the system PATH
environment variable. If Oracle Fail Safe does not find any database homes in the system PATH
, then it scans all the database homes looking for the home with the highest version.
Note that since Oracle Fail Safe chooses a database home when it starts, it will not be aware of any database homes that are installed after Oracle Fail Safe started. The Oracle Fail Safe server and resource monitors must be restarted before Oracle Fail Safe considers a new database home for use. After installing a newer version of Oracle Database, the Cluster Service service should be restarted on all nodes so that Oracle Fail Safe can use the new database home.
If there are databases managed by the cluster, then there must be a database home installed on a local disk of each node in the cluster. If a database home is installed on a shared cluster disk then it's \bin
directory should not be included in the system PATH
environment variable.
There are two different Oracle Fail Safe components that may access a database:
The Oracle Fail Safe server (OracleMSCSServices)
The Oracle Fail Safe database resource DLL (FsResOdbs.dll)
The server will normally only access a database when configuring a database resource (add or delete resource) or during verify operations. During typical system operation the Oracle Fail Safe server does not access any databases.
The resource DLL is invoked by the Windows Cluster Service when a database or a listener resource is referenced by the cluster. For example, when a database is brought online during IsAlive polling when the resource fails over to another virtual node, and so on. On systems with multiple database homes, there may be a requirement to have each database on the system accessed using the same database home software that is being used by the instance for that database. It is possible to configure a resource to run in a separate resource monitor process by selecting the "Run this resource in a separate resource monitor" check box on the resource properties page displayed by the Oracle Fail Safe Manager. When that option is enabled, instead of always using the highest version database home on the system, the resource monitor process for the database uses the database home that is used to run the database instance when accessing the database. When referencing database listener resources, the resource DLL always uses the software from the \bin
path used to start the listener service, regardless of the setting of the "Run this resource in a separate resource monitor" option.
See Oracle Fail Safe Services for information regarding the user accounts used by Oracle Fail Safe components when accessing databases.
For standalone single-instance databases, transparent application failover (TAF) instructs Oracle Net to reestablish a failed connection to a database by connecting to a different listener. This lets the user continue work using the new connection as if the original connection had never failed. The transparent application failover feature does not work the same way for a single-instance Oracle Fail Safe database as it does for a standalone single-instance database. For a Oracle Fail Safe database, a transparent application failover instructs Oracle Net to reconnect to the same listener, which has moved to another cluster node due to a group failover.
For a standalone database, the term failover in the phrase "transparent application failover" refers to Oracle Net failing over a connection from one listener to another. For a Oracle Fail Safe database, the term failover in the phrase "transparent application failover" is a bit of a misnomer as the application does not fail over, but the listener to which it is connected fails over, and then a connection is reestablished.
These differences in implementation do not affect how you manage transparent application failover.
To take advantage of transparent application failover when connected to a database configured with Oracle Fail Safe, the client applications must connect through Oracle Net to an Oracle Database.
With transparent application failover, clients must not explicitly reconnect after a group fails over. The OCI connection handles reconnection and state recovery automatically for the client application. In fact, applications that are not actively updating the database at the time of a failure may not notice that failover is occurring.
Refer to the Oracle Net Services Administrator's Guide for complete information about transparent application failover.
The following sections describe how to handle errors that occur when Oracle Fail Safe tries to bring a highly available single-instance database online. They also describe troubleshooting specific problems with single-instance Oracle Databases configured for high availability.
You can create a script to handle errors that may occur when Oracle Fail Safe is attempting to place a single-instance database online. Oracle Fail Safe uses the same script for all single-instance fail-safe databases on the cluster.
To specify an error handling script:
If Oracle Fail Safe cannot bring a single-instance database online, first it spawns a process to run the script, then it passes the error code, the database name, the database SID
, the TNS service name, and the database parameter file specification to the script and executes the script, as follows:
FsDbError.bat error-code database-name SID TNS service name parameter-file-spec
For example:
FsDbError.bat ORA-01113 OracleDB OracleDB OracleDB.WORLD D:\Ora\admin\OracleDB\pfile\initOracleDB.ora
The process running the script waits for the script to finish within the time specified as the Pending Timeout value for database resources. If the script does not finish within the pending timeout period, then the script is terminated.
Oracle Fail Safe logs an event to the Windows Event Log to indicate whether the script succeeded, failed, or was terminated by Oracle Fail Safe. If the script failed, the error code is also written to the event log.
Regardless of whether the script succeeds or fails, Oracle Fail Safe continues to attempt to bring the single-instance database online as defined in the database restart and failover policies.
In most cases, the first step in troubleshooting a problem is to select the Validate cluster, Validate group, or Validate standalone database action. These tools are described in general in Validating Actions. If the Validate actions do not reveal the source of the problem, review the Windows Application Event Log to see if any error messages have been posted. When the operation fails, Oracle support may also ask to have tracing enabled. See Contacting Oracle Support Services for more information on enabling Oracle Fail Safe tracing.
When you select Validate action for a group containing a single-instance database, Oracle Fail Safe performs the following tasks:
Queries each database in the group to determine which disks it uses. Then, it validates that the disks are cluster disks and have been added to the group. If the disk validation fails (for example, because a disk has been added to the database since it was configured for high availability), then the Validate group action prompts you before fixing the problem.
Detects disk drive changes and updates resource dependencies, if necessary.
You can select the Validate group action at any time. However, you must run it when any of the following occurs:
A group or resource in a group does not come online.
You add more disks to a single-instance database that is configured in a group.
A new node is added to the cluster.
For example, assume that you add a new disk to a single-instance database, but you do not use Oracle Fail Safe Manager to update the cluster configuration. If a server node subsequently shuts down, failover does not occur correctly because the cluster software was never notified that there was a change in the configuration. To prevent this from happening, you must verify the group containing a single-instance database whenever you make a structural change to the database. When you verify the group, Oracle Fail Safe automatically detects changes and updates the cluster configuration for you. In the previous example, Oracle Fail Safe would add the new disk to the group for you.
If any problems are found during the group verification, then Oracle Fail Safe prompts you to fix them or returns an error message that further describes the problem.
While adding a database to a group, detailed error information might not be displayed by the Oracle Fail Safe Manager when a listener or database resource fails to come online. The Oracle Fail Safe resource control manager will log error information in the Windows application event log. For additional error information, refer to Contacting Oracle Support Services.
If there is a problem placing a group that contains a single-instance database online, then try the following:
When you select Validate (from the Oracle Fail Safe Manager group Actions menu), Oracle Fail Safe checks the group configuration and attempts to fix any problems that it finds.
If the Validate group action finds a problem, then it returns an error message that should help you resolve the problem manually.
Check the Oracle Net listener log.
Oracle Net logs an entry to the listener log file every time an error is encountered or a database is accessed through the listener. Check for errors in the log file that may help you to identify the problem.
Check the Oracle Net configuration data.
The listener.ora
file on the server system and the tnsnames.ora
file on both the client and server systems must contain valid virtual server addresses for the groups in your cluster.
Bring each resource in the group online individually.
If multiple single-instance databases are in the group, then this helps you to identify the database causing the problem.
Ensure that the single-instance database Pending Timeout value specified in the Advanced Policies property page is sufficient.
If a group containing a database fails to come online or frequently fails over, then check that the Pending Timeout value is set correctly. Failure to come online and frequent failovers occur if the Pending Timeout value for the database is set too low.
Set the Pending Timeout
value to specify the length of time you want the cluster software to allow for the database to be brought online (or taken offline) before considering the operation to have failed. Set the value high enough to prevent a cluster system from mistaking slow response time for unavailability, yet low enough to minimize the failover response time when a failure does occur.
Set the Pending Timeout
value by modifying the database properties, as follows:
In the Oracle Fail Safe Manager tree view, select the database name.
Click the Advanced Policies tab.
In the Pending Timeout box, modify the Pending Timeout
value.
If users use the SYS
account to access the database, then ensure that the initialization parameter REMOTE_LOGIN_PASSWORDFILE
in the database initialization parameter file (init
database-name
.ora
) is set to SHARED
or EXCLUSIVE
.
If users access the database using operating system authentication only, then ensure that the initialization parameter REMOTE_LOGIN_PASSWORDFILE
in the database initialization parameter file is set to NONE
.
If the account password that Oracle Fail Safe uses to access a database has changed, then update that change in Oracle Fail Safe Manager.
If the password for the account through which Oracle Fail Safe accesses a database changes and you do not update the information through Oracle Fail Safe Manager, then the attempts at polling the database will fail. See Changing the SYS Account Password for information about how to update database password changes for Oracle Fail Safe.
Sometimes, processing-intensive operations (such as an Import operation) can cause Is Alive polling to fail and may result in an undesired group failover. In such cases, you can disable Is Alive polling for the database by issuing the (Get-OracleClusterResource dbname).IsAliveEnabled=$false
command. However, be aware that when you disable Is Alive polling, Oracle Fail Safe suspends monitoring the instance until Is Alive polling is reenabled. You reenable Is Alive polling with the (Get-OracleClusterResource dbname).IsAliveEnabled=$true
command.
Oracle recommends that you run these PowerShell cmdlets commands from within a script so that you can ensure that Is Alive polling is reenabled when the processing-intensive operation completes.
For information about the PowerShell cmdlets commands, see PowerShell Commands.
If there is a problem when Oracle Fail Safe tries to bring a single-instance database online or offline, then the problem may be caused by the way database authentication has been set up. Try the following to solve the problem:
If you select Use SYS account for authenticating database in the General property page of Oracle Database, then ensure that the REMOTE_LOGIN_PASSWORDFILE
initialization parameter in the database initialization parameter file (init
database-name
.ora
) is set to SHARED
or EXCLUSIVE
.
Security Requirements for Single-Instance Databases describes how to correctly set up this parameter for database authentication.
Ensure that Oracle Fail Safe has access to the databases in the group.
For some operations that Oracle Fail Safe performs, such as a group verification and polling the database to ensure that it is online, Oracle Fail Safe must have access to the databases in a group. If the database account password has changed, then it must be updated in Oracle Fail Safe Manager. Otherwise, Oracle Fail Safe cannot monitor the database using Is Alive polling. This situation will be logged to the Windows Application Event log.
Changing the SYS Account Password describes how to correctly update the database password.
If you encounter problems when trying to establish a connection to either a standalone database or a database configured in a group, then you must check the Oracle Net configuration for the database.
Oracle Fail Safe provides the Validate group and Validate standalone database operations to help you verify and repair the Oracle Net configuration. See Validating the Configuration of Oracle Resources and Validating Standalone Database for details.
Oracle Fail Safe changes the listener.ora
and tnsnames.ora
files, and stops and starts listeners when configuring the network name information. The following list describes potential problems and the action you can take to correct each problem:
This message code reports any problems parsing (reading or updating) the Oracle Net listener.ora
and tnsnames.ora
files:
If these files are no longer valid due to improper update or file damage, then Oracle Fail Safe cannot use these to configure virtual server information. You must retrieve a valid version of these files or re-create the files using Oracle Net Assistant.
If these files are valid, then check that the net service name, the database SID
, and the network name of the group used in the operation are correct. Incorrect information may cause the virtual server configuration to fail. Ensure that a database SID
is not included in multiple listeners. On systems with multiple Oracle homes, check all of the listener.ora
files.
FS-10436 Failed to start Windows service
name
for the Oracle Net listener
Oracle Fail Safe starts a listener after changing the definition of a listener or creating the definition of a new listener.
The most common reason for this error is that another listener is already listening for an address. There can be only one listener on the system listening for a particular address or database SID
. For example, if LISTENER_A
has the following definition, then no other listener on the system can listen for key ORCL
using the IPC protocol, or port 1521 on host server_A using the TCP protocol, or ORCL SID
name:
LISTENER = (ADDRESS_LIST= (ADDRESS= (PROTOCOL=IPC) (KEY=ORCL) ) (ADDRESS= (PROTOCOL=TCP) (Host=server_A) (Port=1521) ) )
Any other listeners that try to use the same address or database SID as LISTENER_A
will fail to start.
Another common cause for failing to start a listener is the network name. The network name used by the listener must be active on the node where Oracle Fail Safe tries to start the listener.
See the Oracle Net documentation (including information about the log directory) for additional information about troubleshooting problems with the network configuration.
Whenever Oracle Fail Safe makes changes in the listener.ora
or tnsnames.ora
files, the original version of the file is archived. If you need to reference an Oracle Net service name definition or a listener definition as it was before Oracle Fail Safe changed the definition, then you can look at the archived versions of the configuration files.
Oracle Fail Safe retains previous versions of the configuration files. When a file is updated, the previous version of the file is renamed to filenametimestamp
.ora
. Note that filename
.ora
is the most recent file.
Most authentication problems are due to inconsistencies in copies of the database password file between the cluster nodes. If Fail Safe is able to access the database on one node but not on another, then the password files have lost synchronization. You can resolve this issue in a number of ways:
Copy the password file from the node which works to the node that does not work.
Remove the database from the cluster using the Oracle Fail Safe Manager. Remove the command and add it again using the Add Resource command.
Manually update the SYS password on the failing nodes by using the ORAPWD utility.
If operating system authentication is being used, then the Oracle authentication local user groups may be inconsistent. To verify the user groups, run Oracle Fail Safe Manager and select the Validate command for the cluster. Oracle Fail Safe will compare the contents of all the Oracle authentication local user groups on all nodes of the cluster and will report any inconsistencies found.
Oracle Fail Safe will recognize that a database is a root container and will start and stop individual pluggable databases owned by the container database. When a database is failed over or moved to another node in the cluster, Oracle Fail Safe will start each pluggable database using the state that was saved by the last SQL ALTER PLUGGABLE DATABASE ALL SAVE STATE
command. Oracle database 12c patch set 1 (12.1.0.2) is required to have the ability to save the state of the pluggable databases.
You can use the Oracle Fail Safe Manager to open and close individual pluggable databases in a container database. Click on the expansion icon next to the name of the container database to list the pluggable databases owned by the container database. To open or close a pluggable database, click on the desired database in the list, then choose the open or close action in the Actions list.
Figure 7-11 Example of a Pluggable Database
Note:
The container database cluster resource must be online before the Oracle Fail Safe Manager displays any information about pluggable databases owned by the container database.Oracle Fail Safe does not support the use of triggers to open or close pluggable databases. For example, a trigger similar to the following should not be used for a database in a failover cluster:
CREATE OR REPLACE TRIGGER sys.after_startup AFTER STARTUP ON DATABASE BEGIN EXECUTE IMMEDIATE 'ALTER PLUGGABLE DATABASE ALL OPEN'; END after_startup;