A Troubleshooting High Availability

This appendix describes common problems that you might encounter when deploying and managing Oracle Application Server in high availability configurations, and explains how to solve them. It contains the following topics:

Section A.1, "Troubleshooting OracleAS Disaster Recovery Topologies"
Section A.2, "Troubleshooting Middle-Tier Components"
Section A.3, "Need More Help?"

A.1 Troubleshooting OracleAS Disaster Recovery Topologies

This section describes common problems and solutions in OracleAS Disaster Recovery configurations. It contains the following topics:

Section A.1.1, "Standby Site Not Synchronized"
Section A.1.2, "Failure to Bring Up Standby Instances After Failover or Switchover"
Section A.1.3, "Switchover Operation Fails At the Step dcmctl resyncInstance -force -script"
Section A.1.4, "Unable to Start Standalone OracleAS Web Cache Installations at the Standby Site"
Section A.1.5, "Standby Site Middle-tier Installation Uses Wrong Hostname"
Section A.1.6, "Failure of Farm Verification Operation with Standby Farm"
Section A.1.7, "Sync Farm Operation Returns Error Message"
Section A.1.8, "On Windows Systems Use of asgctl startup Command May Fail If the PATH Environment Variable Has Exceeded 1024 Characters"

A.1.1 Standby Site Not Synchronized

In the OracleAS Disaster Recovery standby site, you may find that the site's OracleAS Metadata Repository is not synchronized with the OracleAS Metadata Repository in the primary site.

Problem

The OracleAS Disaster Recovery solution requires manual configuration and shipping of data files from the primary site to the standby site. Also, the data files (archived database log files) are not applied automatically in the standby site, that is, OracleAS Disaster Recovery does not use managed recovery in Oracle Data Guard.

Solution

The archive log files have to be applied manually. The steps to perform this task is found in Chapter 5, "OracleAS Disaster Recovery".

A.1.2 Failure to Bring Up Standby Instances After Failover or Switchover

Standby instances are not started after a failover or switchover operation.

Problem

IP addresses are used in instance configuration. OracleAS Disaster Recovery setup does not require identical IP addresses in peer instances between the production and standby site. OracleAS Disaster Recovery synchronization does not reconcile IP address differences between the production and standby sites. Thus, if you use explicit IP address xxx.xx.xxx.xx in your configuration, the standby configuration after synchronization will not work.

Solution

Avoid using explicit IP addresses. For example, in OracleAS Web Cache and Oracle HTTP Server configurations, use ANY or host names instead of IP addresses as listening addresses

A.1.3 Switchover Operation Fails At the Step dcmctl resyncInstance -force -script

The OracleAS Disaster Recovery asgctl switchover operation requires that the value of the TMP variable be defined the same in the opmn.xml file on both the primary and standby sites.

Problem

OracleAS Disaster Recovery switchover fails at the step dmctl resyncInstance -force -script and displays a message that a directory could not be found.

Solution

During a switchover operation, the opmn.xml file is copied from the primary site to the standby site. For this reason, the value of the TMP variable must be defined the same in the opmn.xml file on both primary and standby sites; otherwise, the switchover operation will fail. Make sure the TMP variable is defined identically in the opmn.xml files and resolves to the same directory structure on both sites before attempting to perform an asgctl switchover operation.

For example, the following code snippets for a Windows and UNIX environment show a sample definition of the TMP variable.

Example in Windows Environment: 
------------------------------- 
.
.
.
<ias-instance id="infraprod.iasha28.us.oracle.com"> 
 <environment> 
 <variable id="TMP" value="C:\DOCUME~1\ntregres\LOCALS~1\Temp"/> 
 </environment> 
.
.
.
Example in Unix Environment: 
---------------------------- 
.
.
.
<ias-instance id="infraprod.iasha28.us.oracle.com"> 
 <environment> 
 <variable id="TMP" value="/tmp"/> 
 </environment> 
.
.
.

A workaround to this problem is to change the value of the TMP variable in the opmn.xml file on the primary site, perform a dcmctl update config operation, then perform the asgctl switchover operation. This approach saves you having to reinstall the mid-tiers to make use of an altered TMP variable.

A.1.4 Unable to Start Standalone OracleAS Web Cache Installations at the Standby Site

OracleAS Web Cache cannot be started at the standby site possibly due to misconfigured standalone OracleAS Web Cache after failover or switchover.

Problem

OracleAS Disaster Recovery synchronization does not synchronize standalone OracleAS Web Cache installations.

Solution

Use the standard Oracle Application Server full CD image to install the OracleAS Web Cache component

A.1.5 Standby Site Middle-tier Installation Uses Wrong Hostname

A middle-tier installation in the standby site uses the wrong hostname even after the machine's physical hostname is changed.

Problem

Besides modifying the physical hostname, you also need to put it as the first entry in /etc/hosts file. Failure to do the latter will cause the installer to use the wrong hostname.

Solution

Put the physical hostname as the first entry in the /etc/hosts file. See Section 5.2.2, "Configuring Hostname Resolution" for more information.

A.1.6 Failure of Farm Verification Operation with Standby Farm

When performing a verify farm with standby farm operation, the operation fails with an error message indicating that the middle-tier machine instance cannot be found and that the standby farm is not symmetrical with the production farm.

Problem

The verify farm with standby farm operation is trying to verify that the production and standby farms are symmetrical to one another, that they are consistent, and conform to the requirements for disaster recovery.

The verify operation is failing because it sees the middle-tier instance as mid_tier.<hostname> and not as mid_tier.<physical_hostname>. You might suspect that this is a problem with the environmental variable _CLUSTER_NETWORK_NAME_, which is set during installation. However, in this case, it is not because a check of the _CLUSTER_NETWORK_NAME_ environmental variable setting finds this entry to be correct. However, a check of the contents of the /etc/hosts file, indicates that the entries for the middle tier in question are incorrect. That is, all middle-tier installations take the hostname from the second column of the /etc/hosts file.

For example, assume the following scenario:

Two environments are used: examp1 and examp2
OracleAS Infrastructure (Oracle Identity Management and OracleAS Metadata Repository) is first installed on examp1 and examp2 as host infra
OracleAS middle-tier (OracleAS Portal and OracleAS Wireless) is then installed on examp1 and examp2 as host node1
Basically, these are two installations (OracleAS Infrastructure and OracleAS middle-tier) on a single node
Updated the latest duf.jar and backup_restore files on all four Oracle homes
Started OracleAS Guard (asgctl) on all four Oracle homes (OracleAS Infrastructure and OracleAS middle-tier on two nodes)
Performed asgctl operations: connect asg, set primary, dump farm
Performed asgctl verify farm with standby farm operation, but it fails because it sees the instance as mid-tier.examp1 and not as mid_tier.node1.us.oracle.com

A check of the /etc/hosts file shows the following entry:

123.45.67.890 examp1 node1.us.oracle.com node1 infra

Then ias.properties and farms shows the following and the verify operation is failing:

IASname=midtier_inst.examp1

However, the /etc/hosts file should actually be the following:

123.45.67.890 node1.us.oracle.com node1 infra

Then ias.properties and farms shows the following and the verify operation succeeds:

IASname=midtier_inst.node1.us.oracle.com

Solution

Check and change the second column entry in your /etc/hosts file to match the hostname of the middle-tier node in question as described in the previous explanation.

A.1.7 Sync Farm Operation Returns Error Message

A sync farm to operation returns the error message: "Cannot Connect to asdb"

Problem

Occasionally, an administrator may forget to set the primary database using the asgctl command line utility in performing an operation that requires that the asdb database connection be established prior to an operation. The following example shows this scenario for a sync farm to operation:

ASGCTL> connect asg hsunnab13 ias_admin/iastest2
Successfully connected to hsunnab13:7890
ASGCTL>  
.
.
.
<Other asgctl operations may follow, such as verify farm, dump farm, 
<and show operation history, and so forth that do not require the connection
<to the asdb database to be established or a time span may elapse of no activity
<and the administrator may miss performing this vital command.
.
.
.
ASGCTL> sync farm to usunnaa11
prodinfra(asr1012): Syncronizing each instance in the farm to standby farm
prodinfra: -->ASG_ORACLE-300: ORA-01031: insufficient privileges
prodinfra: -->ASG_DUF-3700: Failed in SQL*Plus executing SQL statement:  connect null/******@asdb.us.oracle.com as sysdba;.
prodinfra: -->ASG_DUF-3502: Failed to connect to database asdb.us.oracle.com.
prodinfra: -->ASG_DUF-3504: Failed to start database asdb.us.oracle.com.
prodinfra: -->ASG_DUF-3027: Error while executing Syncronizing each instance in the farm to standby farm at step - init step.

Solution

Perform the asgctl set primary database command. This command sets the connection parameters required to open the asdb database in order to perform the sync farm to operation. Note that the set primary database command must also precede the instantiate farm to command and switchover farm to command if the primary database has not been specified in the current connection session.

A.1.8 On Windows Systems Use of asgctl startup Command May Fail If the PATH Environment Variable Has Exceeded 1024 Characters

On Windows systems, if your system PATH environment variable has exceeded the 1024 character limit because you have many OracleAS instances installed or many third party software installations, or both on your system, the asgctl startup command may fail because you are starting the OracleAS Guard server outside of OPMN and the system cannot resolve the directory path.

Problem

Occasionally, on Windows systems with many installations, OracleAS instances or third party software, or both, the asgctl startup command, which is run outside of OPMN, may return a popup error stating it could not find a dynamic link library for a particular file, orawsec9.dll, followed by a DufException. For example:

C:\product\10.1.3\OC4J_1\dsa\bin> asgctl startup
<<Popup Error:>>
The dynamic link library *orawsec9.dll* could not be found.
<<The exception:>>
oracle.duf.DufException
        at oracle.duf.DufOsBase.constructInstance(DufOsBase.java:1331)
        at oracle.duf.DufOsBase.getDufOs(DufOsBase.java:122)
        at 
oracle.duf.DufHomeMgr.getCurrentHomePath(DufHomeMgr.java:582)
        at oracle.duf.dufclient.DufClient.main(DufClient.java:132)
stado42: -->ASG_SYSTEM-100: oracle.duf.DufException
-----------------------------------------------------------------------------

However, this dll does exist in the ORACLE_HOME\bin directory.

This error is not seen in OracleAS Guard standalone kit because the file orawsec9.dll exists in the ORACLE_HOME\dsa\bin folder.

Solution

The workaround is to either manually edit the system PATH variable with the required path information or manually override the PATH in the command prompt by specifying the relevant %PATH% variables. For example:

C:\set PATH=C:\product\10.1.3\OracleAS_OC4J_2\bin;
C:\product\10.1.3\OracleAS_OHS1\jre\1.4.2\bin\client;
C:\product\10.1.3\OracleAS_OHS1\jre\1.4.2\bin;
C:\product\10.1.3\OracleAS_OHS1\bin;C:\product\10.1.3\OC4J_1\bin

C:\product\10.1.3\OC4J_1\dsa\bin> asgctl startup

A.2 Troubleshooting Middle-Tier Components

This section describes common problems and solutions for middle-tier components in high availability configurations. It contains the following topics:

Section A.2.1, "Using Multiple NICs with OracleAS Cluster (OC4J-EJB)"
Section A.2.2, "Performance Is Slow When Using the "opmn:" URL Prefix"

A.2.1 Using Multiple NICs with OracleAS Cluster (OC4J-EJB)

Problem

If you are running OracleAS Cluster (OC4J-EJB) on computers with two NICs (network interface cards) and you are using one NIC for connecting to the network and the second NIC for connecting to the other node in the cluster, multicast messages may not be sent or received correctly. This means that session information does not get replicated between the nodes in the cluster.

Figure A-1 OracleAS Cluster (OC4J-EJB) Running on Computers with Two NICs

Description of "Figure A-1 OracleAS Cluster (OC4J-EJB) Running on Computers with Two NICs"

Solution

You need to start up the OC4J instances by setting the oc4j.multicast.bindInterface parameter to the name or IP address of the other NIC on the node.

For example, using the values shown in Figure A-1, you would start up the OC4J instances with these parameters:

On node 1, configure the OC4J instance to start with up with this parameter:

-Doc4j.multicast.bindInterface=123.45.67.21

On node 2, configure the OC4J instance to start with up with this parameter:

-Doc4j.multicast.bindInterface=123.45.67.22

You specify this parameter and its value in the "Java Options" field in the "Command Line Options" section in the Server Properties page in the Application Server Control Console (Figure A-2).

Figure A-2 Server Properties Page in Application Server Control Console

Description of "Figure A-2 Server Properties Page in Application Server Control Console"

A.2.2 Performance Is Slow When Using the "opmn:" URL Prefix

Problem

If you have applications that use the "opmn:" prefix in their Context.PROVIDER_URL property, you may experience slow performance in the InitialContext method.

The following sample code sets the PROVIDER_URL to a URL with an opmn: prefix.

Hashtable env = new Hashtable();
env.put(Context.PROVIDER_URL, "opmn:ormi://hostname:port/cmpapp");
// ... set other properties ...
Context context = new InitialContext(env);

If the host specified in PROVIDER_URL is down, the application has to make a network connection to OPMN to locate another host. Going through the network to OPMN takes time.

Solution

To avoid making another network connection to OPMN to get another host, set the oracle.j2ee.naming.cache.timeout property so that the values returned from OPMN the first time are cached, and the application can use the values in the cache.

The following sample code sets the oracle.j2ee.naming.cache.timeout property.

Hashtable env = new Hashtable();
env.put(Context.PROVIDER_URL, "opmn:ormi://hostname:port/cmpapp");

// set the cache value
env.put("oracle.j2ee.naming.cache.timeout", "30");

// ... set other properties ...

Context context = new InitialContext(env);

Table A-1 shows valid values for the oracle.j2ee.naming.cache.timeout property:

Table A-1 Values for the oracle.j2ee.naming.cache.timeout Property

Value Meaning

Value	Meaning
`-1`	No caching.
`0`	Cache only once, without any refreshing.
Greater than `0`	Number of seconds after which the cache can be refreshed. Note that this is not automatic; the refresh occurs only when you invoke "`new` `InitialContext()`" again. If the property is not set, the default value is 60.

-1

No caching.

0

Cache only once, without any refreshing.

Greater than 0

Number of seconds after which the cache can be refreshed. Note that this is not automatic; the refresh occurs only when you invoke "new InitialContext()" again.

If the property is not set, the default value is 60.

With the property set, you will still see some delay on the first "new InitialContext()" call, but subsequent calls should be faster because they are retrieving data from the cache instead of making a network connection to OPMN.

Note that for optimal performance, you should also set Dedicated.Connection to either YES or DEFAULT, and set Dedicated.RMIcontext to FALSE.

A.3 Need More Help?

In case the information in the previous section is not sufficient, you can find more solutions on Oracle MetaLink, http://metalink.oracle.com. If you do not find a solution for your problem, log a service request.