6 Oracle Fusion Middleware High Availability and Enterprise Deployment

This chapter describes issues associated with Oracle Fusion Middleware high availability and enterprise deployment. It includes the following topics:

6.1 General Issues and Workarounds

This section describes general issue and workarounds. It includes the following topic:

6.1.1 Secure Resources in Application Tier

It is highly recommended that the application tier in the SOA Enterprise Deployment topology and the WebCenter Enterprise Deployment topology is protected against anonymous RMI connections. To prevent RMI access to the middle tier from outside the subset configured, follow the steps in "Configure connection filtering" in the Oracle WebLogic Server Administration Console Online Help. Execute all of the steps, except as noted in the following:

  1. Do not execute the substep for configuring the default connection filter. Execute the substep for configuring a custom connection filter.

  2. In the Connection Filter Rules field, add the rules that will allow all protocol access to servers from the middle tier subnet while allowing only http(s) access from outside the subnet, as shown in the following example:

    nnn.nnn.0.0/nnn.nnn.0.0  * * allow 
    0.0.0.0/0 * * allow t3 t3s 
    

6.1.2 mod_wl Not Supported for OHS Routing to Managed Server Cluster

Oracle Fusion Middleware supports only mod_wls_ohs and does not support mod_wl for Oracle HTTP Server routing to a cluster of managed servers.

6.1.3 Only Documented Procedures Supported

For Oracle Fusion Middleware high availability deployments, Oracle strongly recommends following only the configuration procedures documented in the Oracle Fusion Middleware High Availability Guide and the Oracle Fusion Middleware Enterprise Deployment Guides.

6.1.4 SOA Composer Generates Error During Failover

During failover, if you are in a SOA Composer dialog box and the connected server is down, you will receive an error, such as Target Unreachable, 'messageData' returned null.

To continue working in the SOA Composer, open a new browser window and navigate to the SOA Composer.

6.1.5 Accessing Web Services Policies Page in Cold Failover Environment

In a Cold Failover Cluster (CFC) environment, the following exception is displayed when Web Services policies page is accessed in Fusion Middleware Control:

Unable to connect to Oracle WSM Policy Manager.
Cannot locate policy manager query/update service. Policy manager service
look up did not find a valid service.

To avoid this, implement one the following options:

  • Create virtual hostname aliased SSL certificate and add to the key store.

  • Add "-Dweblogic.security.SSL.ignoreHostnameVerification=true" to the JAVA_OPTIONS parameter in the startWeblogic.sh or startWeblogic.cmd files

6.1.6 Considerations for Oracle Identity Federation HA in SSL mode

In a high availability environment with two (or more) Oracle Identity Federation servers mirroring one another and a load balancer at the front-end, there are two ways to set up SSL:

  • Configure SSL on the load balancer, so that the SSL connection is between the user and the load balancer. In that case, the keystore/certificate used by the load balancer has a CN referencing the address of the load balancer.

    The communication between the load balancer and the WLS/Oracle Identity Federation can be clear or SSL (and in the latter case, Oracle WebLogic Server can use any keystore/certificates, as long as these are trusted by the load balancer).

  • SSL is configured on the Oracle Identity Federation servers, so that the SSL connection is between the user and the Oracle Identity Federation server. In this case, the CN of the keystore/certificate from the Oracle WebLogic Server/Oracle Identity Federation installation needs to reference the address of the load balancer, as the user will connect using the hostname of the load balancer, and the Certificate CN needs to match the load balancer's address.

    In short, the keystore/certificate of the SSL endpoint connected to the user (load balancer or Oracle WebLogic Server/Oracle Identity Federation) needs to have its CN set to the hostname of the load balancer, since it is the address that the user will use to connect to Oracle Identity Federation.

6.2 Configuration Issues and Workarounds

This section describes configuration issues and their workarounds. It includes the following topics:

6.2.1 jca.retry.count Doubled in a Clustered Environment

In a clustered environment, each node maintains its own in-memory Hasmap for inbound retry. The jca.retry.count property is specified as 3 for the inbound retry feature. However, each node tries three times. As a result, the total retry count becomes 6 if the clustered environment has two nodes.

6.2.2 Cluster Time Zones Must Be the Same

All the machines in a cluster must be in the same time zone. WAN clusters are not supported by Oracle Fusion Middleware high availability. Even machines in the same time zone may have issues when started by command line. Oracle recommends using Node Manager to start the servers.

6.2.3 WebLogic Server Restart after Abrupt Machine Failure

If Oracle WebLogic Server does not restart after abrupt machine failure when JMS messages and transaction logs are stored on NFS mounted directory, the following errors may appear in the server log files:

<MMM dd, yyyy hh:mm:ss a z> <Error> <Store> <BEA-280061> <The persistent 
store "_WLS_server_soa1" could not be deployed: 
weblogic.store.PersistentStoreException: java.io.IOException: 
[Store:280021]There was an error while opening the file store file 
"_WLS_SERVER_SOA1000000.DAT" 
weblogic.store.PersistentStoreException: java.io.IOException: 
[Store:280021]There was an error while opening the file store file 
"_WLS_SERVER_SOA1000000.DAT" 
        at weblogic.store.io.file.Heap.open(Heap.java:168) 
        at weblogic.store.io.file.FileStoreIO.open(FileStoreIO.java:88)

If an of abrupt machine failure occurs, WebLogic Server restart or whole server migration may fail if the transaction logs or JMS persistence store directory is mounted using NFS. WebLogic Server maintains locks on files used for storing JMS data and transaction logs to protect from potential data corruption if two instances of the same WebLogic Server are accidently started. NFS protocol is stateless, and the storage device does not become aware of machine failure, therefore, the locks are not released by the storage device. As a result, after abrupt machine failure, followed by a restart, any subsequent attempt by WebLogic Server to acquire locks on the previously locked files may fail. Refer to your storage vendor documentation for additional information on the locking of files stored in NFS mounted directories on the storage device.

Use one of the following two solutions to unlock the logs and data files.

Solution 1

Manually unlock the logs and JMS data files and start the servers by creating a copy of the locked persistence store file and using the copy for subsequent operations. To create a copy of the locked persistence store file, rename the file, and then copy it back to its original name. The following sample steps assume that transaction logs are stored in the /shared/tlogs directory and JMS data is stored in the /shared/jms directory.

cd /shared/tlogs
mv _WLS_SOA_SERVER1000000.DAT _WLS_SOA_SERVER1000000.DAT.old
cp _WLS_SOA_SERVER1000000.DAT.old _WLS_SOA_SERVER1000000.DAT
cd /shared/jms
mv SOAJMSFILESTORE_AUTO_1000000.DAT SOAJMSFILESTORE_AUTO_1000000.DAT.old
cp SOAJMSFILESTORE_AUTO_1000000.DAT.old SOAJMSFILESTORE_AUTO_1000000.DAT
mv UMSJMSFILESTORE_AUTO_1000000.DAT UMSJMSFILESTORE_AUTO_1000000.DAT.old
cp UMSJMSFILESTORE_AUTO_1000000.DAT.old UMSJMSFILESTORE_AUTO_1000000.DAT

With this solution, the WebLogic file locking mechanism continues to provide protection from any accidental data corruption if multiple instances of the same servers were accidently started. However, the servers must be restarted manually after abrupt machine failures. File stores will create multiple consecutively numbered .DAT files when they are used to store large amounts of data. All files may need to be copied and renamed when this occurs.

Solution 2

Disable WebLogic file locking by disabling the native I/O wlfileio2 driver. The following sample steps move the shared object for the driver to a backup location, effectively removing it.

cd WL_HOME/server/native/platform/cpu_arch
mv libwlfileio2.so /shared/backup

With this solution, since the WebLogic locking is disabled, automated server restarts and failovers succeed. In addition, this may result in performance degradations. Be very cautious when using this solution. Always configure the database based leasing option, which enforces additional locking mechanism using database tables, and prevents automated restart of more than one instance of same WebLogic Server. Additional procedural precautions must be implemented to avoid any human error and ensure that one and only one instance of a server is manually started at any given point of time. Similarly, extra precautions must be taken to ensure that no two domains have a store with the same name that references the same directory.

6.2.4 Cookie Persistence Setting on Load Balancer May Result in Intermittent Timeouts in accessing Portal on Windows platforms

Cookie Persistence on the load balancer is not required for an Oracle Portal active-active setup. Any inadvertent setting of cookie Persistence to 'active cookie insert' on certain hardware load balancers for Portal deployments on Windows results in intermittent timeouts while accessing Oracle Portal.

6.2.5 Fusion Middleware Control May Display Incorrect Status

In some instances, Oracle WebLogic Fusion Middleware Control may display the incorrect status of a component immediately after the component has been restarted or failed over.

6.2.6 Accumulated BPEL Instances Cause Performance Decrease

In a scaled out clustered environment, if a large number of BPEL instances are accumulated in the database, it causes the database's performance to decrease, and the following error is generated: MANY THREADS STUCK FOR 600+ SECONDS.

To avoid this error, remove old BPEL instances from the database.

6.2.7 Extra Message Enqueue when One a Cluster Server is Brought Down and Back Up

In a non-XA environment, MQSeries Adapters do not guarantee the only once delivery of the messages from inbound adapters to the endpoint in case of local transaction. In this scenario, if an inbound message is published to the endpoint, and before committing the transaction, the SOA server is brought down, inbound message are rolled back and the same message is again dequeued and published to the endpoint. This creates an extra message in outbound queue.

In an XA environment, MQ Messages are actually not lost but held by Queue Manager due to an inconsistent state. To retrieve the held messages, restart the Queue Manager.

6.2.8 Duplicate Unrecoverable Human Workflow Instance Created with Oracle RAC Failover

As soon as Oracle Human Workflow commits its transaction, the control passes back to BPEL, which almost instantaneously commits its transaction. Between this window, if the Oracle RAC instance goes down, on failover, the message is retried and can cause duplicate tasks. The duplicate task can show up in two ways - either a duplicate task appears in worklistapp, or an unrecoverable BPEL instance is created. This BPEL instance appears in BPEL Recovery. It is not possible to recover this BPEL instance as consumer, because this task has already completed.

6.2.9 Configuration Files Missing after Planned Administration Server Node Shutdown or Reboot

The following information refers to Chapter 10, "Managing the Topology," of the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle SOA Suite.

When performing a planned stop of the Administration Server's node (rebooting or shutting down the Admin Server's machine), it may occur that the OS NFS service is disabled before the Administration Server itself is stopped. This (depending on the configuration of services at the OS level) can cause the detection of missing files in the Administration Server's domain directory and trigger their deletion in the domain directories in other nodes. This can result in the framework deleting some of the files under domain_dir/fmwconfig/. This behavior is typically not observed for unplanned downtimes, such as machine panic, power loss, or machine crash. To avoid this behavior, shutdown the Administration Server before performing reboots or, alternatively, use the appropriate OS configuration to set the order of services in such a way that NFS service is disabled with later precedence than the Administration Server's process. See your OS administration documentation for the corresponding required configuration for the services' order.

6.2.10 No High Availability Support for SOA B2B TCP/IP

High availability failover support is not available for SOA B2B TCP/IP protocol. This effects primarily deployments using HL7 over MLLP. For inbound communication in a clustered environment, all B2B servers are active and the address exposed for inbound traffic is a load balancer virtual server. Also, in an outage scenario where an active managed server is no longer available, the persistent TCP/IP connection is lost and the client is expected to reestablish the connection.

6.2.11 WebLogic Administration Server on Machines with Multiple Network Cards

When installing Oracle WebLogic Server on a server with multiple network cards, always specify a Listen Address for the Administration Server. The address used should be the DNS Name/IP Address of the network card you wish to use for Administration Server communication.

To set the Listen Address:

  1. In the Oracle WebLogic Server Administration Console, select Environment, and then Servers from the domain structure menu.

  2. Click the Administration Server.

  3. Click Lock and Edit from the Change Center to allow editing.

  4. Enter a Listen Address.

  5. Click Save.

  6. Click Activate Changes in the Change Center.

6.2.12 Additional Parameters for SOA and Oracle RAC Data Sources

In some deployments of SOA with Oracle RAC, you may need to set additional parameters in addition to the out of the box configuration of the individual data sources in an Oracle RAC configuration. The additional parameters are:

  1. Add property oracle.jdbc.ReadTimeout=300000 (300000 milliseconds) for each data source.

    The actual value of the ReadTimeout parameter may differ based on additional considerations.

  2. If the network is not reliable, then it is difficult for a client to detect the frequent disconnections when the server is abruptly disconnected. By default, a client running on Linux takes 7200 seconds (2 hours) to sense the abrupt disconnections. This value is equal to the value of the tcp_keepalive_time property. To configure the application to detect the disconnections faster, set the value of the tcp_keepalive_time, tcp_keepalive_interval, and tcp_keepalive_probes properties to a lower value at the operating system level.

    Note:

    Setting a low value for the tcp_keepalive_interval property leads to frequent probe packets on the network, which can make the system slower. Therefore, the value of this property should be set appropriately based on system requirements.

For example, set tcp_keepalive_time=600 at the system running the WebLogic Server managed server.

Also, you must specify the ENABLE=BROKEN parameter in the DESCRIPTION clause in the connection descriptor. For example:

dbc:oracle:thin:@(DESCRIPTION=(enable=broken)(ADDRESS_LIST=(ADDRESS=(PRO
TOCOL=TCP)(HOST=node1-vip.mycompany.com)(PORT=1521)))(CONNECT_DATA=(SERVICE_
NAME=orcl.us.oracle.com)(INSTANCE_NAME=orcl1)))

As a result, the data source configuration appears as follows:

<url>jdbc:oracle:thin:@(DESCRIPTION=(enable=broken)(ADDRESS_LIST=(ADDRESS=(PRO
TOCOL=TCP)(HOST=node1-vip.us.oracle.com)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=orcl.us.oracle.com)(INSTANCE_NAME=orcl1)))</url>
    <driver-name>oracle.jdbc.xa.client.OracleXADataSource</driver-name>
    <properties>
      <property>
        <name>oracle.jdbc.ReadTimeout</name>
        <value>300000</value>
      </property>
      <property>
        <name>user</name>
        <value>jmsuser</value>
      </property>
      <property>
        <name>oracle.net.CONNECT_TIMEOUT</name>
        <value>10000</value>
      </property>
    </properties>

6.2.13 Message Sequencing and MLLP Not Supported in Oracle B2B HA Environments

Message sequencing and MLLP are not supported in oracle B2B high availability (HA) environments.

6.2.14 Access Control Exception After Expanding Cluster Against an Extended Domain

The Oracle Identity Federation server has been observed to fail due to access control exceptions under the following circumstances:

  1. You create a domain with no Identity Management components on host1.

  2. On host2, you extend that domain in clustered mode, select all Identity Management components, and select Create Schema.

  3. On host1, you expand the cluster and select all components.

Due to a bug, the file DOMAIN_HOME/config/fmwconfig system-jazn-data.xml on host1 is overwritten so that the <grant> element is removed, which causes the access control exceptions when the Oracle Identity Federation server is started.

To restore the <grant> element, you use the WLST grantPermission command.

On Linux, enter the following three commands at the bash prompt. Type each command on one line.

When typing the commands, replace ORACLE_COMMON_HOME with the path to the Oracle Common Home folder, located in the Middleware Home. When prompted for information to connect to WebLogic, enter the WLS Administrator Credentials and the location of the WebLogic Administration Server.

ORACLE_COMMON_HOME/common/bin/wlst.sh 
ORACLE_COMMON_HOME/modules/oracle.jps_11.1.1/common/wlstscripts/grantPermissi
on.py -codeBaseURL 
file:\${domain.home}/servers/\${weblogic.Name}/tmp/_WL_user/OIF_11.1.1.2.0/- 
-permClass oracle.security.jps.service.credstore.CredentialAccessPermission 
-permTarget context=SYSTEM,mapName=OIF,keyName=* -permActions read
 
ORACLE_COMMON_HOME/common/bin/wlst.sh
ORACLE_COMMON_HOME/modules/oracle.jps_11.1.1/common/wlstscripts/grantPermissi
on.py -codeBaseURL
file:\${domain.home}/servers/\${weblogic.Name}/tmp/_WL_user/OIF_11.1.1.2.0/-
-permClass oracle.security.jps.service.credstore.CredentialAccessPermission
-permTarget credstoressp.credstore -permActions read
 
ORACLE_COMMON_HOME/common/bin/wlst.sh
ORACLE_COMMON_HOME/modules/oracle.jps_11.1.1/common/wlstscripts/grantPermissi
on.py -codeBaseURL
file:\${domain.home}/servers/\${weblogic.Name}/tmp/_WL_user/OIF_11.1.1.2.0/-
-permClass oracle.security.jps.service.credstore.CredentialAccessPermission
-permTarget credstoressp.credstore.OIF.* -permActions read 

On Windows, enter the following three commands at the command prompt. Type each command on one line.

When typing the commands, replace ORACLE_COMMON_HOME with the path to the Oracle Common Home folder, located in the Middleware Home. When prompted for information to connect to WebLogic, enter the WLS Administrator Credentials and the location of the WebLogic Administration Server.

ORACLE_COMMON_HOME\common\bin\wlst.cmd
ORACLE_COMMON_HOME\modules\oracle.jps_11.1.1\common\wlstscripts\grantPermiss
ion.py -codeBaseURL
file:${domain.home}/servers/\${weblogic.Name}/tmp/_WL_user/OIF_11.1.1.2.0/-
-permClass oracle.security.jps.service.credstore.CredentialAccessPermission
-permTarget context=SYSTEM,mapName=OIF,keyName=* -permActions read
 
ORACLE_COMMON_HOME\common\bin\wlst.cmd
ORACLE_COMMON_HOME\modules\oracle.jps_11.1.1\common\wlstscripts\grantPermiss
ion.py -codeBaseURL
file:${domain.home}/servers/${weblogic.Name}/tmp/_WL_user/OIF_11.1.1.2.0/-
-permClass oracle.security.jps.service.credstore.CredentialAccessPermission
-permTarget credstoressp.credstore -permActions read
 
ORACLE_COMMON_HOME\common\bin\wlst.cmd
ORACLE_COMMON_HOME\modules\oracle.jps_11.1.1\common\wlstscripts\grantPermiss
ion.py -codeBaseURL
file:${domain.home}/servers/${weblogic.Name}/tmp/_WL_user/OIF_11.1.1.2.0/-
-permClass oracle.security.jps.service.credstore.CredentialAccessPermission
-permTarget credstoressp.credstore.OIF.* -permActions read 

6.3 Documentation Errata

This section describes documentation errata. It includes the following topic:

6.3.1 Documentation Errata for the Fusion Middleware High Availability Guide

This section contains Documentation Errata for Oracle Fusion Middleware High Availability Guide.

6.3.1.1 Latest Requirements and Certification Information

Several manuals in the Oracle Fusion Middleware 11g documentation set have information on Oracle Fusion Middleware system requirements, prerequisites, specifications, and certification information.

6.3.1.2 Incorrect Path Given for CFC Procedure

In step 12, of section 10.2.3.7.2, "Transforming Oracle Reports for Cold Failover Clusters," in the Oracle Fusion Middleware High Availability Guide, the following directory path is incorrect:

DOMAIN_HOME/config/fmwconfig/servers/WLS_DISCO/applications/discoverer_11.1.1.2.0/configuration

The correct directory path is as follows:

DOMAIN_HOME/config/fmwconfig/servers/WLS_REPORTS/applications/reports_11.1.1.2.0/configuration

6.3.2 Documentation Errata for the Fusion Middleware Enterprise Deployment Guide for Oracle WebCenter

This section contains Documentation Errata for Oracle Fusion Middleware Enterprise Deployment Guide for Oracle WebCenter.

6.3.2.1 Link to Section 8.1.3 is Missing

In Section 8.1, "Configuring the Discussion Forum Connection" of the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle WebCenter, the link to section 8.1.3, "Creating a Discussions Server Connection for WebCenter From EM" is missing.

6.3.2.2 Additional Information for Discussions Forum Mulitcast to Unicast Conversion

In section 6.14, "Converting Discussions Forum from Multicast to Unicast" of the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle WebCenter, the following information is missing from Step 3:

Step 3: Repeat steps 1 and 2 for WLS_Services2, swapping WCHost1 for WCHost2, and WCHost2 for WCHost1 as follows:

-Dtangosol.coherence.wka1=WCHost2  -Dtangosol.coherence.wka2=WCHost1
-Dtangosol.coherence.localhost=WCHost2 -Dtangosol.coherence.wka1.port=8089
-Dtangosol.coherence.wka2.port=8089
6.3.2.2.1 Additional Disscussion Connection Properties Explained in Administration Guide

For additional Discussions Server connection properties associated with the procedure in Section 8.1.3 "Creating a Discussions Server Connection for WebCenter From EM" of the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle WebCenter, refer to section 12.3.1, "Registering Discussions Servers Using Fusion Middleware Control," in the Oracle Fusion Middleware Administrator's Guide for Oracle WebCenter.