Skip Headers
Oracle® Fusion Middleware Release Notes
11g Release 1 (11.1.1) for HP-UX Itanium

Part Number E14773-06
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

6 Oracle Fusion Middleware High Availability and Enterprise Deployment

This chapter describes issues associated with Oracle Fusion Middleware high availability and enterprise deployment. It includes the following topics:

6.1 General Issues and Workarounds

This section describes general issue and workarounds. It includes the following topic:

6.1.1 Discoverer Managed Server Starts in Admin Mode on Unpacked Machine

If you use the pack and unpack commands for Managed Server for Oracle Portal, Oracle Forms, Oracle Reports, and Oracle Discoverer in a cluster, make sure to copy the applications from the first node to the second node (because these are externally staged applications). For details about using the pack and unpack commands, see Oracle WebLogic Server Creating Templates and Domains Using the Pack and Unpack Commands.

6.1.2 mod_wl Not Supported for OHS Routing to Managed Server Cluster

Oracle Fusion Middleware supports only mod_wls_ohs and does not support mod_wl for Oracle HTTP Server routing to a cluster of managed servers.

6.1.3 Only Documented Procedures Supported

For Oracle Fusion Middleware high availability deployments, Oracle strongly recommends following only the configuration procedures documented in the Oracle Fusion Middleware High Availability Guide and the Oracle Fusion Middleware Enterprise Deployment Guides.

6.2 Configuration Issues and Workarounds

This section describes configuration issues and their workarounds. It includes the following topics:

6.2.1 XEngine Not Installed on Second Node in a Clustered Environment

In a clustered environment, the XEngine does not get installed on the second node when the node is on another computer. This is because the XEngine extraction occurs only when you run the Configuration Wizard (which is not run automatically on the second node). The workaround is to perform the XEngine extraction manually in this case. After completing the XEngine extraction, you must restart the server.

6.2.2 jca.retry.count Doubled in a Clustered Environment

In a clustered environment, each node maintains its own in-memory Hasmap for inbound retry. The jca.retry.count property is specified as 3 for the inbound retry feature. However, each node tries three times. As a result, the total retry count becomes 6 if the clustered environment has two nodes.

6.2.3 Cluster Time Zones Must Be the Same

All the machines in a cluster must be in the same time zone. WAN clusters are not supported by Oracle Fusion Middleware high availability. Even machines in the same time zone may have issues when started by command line. Oracle recommends using Node Manager to start the servers.

6.2.4 WebLogic Server Restart after Abrupt Machine Failure

If Oracle WebLogic Server does not restart after abrupt machine failure when JMS messages and transaction logs are stored on NFS mounted directory, the following errors may appear in the server log files:

<MMM dd, yyyy hh:mm:ss a z> <Error> <Store> <BEA-280061> <The persistent 
store "_WLS_server_soa1" could not be deployed: 
weblogic.store.PersistentStoreException: java.io.IOException: 
[Store:280021]There was an error while opening the file store file 
"_WLS_SERVER_SOA1000000.DAT" 
weblogic.store.PersistentStoreException: java.io.IOException: 
[Store:280021]There was an error while opening the file store file 
"_WLS_SERVER_SOA1000000.DAT" 
        at weblogic.store.io.file.Heap.open(Heap.java:168) 
        at weblogic.store.io.file.FileStoreIO.open(FileStoreIO.java:88)

If an of abrupt machine failure occurs, WebLogic Server restart or whole server migration may fail if the transaction logs or JMS persistence store directory is mounted using NFS. WebLogic Server maintains locks on files used for storing JMS data and transaction logs to protect from potential data corruption if two instances of the same WebLogic Server are accidently started. NFS protocol is stateless, and the storage device does not become aware of machine failure, therefore, the locks are not released by the storage device. As a result, after abrupt machine failure, followed by a restart, any subsequent attempt by WebLogic Server to acquire locks on the previously locked files may fail. Refer to your storage vendor documentation for additional information on the locking of files stored in NFS mounted directories on the storage device.

Use one of the following two solutions to unlock the logs and data files.

Solution 1

Manually unlock the logs and JMS data files and start the servers by creating a copy of the locked persistence store file and using the copy for subsequent operations. To create a copy of the locked persistence store file, rename the file, and then copy it back to its original name. The following sample steps assume that transaction logs are stored in the /shared/tlogs directory and JMS data is stored in the /shared/jms directory.

cd /shared/tlogs
mv _WLS_SOA_SERVER1000000.DAT _WLS_SOA_SERVER1000000.DAT.old
cp _WLS_SOA_SERVER1000000.DAT.old _WLS_SOA_SERVER1000000.DAT
cd /shared/jms
mv SOAJMSFILESTORE_AUTO_1000000.DAT SOAJMSFILESTORE_AUTO_1000000.DAT.old
cp SOAJMSFILESTORE_AUTO_1000000.DAT.old SOAJMSFILESTORE_AUTO_1000000.DAT
mv UMSJMSFILESTORE_AUTO_1000000.DAT UMSJMSFILESTORE_AUTO_1000000.DAT.old
cp UMSJMSFILESTORE_AUTO_1000000.DAT.old UMSJMSFILESTORE_AUTO_1000000.DAT

With this solution, the WebLogic file locking mechanism continues to provide protection from any accidental data corruption if multiple instances of the same servers were accidently started. However, the servers must be restarted manually after abrupt machine failures. File stores will create multiple consecutively numbered .DAT files when they are used to store large amounts of data. All files may need to be copied and renamed when this occurs.

Solution 2

Disable WebLogic file locking by disabling the native I/O wlfileio2 driver. The following sample steps move the shared object for the driver to a backup location, effectively removing it.

cd WL_HOME/server/native/platform/cpu_arch
mv libwlfileio2.so /shared/backup

With this solution, since the WebLogic locking is disabled, automated server restarts and failovers succeed. In addition, this may result in performance degradations. Be very cautious when using this solution. Always configure the database based leasing option, which enforces additional locking mechanism using database tables, and prevents automated restart of more than one instance of same WebLogic Server. Additional procedural precautions must be implemented to avoid any human error and ensure that one and only one instance of a server is manually started at any given point of time. Similarly, extra precautions must be taken to ensure that no two domains have a store with the same name that references the same directory.

6.2.5 Port Translation with Oracle Portal Loopback

In a high availability Portal implementation, it is often required to configure the Parallel Page Engine to loopback requests through a load balancer. When configuring the load balancer for Portal Loopback ensure that it is not configured with Port Translation. For example:

The correct configuration: Load Balancer Listens for Requests on Port 7777 and Passes them onto Web Cache Port 7777.

The incorrect configuration: Load Balancer Listens for Requests on Port 8888 and Passes them onto Web Cache Port 7777.

6.2.6 Fusion Middleware Control May Display Incorrect Status

In some instances, Oracle WebLogic Fusion Middleware Control may display the incorrect status of a component immediately after the component has been restarted or failed over.

6.2.7 Accumulated BPEL Instances Cause Performance Decrease

In a scaled out clustered environment, if a large number of BPEL instances are accumulated in the database, it causes the database's performance to decrease, and the following error is generated: MANY THREADS STUCK FOR 600+ SECONDS.

To avoid this error, remove old PBEL instances from the database.

6.2.8 Extra Message Enqueue when One a Cluster Server is Brought Down and Back Up

In a non-XA environment, MQSeries Adapters do not guarantee the only once delivery of the messages from inbound adapters to the endpoint in case of local transaction. In this scenario, if an inbound message is published to the endpoint, and before committing the transaction, the SOA server is brought down, inbound message are rolled back and the same message is again dequeued and published to the endpoint. This creates an extra message in outbound queue.

In an XA environment, MQ Messages are actually not lost but held by Queue Manager due to an inconsistent state. To retrieve the held messages, restart the Queue Manager.

6.2.9 Duplicate Unrecoverable Human Workflow Instance Created with Oracle RAC Failover

As soon as Oracle Human Workflow commits its transaction, the control passes back to BPEL, which almost instantaneously commits its transaction. Between this window, if the Oracle RAC instance goes down, on failover, the message is retried and can cause duplicate tasks. The duplicate task can show up in two ways - either a duplicate task appears in worklistapp, or an unrecoverable BPEL instance is created. This BPEL instance appears in BPEL Recovery. It is not possible to recover this BPEL instance as consumer, because this task has already completed.

6.2.10 Configuration Files Missing after Planned Administration Server Node Shutdown or Reboot

The following information refers to Chapter 10, "Managing the Topology," of the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle SOA Suite.

When performing a planned stop of the Administration Server's node (rebooting or shutting down the Admin Server's machine), it may occur that the OS NFS service is disabled before the Administration Server itself is stopped. This (depending on the configuration of services at the OS level) can cause the detection of missing files in the Administration Server's domain directory and trigger their deletion in the domain directories in other nodes. This can result in the framework deleting some of the files under domain_dir/fmwconfig/. This behavior is typically not observed for unplanned downtimes, such as machine panic, power loss, or machine crash. To avoid this behavior, shutdown the Administration Server before performing reboots or, alternatively, use the appropriate OS configuration to set the order of services in such a way that NFS service is disabled with later precedence than the Administration Server's process. See your OS administration documentation for the corresponding required configuration for the services' order.

6.2.11 Load Balancer Issue when Two Nodes Each have a Managed Server

If the cluster configuration is made up of the following:

  • Unicast messaging for cluster communication

  • Clustered Servers running on different physical machines

  • No ListenAddress specified for the servers in the cluster

Be sure to do the following:

  • Define a custom network channel for cluster-broadcast protocol on each of the clustered servers. The channel must have the same name on each server.

  • Set its ListenAddress/port to one of the IP/Port of the machine where the server is running.

  • Set the Unicast Broadcast Channel name for the cluster to be the newly defined custom channel. This channel should be outbound-enabled.

6.2.12 No High Availability Support for SOA B2B TCP/IP

High availability failover support is not available for SOA B2B TCP/IP protocol. This effects primarily deployments using HL7 over MLLP. For inbound communication in a clustered environment, all B2B servers are active and the address exposed for inbound traffic is a load balancer virtual server. Also, in an outage scenario where an active managed server is no longer available, the persistent TCP/IP connection is lost and the client is expected to reestablish the connection.

6.2.13 404 Errors while Accessing a Composite's end point that uses OHS as Front End

The routing of requests from Oracle HTTP Server to composites' end points exposed in the WLS_SOA servers begins as soon as the WLS_SOA servers change to status "running." The soa-infra application may be unavailable regardless of whether the WLS_SOA server is running. Additionally, composite deployment and syncing across a cluster can take some time after the start of the WLS_SOA server. This may lead to Oracle HTTP Server starting to route to soa-infra context URL while the required composites are not yet available in a server that is being started (after a failover, server migration or simple restart in the node). Oracle recommends including the appropriate retry code in the clients when invoking the end points to overcome the possible 404 HTTP error codes. After the full composite syncing completes, the errors should stop.

6.3 Documentation Errata

This section describes documentation errata. It includes the following topic:

6.3.1 Documentation Errata for the Fusion Middleware High Availability Guide

This section contains Documentation Errata for Oracle Fusion Middleware High Availability Guide.

6.3.1.1 Latest Requirements and Certification Information

Several manuals in the Oracle Fusion Middleware 11g documentation set have information on Oracle Fusion Middleware system requirements, prerequisites, specifications, and certification information.

6.3.1.2 Synchronizing System Clocks in a Cluster Required

Section 5.11.1.5, "Synchronizing System Clocks," of the Oracle Fusion Middleware High Availability Guide, states that "Oracle recommends synchronizing system clocks on each of the cluster nodes for high availability SOA deployments."

Synchronizing system clocks in a cluster is not a recommendation; it is a mandatory requirement.

6.3.1.3 Incorrect References to NIP and NAP Protocols

Oracle Access Manager components use proprietary protocols called Oracle Access Protocol (OAP) and Oracle Identity Protocol (OIP) to communicate with each other.

Oracle Access Protocol (OAP) enables communication between Access System components (for example, Policy Manager, Access Manager, and WebGate) during user authentication and authorization. This protocol was formerly known as NetPoint Access Protocol (NAP) or COREid Access Protocol.

Oracle Identity Protocol (OIP) governs communications between Identity System components (for example, Identity Server, WebPass) and a Web server. This protocol was formerly known as NetPoint Identity Protocol (NIP) or COREid Identity Protocol.

In the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle Identity Management, Section 1.4.2 "Understanding the Application Tier" includes a reference to the NAP (Network Access Protocol) port, which should be a reference to the Oracle Access Protocol (OAP) port. This section also includes a reference to the NIP (Network Identity Protocol) port, which should be a reference to the Oracle Identity Protocol (OIP) port.

In the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle Identity Management, Section 1.4.3 "Understanding the Web Tier" includes a reference to the Network Access Protocol (NAP), which should be a reference to the Oracle Access Protocol (OAP).

In the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle Identity Management, Table 2-2 "Ports Used in the Oracle Identity Management Enterprise Deployment Topology" includes two references to the NAP protocol, which should be references to the OAP protocol. This table also includes two references to the NIP protocol, which should be references to the OIP protocol.

6.3.1.4 Procedure Missing from High Availability Guide Oracle Reports Configuration Section

In the Oracle Fusion Middleware High Availability Guide, before section "12.6.5.6.4 Restart WLS_REPORTS and WLS_REPORTS1," the following procedure for creating an Oracle Reports server cluster is missing:

By creating a Reports cluster with a database reports queue it is possible to link all of the Reports servers to the same queue. The benefit of this procedure is that when a server has spare capacity, it can take and execute the next report in the queue, thereby distributing the load. It also ensures that if a cluster member becomes unavailable, another Reports server can detect this and run any reports on which the failed server was working.

Create a Reports cluster by adding a cluster entry to the rwservlet.properties file on both APPHOST1 and APPHOST2.

Cluster APPHOST1

Edit the rwservlet.properties file located in the DOMAIN_HOME/user_projects/domains/ReportsDomain/servers/WLS_REPORTS/stage/reports/reports/configuration directory.

Add the following line:

<cluster clustername="cluster_reports" clusternodes="rep_wls_reports1_APPHOST2_reports2"/>

Note:

The value of clusternodes is the value which appears in the <server> tag in the rwservlet.properties file located on APPHOST2.

Note:

The clusternodes parameter should list all of the Reports servers in the cluster (comma separated) EXCEPT the local Reports server.

Cluster APPHOST2

Edit the rwservlet.properties file located in the DOMAIN_HOME/user_projects/domains/ReportsDomain/servers/WLS_REPORTS1/stage/reports/reports/configuration directory.

Add the following line:

<cluster clustername="cluster_reports" clusternodes="rep_wls_reports_APPHOST1_reports1"/>

Note:

The value of clusternodes is the value which appears in the <server> tag in the rwservlet.properties file located on APPHOST1.

Note:

The clusternodes parameter should list all of the Reports servers in the cluster (comma separated) EXCEPT the local Reports server.

6.3.2 Documentation Errata for the Fusion Middleware Enterprise Deployment Guides

This section contains Documentation Errata for Oracle Fusion Middleware Enterprise Deployment Guides.

6.3.2.1 Quartz Requires Synchronizing System Clocks in a Cluster

In Chapter 2, "Database and Environment Preconfiguration," of the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle SOA Suite, the following information is missing:

  • Quartz

    Oracle SOA Suite uses Quartz to maintain its jobs and schedules in the database. For the Quartz jobs to be run on different Oracle SOA nodes in a cluster, it is required that the system clocks on the cluster nodes be synchronized.

6.3.2.2 Changes Required in for Deploying FOD in and EDG SOA Topology

The following information is missing from section 10.2, "Deploying Composites and Artifacts in SOA Enterprise Deployment Topology" of the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle SOA Suite:

When deploying SOA Fusion Order Demo, the following additional steps are required in addition to the deployment steps provided in the FOD's README file).

  1. Change the nostage property to false in the build.xml file of the Web applications so that ear files are copied to each node. Edit the CreditCardAuthorization and OrderApprvalHumanTask build.xml files, located at FOD_dir\CreditCardAuthorization\bin and FOD_dir\OrderApprovalHumanTask\bin directories, and change the following field:

    <target name="deploy-application">
         <wldeploy action="deploy" name="${war.name}"
           source="${deploy.ear.source}" library="false"
           nostage="false"
           user="${wls.user}" password="${wls.password}"
           verbose="false" adminurl="${wls.url}"
           remote="true" upload="true"
           targets="${server.targets}" />
       </target>
    

    To:

    <target name="deploy-application">
         <wldeploy action="deploy" name="${war.name}"
           source="${deploy.ear.source}" library="false"
           nostage="true"
           user="${wls.user}" password="${wls.password}"
           verbose="false" adminurl="${wls.url}"
           remote="true" upload="true"
           targets="${server.targets}" />
       </target>
    
  2. Change the target for the Web applications so that deployments are targeted to the SOA Cluster and not to an individual server. Edit the build.properties file for FOD, located in the FOD_Dir/bin directory, and change the following field:

    # wls target server (for shiphome set to server_soa, for ADRS use AdminServer) 
    server.targets=SOA_Cluster (the SOA cluster name in your SOA EDG)
    
  3. Change the JMS seed templates so that instead of regular Destinations, Uniform Distributed Destinations are used and the JMS artifacts are targeted to the EDG JMS Modules. Edit the createJMSResources.seed file, located in the FOD_DIR\bin\templates directory, and change:

    # lookup the SOAJMSModule - it's a system resource
         jmsSOASystemResource = lookup("SOAJMSModule","JMSSystemResource")
    
         jmsResource = jmsSOASystemResource.getJMSResource()
        
         cfbean = jmsResource.lookupConnectionFactory('DemoSupplierTopicCF')
         if cfbean is None:
             print "Creating DemoSupplierTopicCF connection factory"
             demoConnectionFactory =
     jmsResource.createConnectionFactory('DemoSupplierTopicCF')
             demoConnectionFactory.setJNDIName('jms/DemoSupplierTopicCF')
             demoConnectionFactory.setSubDeploymentName('SOASubDeployment')
     .
         topicbean = jmsResource.lookupTopic('DemoSupplierTopic')
         if topicbean is None:
             print "Creating DemoSupplierTopic jms topic"
             demoJMSTopic = jmsResource.createTopic("DemoSupplierTopic")
             demoJMSTopic.setJNDIName('jms/DemoSupplierTopic')
             demoJMSTopic.setSubDeploymentName('SOASubDeployment')
    

    To:

    # lookup the SOAJMSModule - it's a system resource
         jmsSOASystemResource = lookup("SOAJMSModuleUDDs","JMSSystemResource")
    
         jmsResource = jmsSOASystemResource.getJMSResource()
        
         cfbean = jmsResource.lookupConnectionFactory('DemoSupplierTopicCF')
         if cfbean is None:
             print "Creating DemoSupplierTopicCF connection factory"
             demoConnectionFactory =  
     jmsResource.createConnectionFactory('DemoSupplierTopicCF')
             demoConnectionFactory.setJNDIName('jms/DemoSupplierTopicCF')
             demoConnectionFactory.setSubDeploymentName('SOAJMSSubDM')
     .
         topicbean = jmsResource.lookupTopic('DemoSupplierTopic')
         if topicbean is None:
             print "Creating DemoSupplierTopic jms topic"
             demoJMSTopic =
     jmsResource.createDistributedTopic("DemoSupplierTopic")
             demoJMSTopic.setJNDIName('jms/DemoSupplierTopic')
             demoJMSTopic.setSubDeploymentName('SOAJMSSubDM')
    

6.3.2.3 Configuration Changes Propagation Information Missing from SOA EDG

The following information is missing from Chapter 10, "Managing the Topology" of the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle SOA Suite.

Configuration Changes being applied to the SOA and BAM components in an EDG Topology:

If you are using Oracle SOA Suite in a clustered environment, any configuration property changes you make in Oracle Enterprise Manager on one node must be made on all nodes. Configuration properties are set in Oracle Enterprise Manager through the following options of the SOA Infrastructure men:

Administration > System MBean Browser SOA Administration > any property selections Services and References > Properties tab.

In addition, consider the following when making configuration changes to BAM Server in a BAM EDG Topology:

Since server migration is used, the BAM Server is moved to a different node's domain directory. It is necessary to pre-create the BAM Server configuration in the failover node. The BAM Server configuration files are located in the following directory:

ORACLE_BASE/admin/<domain_name>/mserver/<domain_name>/servers/<servername>/tmp/_WL_user/oracle-bam_11.1.1 /*/APP-INF/classes/config/

Where '*' represents a directory name randomly generated by Oracle WebLogic Server during deployment, for example, 3682yq.

In order to create the files in preparation for possible failovers, you can force a server migration and copy the files from the source node. For example, for BAM:

  1. Configure the driver for WLS_BAM1 in BAMHOST1.

  2. Force a failover of WLS_BAM1 to BAMHOST2. Verify the directory structure for the BAM Server in the failover node:

    cd ORACLE_BASE/admin/<domain_name>/mserver/
    <domain_name>/servers/<servername>/tmp/_WL_user/oracle-bam_11.1.1
    /*/APP-INF/classes/config/
    

    Where '*' represents a directory name randomly generated by Oracle WebLogic Server during deployment, for example, 3682yq.

  3. Do a remote copy of the BAM Server configuration file from BAMHOST1 to BAMHOST2:

    BAMHOST1> scp  
     ORACLE_BASE/admin/<domain_name>/mserver/<domain_name>
    /servers/<servername>/tmp
    /_WL_user/oracle-bam_11.1.1/*/APP-INF/classes/config/*  oracle@BAMHOST2:
     ORACLE_BASE/admin/<domain_name>/mserver/<domain_name>/servers/<servername>/tmp
    /_WL_user/oracle-bam_11.1.1/*/APP-INF/classes/config/
    

6.3.2.4 Converting Discussions Forum from Multicast to Unicast

The procedure for converting Discussions Forum from multicast to unicast is missing from Chapter 6, Configuring High Availability for Oracle ADF and WebCenter Applications, in the the Oracle Fusion Middleware High Availability Guide.

To convert Discussions Forum from multicast to unicast:

Step 1: Enable system properties in the Oracle Coherence configuration files

To override the default Oracle Coherence settings, set the relevant system properties in the coherence .xml files. For Discussions Forum, edit the tangosol-coherence-override.xml file. This file is part of the coherence.jar deployed with Discussions Forum. Make the following changes to this file:

  1. Extract the tangosol-coherence-override.xml file from coherence.jar in the WLS_Services deployment directory (jar xvf coherence.jar).

  2. Add the following lines to the file within the cluster-config element:

    <unicast-listener>
        <well-known-addresses>
          <socket-address id="1">
            <address system-property="tangosol.coherence.wka1"></address>
            <port system-property="tangosol.coherence.wka1.port">8088</port>
          </socket-address>
          <socket-address id="2">
            <address system-property="tangosol.coherence.wka2"></address>
            <port system-property="tangosol.coherence.wka2.port">8088</port>
          </socket-address>
          ....etc....
          <socket-address id="9">
            <address system-property="tangosol.coherence.wka9"></address>
            <port system-property="tangosol.coherence.wka9.port">8088</port>
          </socket-address>
        </well-known-addresses>
      </unicast-listener> 
    
  3. Jar up the file and restart the server (jar cvf coherence.jar *).

Step 2: Add the startup parameters

To add the relevant startup parameters:

  1. In the Oracle WebLogic Server Administration Console, select Servers, WLS_Services1, Configuration, and then Server Start.

  2. In the Arguments box, add the following:

    -Dtangosol.coherence.wka1=Host1  -Dtangosol.coherence.wka2=Host2
    -Dtangosol.coherence.localhost=Host1 -Dtangosol.coherence.wka1.port=8088
    -Dtangosol.coherence.wka2.port=8088 
    

    Where Host1 is where WLS_Services1 is running.

  3. Repeat steps 1 and 2 for WLS_Services2, swapping Host1 for Host2 and Host2 for Host1.

  4. Restart the WLS_Services servers.

Step 3: Validate the changes

To validate the changes:

  1. Log on to the Discussions Forum Administration panel.

  2. Select Cache Settings in the left pane.

  3. At the bottom of the screen, ensure that Clustering is set to enabled.

  4. Repeat steps 1 through 3 for all members of the cluster.

    As servers join the cluster they appear at the top of the screen.