Skip Headers
Oracle® Communications Order and Service Management Installation Guide
Release 7.2.2

E35412-06
Go to Documentation Home
Home
Go to Table of Contents
Contents
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

B OSM High-Availability Guidelines and Best Practices

This appendix provides guidelines and best practices for the various technologies and components that comprise an Oracle Communications Order and Service Management (OSM) high-availability system.

High-Availability System Architecture

Figure B-1 shows a cluster configuration which includes three cluster-managed servers, an administration server, a JMS client sending JMS messages to OSM, and an HTTP load balancer load balancing HTTP and HTTPS messages to OSM from various OSM Web clients. This is an example of an OSM high-availability system deployed across multiple physical servers at the application server layer. Each physical server can host one or more managed servers. The managed servers form the WebLogic Server cluster that runs OSM. At the database server layer, Oracle Database can be configured for either an active-passive or active-active deployment. OSM supports an active-active deployment of only two Oracle RAC instances.

Figure B-1 OSM High-Availability System Architecture

Shows an OSM high-availability system deployed across multiple physical servers at the application server layer.

About Order Processing Within an OSM WebLogic Cluster

The following sections describe how OSM processes orders within an OSM WebLogic cluster.

About Order Affinity and Ownership in an OSM WebLogic Cluster

When an OSM managed server receives a new order, OSM assigns a unique order ID to the order that it associates to the receiving managed server instance name within the cluster. Throughout the order fulfillment life cycle, OSM only processes this order with the associated managed server. This OSM principle is called order affinity and ensures order data integrity and performance by preventing multiple managed server instances from processing an order. This principle prevents additional communication overhead that would be required to synchronize order objects across multiple managed server instances. The server that has this exclusive control of an order owns the order. OSM routes all requests relating to an order to the owner instance.

Order ownership, however, is transferable in some cases. OSM can transfer an order to another managed server in the following scenarios:

  • If an order becomes a high-activity order, OSM can redistribute the order from the receiving managed server to another less active managed server to better balance the load between each server in the cluster (see "Distribution of High Activity Orders" for more information).

  • If an incoming order is a revision order that arrives on a managed server different from the one processing the base order, OSM will transfer order ownership so that the same managed server owns both the original order and the incoming order to execute amendment.

  • If the incoming order has a dependency to another order on another server instance. For example, a follow on order that has a dependency on another order would be routed to the server where the previous order was processed.

  • If a managed server is added or removed from a cluster, OSM notifies all server instances about topology changes to the cluster and re-runs the distribution algorithms that determine which server instance owns an order. Order ownership either remains with the previous owner or with a different owner.

Note:

When order ownership changes as the WebLogic cluster resizes, and if Oracle Real Application Clusters (Oracle RAC) is used at the database layer, the reassignment of orders could temporarily impact the database performance.

Note:

Before redistribution of an order to a new or different server instance, that server instance notifies other server instances to complete pending operations on the orders to be redistributed and delete them from their order cache.

About Load Balancing for OSM and Order Affinity

Load balancing helps maximize server resource use and order throughput. It also enables OSM to minimize server response time and processing delays that can occur if some servers are overloaded while others remain unused. Load balancing also enables rolling downtimes of servers without client impact, during non-peak times.

For OSM, two types of incoming messages are important for load balancing:

  • Load balancing for JMS over WebLogic T3 or T3S.

  • Load balancing for HTTP and HTTPS.

Inbound JMS messages to OSM can include:

  • OSM Web Service requests, such as Create Order requests from a CRM that initiates an OSM order.

  • JMS messages responding to an OSM automation, such as a response to a automation plug-in JMS request messages to external fulfillment systems.

Inbound HTTP and HTTPS messages to OSM can include:

  • OSM Web Service requests transmitted over HTTP and HTTPS, such as CreateOrder requests from a CRM that initiates an OSM order.

  • OSM Web client interactions, including the Task Web client and Order Management Web client.

  • XML API requests from external system.

Note:

Depending on your OSM cartridge implementation, OSM automations often use the XML API function calls. For example, automation plug-ins of O2A cartridges do use the XML API. However, OSM typically uses the XML API locally on the same server instance, since the XML API is often used to manipulate the same order, which is owned by the local instance.

For JMS messages, OSM uses the WebLogic Server JMS distributed destinations for load balancing. See "JMS Distributed Destinations" for more information. There is no need to load balance JMS messages using an external load balancer.

For HTTP and HTTPS messages, OSM uses a software or hardware load balancer outside of the OSM WebLogic Server cluster.

Load balancing for OSM Web Service requests is important because the OSM order affinity functionality requires that the orders are distributed appropriately among each managed server within the cluster since the managed server that receives an order becomes the owner of the order. See "About JMS Load Balancing Schema Options" for more information about JMS load balancing options and see "About HTTP and HTTPS Load Balancing and Session ID Configuration" for more information about HTTP and HTTPS load balancing options.

About the Performance Differences between JMS and HTTP or HTTPS

For some OSM Web Services, such a GetOrder request or a CreateOrder request that is a revision order of a base order owned by another managed server, the managed server receiving the request is not the owner of the order. To maintain order affinity, OSM forwards such requests to the owner node.

If the Web Service request is over JMS, OSM re-directs the request to the owner node. The managed server that originally received the order no longer participates in any way in the processing of the order.

If the Web Service request is over HTTP or HTTPS, OSM forwards the request to the owner node over internal JMS messaging. However, the receiving managed server must continually maintain a socket connection to the HTTP client or load balancer that sent the message even though another managed server is responsible for processing the message. The socket connection on the receiving server must remain open until a response is generated because HTTP messages are synchronous. This restriction adds a performance overhead when running Web Service request over HTTP or HTTPS and increases with the size of the WebLogic Server cluster, since the probability of a message arriving at the server that owns a particular order decreases as order ownership are spread across more servers.

Given the above limitation, as well as the advantage of the transactional reliability of JMS message processing, Oracle recommends using the OSM Web Services over JMS messages for external clients. HTTP and HTTPS messages should be restricted to the OSM Order Management Web client and the OSM Task Web client since manual client interactions are synchronous by nature.

Distribution of High Activity Orders

High-activity orders have a large number of processes, sub-processes, and tasks that must be executed concurrently. Because the workload for a high-activity order can be significantly higher than for a typical order, OSM may redistribute a high activity order to another active server instances proportionate to the managed server weights within the cluster. This redistribution based on weight ensures that one managed server does not get an unfair share of high-activity orders because of round robin load balancing. This ensures that high-activity orders are properly distributed among members in the cluster.

High activity order processing is enabled by default. To tune or disable the high-activity order routing mode in OSM, you must configure a set of related parameters in the oms-config.xml file. See OSM System Administrator's Guide for a detailed reference of available parameters.

General High-Availability Guidelines

None of OSM's physical components are expected to be in any De-Militarized Zone (DMZ), since neither its upstream systems nor Web client users are exposed to a public, untrusted domain such as the Internet.

To attain the fastest possible network connections, it is recommended that the physical servers for WebLogic and Oracle Database be in the same network segment. The performance of OSM is sensitive to network latency between WebLogic and the database. Less than 1 msec ping time is recommended.

To protect OSM from component failure, faulty configuration updates, or file corruption, WebLogic Server and Oracle Database should be backed up periodically. See OSM System Administrator's Guide for backup and recovery procedures.

Application Server Guidelines

The physical application servers typically host a single Oracle WebLogic Server domain that runs OSM. It is possible to have multiple WebLogic Server domains running the same OSM application. However, managing resources across multiple domains adds complexity.

The OSM WebLogic domain is an interrelated set of WebLogic Server resources that are managed as a unit. This unit includes:

  • A WebLogic Server cluster, which consists of multiple managed servers that host the OSM application. The managed servers are hosted across multiple physical servers to provide high availability at the application layer.

  • An administration server that manages all the resources in the domain. The administration server can run on a separate physical server, or it can be installed on the same physical server that is running a managed server.

It is recommended that you configure at least two managed servers. If you use a 32-bit JVM, the maximum theoretical heap limit is 4GB and is, in practice, much lower. Consequently, you may need more managed servers to handle your peak traffic. You can add additional managed servers after OSM is installed (see "Adding a New Managed Server to a Clustered Environment"). Spare managed servers are highly recommended.

The domain may have more than one WebLogic Server cluster. For example, if you run separate instances of the OSM server for central order management and service order management, they could be on different clusters in the same domain. OSM may also share the same domain with other Oracle Communications products such as Oracle Communications ASAP, Oracle Communications IP Service Activator, or Oracle Communications Unified Inventory Management (UIM), each running on separate clusters.

To maximize availability in the OSM WebLogic domain, it is strongly recommended that you run Node Manager on each machine that hosts a WebLogic Server instance. Node Manager is a WebLogic Server utility that starts, shuts down, and restarts the administration server and managed server instances from a remote location. See "Node Manager Considerations" for more information.

Figure B-2 illustrates the relationship between Node Manager and the server instances it controls.

Figure B-2 Node Manager Relationships With Server Instances

Illustrates the relationship between Node Manager and the server instances it controls.

In the WebLogic Server environment, the relationship of an administration server to Node Manager varies depending on the scenario. For example:

  • The administration server can be under Node Manager control. You can start it, monitor it, and restart it using Node Manager.

  • The administration server can be a Node Manager client. When you start or stop managed servers from the Administration Console, you are accessing Node Manager using the administration server.

  • The administration server supports the process of starting up a managed server with Node Manager. When you start a managed server with Node Manager, the managed server contacts the administration server to obtain outstanding configuration updates.

The capacity of OSM deployed on a WebLogic Server cluster can be resized dynamically to meet demand. Server instances can be added to a cluster without interruption of service. However, removing a server instance could cause in-progress orders to get stuck, as JMS messages sent to the removed instance will not be rerouted automatically to another WebLogic Server instance. WebLogic capabilities such as JMS distributed destinations and persistent store ensure high availability in OSM, and help to mitigate the interruption of service due to failure or removal of server instance. See "JMS Distributed Destinations" and "Persistent Store: JMS File Store and JDBC Store" for high-availability considerations.

Administration Server Considerations

The administration server operates as the central control entity for the configuration of your OSM WebLogic domain.

The failure of the administration server does not affect the operation of managed servers in the domain. Furthermore, the load balancing and failover capabilities supported by the domain configuration remain available. However, it does prevent you from changing the domain's configuration, including loss of in-progress management and deployment operations and loss of ongoing logging functionality.

Oracle recommends the following best practices when configuring the administration server in your OSM WebLogic domain:

  • The administration server should not participate in a cluster. Ensure that the administration server's IP address is not included in the cluster-wide DNS name.

  • Start the administration server using Node Manager, so that it restarts automatically in the event of failure. (If the administration server for a domain becomes unavailable, the managed servers in the domain will periodically attempt to reconnect to the administration server.)

For additional clustering best practices, see Oracle Fusion Middleware Using Clusters for Oracle WebLogic Server.

You may also consider transforming the administration server in an existing domain for cold cluster failover.

In this active-passive topology, the administration server is installed on Node1, and then transformed to a shared disk. In the event of failure, it will be failed over to Node2. The administration server domain_home resides on a shared disk that is mountable by both Node1 and Node2, but is mounted by either one of the two at any given point in time. The listen address of the administration server is a virtual IP.

See the chapter on active-passive topologies in Oracle Fusion Middleware High Availability Guide for more information.

Node Manager Considerations

Node Manager is a Java utility that runs as a separate process from the WebLogic Server and allows you to perform common operations for a managed server, regardless of its location with respect to its administration server. The Node Manager process is not associated with a specific WebLogic domain but with a machine. Thus each physical server has its own Node Manager, which can control all server instances that reside on the same machine as the Node Manager process.

Consider the following guidelines when using Node Manager:

  • Run Node Manager as an operating system service on UNIX platforms, allowing it to restart automatically when the system is restarted.

  • Set the AutoRestart attribute of the administration server and each managed server to true to allow Node Manager to automatically restart it in the event of failure, depending on the exit code (if the exit code is less than 0, the server is not restarted and you must diagnose the problem).

  • Do not disable the Managed Server Independence (MSI) mode for a managed server (enabled by default). MSI allows Node Manager to automatically restart a managed server after failure even when the administration server is unavailable.

  • To ensure that Node Manager properly restarts servers after a system crash (for example, an OS crash), you must do the following:

    • Ensure that CrashRecoveryEnabled is set to true. This property is disabled by default.

    • Start the administration server using Node Manager. You cannot use Node Manager to start a server instance in MSI mode, only to restart it. For a routine startup, Node Manager requires access to the administration server.

    • Start all managed servers using the administration server. You can accomplish this using the WebLogic Server Scripting Tool command line or scripts, or the Administration Console. A widespread practice is to start managed servers using a shell script.

See Node Manager Administrator's Guide for Oracle WebLogic Server for more information.

WebLogic Server Cluster Considerations

Oracle recommends the following best practices when configuring managed server instances in your clustered OSM domain.

  • Configure Node Manager to automatically restart all managed servers in the domain.

  • Configure all managed server instances to use MSI, which is the default. This feature allows the managed servers to restart even if the administration server is unreachable due to a network, hardware, or software failure. See Oracle Fusion Middleware Managing Server Startup and Shutdown for Oracle WebLogic Server for more information.

About the WebLogic Messaging Mode and OSM Cluster Size

The WebLogic Server cluster messaging mode enables cluster members to remain synchronized and provides the foundation for other WebLogic Server functions such as load balancing, scalability, and high availability.

In an OSM cluster, the messaging mode can be multicast or unicast, however, Oracle recommends using multicast in most OSM installations since multicast is generally more reliable.

In some cases, unicast may be the only option. For example, multicast can only work over a single subnet. If a single subnet is not possible due to technological or IT policy reasons, or if the network's multicast transmission is not reliable, then unicast messaging mode becomes the best option. If you must use unicast, ensure that you apply the WebLogic Server patches that resolve the currently known unicast issues. See "Software Requirements" for patch information.

Do not use unicast for a cluster with more than 20 managed servers. Enabling a reliable multicast network for WebLogic Server multicast messaging mode is the only option for such large cluster sizes. The broadcast nature of multicast over UDP packets works better in such large clusters than one-to-one TCP unicast connections between each pair of managed servers.

You can use unicast for cluster sizes between 10 and 20 managed servers, however should consider multicast if you begin to experience poor performance or reliability issues.

About Coherence and Unicast

Oracle recommends unicast mode for Coherence. The OSM cluster performance and robustness are sensitive to the synchronization of cached data maintained by Coherence. The inherently unreliable packet delivery with UDP in multicast transmission may destabilize cache synchronization, and errors can be very difficult to troubleshoot. As a result, Oracle does not recommend using Coherence in multicast mode.

WebLogic Server Migration

OSM's automation framework leverages WebLogic Java Message Service (JMS) to support messaging between automation components and external systems. In addition, upstream systems can access OSM Web services with JMS as one of the transport protocols. Thus, it is critical that JMS be highly available. JMS high availability is achieved by using JMS distributed destinations (see "JMS Distributed Destinations") as well as planning for migration in the event of failure.

Whole Server Migration

WebLogic Server migration is the process of moving a clustered WebLogic Server instance, or a service running on a clustered instance, elsewhere in the event of failure. In the case of whole server migration, the server instance is migrated to a different physical machine upon failure. In the case of service-level migration, the services are moved to a different server instance within the cluster. See "JMS Service Migration" for a discussion of service-level migration. Whole server migration is the preferred and recommended approach because all JMS-related services are migrated together.

WebLogic Server provides migratable servers to make JMS and the JTA transaction system highly available. Migratable servers—clustered server instances that migrate to target servers—provide for both automatic and manual migration at the server level, rather than at the service level.

When a migratable server becomes unavailable for any reason; for example, if it hangs, loses network connectivity, or its host machine fails, migration is automatic. Upon failure, a migratable server is automatically restarted on the same machine if possible. If the migratable server cannot be restarted on the machine where it failed, it is migrated to another machine. In addition, an administrator can manually initiate migration of a server instance.

The target server for the migration can be a spare server on which Node Manager is running. This server does not participate in the cluster until a migratable server is migrated to it.

Another option for the target server is a server that is hosting a WebLogic Server instance. In the event of failure, the migratable server will be migrated to it resulting in the two instances (which now run on the same server) competing for CPU, memory, and disk resources. In this case, performance could be impacted.

Before you configure automatic whole server migration, be aware of the following requirements:

  • All servers hosting migratable servers are time-synchronized. Although migration works when servers are not time-synchronized, time-synchronized servers are recommended in a clustered environment.

  • To ensure file availability, use a disk that is accessible from all machines. If you cannot share disks between servers, you must ensure that the contents of domain_dir/bin are copied to each machine.

  • Use high-availability storage for state data. For highest reliability, use a shared storage solution that is itself highly available, for example, a storage area network (SAN). For more information, see Oracle Fusion Middleware Using Clusters for Oracle WebLogic Server.

  • For capacity planning in a production environment, keep in mind that server startup during migration taxes CPU utilization. You cannot assume that because a machine can handle a certain number of servers running concurrently that it also can handle that same number of servers starting up on the same machine at the same time.

For additional requirements, see Oracle Fusion Middleware Using Clusters for Oracle WebLogic Server.

JMS Service Migration

If whole server migration is not possible, JMS service migration is an alternative.

JMS services are singleton services, and, therefore, are not active on all server instances in a cluster. Instead, they are pinned to a single server in the cluster to preserve data consistency. To ensure that singleton JMS services do not introduce a single point of failure for dependent applications in the cluster, WebLogic Server can be configured to automatically or manually migrate them to any server instance in the migratable target list.

Migratable JMS-related services cannot use the default persistent store, so you must configure a custom store and target it to the same migratable target as the JMS server. The custom store must either be accessible from all candidate server members in the migratable target or migrated to a backup server target by pre-migration/post-migration scripts. JMS service and JTA must be migrated together.

For more information, see the chapter on service migration in Oracle Fusion Middleware Using Clusters for Oracle WebLogic Server.

JMS Distributed Destinations

JMS destinations, which may be JMS queues or JMS topics, serve as repositories for messages. A JMS destination provides a specific end point for messages, which a JMS client uses to specify the target of messages that it produces and the source of messages that it consumes. For example, OSM automation plug-ins can specify the JNDI names of the JMS queue and the JMS reply-to queue to produce and consume messages with external systems.

A distributed destination is a single set of destinations that are accessible as a single, logical destination to a client (for example, a distributed topic has its own JNDI name). The members of the set are typically distributed across multiple servers within a cluster, with each member belonging to a separate JMS server. When deployed to a cluster, OSM uses distributed destinations because JMS provides load balancing and failover for the members of a distributed destination in a cluster. For performance reasons, the server affinity is enabled on the connection factory to give preference to local destination members.

Note:

OSM does not support uniform distributed destinations (UDDs), which are the default type of distributed destination in WebLogic. OSM supports only weighted distributed destinations (WDDs). When configuring distributed destinations in the WebLogic Server Administration Console, select Weighted for the Destination Type to configure the distributed destination as a WDD.

Note:

Messages sent to JMS distributed destinations will always be delivered to member queues. However, messages delivered to a member queue can get stuck in the event of a server failure. In that case, messages cannot be consumed until either the WebLogic server is restarted or the JMS server is migrated.

About WebLogic JMS T3 and T3S Load Balancing

WebLogic T3 and the secure T3S variant are transport protocols for application-level services like Java Transaction API (JTA) transaction coordination, remote access to Enterprise Java Beans (EJBs), and JMS message delivery that WebLogic uses for communication between client applications and a WebLogic Server. OSM typically communicates messages using T3 or T3S to distributed JMS destination for:

  • OSM Web Service XML/SOAP messages transmitted over JMS, like an OSM CreateOrder Web Service requests from a CRM that initiates an OSM order.

  • OSM automations that receives messages from external fulfillment systems.

  • OSM internal event handling. For example, the oms_events_queue can be used for triggering data change notifications on an order.

The T3 protocol is fully implemented by the WebLogic Server and its client libraries. T3 client libraries support clustered server URLs for the initial context lookup enabling native WebLogic support for load balancing. The WebLogic cluster then manages load distribution based on system availability and the selected load balancing schema. See the WebLogic documentation for more information about creating a WebLogic client for communicating JMS messages to OSM over T3.

Figure B-3 shows a JMS client configured with the URLs for each managed server within a cluster in the JNDI properties file that it is sending messages to. WebLogic load balances these messages using the URLs based on a load balancing schema (see "About JMS Load Balancing Schema Options" for more information). OSM typically processes an OSM order on the managed server that receives the order, but in some cases OSM uses the managed server queues internally to redistribute orders to other managed servers after an order has been received on a managed server (see "About Order Affinity and Ownership in an OSM WebLogic Cluster" for more information).

Figure B-3 JMS Load Balancing

Description of Figure B-3 follows
Description of "Figure B-3 JMS Load Balancing"

Note:

Because JMS messages are transmitted in the context of JTA transactions, you must make sure that the WebLogic client always has bi-directional network and firewall access to every OSM WebLogic managed server. If you send a message to a distributed destination, for example, the JTA coordinator used by the WebLogic client must be able to communicate with every managed server. Having only partial access to the managed servers in the OSM cluster can lead to inconsistent message states.

About JMS Load Balancing Schema Options

At the WebLogic cluster level, you can select between three load balancing schemas for JMS load balancing. WebLogic supports only one load balancing schema selection per cluster even if there are multiple applications hosted with the same cluster. The load balancing schema you select effects OSM both on its external and internal messaging interfaces, for example, incoming messages from an external system and messages exchanged between managed servers within the cluster.

The following lists the OSM WebLogic load balancing options and the situations where they are recommended for OSM clusters:

  • Round-robin: OSM distributes the JMS messages evenly across the managed serves in the cluster by periodically circulating the list of available managed servers. Each server is treated equally. Oracle recommends the round-robin schema for homogeneous cluster layouts where each managed server has access to the same level of hardware resource capacity (for example, where all managed server are hosted on machines with an identical number of cores).

  • Random-based: Before routing a JMS message, the random-based load balancing schema generates a random number and selects one of the candidate servers as the message destination based on the random number. This schema is useful in scenarios where only one client application sends messages that follow a repeating pattern. It is possible for such a pattern to continually reset a round-robin schema so that only the first managed server within the cluster ever gets any of the messages. This scenario is unlikely with OSM. Also, because the random-based schema incurs additional performance overhead, Oracle only recommends this schema if it is not possible to change the client application to support either round-robin or weight-based schemas.

  • Weight-based: If the OSM cluster consists of managed servers hosted on systems with varying hardware resource capacity, you can assign load balancing weights to each WebLogic instance. For example, if the cluster has one managed server hosted on a system with only 12 cores, and all other managed servers hosted on systems with 36 cores, you could assign a smaller weight to the managed server with the 12 core machine so that it receives fewer JMS messages. Oracle recommends the weight-based schema for non-homogeneous cluster layouts where the managed servers have access to different levels of hardware resource capacity.

About Hardware or Software HTTP and HTTPS Load Balancing Options

Oracle recommends that you use Oracle HTTP Server to load balance HTTP and HTTPS messages (for example, for the OSM Web Clients or for OSM messages over HTTP) for OSM clusters in production environments that require high availability for HTTP messages. The Oracle HTTP Server is a software load-balancing application that is part of the Oracle Fusion Middleware Web Tier Tools. You should install Oracle HTTP Server on a standalone machine or machines that are separate from the hosts of the OSM WebLogic cluster.

Note:

These recommendations only apply to HTTP and HTTPS messages. JMS messages should be load balanced using the WebLogic Server native support for JMS T3 and T3S load balancing (see "About WebLogic JMS T3 and T3S Load Balancing").

For detailed instructions for downloading, installing, and configuring Oracle HTTP Server for OSM, see Setting up an Oracle HTTP Server for OSM Cluster HTTP Load Balancing (Doc ID 1618630.1) knowledge article in Oracle Support at:

https://support.oracle.com

For development systems or if you do not require high-availability for HTTP traffic in a production system, you can create a managed server and use it as an HTTP proxy that is managed by the WebLogic Administration server, but remains outside of the cluster.

You can also consider a hardware load balancing solution for load balancing HTTP and HTTPS messages. A hardware load balancer can use any algorithm supported by the hardware, including advanced load-based balancing strategies that monitor the utilization of individual machines. If you choose to use hardware to load balance HTTP and HTTPS sessions, it must support a compatible passive or active cookie persistence mechanism, and SSL persistence.

The following lists other possible software load balancer options for OSM HTTP and HTTPS messages:

  • Oracle Weblogic proxy plug-ins for the following standard web server solutions

    • Oracle iPlanet Web Server (7.0.9 and later)

    • Apache HTTPD 2.2.x

    • Microsoft Internet Information Services 6.0 and 7.0

  • Dedicated software load balancing solutions like Oracle Traffic Director. Oracle recommends this option for running OSM with Oracle Exalogic and Oracle SuperCluster.

The following lists possible hardware load balancer options for OSM HTTP and HTTPS messages:

  • F5 Big-IP

  • Cisco ACE

About HTTP and HTTPS Load Balancing and Session ID Configuration

Round-robin load balancing for HTTP and HTTPS messages is the only supported option for software load balancers since WebLogic does not propagate managed server weights.

Note:

The WebLogic cluster load balancing schema options (see "About JMS Load Balancing Schema Options") have no effect for load balancing HTTP messages because an HTTP load balancer is outside of the cluster.

When running OSM in a cluster, you must enable the proxy-plugin option at the cluster level as opposed to the managed server level otherwise session drops may occur and login to the OSM web UI may not be possible.

You must ensure that the chosen HTTP load balancing solution supports WebLogic session ids and custom HTTP headers. All WebLogic plug-ins, including the Oracle HTTP Server, supports sticky sessions, but do not support session ID failover if the server the ID is connected to fails.

Managing WebLogic Transactions

Transactions are a means to guarantee that database changes are completed accurately. The WebLogic Server transaction manager is designed to recover from system crashes with minimal user intervention, and makes every effort to resolve transaction branches with a commit or roll back, even after multiple crashes or crashes during recovery.

To facilitate recovery after a crash, the WebLogic Server Transaction Recovery Service automatically attempts to recover transactions on system startup. On startup, the Transaction Recovery Service parses all transaction log records for incomplete transactions and completes them as described in Oracle Fusion Middleware Programming JTA for Oracle WebLogic Server.

Oracle recommends the following guidelines:

  • If a server crashes and you do not expect to be able to restart it within a reasonable period of time, you can migrate either the whole server or the Transaction Recovery Service to another server in the same cluster. The transaction log records are stored in the default persistent store for the server. If the default persistent store is a file store (the default), it must reside in a shared storage system that is accessible to any potential machine to which a failed migratable server might be migrated. See "Persistent Store: JMS File Store and JDBC Store" for high availability considerations.

  • Server instances should be configured using DNS names rather than IP addresses. A server instance is identified by its URL (IP address or DNS name plus the listening port number). Changing the URL by moving the server to a new machine or changing its listening port on the same machine effectively moves the server, so the server identity may no longer match the information stored in the transaction logs. Consequently, any pending transactions stored in the transaction log files will be unrecoverable. This is also critical if firewalls are used to avoid address translation issues.

  • If automatic migration of the Transaction Recovery Service is not configured, you should first attempt to restart a crashed server and allow the Transaction Recovery Service to handle incomplete transactions (rather than move it to a new machine). However, if the server exited with a code less than 0, you should not attempt to restart it unless you diagnose the problem. In this case, the server did not terminate in a stable condition, for example, due to invalid configuration.

    See Oracle Fusion Middleware Programming JTA for Oracle WebLogic Server for more information.

Persistent Store: JMS File Store and JDBC Store

WebLogic's persistent store provides a built-in, high-performance storage solution for WebLogic Server subsystems and services that require persistence. For example, it can store persistent JMS messages or temporarily store messages sent using the Store-and-Forward feature. The persistent store supports persistence to a file-based store or to a JDBC-enabled database. The persistent store is important for OSM because it stores all of the JMS messages from the JMS service.

Considerations When Choosing Between File Store and JDBC Store

There is a trade-off in performance and ease of backup when choosing between JMS file store and JDBC store:

  • JMS file store provides better performance than JDBC store. However, you cannot perform an online backup of OSM data consistent with the JMS file store. You must first shut down OSM, and then back up the JMS file store and the database at the same time. Otherwise, inconsistent message states and database states may result, and the backup cannot be used to restore OSM.

  • The benefit of JDBC store is that online database backups can obtain consistent snapshots of both OSM data and JMS messages. Currently, however, WebLogic does not offer a mechanism for consistent backup of persistent stores and transaction logs. This is because the transaction logs can only be file-based.

  • In an Oracle Communications environment where ASAP, IP Service Activator, or UIM is running in the same OSM WebLogic domain, the JDBC store may yield a more consistent backup strategy across the domain and may outweigh performance considerations. However, you cannot take a consistent backup of OSM because the data is distributed across the database and file system.

To realize high availability for the file store, it should reside on shared disk storage that is itself highly available, for example, a storage area network (SAN).

If you choose JMS file store, it is recommended to configure one custom file store for each managed server.

Oracle Coherence

Oracle Coherence plays a major role in providing grid services to OSM. OSM employs the Coherence Invocation service as well as local and distributed Coherence caches. The Coherence Invocation service is a feature of the Coherence Application Edition or Grid Edition.

Oracle Coherence must be configured to avoid conflicts in a clustered OSM environment. See "Configuring Oracle Coherence for an OSM Cluster" for guidelines and best practices.

Database Server Guidelines

At the database-server layer, OSM supports two high-availability options:

  • Cold cluster failover, also known as cold standby.

  • Oracle Real Application Clusters (RAC), in either an active-passive or active-active topology.

You can configure Oracle Database for cold standby or Oracle RAC, depending on your business goals and availability requirements.

Cold Cluster Failover

Cold cluster failover consists of Oracle Clusterware and two Oracle single-instance databases running on separate physical servers sharing disk storage. Oracle Clusterware monitors the primary active database instance and provides the capability of failover to a cold standby database in case of failure, thus ensuring high availability. Clients on the network experience a period of lockout while the failover occurs and are then served by the other database instance after the instance has started.

See "Configuring Oracle Database with Clusterware" for configuration details.

Oracle RAC

OSM supports Oracle RAC in either an active-passive or active-active configuration. Oracle recommends using an active-active configuration for maximum availability.

In active-passive Oracle RAC, an active instance handles requests and a passive instance is on standby. When the active instance fails, an agent shuts down the active instance completely, brings up the passive instance, and application services can successfully resume processing. As a result, the active-passive roles are now switched.

In active-active Oracle RAC, all active instances simultaneously process database operations. In addition to load balancing, active-active Oracle RAC can also provide high availability if the physical database server of one Oracle RAC database instance is dimensioned to handle all of the database load upon failure of the other Oracle RAC database instance.

OSM supports Oracle RAC through the use of WebLogic multi data sources. To optimize performance, OSM uses order affinity as described in the following section.

Note:

OSM supports an Oracle RAC configuration of two nodes only.

Order Affinity for OSM

WebLogic multi data sources support XA affinity for global transactions, which ensures that all the database operations for a global transaction performed on an Oracle RAC cluster are directed to the same Oracle RAC instance. However, XA affinity cannot span different global transactions on the same data, which is a key performance requirement for OSM. The objective is to minimize the adverse impact on performance caused by database buffer cache transfers between Oracle RAC instances. Therefore, OSM supports order affinity, which means that database operations for different global transactions on the same order are normally directed to the same Oracle RAC instance. Overall, Oracle RAC instances process database operations simultaneously, but each instance operates on a subset of orders mutually exclusive to each other.

OSM order affinity works in the following way:

  • Each OSM server interacts with Oracle RAC through a WebLogic Server multi data source configured for failover with two data sources (one for each Oracle RAC instance). This setup is used for both active-passive and active-active topologies. Under normal conditions each OSM server always interacts with a single Oracle RAC instance. In an active-active topology, load balancing is achieved by reversing the order of the data sources in the multi data source for half of the OSM servers.

  • If the Oracle RAC database is configured with server-side load balancing (the Oracle RAC instances register with a remote listener process), server-side load balancing must be overridden as discussed in "Remote Listener Considerations".

  • Under normal conditions, the ownership and processing of each order is pinned to a single OSM server in a cluster. Because each OSM server interacts with a single Oracle RAC instance through the primary data source of its multi data source, all database operations for each order are directed to the same Oracle RAC instance. If ownership of an order is transferred to another OSM server (for example, when the cluster resizes or the order becomes a high-activity order), the processing of that order will be pinned again to the new OSM server.

Connecting Oracle RAC with JDBC Multi Data Source

During installation, the OSM installer prompts for the database parameters of the Oracle RAC instances and automatically creates the appropriate configuration in a JDBC multi data source. If you decide to manually configure your JDBC data source, you must understand the following discussion.

At the application layer, OSM maintains WebLogic Server order affinity. That is, all processing of an order is performed exclusively by one WebLogic Server instance, to minimize serialization. The OSM application cannot ensure all database operations for a particular order maintained by a WebLogic Server instance are directed to the same Oracle RAC instance. Thus, the approach to preserve Database Server order affinity is to have each WebLogic Server instance connect to only one Oracle RAC instance at any instant.

OSM uses a multi data source consisting of two data sources, each of which connects to an Oracle RAC database instance. Using the failover algorithm, the first data source is the primary data source and the other data source is the secondary data source. Under normal operation, only the primary data source is connected and used. When the primary data source fails, the multi data source chooses the next available data source as the primary data source.

The failover algorithm is used in both active-passive and active-active topologies. However, the configuration of the data source members within the multi data source is different:

  • In active-passive Oracle RAC, all instances in the WebLogic Server cluster are configured to the PREFERRED Oracle RAC database instance as the primary data source, and to the AVAILABLE Oracle RAC database instance as the secondary. Upon database failure, all WebLogic Server instances transition from the PREFERRED Oracle RAC database instance to the AVAILABLE Oracle RAC database instance.

    Figure B-4 illustrates this configuration.

    Figure B-4 Data Source Configuration for Oracle RAC Active-Passive

    Illustrates the Data Source Configuration for Oracle RAC Active-Passive.
  • In active-active Oracle RAC, WebLogic Server instances are partitioned. Half of the WebLogic Server instances are configured to one Oracle RAC database instance as the first data source, and to the other Oracle RAC database instance as the second data source. The other half in the WebLogic Server cluster are configured with the sequence of the Oracle RAC database instances swapped. As a result, if one of the Oracle RAC database instances fails, half the WebLogic Server instances failover to the remaining Oracle RAC database instance, which is already handling the database operations of half of the WebLogic Server cluster.

    Figure B-5 illustrates this configuration.

    Figure B-5 Data Source Configuration for Oracle RAC Active-Active

    Illustrates the Data Source Configuration for Oracle RAC Active-Active.

    In the active-active Oracle RAC configuration, the use of the failover algorithm (as opposed to a load-balancing algorithm) may appear counter-intuitive. However, keep in mind that load balancing in an active-active Oracle RAC configuration is not managed at the multi data source layer, but rather by partitioning the instances in the WebLogic Server cluster. There is no dynamic load balancing between WebLogic instances and database instances.

    The relationship of a WebLogic instance and database instance in an active-active Oracle RAC configuration is many-to-one. That is, more than one WebLogic instance may choose the same database instance as its primary database instance, but a WebLogic instance cannot choose more than one database instance as its primary database instance. When you add a new database instance, you can either reassign an existing WebLogic instance to it, or create a new WebLogic instance.

    WebLogic Server instances must be partitioned appropriately to load-balance with active-active Oracle RAC. The recommended approach is to have an even number of physical application servers with the same hardware dimensioning and weight. You should monitor the performance of your WebLogic and database instances to ensure they are not overloaded or under utilized. You can add more WebLogic instances to a database instance that is not fully utilized, or reassign a WebLogic instance to another database instance that is overloaded.

    When a WebLogic Server cluster resizes, the ownership of an order is reassigned. The cache transfer of records from one database instance to another has a temporary impact on performance.

Database Failover with Oracle RAC

In an Oracle RAC configuration, all data files, control files, and parameter files are shared for use by all Oracle RAC instances. When a database instance fails, performance may be temporarily affected, but the database service continues to be available.

The use of WebLogic multi data source and JMS minimize the impact of a database failure in the following ways:

  • The JDBC multi data source fails over to the secondary data source.

  • In-flight WebLogic transactions are driven to completion or rolled back, based on the state of the transaction at the time of the failure. Because JMS messages are redelivered, most failed transactions are automatically retried (upon redelivery) on the secondary data source. This does not apply to failed Web services and HTTP requests (for example, failed createOrder requests must be resubmitted).

Listener Considerations for Oracle RAC

In an Oracle RAC environment, the database listener process establishes the connections between the JDBC data source of a WebLogic Server instance and an Oracle RAC instance.

To enable the listener functionality, you have two options:

  • Use local listeners. With this option, each Oracle RAC instance is configured to register only with its listener in the same physical server.

  • Use remote listeners. With this option, each Oracle RAC instance is configured to register with a remote listener that may or may not be in the same physical server. There is no "remote listener only" scenario—local listeners must be running for the remote listener to work properly. When a request comes in, the remote listener redirects it to the local listener depending on what instance it is connecting to.

In an Oracle RAC environment, remote listeners are typically used. In 11gR2, when you create an Oracle RAC database with the Database Configuration Assistant (DBCA), the remote listener is a Single Client Access Name (SCAN) listener. The SCAN address resolves to a number of addresses (for example, using round-robin DNS).

When configuring JDBC data sources, you must be aware of your listener process setup. The OSM installer will automatically configure your JDBC data sources based on the listener process considerations discussed in the following sections. These considerations apply to both active-active and active-passive topologies.

Remote Listener Considerations

By default, server-side load balancing is configured when using remote listeners. That is, the remote listener decides how to forward connection requests based on the load of the Oracle RAC instances. However, OSM active-passive (failover) configurations require that server-side load balancing be overridden. To achieve this, the INSTANCE_NAME parameter (the SID of a specific instance) must be included in the JDBC URL of each member data source, in addition to identifying the database service by name. For example:

jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=host1)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=OSM) (INSTANCE_NAME=SID1)))

The OSM installer will automatically set up the URL of each JDBC data source in the WebLogic Server instances' multi data source. However, if you choose to manually configure additional Oracle RAC instances, you must populate the SID in the JDBC URL of the member data sources in WebLogic Server. See "Configuring an Additional Data Source for an Oracle RAC Instance" for details.

An alternative to overriding server-side load balancing in remote listeners is to use local listeners instead.

Local Listener Considerations

When configuring local listeners, consider the following:

  • Each database instance should be configured to register only with its local listener.

  • Oracle instances can be configured to register with the listener statically in the listener.ora file or registered dynamically using the instance initialization parameter local_listener, or both. Oracle recommends using dynamic registration.

  • A listener can start either a shared dispatcher process or a dedicated process. Oracle recommends using dedicated processes.

Disk Storage Recommendations

In an OSM high-availability architecture, Oracle database requires high performance disk storage. WebLogic Server may also need sharable storage. The WebLogic layer and database layer may use separate disk storage systems.

File storage must be sharable across database instances. In an Oracle RAC configuration, all data files, control files, and parameter files are shared for use by all Oracle RAC instances.

File storage may also need to be shared across WebLogic Server instances due to persistent store requirements. If WebLogic Server binaries are installed on a shared file system, the file system must be highly scalable and available to avoid performance degradation and single point of failure due to file system error or corruption.

A high-availability storage solution that uses one of the following architectures is recommended:

  • Direct Attached Storage (DAS), such as a dual ported disk array or a Storage Area Network (SAN).

  • Network Attached Storage (NAS).

In terms of I/O characteristics, OSM performs a large amount of database writes compared to database reads.

RAID Recommendations

RAID (Redundant Array of Independent Disks) disk storage technology increases disk storage functions and reliability through redundancy. The use of RAID with redundant controllers is recommended to ensure there is no single point of failure and to provide better performance on the storage layer.

The database is sensitive to read/write performance of the redo logs and should be on a RAID 1, RAID 1+0, or no RAID at all because logs are accessed sequentially and performance is enhanced by having the disk drive head near the last write location.

Table B-1 summarizes the recommended RAID levels to use with different types of Oracle database files.

Table B-1 RAID Levels

RAID Type of RAID Control File Database File Redo Log File Archive Log File

0

Striping

AvoidFoot 1 

OK1

Avoid1

Avoid1

1

Shadowing

OK

OK

Recommended

Recommended

0+1

Striping and Shadowing

OK

OKFoot 2 

Avoid

Avoid

1+0

Shadowing and Striping

OK

RecommendedFoot 3 

Avoid

Avoid

3

Striping With Static Parity

OK

AvoidFoot 4 

Avoid

Avoid

5

Striping With Rotating Parity

OK

Avoid3

Avoid

Avoid


Footnote 1 RAID 0 does not provide any protection against failures. It requires a strong backup strategy.

Footnote 2 RAID 0+1 avoids hot spots and gives better possible performance during a disk failure. The disadvantage of RAID 0+1 is the cost of configuration.

Footnote 3 When heavy write operation involves this data file.

Footnote 4 RAID 1+0 is recommended because RAID 1+0 provides higher availability and performance, even when one or more drives are lost.