Planning Your OSM Production Installation

Overview of Planning Your OSM Production Installation

This section provides an overview of the planning process that you must perform to identify and acquire the hardware, software, and networking equipment and configurations required to run OSM in a highly available production environment.

When you plan your installation, you must consider such options as the type of OSM system you need, the types of orders you must process, the hardware you need, the amount of memory, CPU, and I/O required, the networking equipment and configuration needed, the database, WebLogic Server, and operating system configuration requirements.

When planning for your production installation, you should determine whether there is a more recent patch of OSM available that you want to use for your production system. For more information, see "Checking for a Current OSM Patch Before Going into Production."

Types of Systems

Planning an OSM installation depends on the purpose of the OSM system. You can define the following OSM systems:

Development Systems: The purpose of a development system is to develop, deploy, test, and demonstrate new OSM solution functionality. For more information about development systems, see "OSM Development System Guidelines and Best Practices."
Production Systems: The purpose of a production system is to process orders in an overall OSM solution. Production systems must be highly available, scalable, and secure. Before you go live with a production system, you must simulate the production system environment and expected order volume as closely as possible. You can use a pre-production environments to generate performance data, to test system tuning procedures, and to provide a staging environment before moving to a production system.

High Availability Architecture

In a highly-available OSM deployment, redundancy is built in at each layer to ensure there is no single point of failure. A OSM system that is deployed into a high-availability architecture consists of the following:

The application server layer, which hosts the WebLogic Server cluster that manages the OSM application.
The database server layer, which hosts a highly-available Oracle RAC database instances.
A shared storage system that the database servers use to access the database files.
A shared storage system for the application servers for whole server migration in case of a server failure.
An HTTP load balancer in fault-tolerant mode.

Figure 3-1 shows an example of a highly-available OSM production system topology. This system is deployed across multiple physical servers at the application server layer.

Figure 3-1 OSM High-Availability Test or Production System Topology

Description of "Figure 3-1 OSM High-Availability Test or Production System Topology"

The system includes a WebLogic Server cluster with four managed servers, an administration server, a JMS client sending orders to OSM, and an HTTP load balancer that load balances HTTP and HTTPS messages to OSM from various OSM web clients. Each physical server can host one or more managed servers. The managed servers form the WebLogic server cluster that runs OSM. At the database server layer, OSM supports a partitioned active-active deployment of two or more Oracle Real Application Clusters (Oracle RAC) instances with shared storage.

For increased availability for the WebLogic Server, Oracle recommends that you configure managed servers in the cluster with whole server migration. Whole server migration enables a managed server that unexpectedly terminates and cannot be restarted to migrate and start up on a different machine. The standby machine can be empty or host an existing OSM managed server. If the machine hosts an existing managed server, then the machine must have enough capacity to run a second OSM managed server. You must acquire and configure shared storage to ensure the persistence of managed server data when a managed server migrates to another machine.

Initial Sizing Based on Order Complexity and Performance Needs

The size of an OSM production system depends on the overall complexity of the solution. Complexity can be determined using criteria such as:

The average number of orders per day
The order creation rate during peak hour
The number of days the order must be retained
The number of order line items per order
The number of order components per order
The number of tasks per order
The number of data elements and their complexity per order
The number of tasks per second (throughput)
Expected order lifetime (from creation to completion in seconds)
Number of manual users

The most common measure of OSM performance is order throughput. OSM must fulfill orders at the rate that is determined by business need. OSM throughput is measured in task transitions per second (TPS).

Although the TPS metric varies for each deployment, it is useful for you to consider the following approximate guidelines and adjust them as your circumstances require:

Simple orders, which typically complete less than 10 tasks per order.
Moderate orders, which complete approximately 25 tasks per order.
Complex orders, which complete approximately 100 tasks per order.
Very complex orders, which complete approximately 1000 tasks per order.

Given this criteria, you can create an initial estimate of the hardware requirements for an OSM production installation. See "Planning the Physical Architecture" for more information about estimating hardware requirements.

Note:

The solution architecture impacts the size of orders, the number of tasks, and so on. This enables you to do the initial sizing, but you must still confirm this sizing with actual performance testing. For more information, see "OSM Pre-Production Testing and Tuning."

Planning the Solution Architecture, System Deployment, and Maintenance

A solution architecture, which refers to the structure, interaction, and abstraction of software applications, needs to be devised. It represents how the various components, including OSM, interact. It also showcases issues such as:

What role does each component or product play in the solution?
Who can access and use this component?
How is it secured?

For more information, see OSM Modeling Guide.

In addition to the hardware and software requirements, you must also plan system deployment and ongoing maintenance considerations such as:

How will the OSM in the central order management (COM), service order management (SOM), and technical order management (TOM) roles map to the physical architecture? Will there be multiple instances of OSM?

OSM in the COM role is typically deployed in a separate WebLogic Server cluster and OSM in the SOM and TOM roles are typically in the same WebLogic Server clusters. You must do performance tests for each OSM instance in your network. Oracle recommends that OSM is deployed on dedicated machines. Sharing machines makes troubleshooting much more difficult.
What is the backup and restore strategy?
How will application performance monitoring and operational maintenance activities, such as log file clearing, be handled?
How will business continuity be maintained in the event of an application failure?
What are the data retention requirements and the data purge strategy?
How will errors and failures in other interfacing applications be handled?
How will the production environment be deployed? Is there a requirement for automated deployment?

Planning the Physical Architecture

The following sections describe hardware sizing guidelines for OSM using RODOD, RSDOD, and very simple non-RODOD or RSDOD solution examples on Oracle Linux. These sections provide guidelines for estimating the hardware required to achieve similar daily order volumes with your own OSM solutions. It also includes general sizing guidelines applicable to any solution type.

These guidelines are intended to assist in estimating the total OSM system requirements in very early stages of implementation, before OSM is installed. These guidelines do not contain express or implied warranties of any kind. After you install OSM and build a solution, you must do performance tests to validate whether the hardware selected is enough for production order volumes. For more information, see "OSM Pre-Production Testing and Tuning."

The hardware recommendations below cover only the OSM part of the solution. For all recommendations a day is considered to be 10 hours. The database storage service time is expected to be less than 5 milliseconds. Each managed server in the cluster has a total heap of just under 32 GB. The values in the tables are approximate.

Generally, the more complex your orders are, the fewer orders your OSM system can process per day, per managed server, and per database instance. To increase the number of orders you can process per day, you can either:

Simplify your orders, or
Configure additional managed servers, database instances, and hardware.

OSM COM Hardware Sizing Guidelines for RODOD Orders

You can use two models for RODOD COM sizing: simple and complex. They are different in number of sales lines and components, and, as a result, in the number of tasks and number of orders that can be completed per day.

Sizing Guidelines for Simple RODOD COM Orders

Table 3-1 shows sizing guideline assumptions for simple RODOD COM orders. One order contains:

20 automated tasks
five components
five sales lines

Table 3-1 Hardware Sizing Guidelines for Simple RODOD COM Orders

Deployment Size	Server	Small	Medium	Large
Orders/day	Not Applicable	<= 250,000	250,000 <= 1,000,000	1,000,000 <= 2,000,000
Reference server model	Application	Oracle Server X8-2	Oracle Server X8-2	Oracle Server X8-2
Number of servers (HA configuration in parentheses)	Application	1 (2)	1 (2)	2 (3)
CPUs	Application	1 Intel(R) Xeon(R) Gold 5222 3.8 GHz (4 cores)	2 Intel(R) Xeon(R) Platinum 8260 2.4 GHz (24 cores)	2 Intel(R) Xeon(R) Platinum 8260 2.4 GHz (24 cores)
RAM (DDR4, GB)	Application	64	192	192
Internal disk space (GB)	Application	2x600 SAS-3 HDD (RAID1)	2x600 SAS-3 HDD (RAID1)	2x600 SAS-3 HDD (RAID1)
Number of WebLogic Server managed servers (HA configuration in parentheses)	Application	2 (4)	4 (8)	8 (12)
Shared storage IOPS (for failover purposes), total for all nodes	Application	5000	17500	35000
Reference server model	Database	Oracle Server X8-2	Oracle Server X8-2	Oracle Server X8-2
Number of servers (HA configuration in parentheses)	Database	1 (2)	1 (2)	2 (3)
CPUs	Database	1 Intel(R) Xeon(R) Gold 5222 3.8 GHz (4 cores)	2 Intel(R) Xeon(R) Platinum 8260 2.4 GHz (24 cores)	2 Intel(R) Xeon(R) Platinum 8260 2.4 GHz (24 cores)
RAM (DDR4, GB)	Database	32	128	128
Database storage IOPS, total for all nodes	Database	10000	35000	70000

Sizing Guidelines for Complex RODOD COM Orders

Table 3-2 shows sizing guideline assumptions for complex RODOD COM orders. One order contains:

40 automated tasks
15 components
20 sales lines

Table 3-2 Hardware Sizing Guidelines for Complex RODOD COM Orders

Deployment Size	Server	Small	Medium	Large
Orders/day	Not Applicable	<= 50,000	50,000 <= 200,000	200,000 <= 400,000
Reference server model	Application	Oracle Server X8-2	Oracle Server X8-2	Oracle Server X8-2
Number of servers (HA configuration in parentheses)	Application	1 (2)	1 (2)	2 (3)
CPUs	Application	2 Intel(R) Xeon(R) Gold 5222 3.80 GHz (4 cores)	2 Intel(R) Xeon(R) Platinum 8260 2.40 GHz (24 cores)	2 Intel(R) Xeon(R) Platinum 8260 2.40 GHz (24 cores)
RAM (DDR4, GB)	Application	128	128	128
Internal disk space (GB)	Application	2x600 SAS-3 HDD (RAID1)	2x600 SAS-3 HDD (RAID1)	2x600 SAS-3 HDD (RAID1)
Number of WebLogic Server managed servers (HA configuration in parentheses)	Application	2 (4)	4 (8)	8 (12)
Shared storage IOPS (for failover purposes), total for all nodes	Application	5000	17500	35000
Reference server model	Database	Oracle Server X8-2	Oracle Server X8-2	Oracle Server X8-2
Number of servers (HA configuration in parentheses)	Database	1 (2)	1 (2)	2 (3)
CPUs	Database	2 Intel(R) Xeon(R) Gold 5222 3.80 GHz (4 cores)	2 Intel(R) Xeon(R) Platinum 8260 2.40 GHz (24 cores)	2 Intel(R) Xeon(R) Platinum 8260 2.40 GHz (24 cores)
RAM (DDR4, GB)	Database	64	128	128
Database storage IOPS, total for all nodes	Database	10000	35000	70000

OSM SOM Hardware Sizing Guidelines for RSDOD Orders

Table 3-3 shows sizing guideline assumptions for RSDOD SOM orders. One order contains:

20 automated tasks
Five components
10 sales lines

Table 3-3 Hardware Sizing Guidelines for RSDOD SOM Orders

Deployment Size	Server	Small	Medium	Large
Orders/day	Not Applicable	<= 250,000	250,000 <= 1,000,000	1,000,000 <= 2,000,000
Reference server model	Application	Oracle Server X8-2	Oracle Server X8-2	Oracle Server X8-2
Number of servers (HA configuration in parentheses)	Application	1 (2)	1 (2)	2 (3)
CPUs	Application	1 Intel(R) Xeon(R) Gold 5222 3.80 GHz (4 cores)	1 Intel(R) Xeon(R) Platinum 8260 2.40 GHz (24 cores)	1 Intel(R) Xeon(R) Platinum 8260 2.40 GHz (24 cores)
RAM (DDR4, GB)	Application	64	192	192
Internal disk space (GB)	Application	2x600 SAS-3 HDD (RAID1)	2x600 SAS-3 HDD (RAID1)	2x600 SAS-3 HDD (RAID1)
Number of WebLogic Server managed servers (HA configuration in parentheses)	Application	2 (4)	4 (8)	8 (12)
Shared storage IOPS (for failover purposes), total for all nodes	Application	5000	17500	35000
Reference server model	Database	Oracle Server X8-2	Oracle Server X8-2	Oracle Server X8-2
Number of servers (HA configuration in parentheses)	Database	1 (2)	1 (2)	2 (3)
CPUs	Database	2 Intel(R) Xeon(R) Gold 5222 3.80 GHz (4 cores)	2 Intel(R) Xeon(R) Platinum 8260 2.40 GHz (24 cores)	2 Intel(R) Xeon(R) Platinum 8260 2.40 GHz (24 cores)
RAM (DDR4, GB)	Database	32	128	128
Database storage IOPS, total for all nodes	Database	10000	35000	70000

Simple Order Hardware Sizing Guidelines (Neither RODOD nor RSDOD)

Table 3-4 shows sizing guideline assumptions for simple (neither RODOD nor RSDOD) orders. One order contains:

Five automated tasks
Five components
10 sales lines

Table 3-4 Hardware Sizing Guidelines for Simple Orders

Deployment Size	Server	Small	Medium	Large
Orders/day	Not Applicable	<= 2,000,000	2,000,000 <= 5,000,000	5,000,000 <= 10,000,000
Reference server model	Application	Oracle Server X8-2	Oracle Server X8-2	Oracle Server X8-2
Number of servers (HA configuration in parentheses)	Application	1 (2)	1 (2)	2 (3)
CPUs	Application	1 Intel(R) Xeon(R) Gold 5222, 3.80 GHz (4 cores)	1 Intel(R) Xeon(R) Platinum 8260, 2.40 GHz (24 cores)	1 Intel(R) Xeon(R) Platinum 8260, 2.40 GHz (24 cores)
RAM (DDR4, GB)	Application	64	192	192
Internal disk space (GB)	Application	2x600 SAS-3 HDD (RAID1)	2x600 SAS-3 HDD (RAID1)	2x600 SAS-3 HDD (RAID1)
Shared storage IOPS (for failover purposes), total for all nodes	Application	500	1000	2000
Number of WebLogic Server managed servers (HA configuration in parentheses)	Application	2 (4)	4 (8)	8 (12)
Reference server model	Database	Oracle Server X8-2	Oracle Server X8-2	Oracle Server X8-2
Number of servers (HA configuration in parentheses)	Database	1 (2)	1 (2)	2 (3)
CPUs	Database	1 Intel(R) Xeon(R) Gold 5222, 3.80 GHz (4 cores)	2 Intel(R) Xeon(R) Gold 5222, 3.80 GHz (4 cores)	1 Intel(R) Xeon(R) Platinum 8260, 2.40 GHz (24 cores)
RAM (DDR4, GB)	Database	32	128	128
Database storage IOPS, total for all nodes	Database	3000	6500	13000
Number of Oracle RAC nodes (HA configuration in parentheses)	Database	1 (2)	1 (2)	2 (3)

General Hardware Sizing and Configuration Recommendations

The following sections provide general hardware sizing and configuration recommendations.

OSM Installer and Application Server System Sizing

Ensure you have a minimum of 20 GB of available disk space for installing and deploying all OSM required packages, creating a domain, and deploying the OSM application.

Application Server Hardware Sizing

Your cluster should be equally distributed across the number of Oracle RAC database instances that OSM uses. For example, if your order volume requirements mandates that you need three managed servers and two Oracle RAC database instances, you must round up the number of managed servers to four so that each database instance has an equal number of managed servers. Each managed server should have a total heap of just under 32 GB of memory. Oracle recommends that you use the managed server startup parameters and memory configuration specified in "Configuring Managed Server Startup Parameters."

Running Multiple WebLogic Servers on the Same System

You can run multiple WebLogic servers in a cluster on the same system so as to maximize the use of available resources while keeping the heap size of the associated JVM instances to a reasonable level. Ensure that you limit the number of JVM instances based on the number of available processors.

Shared Storage for the WebLogic Server

In an OSM high-availability system architecture, if transaction logs and JMS messages are not persisted in JDBC stores, the WebLogic Server requires high-performance disk storage to support whole server migration. Whole server migration enables a managed server that fails on one system to migrate and startup on another system. Various files must be installed on shared storage across WebLogic managed server instances such as persistent stores.

See "Understanding Whole Server Migration for High Availability" for information about the WebLogic Server files that must be on shared storage to support the whole server migration functionality. All other files can be on the local file system where the managed servers or administration server are running.

Note:

Other shared storage file configurations are possible depending on your business requirements. For more information about best practices for WebLogic Server on shared storage, see the following article on the Oracle Technology Network website:

http://www.oracle.com/technetwork/database/availability/maa-fmwsharedstoragebestpractices-402094.pdf

Use RAID 1+0 (normal redundancy) backed shared storage for installing WebLogic Server, creating the OSM domain, and deploying applications and server log files. Ensure that you have 5 GB for JMS persistent file stores. This requirement might be higher, depending on the design and the order volume.

Database Hardware Sizing

For the database, plan to have at least 500 GB of free disk space for the OSM schema. The OSM schema size is likely to be higher, depending on the design and the order volume, and you must plan for this during hardware sizing. The size of the OSM schema depends many factors such as the number orders you process per day, the duration you must retain the orders, and so on.

Shared Storage for the Database

In an OSM high-availability system architecture, the Oracle database requires high-performance shared storage to support database failover operations. In an Oracle RAC configuration, all data files, control files, and parameter files are shared for use by all Oracle RAC instances.

A high-availability storage solution that uses one of the following architectures is recommended:

Direct Attached Storage (DAS), such as a dual ported disk array or a Storage Area Network (SAN).
Network Attached Storage (NAS).

In terms of I/O characteristics, OSM performs a large amount of database writes compared to database reads. OSM is highly sensitive to writes performance of the Oracle Database. Ensure that the write response times of the storage do not exceed 5ms under the load.

You should also consider backup and restore hardware and software that supports mirror split and re-mirroring such as the Oracle ZFS Storage Appliance. Oracle recommends such mirroring software and hardware because the backup and restore functionality is rapid and can be done online. This functionality is especially important when performing upgrades or when purging database partitions and can reduce the length of maintenance windows or the time it takes to recover from errors. For more information about backing up and restoring OSM files and data, see OSM System Administrator's Guide.

RAID Recommendations for the Database

Redundant Array of Independent Disks (RAID) disk storage technology increases disk storage functions and reliability through redundancy. The use of RAID with redundant controllers is recommended to ensure there is no single point of failure and to provide better performance on the storage layer.

The database is sensitive to read/write performance of the redo logs and should be on a RAID 1, RAID 1+0, or no RAID at all because logs are accessed sequentially and performance is enhanced by having the disk drive head near the last write location.

See the I/O Tuning with Different RAID Configurations (Doc ID 30286.1) knowledge article on the Oracle support website for additional information:

https://support.oracle.com

Understanding Order Affinity

The following section provides information about load balancing.

About Order Affinity and Ownership in an OSM WebLogic Cluster

When an OSM managed server receives a new order, OSM assigns a unique order ID to the order. OSM associates the order ID to the receiving managed server instance name within the cluster. Throughout the order fulfillment life cycle, OSM processes this order only with the associated managed server. This OSM principle is called order affinity and ensures order data integrity and performance by preventing multiple managed server instances from processing the same order. The server instance that has control of an order owns the order. OSM routes all requests relating to the order to the owner instance.

Order ownership is transferable. OSM can transfer an order to another managed server in the following scenarios:

If an order becomes a high-activity order, OSM can redistribute the order from the receiving managed server to another less-active managed server to better balance the load between each server in the cluster (see "Distribution of High-Activity Orders" for more information).
If an incoming order is a revision order that arrives on a managed server different from the one processing the base order, OSM transfers order ownership so that the same managed server owns both the base order and the incoming order.
If the incoming order has a dependency on an order owned by a server instance other than the one on which it was received. For example, a follow-on order that has a dependency on another order would be routed to the server where the previous order was processed.
Before redistribution of an order to a new or different server instance, that server instance notifies other server instances to complete pending operations on the orders to be redistributed and delete them from their order cache.

Note:

The reassignment of orders can temporarily impact Oracle RAC database performance when order ownership changes as the OSM WebLogic cluster resizes.
If a managed server is added or removed from a cluster, OSM notifies all server instances about topology changes to the cluster and re-runs the distribution algorithms that determine which server instance owns an order. Order ownership either remains with the previous owner or with a different owner.

Note:

The user will be logged out of the web client and will have to log back in, if you have a WebLogic Server cluster, and the following conditions apply:

a user is viewing an order in the Order Management web client or Task web client
that order is hosted on a managed server that fails or is shut down

About Load Balancing for OSM and Order Affinity

Load balancing helps maximize server resource use and order throughput. It enables OSM to minimize server response time and processing delays that can occur if some servers are overloaded while others remain unused. Load balancing helps support rolling downtimes of servers for maintenance tasks or upgrade procedures without impacting clients during non-peak times.

For OSM, two types of incoming messages are important for load balancing:

Load balancing for JMS over T3 or T3S. Inbound JMS messages to OSM can include:
- OSM Web Service requests, such as Create Order requests from a CRM that initiates an OSM order
- JMS messages responding to an OSM automation, such as a response to an automation plug-in JMS request messages to external fulfillment systems
Load balancing for HTTP and HTTPS. Inbound HTTP and HTTPS messages to OSM can include:
- OSM Web Service requests transmitted over HTTP and HTTPS, such as CreateOrder requests from a CRM that initiates an OSM order
- OSM web client interactions, including the Task web client and Order Management web client
- XML API requests from external system

Note:

OSM automations often use the XML API function calls while processing orders within a local server instance. However, OSM typically uses the XML API locally on the same server instance, because the XML API is often used to manipulate the same order owned by the local instance.

For JMS messages, OSM uses the Oracle WebLogic Server JMS distributed destinations for load balancing. See "JMS Distributed Destinations" for more information. You do not need to load balance JMS messages using an external load balancer.

For HTTP and HTTPS messages, Oracle recommends using a software or hardware load balancer outside of the OSM WebLogic cluster.

Load balancing for OSM Web Service requests is important because the OSM order affinity functionality requires that the orders are distributed appropriately among each managed server within the cluster. A managed servers that receive an order becomes the owner of the order. See "About JMS Load Balancing Schema Options" for more information about JMS load-balancing options and see "About HTTP and HTTPS Load Balancing and Session ID Configuration" for more information about HTTP and HTTPS load balancing options.

About the Performance Differences Between JMS and HTTP or HTTPS

In some order affinity scenarios, OSM must forward the requests from a receiving managed server to the owner managed server, such as when a CRM system sends a revision order and OSM receives the order on a managed server that is not the owner of the base order. This process is different depending on whether the message is delivered over JMS or over HTTP or HTTPS.

If the CRM sends the revision order over JMS, OSM re-directs the request to the owner instance. The managed server that originally received the order no longer participates in any way in the processing of the order.

If the CRM sends the revision order over HTTP or HTTPS, OSM forwards the request to the owner managed server over the internal JMS messaging queues. However, the receiving managed server must continually maintain a socket connection to the HTTP client or load balancer that sent the revision order even though another managed server is responsible for processing both the revision order and the base order. The socket connection on the receiving server must remain open until a response is generated because HTTP messages are synchronous. This restriction adds a performance overhead when sending orders over HTTP or HTTPS and increases with the size of the WebLogic Server cluster, because the probability of a message, like a revision order, arriving at the server that owns a particular order decreases as the ownership of orders is spread across more servers.

Given the above limitation, as well as the advantage of the transactional reliability of JMS message processing, Oracle recommends using the OSM Web Services over JMS messages for external client communication. Use HTTP and HTTPS messages for the OSM Order Management web client and the OSM Task web client because human interaction with these clients are synchronous by nature.

About Order Affinity and Ownership in an Oracle RAC Database

WebLogic multi data sources support XA affinity for global transactions, which ensures that all the database operations for a global transaction performed on an Oracle RAC cluster are directed to the same Oracle RAC instance. However, XA affinity cannot span different global transactions on the same data, which is a key performance requirement for OSM. The objective is to minimize the adverse impact on performance caused by database buffer cache transfers between Oracle RAC instances. Therefore, OSM supports order affinity, which means that database operations for different global transactions on the same order are normally directed to the same Oracle RAC instance. Overall, Oracle RAC instances process database operations simultaneously, but each instance operates on a subset of orders mutually exclusive to each other.

OSM order affinity works in the following way:

Each OSM server interacts with Oracle RAC through a WebLogic Server multi data source configured for failover with two data sources (one for each Oracle RAC instance). This setup is used for both active-passive and active-active topologies. Under normal conditions each OSM server always interacts with a single Oracle RAC instance. In an active-active topology, load balancing is achieved by reversing the order of the data sources in the multi data source for half of the OSM servers.
If the Oracle RAC database is configured with server-side load balancing (the Oracle RAC instances register with a remote listener process), server-side load balancing must be overridden as discussed in "Remote Listener Considerations."
Under normal conditions, the ownership and processing of each order is pinned to a single OSM server in a cluster. Because each OSM server interacts with a single Oracle RAC instance through the primary data source of its multi data source, all database operations for each order are directed to the same Oracle RAC instance. If ownership of an order is transferred to another OSM server (for example, when the cluster resizes or the order becomes a high-activity order), the processing of that order will be pinned again to the new OSM server.

Planning the Network Infrastructure

The following sections provide information about planning your network infrastructure.

Planning Network IP Addresses

The WebLogic Server cluster must have the following:

A multicast IP address and port for the WebLogic Server cluster. Use any IP address between 224.0.0.0 and 239.255.255.255.
IP addresses for each server in the cluster. If you are using whole server migration, the IP addresses must be available for node manager to dynamically allocate as floating IP addresses for the managed servers in the cluster.

The Oracle RAC database must have the following:

Three IP addresses that resolve to the same SCAN host name
Each Oracle RAC database instance must have a public and private IP address with corresponding host names.

Planning Bi-Directional Network and Firewall Access

Because JMS messages are transmitted in the context of JTA transactions, ensure that the WebLogic Server client always has bi-directional network and firewall access to every OSM WebLogic managed server. If you send a message to a distributed destination, for example, the JTA coordinator used by the WebLogic Server client must be able to communicate with every managed server. Having only partial access to the managed servers in the OSM cluster can lead to inconsistent message states.

Network Latency Between WebLogic Server and the Database

To attain the fastest possible network connections, Oracle recommends that the physical servers for the WebLogic Server and Oracle Database be in the same network segment. The performance of OSM is sensitive to network latency between the WebLogic Server and the database.

Oracle recommends connecting the OSM and database servers with a minimal number of network devices in between. The switch connecting the network devices should be 10 GB capacity. This hardware configuration should produce an optimal network latency between 0.2 and 0.4 msec. Network latency above 1 msec can cause performance degradation.

Network Latency and NFS Configuration for WebLogic Server Shared Storage

The usual latency requirement from storage is service time of less than 5 msec. You must decide the IOPS (input/output per second) requirement based on hardware sizing.

For more information about recommended parameters for different levels of NFS mount robustness, see Mount Options for Oracle files when used with NFS on NAS devices (Doc ID 359515.1) on the Oracle support website at:

https://support.oracle.com

While this KM note was written for Oracle RAC, it provides a useful overview of various combinations of NFS parameters that are also appropriate for WebLogic Server shared storage.

Operating System Planning

Install OSM Server on UNIX or Linux systems for production environments. OSM server can be installed on UNIX, Linux, or Windows system for development, demonstration, and non-performance test systems.

If you plan to use Oracle Communications Design Studio on a Windows system, you should also install the SDK Tools component on the Windows system. If you plan to generate reports using the command line utility of the OSM Reporting Interface, you must install the SDK Tool component.

Database Planning

The following sections provide information about planning your database for the OSM system. In addition to the database planning information provided in this chapter, review the following:

Creating Tablespaces: For more information about options and recommendations for creating tablespaces for OSM schemas, see "Tablespace and Schema Considerations for OSM Production Systems."
Using Partitioning: For an overview of partitioning in OSM and a discussion about the benefits and pitfalls of partitioning, see OSM System Administrator's Guide. Oracle strongly recommends partitioning in all production deployments or production test environments, particularly those with high order volumes or any volume of large or complex orders. Moreover, partitioning is required if you plan to use active-active Oracle RAC.
Order Purge Strategies: For more information about order purge strategies, see OSM System Administrator's Guide. You must decide on an order purge strategy before doing performance testing and before going into production.
Sizing Partitions: For more information about sizing partitions for order data, see OSM System Administrator's Guide. Partition sizing depends on your order purge strategy.
Cartridge Management Strategy: For more information about cartridge management strategies, see OSM System Administrator's Guide.
Online and Offline Maintenance: For more information about online and offline maintenance operations, see OSM System Administrator's Guide.
Database Management Procedures: For more information about recommendations for managing your production database, see "Checking for Database Management Procedures."

Oracle RAC Database Active-Active Deployments

At the database-server layer, use Oracle RAC in the active-active high-availability topology for system test and production systems. In active-active Oracle RAC, all active instances simultaneously process database operations. In addition to load balancing, active-active Oracle RAC can also provide high availability if the physical database server of one Oracle RAC database instance is dimensioned to handle all of the database load upon failure of the other Oracle RAC database instance.OSM supports Oracle RAC through the use of WebLogic multi data sources. To optimize performance, OSM uses order affinity as described in the following section.

Note:

OSM supports an Oracle RAC configuration of two or more nodes. OSM also supports Oracle RAC One Node.

Database Partitioning

During the installation, you specify if you need to partition the OSM database, and you provide partition sizes. Oracle strongly recommends using partitioning for production databases and production test databases.

You can change the values that you selected during the installation process. However, those updates do not affect existing partition sizes.

For information about partition sizes, see OSM System Administrator's Guide.

Database Failover with Oracle RAC

In an Oracle RAC configuration, all data files, control files, and parameter files are shared for use by all Oracle RAC instances. When a database instance fails, performance may be temporarily affected, but the database service continues to be available.

The use of WebLogic multi data source and JMS minimize the impact of a database failure in the following ways:

The JDBC multi data source fails over to the secondary data source.
In-flight WebLogic transactions are driven to completion or rolled back, based on the state of the transaction at the time of the failure. Because JMS messages are redelivered, most failed transactions are automatically retried (upon redelivery) on the secondary data source. This does not apply to failed web services and HTTP requests (for example, failed createOrder requests must be resubmitted).

Database Failover with Oracle RAC One Node

For Oracle RAC One Node, there is only one instance active at a time. Therefore, a stand-alone data source using the SCAN address (without Instance Name) ensures that all OSM managed servers communicate with the same database instance while still allowing for automated failover.

Install OSM as if using a non-RAC DB using the SCAN address (without instance name). OSM treats an Oracle RAC One Node database as if it were a non-RAC database and lets the database and SCAN listener handle failover.

For RAC One Node, a sample data source URL is:

jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=host1)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=OSM)))

Notice that INSTANCE_NAME is not provided, because the SCAN listener chooses the instance when a connection is made based on the current failover state of the database.

Note:

The considerations described in "Listener Considerations for Oracle RAC" are not applicable to Oracle RAC One Node.

Listener Considerations for Oracle RAC

In an Oracle RAC environment, the database listener process establishes the connections between the JDBC data source of a WebLogic Server instance and an Oracle RAC instance.

To enable the listener functionality, Oracle recommends that you use remote listeners. Remote listeners are also known as SCAN listeners. With this option, each Oracle RAC instance is configured to register with a remote listener that may or may not be in the same physical server. There is no "remote listener only" scenario: local listeners must be running for the remote listener to work properly. When a request comes in, the remote listener redirects it to the local listener depending on what instance it is connecting to.

When configuring JDBC data sources, you must be aware of your listener process setup. The OSM installer will automatically configure your JDBC data sources based on the listener process considerations discussed in the following sections. These considerations apply to both active-active and active-passive topologies.

Remote Listener Considerations

By default, server-side load balancing is configured when using remote listeners. That is, the remote listener decides how to forward connection requests based on the load of the Oracle RAC instances. OSM active-active configurations require that server-side load balancing be overridden. To achieve this, the OSM installer includes the INSTANCE_NAME parameter (the SID of a specific instance) in the JDBC URL of each member data source, in addition to identifying the database service by name.

For example, the following data source URLs include both INSTANCE_NAME and SERVICE_NAME:

jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=host1)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=OSM) (INSTANCE_NAME=SID1)))

jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=host1)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=OSM) (INSTANCE_NAME=SID2)))

In the example, the host and port are for the SCAN listener, the service name is the same, and the instance names are different.

The OSM installer will automatically set up the URL of each JDBC data source in the WebLogic Server instances' multi data source. However, if you choose to manually configure additional Oracle RAC instances, you must populate the SID in the JDBC URL of the member data sources in WebLogic Server. See "Manually Configuring Additional Data Sources for an Oracle RAC Instance" for details.

Local Listener Considerations

When configuring local listeners, consider the following:

Each database instance should be configured to register only with its local listener.
Oracle instances can be configured to register with the listener statically in the listener.ora file or registered dynamically using the instance initialization parameter local_listener, or both. Oracle recommends using dynamic registration.
A listener can start either a shared dispatcher process or a dedicated process. Oracle recommends using dedicated processes.

WebLogic Server Planning

An OSM instance consists of an administration server, and a cluster of managed servers. Clustering ensures continuous availability of your OSM server and improves performance by enabling load balancing, scalability, and failover. You may choose to use the clustering feature in OSM if:

You want to minimize unexpected system downtime.
Your order volume is very high and cannot be sustained with a single WebLogic Server instance or physical host.

OSM supports the following load balancing:

Load balancing for JMS messages: The native WebLogic load balancing options for Java Messaging Service (JMS) messages help OSM maximize server resource use and order throughput. Load balancing also enables OSM to minimize server response time and processing delays that can occur if some servers are overloaded with orders while others remain unused. Load balancing allows rolling downtimes of servers without client impact, as long as enough servers are up and running.
Load balancing for HTTP and HTTPS messages: In addition to the native WebLogic support for load balancing JMS messages, Oracle recommends installing a software or hardware HTTP load balancer for balancing incoming HTTP or HTTPS messages.

To ensure high availability, the load balancing mechanisms (both the native WebLogic JMS load balancing or the HTTP load balancer) forward messages to other managed servers if one of the managed servers fails. Orders that were being processed by the failed server are delayed until that server is either restarted or migrated.

Understanding the WebLogic Cluster Configuration

Recommendations for OSM cluster include the following:

Messaging mode: Oracle recommends that you use multicast messaging mode when setting up OSM. For more information about using multicast or unicast, see "About the WebLogic Messaging Mode and OSM Cluster Size."
Load balancing: Set up clusters to use the random algorithm.

Table 3-5 includes a summary of the recommendations for OSM cluster.

Table 3-5 Configuration Recommendations for OSM Cluster

Configuration Item	Value
Cluster Messaging Mode	multicast
Default Load Algorithm	Random

About Cluster Domain Management

Oracle recommends the following best practices when configuring managed server instances in your clustered OSM domain.

Configure Node Manager to automatically restart all managed servers in the domain.
Configure all managed server instances to use MSI, which is the default. This feature allows the managed servers to restart even if the administration server is unreachable due to a network, hardware, or software failure. See Oracle Fusion Middleware Managing Server Startup and Shutdown for Oracle WebLogic Server for more information.

About the WebLogic Messaging Mode and OSM Cluster Size

The WebLogic Server cluster messaging mode enables cluster members to remain synchronized and provides the foundation for other WebLogic Server functions such as load balancing, scalability, and high availability.

In an OSM cluster, the messaging mode can be multicast or unicast. Oracle recommends using multicast in most OSM installations because multicast is generally more reliable.

In some cases, unicast may be the only option. For example, multicast can only work over a single subnet. If a single subnet is not possible due to technological or IT policy reasons, or if the network's multicast transmission is not reliable, then unicast messaging mode becomes the best option. If you must use unicast, ensure that you apply the WebLogic Server patches that resolve the currently known unicast issues. See "Software Requirements" for patch information.

Do not use unicast for a cluster with more than 20 managed servers. Enabling a reliable multicast network for WebLogic Server multicast messaging mode is the only option for such large cluster sizes. The broadcast nature of multicast over UDP packets works better in such large clusters than one-to-one TCP unicast connections between each pair of managed servers.

You can use unicast for cluster sizes between 10 and 20 managed servers, but consider multicast if you begin to experience poor performance or reliability issues.

About Coherence and Unicast

Oracle recommends unicast mode for Oracle Coherence. The OSM cluster performance and robustness are sensitive to the synchronization of cached data maintained by Coherence. The inherently unreliable packet delivery with UDP in multicast transmission may destabilize cache synchronization, and errors can be difficult to troubleshoot. As a result, Oracle does not recommend using Coherence in multicast mode.

Understanding the Administration Server

The administration server operates as the central control entity for the configuration of your OSM WebLogic domain.

The failure of the administration server does not affect the operation of managed servers in the domain. Furthermore, the load balancing and failover capabilities supported by the domain configuration remain available. However, it does prevent you from changing the domain's configuration, including loss of in-progress management and deployment operations and loss of ongoing logging functionality.

Oracle recommends the following best practices when configuring the administration server in your OSM WebLogic domain:

The administration server should not participate in a cluster. Ensure that the administration server's IP address is not included in the cluster-wide DNS name.
Start the administration server using Node Manager to ensure that the administration server restarts in the event of a failure. (If the administration server for a domain becomes unavailable, the managed servers in the domain will periodically attempt to reconnect to the administration server.) Dot not deploy OSM to the administration server in a production system.

For additional clustering best practices, see Oracle Fusion Middleware Using Clusters for Oracle WebLogic Server.

You may also consider transforming the administration server in an existing domain for cold cluster failover.

In this active-passive topology, the administration server is installed on Node1 and then transformed to a shared disk. In the event of failure, it will be failed over to Node2. The administration server domain_home resides on a shared disk that is mountable by both Node1 and Node2 but is mounted by either one of the two at any given point in time. The listen address of the administration server is a virtual IP.

See the chapter on active-passive topologies in Oracle Fusion Middleware High Availability Guide for full details.

Understanding Node Manager Configuration

Node Manager is a Java utility that runs as a separate process from the WebLogic Server and allows you to perform common operations for a managed server, regardless of its location with respect to its administration server. The Node Manager process is associated with a machine. Thus each physical server has its own Node Manager, which can control all server instances that reside on the same machine as the Node Manager process.

Consider the following guidelines when using Node Manager:

Run Node Manager as an operating system service on UNIX platforms, allowing it to restart automatically when the system is restarted.
Set the AutoRestart attribute of the administration server and each managed server to true to allow Node Manager to automatically restart it in the event of failure, depending on the exit code (if the exit code is less than 0, the server is not restarted and you must diagnose the problem).
Do not disable the Managed Server Independence (MSI) mode for a managed server (enabled by default). MSI allows Node Manager to automatically restart a managed server after failure even when the administration server is unavailable.
To ensure that Node Manager properly restarts servers after a system crash (for example, an operating system crash), you must do the following:
- Ensure that CrashRecoveryEnabled is set to true. This property is disabled by default.
- Start the administration server using Node Manager. You cannot use Node Manager to start a server instance in MSI mode, only to restart it. For a routine startup, Node Manager requires access to the administration server.
- Start all managed servers using the administration server. You can accomplish this using the WebLogic Server Scripting Tool command line or scripts or the Administration Console. A widespread practice is to start managed servers using a shell script.

See Node Manager Administrator's Guide for Oracle WebLogic Server for more information.

Understanding JMS Messaging

Recommendations for JMS Messaging include the following:

For production systems, Oracle recommends that JMS message persistence be configured to use a JDBC store rather than a file store. WebLogic added support for the use of JDBC stores for tlogs in Fusion MiddleWare 12cR1. From an operational perspective, creating JDBC stores for JMS and tlogs on the same database instance used for the OSM schema allows for atomic backups thereby enabling consistent restoration of overall application state. Note that the use of JDBC stores increases database CPU utilization and storage I/O operations; this needs to be accounted for in infrastructure planning. The exact magnitude of the cost increase varies depending on infrastructure and solution characteristics.

For optimal performance, for systems using a file store, Oracle recommends that you configure Direct-Write-With-Cache, if this option is supported in your environment. For information about the best practices for configuring a WebLogic file store, see the chapter about using the WebLogic persistent store in Oracle Fusion Middleware Configuring Server Environments for Oracle WebLogic Server.
In a clustered environment, WebLogic uses load balancing to distribute the workload across clusters. For OSM, set the load balancing policy for distributed JMS queues to Random. For more information about WebLogic JMS Random distribution, see the chapter about configuring advanced JMS system resources in Oracle Fusion Middleware Configuring and Managing JMS for Oracle WebLogic Server.
WebLogic supports the Store and Forward (SAF) service for reliable delivery of messages between distributed applications running on different WebLogic Server instances. It is recommended that you set the Conversation Idle Time Maximum on SAF agents to a positive value to allow messages to be forwarded to other active members when the original target is down or unavailable. For more information about the WebLogic SAF service, see the chapter about understanding the SAF service in Oracle Fusion Middleware Configuring and Managing Store-and-Forward for Oracle WebLogic Server. Oracle recommends that you use SAF to integrate OSM with Oracle Communications ASAP, Oracle Communications IP Service Activator, and Oracle Communications Unified Inventory Management (UIM). For more information about this post OSM installation task, see "OSM Integration with External Systems."

JMS Distributed Destinations

JMS destinations, which may be JMS queues or JMS topics, serve as repositories for messages. A JMS destination provides a specific end point for messages, which a JMS client uses to specify the target of messages that it produces and the source of messages that it consumes. For example, OSM automation plug-ins can specify the JNDI names of the JMS queue and the JMS reply-to queue to produce and consume messages with external systems.

A distributed destination is a single set of destinations that are accessible as a single, logical destination to a client (for example, a distributed topic has its own JNDI name). The members of the set are typically distributed across multiple servers within a cluster, with each member belonging to a separate JMS server. When deployed to a cluster, OSM uses distributed destinations because JMS provides load balancing and failover for the members of a distributed destination in a cluster. For performance reasons, the server affinity is enabled on the connection factory to give preference to local destination members.

Note:

OSM does not support uniform distributed destinations (UDDs), which are the default type of distributed destination in WebLogic. OSM supports only weighted distributed destinations (WDDs). When configuring distributed destinations in the WebLogic Server Administration Console, select Weighted for Destination Type to configure the distributed destination as a WDD.

Note:

Messages sent to JMS distributed destinations will always be delivered to member queues. However, messages delivered to a member queue can get stuck in the event of a server failure. In that case, messages cannot be consumed until either the WebLogic server is restarted or the JMS server is migrated.

Cluster and Single-Server Queues

Multiple queues are created automatically when OSM is installed. When OSM is installed to a cluster, additional queues are provided for added efficiency in processing when OSM is in a WebLogic cluster.

If your development systems have been installed onto a single-server instance of WebLogic Server, ensure that your client systems are updated to use the queues appropriate to a cluster. For more information about the queues that are installed with OSM, see the discussion of OSM installed components in OSM System Administrator's Guide.

About WebLogic Server JMS T3 and T3S Load Balancing

WebLogic Server T3 and the secure T3S variant are transport protocols for application-level services that OSM uses for communication between client applications and the WebLogic Server. OSM typically communicates messages using T3 or T3S to distributed JMS destination for:

OSM Web Service XML/SOAP messages, like an OSM CreateOrder Web Service request from a CRM that initiates an OSM order
OSM automations that receive messages from external fulfillment systems
OSM internal event handling. For example, oms_events_queue can be used for triggering data change notifications on an order.

The T3 protocol is fully implemented by the WebLogic Server and its client libraries. T3 client libraries support clustered server URLs for the initial context lookup enabling native WebLogic support for load balancing. The WebLogic Server cluster then manages load distribution based on system availability and the selected load balancing schema. See the WebLogic documentation for more information about creating a WebLogic client for communicating JMS messages to OSM over T3.

Figure 3-2 shows a JMS client with a JNDI properties file configured with the URLs for each managed server within a cluster that the client is sending messages to. WebLogic Server load balances these messages using the URLs based on a load balancing schema (see "About JMS Load Balancing Schema Options" for more information). OSM typically processes an OSM order on the managed server that receives the order, but in some cases OSM uses the managed server queues internally to redistribute orders to other managed servers after an order has been received on a managed server (see "About Order Affinity and Ownership in an OSM WebLogic Cluster" for more information).

Figure 3-2 JMS Load Balancing

Description of "Figure 3-2 JMS Load Balancing"

About JMS Load Balancing Schema Options

At the WebLogic Server cluster level, you can select between three load balancing schemas for JMS load balancing. WebLogic supports only one load balancing schema selection per cluster even if the cluster hosts multiple applications. The load balancing schema you select effects OSM both on its external and internal messaging interfaces such as incoming messages from an external system or messages exchanged between managed servers within the cluster.

The following lists the WebLogic Server load balancing options:

Round-robin: OSM distributes the JMS messages evenly across the managed servers in the cluster by periodically circulating the list of available managed servers. Each server is treated equally.
Random-based: Before routing a JMS message, the random-based load balancing schema generates a random number and selects one of the candidate servers as the message destination based on the random number.
Weight-based: If the OSM cluster consists of managed servers hosted on systems with varying hardware resource capacity, you can assign load balancing weights to each WebLogic Server instance.

Oracle recommends that you use random-based load balancing. The OSM Installer automatically configures random based load balancing.

Understanding Whole Server Migration for High Availability

The OSM automation framework uses WebLogic Java Message Service (JMS) to support messaging between automation components and external systems. In addition, upstream systems can access OSM web services with JMS as one of the transport protocols. Thus, it is critical that JMS be highly available. JMS high availability is achieved by using JMS distributed destinations (see "JMS Distributed Destinations") as well as planning for whole server migration in the event of failure.

WebLogic Server migration is the process of moving a managed server instance elsewhere in the event of failure. In the case of whole server migration, the server instance is migrated to a different physical machine upon failure. Whole server migration is the preferred and recommended approach because all JMS-related services are migrated together.

Note:

JMS service migration is not supported with OSM, because using JMS service migration could result in the JMS server not running on the same machine as the managed server to which it is dedicated. This would cause issues similar to those that would arise if server affinity was not configured on the default OSM JMS connection factory.

WebLogic Server provides migratable servers to make JMS and the JTA transaction system highly available. Migratable servers (clustered server instances that migrate to target servers) provide for both automatic and manual migration at the server level, rather than at the service level.

When a migratable server becomes unavailable (for example, if it hangs, loses network connectivity, or its host machine fails), migration is automatic. Upon failure, a migratable server is automatically restarted on the same machine if possible. If the migratable server cannot be restarted on the machine where it failed, it is migrated to another machine. In addition, an administrator can manually initiate migration of a server instance.

The target server for the migration can be a spare server on which Node Manager is running. This server does not participate in the cluster until a migratable server is migrated to it.

Another option for the target server is a server that is hosting a WebLogic Server instance. In the event of failure, the migratable server will be migrated to it, resulting in the two instances (which now run on the same server) competing for CPU, memory, and disk resources. In this case, performance could be impacted.

Before you configure automatic whole server migration, be aware of the following requirements:

All servers hosting migratable servers are time-synchronized. Although migration works when servers are not time-synchronized, time-synchronized servers are recommended in a clustered environment.
To ensure file availability, use a disk that is accessible from all machines. If you cannot share disks between servers, you must ensure that the contents of domain_home/bin are copied to each machine.
Ensure that the user account that runs the managed servers can work without a password prompt.
Ensure that the user account that runs the managed servers have execute privilege on the /sbin/ifconfig and /sbin/arping binaries that are involved in creating floating IP address.
Use high-availability storage for state data. For highest reliability, use a shared storage solution that is itself highly available; for example, a storage area network (SAN). For more information, see Oracle Fusion Middleware Using Clusters for Oracle WebLogic Server.
For capacity planning in a production environment, keep in mind that server startup during migration taxes CPU utilization. You cannot assume that because a machine can handle a certain number of servers running concurrently that it also can handle that same number of servers starting up on the same machine at the same time.

For additional requirements, see Oracle Fusion Middleware Using Clusters for Oracle WebLogic Server.

Managing WebLogic Transactions

Transactions are a means to guarantee that database changes are completed accurately. The WebLogic Server transaction manager is designed to recover from system crashes with minimal user intervention and makes every effort to resolve transaction branches with a commit or roll back, even after multiple crashes or crashes during recovery.

To facilitate recovery after a crash, the WebLogic Server Transaction Recovery Service automatically attempts to recover transactions on system startup. On startup, the Transaction Recovery Service parses all transaction log records for incomplete transactions and completes them as described in Oracle Fusion Middleware Programming JTA for Oracle WebLogic Server.

Oracle recommends the following guidelines:

If a server crashes and you do not expect to be able to restart it within a reasonable period of time, you can migrate either the whole server or the Transaction Recovery Service to another server in the same cluster. The transaction log records are stored in the default persistent store for the server. If the default persistent store is a file store (the default), it must reside in a shared storage system that is accessible to any potential machine to which a failed migratable server might be migrated. See "Persistent Store: JMS File Store and JDBC Store" for high-availability considerations.
Configure server instances using DNS names rather than IP addresses. A server instance is identified by its URL (IP address or DNS name plus the listening port number). Changing the URL by moving the server to a new machine or changing its listening port on the same machine effectively moves the server, so the server identity may no longer match the information stored in the transaction logs. Consequently, any pending transactions stored in the transaction log files will be unrecoverable. This is also critical if firewalls are used to avoid address translation issues.
First, attempt to restart a crashed server and allow the Transaction Recovery Service to handle incomplete transactions (rather than move it to a new machine). However, if the server exited with a code less than 0, do not attempt to restart it unless you diagnose the problem. In this case, the server did not terminate in a stable condition; for example, due to invalid configuration.

See Oracle Fusion Middleware Programming JTA for Oracle WebLogic Server for more information.

Persistent Store: JMS File Store and JDBC Store

WebLogic's persistent store provides a built-in, high-performance storage solution for WebLogic Server subsystems and services that require persistence. For example, it can store persistent JMS messages or temporarily store messages sent using the Store-and-Forward feature. The persistent store supports persistence to a file-based store or to a JDBC-enabled database. The persistent store is important for OSM because it stores all of the JMS messages from the JMS service.

There is a trade-off in performance and ease of backup when choosing between JMS file store and JDBC store:

JMS file store provides better performance than JDBC store. However, you cannot perform an online backup of OSM data consistent with the JMS file store. You must first shut down OSM and then back up the JMS file store and the database at the same time. Otherwise, inconsistent message states and database states may result, and the backup cannot be used to restore OSM.
The benefit of JDBC store is that online database backups can obtain consistent snapshots of both OSM data and JMS messages. For more information, see Persistent Store Configuration & Operational Considerations for JMS, SAF & WebLogic tlogs in OSM [Doc ID: 2469767.1] knowledge article on the Oracle support website at: https://support.oracle.com.
In an Oracle Communications environment where ASAP, IP Service Activator, or UIM is running in the same OSM WebLogic domain, the JDBC store may yield a more consistent backup strategy across the domain and may outweigh performance considerations. However, you cannot take a consistent backup of OSM because the data is distributed across the database and file system.

To realize high availability for the file store, it should reside on shared disk storage that is itself highly available, for example, a storage area network (SAN).

If you choose JMS file store, Oracle recommends that you configure one custom file store for each managed server.

Persistent Store: TLog File Store and JDBC Store

Each managed server is associated with a transaction log (TLog) store. For production OSM systems, Oracle recommends replacing the Default Store, which is a file-based store, with a JDBC store. In a RAC environment, TLog JDBC stores can share a common multidata source configured with load balancing or a common GridLink data source if this option is licensed in your environment.

For more details about using a JDBC TLog Store, see the chapter about using a JDBC Store in Oracle Fusion Middleware Administering the WebLogic Persistent Store.

Understanding Hardware or Software HTTP and HTTPS Load Balancing Options

Oracle recommends that you use an HTTP Server to load balance HTTP and HTTPS messages for OSM clusters in production environments that require high availability for HTTP messages (for example, for the OSM web clients or for OSM messages over HTTP).

Note:

These recommendations only apply to HTTP and HTTPS messages. JMS messages should be load balanced using the WebLogic Server native support for JMS T3 and T3S load balancing (see "About WebLogic Server JMS T3 and T3S Load Balancing").

For development systems or if you do not require high-availability for HTTP traffic in a production system, you can create a managed server and use it as an HTTP proxy that is managed by the WebLogic Server Administration Console, but remains outside of the cluster.

You can also consider a hardware load-balancing solution for load balancing HTTP and HTTPS messages. A hardware load balancer can use any algorithm supported by the hardware, including advanced load-based balancing strategies that monitor the utilization of individual machines. If you choose to use hardware to load balance HTTP and HTTPS sessions, the hardware must support a compatible passive or active cookie persistence mechanism and SSL persistence.

The following lists possible software load-balancer options for OSM HTTP and HTTPS messages:

Oracle HTTP Server
Oracle WebLogic Server proxy plug-ins for the following standard web server solutions
- Oracle iPlanet Web Server (7.0.9 and later)
- Apache HTTPD 2.2.x
- Microsoft Internet Information Services 6.0 and 7.0
Dedicated software load-balancing solutions like Oracle Traffic Director. Oracle recommends this option for running OSM with Oracle Exalogic and Oracle SuperCluster.

The following lists possible hardware load-balancer options for OSM HTTP and HTTPS messages:

F5 Big-IP
Cisco ACE

About HTTP and HTTPS Load Balancing and Session ID Configuration

Round-robin load balancing for HTTP and HTTPS messages is the only supported option for software load balancers because WebLogic does not propagate managed server weights externally.

Note:

The WebLogic cluster load balancing schema options (see "About JMS Load Balancing Schema Options") have no effect for load balancing HTTP messages because an HTTP load balancer is outside of the cluster.

When running OSM in a cluster, you must enable the proxy-plug-in option at the cluster level as opposed to the managed-server level, otherwise session drops may occur and login to the OSM web clients may not be possible.

You must ensure that the chosen HTTP load-balancing solution supports WebLogic session IDs and custom HTTP headers. All WebLogic plug-ins, including the Oracle HTTP Server, support sticky sessions, but they do not support session ID failover if the server the ID is connected to fails.

About Oracle Coherence

Oracle Coherence plays a major role in providing grid services to OSM. OSM employs the Coherence Invocation service as well as local and distributed Coherence caches. The Coherence Invocation service is a feature of the Coherence Application Edition or Grid Edition.

Oracle Coherence must be configured to avoid conflicts in a clustered OSM environment. See the WebLogic documentation for guidelines and best practices.