Oracle Warehouse Management Planning for High Availability and Performance

This chapter covers the following topics:

Introduction

Warehouses and distribution centers are the pivotal points in the supply chain that facilitate the balance between supply and demand. System performance and availability is an important consideration in the implementation of a warehouse management systems as any interruption may cause a ripple effect in the supply chain.

This chapter focuses on the system and network design to optimize performance and availability of the Oracle WMS application. The recommendations provided should be tailored to suit specific business requirements and rolled back into the overall deployment plan.

Overview

The goal of any Information Technology (IT) organization is to provide its customers with smooth and uninterrupted access to applications and system resources. It is therefore very important to minimize disruptions due to planned and unplanned downtime.

Because it is unpredictable, unplanned downtime is of primary concern to IT. Software and hardware related failures are the most likely to occur and account for 49% of all unplanned downtime. Human error, typically due to lack of training or inadvertent operator error ranks second, representing about 36% of unplanned downtime.

Although unlikely, disasters happen but account for only 3% of all unplanned downtime.

Planned downtime is periodic in nature and is necessary to provide good system performance and continuous operation. Regular maintenance activities include backups, upgrades, and data re-organization. The challenge of the IT organization is to minimize planned downtime.

Although traditionally not considered a part of system availability, system performance can be regarded as a type of Unplanned Downtime, in that it has a similar effect in reducing productivity. Here, the lost time is not in a contiguous block, but rather spread out as unproductive waits between user interactions. Often, this can be more frustrating to the user than a total downtime. Studies indicate that productivity is inversely proportional to response time (as response time reduces, productivity increases). Indeed the productivity of an expert user with poor (5 seconds or more) response time reduces to that of a novice user with good (below 1 or 2 seconds) response time. This is particularly true of “execution” type applications such as Oracle Warehouse Management.

Achieving both high availability and good performance usually requires an investment in a robust IT infrastructure. This includes both capital expenditures on hardware and networks, and in the management and operation of those resources.

This chapter covers some of the high availability tools that Oracle provides and how they can be applied to provide a reliable warehouse management application. In addition, it makes recommendations for the organization and sizing of the hardware and network resources to deliver good performance.

Causes of Downtime
Unplanned Downtime Planned Downtime
System Failures: Power outage, system crash Routine operations: upgrades, patches
Data failures: Data corruption Routine Maintenance: backups, defragmentation
Human Error: Deleted data, Administrator error  

Security

Warehouse Management Architecture

Warehouse Management is completely integrated with Oracle ERP and Supply Chain Management applications. It is a part of the e-Business suite and therefore provides all the functionality and tools of a regular e-Business installation. A typical Oracle Warehouse Management installation consists of the following components:

Warehouse Management Availability

System Failures

System faults and crashes are some of the most common causes of unplanned downtime. System faults are the result of hardware failures, power failures, and operating system or server crashes. The amount of disruption incurred depends upon the number of affected users, and the speed at which service is restored. High availability systems are designed to quickly and automatically recover from failures, should they occur. Oracle9i Real Application Clusters provides a highly resilient, fault-tolerant, flexible and scalable architecture that can virtually eliminate any unplanned downtime. A Oracle9i Real Application Cluster consists of a set of nodes (machines) that share a common set of disks. The nodes of the cluster are tightly integrated to provide a cooperative of servers. In the event of a node failure, users are transparently switched to another node in the cluster. The cluster performs a load balancing function and nodes can be added without any disruption to the cluster. Applications need not be modified to take advantage of this architecture. Oracle provides lower cost alternatives to Real Applications Cluster. They focus primarily on fast recovery from system faults, rather than fault-tolerance.

Media failures are another major cause of system failures. It is strongly encouraged to use RAID technology to protect against these types of failures. In the event of a disk failure, the RAID storage system automatically re-creates the lost data using its redundancy algorithms. Users of the system are unaware of the failure, although they may experience a slower response.

Oracle Real Application Cluster

the picture is described in the document text

Data Failures

It is extremely important to design a solution to protect against and recover from data and media failure. A system or network fault may prevent users from accessing data, but media failures without proper backups can lead to lost data that cannot be recovered. Oracle provides advanced features that guarantee that data will remain intact in the event of a media failure.

Oracle Recovery Manager helps IT managers create proper backup and restore procedures.

In addition, Oracle offers Data Guard, a comprehensive set of data protection and recovery tools.

Human Error

Many studies on system availability point to human error as one the major causes of unplanned downtime. A recent survey by the Disaster Recovery Journal estimated that some 36% of unplanned downtime is due to human error. Human errors include accidents (e.g. deleting important data), unintended outcomes (e.g. an action that monopolizes system resources), and even sabotage. The real challenge with human error lies in identifying the impact of the error then taking the fastest route to recovery. The Oracle Database arms administrators with the tools they need to quickly diagnose and recover from human error.

One of the major new features of Oracle Database is Flashback Query. Flashback Query enables administrators to view and reconstruct data that may have been accidentally deleted or changed.

Planned Downtime

In today's environment, businesses are rarely presented with a window during which users are not affected by system availability. Planned downtime can be as disruptive for users as unplanned downtime.

As the volume of stored data becomes increasingly large, so does the time required to perform regular maintenance operations. The Oracle Database has been designed to ensure these routines and maintenance operations can be carried out with little or no downtime at all. The Oracle Recovery Manager can make full and incremental backups of data while the database server is online and users are querying or updating data.

Another important feature is Partitioning. Partitioning of data with the Oracle Database enables administrators to divide large tables up into smaller more manageable chunks without having to change any underlying application code. This allows maintenance tasks to be performed at the smaller partition level, allowing the bulk of the data to remain unaffected during maintenance.

Users can take advantage off all the features mentioned above to protect their Oracle Warehouse Management instance against unwanted prolonged downtime.

RF Network Availability

The RF network is the network that supports the hand-held devices in the warehouse. Most suppliers of RF networks provide high levels of network redundancy. Oracle strongly recommends redundant RF networks, as disruptions to the network will result in disruptions in the warehouse operation. Please contact your RF hardware vendor (Intermec, Symbol, LXE, and so on) to discuss these capabilities.

Printer Availability

Of all the components listed above, printers are the most likely to experience problems. Backup printers should be available and ready to be brought on line in case of malfunction of the primary printers.

Most printer control software solutions provide a simple way to re-assign an IP address to a different printer. The IP address of the malfunctioning printer can be quickly re-assigned to the backup printer with minimal disruption to the warehouse operation.

Effect Of Failures On Warehouse Management Availability

As described above, the Oracle WMS consists of several interrelated components. Failure of any component in this architecture will prevent users from accessing some pieces of the WMS functionality. User access to WMS functionality can be broken down into two parts:

Configuration and Control Functions

These functions involve set-up tasks such as creating organizations, users, rules, and locators, as well as management tasks such the Materials Workbench and the Control Board. These functions are accessible through the desktop using a standard web browser. Users have access to these functions while the WMS instance and the Wide Area Network are running.

Warehouse Operator Functions

These functions involve all material movement tasks, such as picks and put away. These functions are only accessible through mobile devices. Users have access to these function as long as all components are running (the database instance, the MWA Server, the Wide Area Network, and the RF network).

Effects of Failure to Users in WMS
System Component Failure Prevention Effect of Failure on WMS Availability
Oracle ERP instance Oracle Real Application Cluster
Oracle Fail-safe
WMS and desktop applications are unavailable.
MWA application server Oracle Real Applications Cluster
Multiple MWA servers on multiple machines
WMS application is unavailable through mobile devices. Desktop access to applications is available through a browser.
Wide Area Network Redundant Wide Area Network WMS and desktop applications are unavailable.
RF network for mobile devices Redundant access points and base stations WMS application is unavailable through mobile devices. Desktop access to applications is available through a browser.
Printers Good maintenance program
Backup printers
Unable to print LPNs and shipping documents.

Performance

Oracle WMS performance is primarily dependent on the following components of the architecture:

MWA Application Server

The MWA Application Server is written in java and acts a the middle-tier for the hand-held devices. It communicates through Telnet with hand-held devices and displays the screens and data to the users. To provide high availability, the MWA Application Server could be installed in a Oracle Real Application Cluster, or it could be installed on multiple non-clustered nodes. In a non-clustered configuration, Oracle recommends running the MWA server on a separate machine from the WMS database instance. However, these machines should be co-located (housed in the same data center) to provide users with optimal response time.

MWA is shipped with a dispatcher that provides a mechanism for load balancing as well as redundancy. When the user attempts to connect to a MWA server, the Dispatcher routes the connection to the proper MWA server following a round robin algorithm.

The dispatcher should be started from the Oracle Concurrent Manager. The Concurrent Manager will automatically re-start the dispatcher in the unlikely event that the Dispatcher experiences any downtime.

The RF devices should be configured to connect to the Dispatcher, NOT to the different instances of the MWA server.

To maintain high performance the MWA servers should be reset on a periodic basis. This will clear memory and reset the Java VM. The reset should happen during warehouse shift changes (8 or 12 hours). The reset should also be transparent to end users. For example, in a 3 shift warehouse operation the following reset schedule could be implemented via Shell/batch scripts:

  1. At 6:30 a.m. start the MWA dispatcher

  2. At 6:35 a.m. start MWA server instances MWA1, MWA2, and MWA3 for the 7:00 am shift. The dispatcher will load balance all new connections between the 3 WMA server instances

  3. At 3:45 p.m., gracefully shutdown MWA server instances MWA1, MWA2, and MWA3 and start up MWA server instances MWA4, MWA5, and MWA6 for the 4:00 p.m. shift. This will cause the dispatcher to stop sending any new connections to the MWA1, MWA2, and MWA3 instances. As new shift operators connect to the dispatcher, they are routed to the MWA4, MWA5, or MWA6 instance. Once all users logged into MWA1, MWA2, and MWA3 have logged off, these server instances are signaled to shutdown.

  4. At 11:45 p.m., gracefully shutdown MWA server instance MWA4, MWA5, and MWA6 and start up MWA server instance MWA1, MWA2, and MWA3 for the 12:00 am shift.

  5. At 6:30 a.m. repeat the process.

Wide Area Network

Today, most WANs provide T1 or fractional-T1 speeds and are highly resilient. Network latencies of 300 milliseconds are very attainable even in international networks. Disruptions are rare and of short duration. Wide Area Networks are typically found in the form of leased lines or private customer networks. Network suppliers can provide a guaranteed quality of service (uptime and speed) based on your needs. Oracle recommends using dedicated leased lines as the primary channel of communication between the data centers and the warehouse.

There are more affordable alternatives to dedicated leased lines and private networks. Telecommunications companies are investing heavily into IP Virtual Private Networks (IP-VPNs) to provide small businesses and remote offices with low cost connectivity. Several categories of VPN services are available: dial-up, DSL, Frame Relay, and so on. They are differentiated by the Quality of Service offered by the service provider. Unlike Leased Frame Relay or T1 lines that use dedicated circuits, IP VPNs use the service provider's IP backbone. The network infrastructure is managed by the service provider and you only need a router at their premises. VPNs do not typically provide the same quality of service, performance, and predictability as dedicated leased lines. Oracle does not recommend using VPNs as the primary vehicle for data transmission for mission critical applications such as WMS, however, VPNs are a relatively cheap means of providing redundancy to a leased line WAN. You can lease a DSL VPN and only use it as a standby backup to the primary leased line network. In the unlikely event that the leased line experiences problems, critical network traffic can be diverted to the VPN.

Network Traffic

Since most of the network traffic takes place between the MWA server and the Oracle WMS instance, Oracle recommends co-locating the MWA server with the Oracle WMS instance in the data center, and accessing the application from the warehouse over a wide area network (WAN). The network traffic between the hand-held devices and the MWA server (at the data center) is much smaller than that between the MWA server and the database. It consists of data entered by users and data to be displayed on the screen and control characters to paint the screen. The average size of packets sent from the MWA server to the mobile device is less than 500 bytes and consists of control characters and regular ascii characters to paint the screen. The average size of packets sent from the mobile device to the MWA server is less than 200 bytes. It consists of the data entered/scanned by the user in each of the fields. This volume of traffic is far less than that generated by any equivalent browser based or traditional client/server application.

As a result, the principle consideration for achieving good performance on the hand-held device is how many packets need to be sent rather than packet size, and how long each packet takes to send. The MWA server and WMS application have been architected to minimize the number of packets generated by the user interaction with the screens. This leaves network latency rather than total bandwidth as the principle consideration for good response – though adequate bandwidth is required to eliminate any possible queuing delays.

Character vs. Block Mode

The MWA server provides the function of an application server as well as a Telnet server. Handheld devices are configured to act as Telnet clients. Most suppliers provide Handheld devices that can operate in two modes:

Character mode is the default Telnet operation mode. In character mode, any character entered by the user on the terminal (in this case the handheld device) is sent to the Telnet server (in Oracle WMS, this is the MWA server) and echoed back for display on the terminal screen. This results in a large number of packets being sent for each field. For example, if the user enters (or scans) the purchase order number: PO-ABC-12345, each character (P, O, -, A...) is sent in a separate packet to the Telnet server and echoed back to the handheld, resulting in 3 packets for each character (the character from the handheld, the Echo from the Telnet server and the Acknowledge from the handheld) for a total of 36 packets. Due to the dependency between the number of packets and user response time, this increase in the number of packets due to standard Telnet Character mode is undesirable.

In contrast, in block mode, all the characters are sent in the same packet to the Telnet server when the user presses the Enter key. For the purchase order above, PO-ABC-12345, this would result in only 3 packets being transmitted (the characters from the handheld, the Echo from the Telnet server and the Acknowledge from the handheld).

Whenever there is significant network latency (over 200ms), Oracle recommends using handheld devices in Block Mode. This will highly improve the user experience.

There are some minor differences from a user prospective when using Block mode as opposed to Character mode:

To illustrate the nature of the traffic between the handheld devices and the MWA server, this document shows the TCP/IP packets generated by a typical Oracle WMS receiving transaction.

The handheld device used in this test is an Intermec-2415 and was configured to operate in both Character and Block Mode. A network analyzer was used to capture packets exchanged between the MWA server and the handheld device.

The three data entry screens shown below are the screens the user interacts with on the handheld device to complete a typical receiving transaction.

The following tables show the Number of packets and their distribution according to packet size for both Block mode and Character mode:

Block Mode

Block Mode
Source Node Destination Node Packets Bytes
Handheld device MWA server 62 4216
MWA Handheld device 46 8406
Block Mode - Packets
Packet Size (bytes) Number of Packets % of Total Packets
0-64 2 2
65-127 86 79.5
128-255 8 7.5
256-511 12 11
> 511 0 0

Character Mode

Character Mode
Source Node Destination Node Packets Bytes
Handheld device MWA server 84 5712
MWA server Handheld device 62 9184
Character Mode - Packets
Packet Size (bytes) Number of Packets % of Total Packets
0-64 11 7.5
65-127 116 79.5
128-255 8 5.5
256-511 11 7.5
> 511 0 0

For this receiving transaction example, the character mode generates 35% more traffic than block mode (146 vs. 108 packets).

The difference in the number of packets exchanged increases as the number of character entered in each data entry field increases. For Oracle WMS, character mode will typically generate between 30% and 90% more traffic.

Network Traffic Priority

Prioritizing traffic, improves network latency by giving higher priority to traffic destined to specific IP targets (e.g. and MWA server).

For example, if typical network latency is around 500 milliseconds and varies widely, a high priority can be assigned to the telnet traffic between the MWA server and the routers that support the mobile devices. In most environment this will provide a consistent network latency that is considerably faster than non prioritized traffic. It is very likely that the latency would drop to about 350 milliseconds. This in turn results in improved response time on the mobile devices.

Network Performance Summary

Minimizing network latency is a key factor for ensuring good performance. A reasonable target is 300ms or better across the WAN. In situations where network latency is over 200 ms, system administrators can use Block mode as well as prioritize the Telnet traffic between the MWA server and the handheld devices.

Summary

Oracle Warehouse Management is a comprehensive warehouse management solution. It is an integral part of the Oracle e-Business suite and benefits from all the system management features available to the e-Business suite. Oracle recognizes the need for companies to be “open for business” 24 hours a day, 7 days a week, and has invested heavily in developing technologies that provide high levels of performance and availability. You should carefully consider these features when designing a WMS deployment. Close attention should also be paid to a redundant infrastructure for LANs, WANs, and RF networks to minimize disruptions to the business.