Capacity Planning Process

The capacity planning process involves several activities. The following sections describe these activities:

Design the WLI Application

The following are some of the performance-related design issues that architects and developers should keep in mind while designing WLI applications:

Invoking JPDs

You must use only process controls (not service controls) to invoke subprocesses. Service controls are recommended for invoking only web services and JPDs in a different server or cluster. Using process control does not incur the overhead of a web service stack, unlike service control which has more CPU and input/output cost.

Process control callbacks vs. message broker subscriptions

Process control callbacks are faster than message broker subscriptions because process control callbacks are routed directly to the JPD instance. Message broker subscriptions with a filter involve database access to map the filter value to the process instance.

Note:

Dynamic subscriptions offer loose coupling. So you can use dynamic subscription instead of process control callback in design scenarios where loose coupling is required.

Persistence flags

By design, if your process becomes stateful and the operation does not require the state to be persisted in the database, consider changing the persistence flag to Never or Overflow.

Note:

Persistence set to Never or Overflow might not work properly in a cluster.

Accessing worklists through worklist APIs versus WLI JPD controls

Accessing worklists through worklist APIs is faster than accessing them through WLI JPD controls. However, controls are easier to use and program. Accessing the worklist directly through the API avoids the overhead of control runtime as controls internally use worklist APIs.

Accessing worklists through JPD controls is recommended in scenarios where process orchestration requires worklist access.
For plain worklist manipulations, worklist APIs work faster than JPD controls.

JPD state management

Stateless JPDs are executed in memory and the states are not persisted; therefore, they provide better performance than stateful JPDs. In a scenario where you do not need information about the previous state of a process, use stateless JPDs. There is no input/output cost for stateless JPDs as the state is not persisted. Stateless JPDs are inherently more scalable.

Callbacks over JMS

For an asynchronous process, if the callback location is a WLS JMS queue and is same for all instances of the process, WLI performance is affected under high load conditions.

Synchronous vs. asynchronous process

An asynchronous process has some latency cost. Consider a task that requires a few hundred milliseconds for completion.

If this task is designed as an asynchronous JPD, the overhead of the asynchronous processing infrastructure could increase processing time significantly.
If, on the other hand, it is designed as a synchronous JPD, the impact on processing time would not be significant.

In general, asynchronous processing provides tremendous value for lengthy processes. Synchronous processing, on the other hand, is better suited for tasks that are expected to take less time. However, there could be applications in which a synchronous process is blocking, which could be a significant bottle neck, that in turn causes the server to use many more threads than neccessary.

Tune the Environment

Performance of a WLI application depends not just on the design of the application, but also on the environment in which it runs.

The environment includes the WLI server, the database, the operating system and network, and the JVM. All of these components should be tuned appropriately to extract good performance from the system.

Tuning WLI

Appropriate settings should be made for parameters such as JDBC data sources, weblogic.wli.DocumentMaxInlinSize, process tracking level, B2B message tracking level, and Log4j. For more information, see WLI Tuning.

Tuning the Database

This includes defining settings for initialization parameters, generation of statistics, disk I/O, indexing, and so on. For more information about tuning the database, see Database Tuning Guide.

Tuning the Operating System and Network

Proper tuning of the OS and the network improves system performance by preventing the occurrence of error conditions. For more information, see “Operating System Tuning” in WebLogic Server Performance and Tuning.

Tuning the JVM

The JVM heap size should be tuned to minimize the time that the JVM takes to perform garbage collection and maximize the number of clients that the server can handle at any given time. For more information, see “Tuning Java Virtual Machines” in WebLogic Server Performance and Tuning.

Prepare the Application for Performance Testing

Certain minor changes may need to be made in the application for running the performance tests and for invoking the application through load generator scripts.

The extent of change depends on the nature of the application, capability of the load generator, and the outcome that is expected from the capacity planning process.

If you want to measure end-to-end performance (from process invocation to completion) of an asynchronous JPD without writing callback handlers in the load generator script, you can generate a SyncAsync WSDL for the process and use it in the load generator. This would make the load generator script simple, leaving all the complexity in the server; the focus remains on the server rather than on the load generator script.
If the JPD interacts with a third-party software, and if, you do not want to or cannot use that software (due to licensing issues or to avoid accounting for delay in calling the third-party software), you can use a simulator in place of the third-party software. This ensures that the test results give reliable information about the true performance and capacity of the WLI application.

Design the Workload

The quality of the result of any performance test depends on the workload that is used.

Workload is the amount of processing that the system is expected to complete. It consists of certain applications running in the system with a certain number of users connecting to and interacting with the system.

The workload should be designed so that it is as close to the production environment as possible.

In a production environment with many concurrent users, not all users may perform the same operation; the operations would vary depending on the users’ profile and interest.

In addition, users may require think time, which is the time required by users to think about possible alternatives and take decisions before triggering an action in the system.

A WLI application that has three types of clients – web services, JMS, and file – may, for example, have a user profile as shown in the following figure.

Figure 2-1 Sample User Profile

Sample User Profile

User behavior varies depending on the type of application. A user of a long-running process may not execute operations in quick succession. The behavior of a batch processing user, on the other hand, may be completely different.

Number of users at different load conditions (low, moderate, and peak)
Average number of concurrent users
Think time (time that users take to think before performing any action in the system)
Profile of the users (operations that they are expected to execute)
Peak message-arrival rate
Message size

Define the Unit of Work and SLA

A Service Level Agreement (SLA) is a contract – between the service provider and service consumer – that defines acceptable (and unacceptable) levels of service. The SLA is typically defined in terms of response time or throughput (transactions per second).

For systems of synchronous nature, the aim is to tune the system that achieves highest throughput while meeting the response time (realistically, for about 95% of the transactions). The response time thus becomes the SLA. For systems of asynchronous nature, throughput or messages per second is the SLA.

For capacity planning purposes, it is important to define the unit of work (that is, the set of activities included in each transaction), before using it to define the SLA.

Each node is a JPD. All of these JPDs are required for processing the purchase order. In this scenario, the unit of work (transaction) can be defined as either of the following:

Each JPD that should be executed to process the purchase order.
The entire flow of operations, starting from the business transport client to receiving a response from the warehouse processor.

It is recommended that the entire flow of business operations, rather than each JPD, be considered a single unit of work.

Design the Load Generation Script

A load generation script is required to load the server with the designed workload while running the tests.

While writing the load generation script, you should keep the following points in mind:

Ensure that each user sends a new request only after the previous request is served and a response is received.
If the system processes requests in batches, ensure that a new batch of requests is sent only after the previous batch is processed.

The load level becomes easier to understand and manage, when you limit each simulated user to a single in-flight request,. If the rate at which requests are sent is not controlled, requests may continue to arrive at the system even beyond the flow-balance rate, leading to issues such as queue overflow.

The following figure depicts a single user sending the next request only after the previous request is processed by the server.

With this approach, the arrival rate (load) on the system can be increased by increasing the number of concurrent users, without affecting the system adversely; therefore, the capacity of the system can be measured accurately.

The following figure depicts a single user sending new requests without waiting for the server to finish processing previous requests.

This approach could cause issues such as queue overflow and lead to misinterpretation of capacity.

Configure the Test Environment

The test environment should be configured as described in this section to ensure that the results of the tests are reliable and not affected by external factors.

When the tests are conducted, no other process must run in any of the test machines – load generator, WLI server, and database – apart from the processes that need to be tested. In addition, no other machine or external program must access or use the database machine.
OS tasks, such as automatic system update and scheduled jobs, must not interfere with the tests.
To prevent network-related issues (slow network or network traffic) from affecting test results, it is recommended that a VLAN be set up with all the machines involved in the tests. This ensures that the test machines are isolated from network traffic originating in or intended for machines that are not involved in the tests.
A network speed of 1000 Mbps is recommended for running the tests. The bandwidth must be monitored while the tests are running.

Run Benchmark Tests

Benchmark tests help in identifying system bottlenecks and tuning the system appropriately.

The tests involve increasing the load on the system, in gradual steps, till the throughput does not increase any further.

Utilization of one of the hardware resources has reached 100%, indicating that the particular resource has become a bottleneck.
Utilization has not reached 100% for any of the hardware resources, but the throughput has peaked, indicating that the system requires further tuning to make better use of the available hardware resources.

The following figure depicts a Mercury LoadRunner ramp-up schedule in which the initial 10 minutes are for warm-up tests with 10 concurrent users. Subsequently, the load is increased at the rate of 10 additional users every 15 minutes.

Application behavior

Concurrent user load
Response time
Throughput for each unit of work

Use tools such as Mercury LoadRunner and The Grinder for emulating users and capturing metrics.

Resource utilization

CPU utilization
Memory footprint
Network utilization
I/O utilization

If the load generation tool does not capture all of this data, use OS-specific utilities such as vmstat, iostat, and perfmon.

As users are added, the average TPS increases. When utilization of one of the hardware resources (in this case, CPU) reaches 100%, the average TPS peaks. The response time at this point is the optimal result. When further users are added to the system, the TPS starts diminishing. TPS rises in a near linear fashion as an increasing load is applied until the system is saturated due to a CPU or input/output constraint, then it levels off and begins to fall. Response times rise in a near linear fashion until the saturation point is reached, and then increases in a non-linear fashion.

This pattern of results indicates a system and typical behavior where resources are utilized to the maximum.

The next activity in the capacity planning process is to validate the results of the benchmark tests.

Validating the Results Using Little’s Law

Before analyzing the test results, you must validate them using Little’s Law, to identify bottlenecks in the test setup. The test results should not deviate significantly from the result that is obtained when Little’s Law is applied.

The response-time formula for a multi-user system can be derived by using Little's Law. Consider n users with an average think time of z connected to an arbitrary system with response time r. Each user cycles between thinking and waiting-for-response; so the total number of jobs in the meta-system (consisting of users and the computer system) is fixed at n.

n is the load, expressed as the number of users; z + r is the average response time, and x is the throughput.

Tips for Thorough Validation of Results

It is recommended that you run benchmarks for at least thirty minutes, and preferably longer.
Discard at least the first fifteen minutes of data obtained under load to allow dynamic parts of the system a one time warm up, and applications, database caches, and connection pools to be populated. For example, allow:

The JVM to compile hot spots to native code (the Sun JVM), or to optimize the native co de at hot spots (JRockit).
The JVM to tune the garbage collector (JRockit dynamically picks the GC strategy, and dynamically tunes serveral key parameters).
Application and database caches, connection pools to be populated.

Watch for the variability of individual response times and ensure that irregular effects are not skewing the results, for example, if an occasional operating system job is running in the middle of a test run. A recommended quality metric is the standard deviation of response times divided by the mean response time. Monitor this value for repeated runs, and check that it is reasonably stable.

Interpreting the Results

While interpreting the results, take care to consider only the steady-state values of the system. Do not include ramp-up and ramp-down time in the performance metrics.

When the throughput saturates, utilization of a resource – CPU, memory, hard disk, or network – must have peaked. If utilization has not peaked for any of the resources, analyze the system for bottlenecks and tune it appropriately.

Tips for Analyzing Bottlenecks and Tuning

To identify potential bottlenecks related to disk I/O, check whether the disk idle time percentage is low or average disk queue length is consistently high.

If this is the case, check for I/O activities caused by the application or WLI, and try to reduce the number of such activities.
If this is not the case, then faster hard disks may be required.

Monitor I/O activities in the database machine.
To identify memory-related bottlenecks, monitor the heap utilization and garbage collection time.
In addition, look for the following:

Long run-queue for the processor: If this is the case, try reducing the load.
High utilization of network bandwidth: If this is the case, upgrade the network speed.

If bottlenecks are observed in the load generator machine, try using multiple load generator machines.

If no resource bottlenecks exist at the point when throughput saturates, bottlenecks could exist in the application and system parameters. These bottlenecks could be caused by any of the following:

Check whether any of the tuning parameters described in WLI Tuning could make a difference.
Ensure that the application is designed according to the best practices. For more information, see Best Practices for WebLogic Integration.
Use profiling tools such as ej-technologies’ JProfiler, and Quest Software’s jProbe to detect CPU or memory issues related to the WLI application. It is recommended that you use JProfiler as jProbe is generally considered too heavy to use under load.

Database

Ensure that the database is tuned appropriately. For more information, see Database Tuning Guide.
Look for database performance issues by using tools such as statspack and Oracle Performance Manager.

OS and network

Ensure that the operating system and network parameters are tuned appropriately. For more information, see WebLogic Server Performance and Tuning.

JVM options

JVM parameters can have a significant impact on performance. For more information, see “Tuning Java Virtual Machines” in WebLogic Server Performance and Tuning.
Use tools such as Sun Java JConsole and BEA JRockit Runtime Analyzer to detect issues related to the JVM.

Run Scalability Tests

A system can be considered scalable, when adding additional hardware resources consistently provides a commensurate increase in performance. Such a system can handle increased load without degradation in performance. To handle the increased load, hardware resources may need to be added.

Applications can be scaled horizontally by adding machines and vertically by adding resources (such as CPUs) to the same machine.

Horizontal and Vertical Scaling

The following table compares the relative advantages of horizontal and vertical scaling:

Table 2-1 Relative Advantages of Horizontal and Vertical Scaling
Vertical Scaling (More resources in a single machine)	Horizontal Scaling (More machines)
Facilitates easy administration. Improves manageability. Provides more effective interconnection between system resources.	Offers high availability. No scalability ceiling.

When an application needs to be scaled, you may opt for horizontal scaling, vertical scaling, or a combination, depending on your requirements.

Horizontal scaling is often appropriate for organizations that decide to invest in low-cost servers initially and acquire more machines as the load increases. However, it involves additional costs for load balancing and system administration. An additional advantage of Horizontal scaling is that it suffers less from scalability ceilings that are expensive to address. With Vertical scaling, you cannot scale past the point that your machine is fully loaded without discarding it for another, larger machine.
Vertical scaling requires a high-end machine that allows easy addition of resources when the need arises.
A combination of horizontal and vertical scaling is sometimes the best solution; it helps organizations benefit from the relative advantages of both approaches.

The following figure shows a comparison between WLI running on a single non-clustered 4-CPU machine (vertical scaling) and on two clustered 2-CPU machines (horizontal scaling).

Performance in the horizontal scaling scenario (two 2-CPU machines) is slightly lower than in the vertical scaling scenario (single 4-CPU machine) due to additional load balancing and clustering overhead in the horizontal scaling scenario. However, you can add additional machines to increase the capacity of the horizontally scaled system. This is not possible with a vertically scaled system.

Conducting Scalability Tests

Scalability tests help you find out how the application scales when additional resources are added in the system – horizontally and vertically. This information is useful for estimating the additional hardware resources required for a given scenario.

The scalability test involves increasing the load, in gradual steps, till the SLA is achieved or the target resource utilization is reached, whichever occurs first.

For running scalability tests, the workload should be designed to emulate, as closely as possible, the production scenario. If no human user interaction is necessary and if the process invocations happen programmatically, it is recommended that you use a zero-think-time approach, similar to the approach for benchmark tests.

If the target resource utilization level is reached before the SLA is achieved, additional resources must be added to the system. The additional resources (vertical scaling) or machines (horizontal scaling) must be added in the order 1, 2, 4, 8, and so on.

All the data that was recorded while running benchmark tests must be captured while running the scalability test. For more information, see Run Benchmark Tests.

After running the test, validate and analyze them as described for benchmark tests, and then, if required, estimate the additional resource requirement as described in the next section.

Estimate Resource Requirement

A Capacity Plan helps estimate the current and future resource requirements for the current SLA and for future loads. You need to create the load model of the system in order to create a capacity plan.

The test results provide the data points to create this load model. You can derive an equation for the curve obtained from the test results, and use it to estimate the additional hardware resources that are required. Use techniques such as linear regression and curve fitting to predict the required resources. You can implement these techniques using spreadsheet applications such as Microsoft Excel.

The graph shows the average number of transactions per second (TPS) at 70% CPU utilization for clusters with varying number of nodes.

For the results of this scalability test, a linear equation is the best fit. In a best fit curve, R2 must approach unity (the value, 1).

The equation is y = 12.636x + 4.065, where y is the average TPS and x is the number of nodes.

Based on the results of the scalability tests and the tuning that is necessary for achieving the required results, you should configure the application for deployment to the production environment.

Capacity Planning and Performance Tuning