Deployment Guide: Determining System Capacity

Chapter 3 Determining System Capacity

This chapter explains how to measure Netscape Application Server (NAS) performance by determining system capacity. Determining system capacity is critical to a successful deployment of NAS. In this phase of deployment, you gather information about the level of activity that will occur on your server: you determine the number of users, the number of requests, expected response time, and expected application load.

Understanding system capacity is closely linked to understanding how NAS integrates into your current environment, a topic discussed in Chapter 2, "Planning Your Environment." You may find that certain decisions you made earlier are directly linked to system capacity. Read the current chapter carefully, and then, if necessary, revisit the information presented in Chapter 2, "Planning Your Environment," about planning your system topology. You may decide to adjust some of the decisions you made about NAS setup based on system capacity needs.

This chapter contains the following sections:

Establishing System Performance Goals

Developing a Capacity Profile

Developing an Application Profile

Developing a Failover Provision

Establishing System Performance Goals

At some point during the deployment process, decide what your system capacity goals are. Beyond the basic goals of improving response time and maximizing throughput, as explained in Chapter 2, "Planning Your Environment," you should establish specific goals.

To help you plan system capacity, you should develop a profile of your system's performance by determining the following information:

What capacity of requests, or throughput, can the system support?

How many concurrent users can the system support?

What is an acceptable average response time for requests submitted by your users?

What is the average think time between requests?

These factors are interrelated. If you know any three of these four pieces of information, you can always calculate the fourth.

Individual metrics described in this chapter can be calculated using a remote browser emulator (RBE) tool, or web site performance and benchmarking software, that simulates your enterprise's web application activity. Some examples of benchmarking tools include Benchmark Factory and WebBench. Typically, RBE and benchmarking products generate concurrent HTTP requests and then report back the response time and number of requests per minute. You can then use these figures to calculate server activity. For information about running performance tests with the help of benchmarking tools, see Chapter 4, "Performance Testing and Fine-Tuning Your System."

Note that the results of calculations described in the following section, "Developing a Capacity Profile," are not necessarily absolute. Treat them as reference points or benchmarks to work against as you try to fine-tune your server performance.

Developing a Capacity Profile

As your enterprise's NAS deployer, you have the responsibility of understanding your system and all its elements, including the applications you develop. Your system consists of a variety of elements, including the operating system, the network, web servers, back-end data sources, NAS servers, and your applications. The weakest link in this group will always be the bottleneck, so you need to pay attention to all of them when trying to improve capacity. You start by assessing the system's overall capacity and then identify where performance is being compromised.

This section explains how to assess overall system capacity by describing in detail the concepts of maximum number of concurrent users, peak load and peak capacity, steady state, response time, and requests per minute.

Defining Maximum Number of Concurrent Users To better understand capacity, determine the maximum number of concurrent users that your system can handle before performance starts to decline. A user constitutes one running web-browser process that periodically sends requests from a client machine via a web server to NAS. A user is "concurrent" for as long as he or she is on the system as a running web-browser process submitting requests, receiving results of requests from the server, and viewing the results of the requests.

In the following diagram, think of the horizontal line (points x to y) as system activity. A single user is a concurrent user from point A until point E. From points A to B the user's request is being processed by the system.

From points B to C the user is in think time mode, viewing the results of the request and deciding what action to take next. Realistically, a user does not submit requests continuously. A user submits a request, the server receives the request, processes it and then returns a result, at which point the user spends some time analyzing the result before submitting a new request. This time spent reviewing the result of a request is called "think time." The user is not performing any actions against the server, and thus a request is not being generated during think time. However, when you calculate the number of concurrent users, you include think time in the equation.

From points C to D, the system is again processing the request, with D being the point at which the result is returned to the user. From points D to E, the user is again in think time mode. At point E, the user exits the system, after which he or she is no longer a concurrent user.

Determining typical think time length is important because you can use it to calculate more accurately the number of requests per minute and the number of concurrent users your system can support. Essentially, when a user is on the system but not submitting a request, a gap opens for another user to submit a request without altering system load. This also means that you can support more concurrent users.

Eventually, in the course of system activity, as the number of concurrent users submitting requests increases, requests processed per minute begin to decline. The following diagram illustrates this situation:

You want to identify the point at which adding more concurrent users reduces the number of requests that can be processed per minute, as this indicates when performance starts to degrade.

Defining Peak Load and Peak Capacity Peak load refers to the maximum number of concurrent users and requests per minute that you expect the system to support, based on the pattern of activity that typically exists on your system. You may want your system to support a certain number of requests with a certain response time per request. This doesn't mean that your system can in fact handle this load: it might be able to handle the number of users, but only by processing their requests at a slower rate.

If you decide you want your system to handle the maximum number of concurrent users that typically access the system and submit requests, you may have to add server resources to achieve the desired response time. You may, instead, decide to use another design point, steady state load, which is explained on page 61. Whether you choose peak load or steady state load as your system capacity design point depends on several factors, including acceptable response time, all of which are described later in this chapter.

Peak capacity refers to the maximum number of concurrent users that the system can realistically sustain before requests per minute start to decline and response time starts to increase. Peak capacity may be more or less than peak load, and knowing the difference is critical to planning your system. You may find out at first that your system cannot handle the peak load. However, if you use peak load as a design point, you can do things to adjust aspects of your system and bring peak capacity to a level that is acceptable for your enterprise.

In any system, as the number of concurrent users rises, so does the number of requests per minute (or per second). Then, at a certain point, although more users are concurrently on the system submitting requests, the number of requests per minute starts to level off and remain constant, as the following diagram illustrates:

Point A represents the maximum number of concurrent users the system can handle before requests per minute start to level off. Point B represents peak capacity—the maximum number of users the system can support before performance starts to degrade. (Note that peak load is not represented in the diagram because it is not a function of the NAS performance curve, but rather a value defined by your system requirements.) Eventually, requests per minute start to decline, even though the number of concurrent users continues to rise. As the diagram above illustrates, this results in multiple users on the system but fewer requests being processed, a situation referred to as "thrashing." Thrashing occurs when the system is spending more time and resources managing queues and loads than actually processing requests.

Defining Steady State Steady state refers to the point at which the system reaches and maintains steady capacity, or a steady number of requests per minute, while the number of concurrent users continues to increase. In steady-state mode, your system is not processing at peak capacity; it is maintaining a steady capacity and continues to maintain this capacity, even as the number of concurrent users on the system rises.

In the ideal system, capacity levels off with a relatively low number of users, meaning that the system can handle a large number of requests submitted by a small number of users. The system's efficiency is further demonstrated by this large volume of requests per minute remaining at a steady state despite a continued increase in the number of concurrent users.

Here is how the ideal system looks graphically.

There is an initial steep slope upward in the number of requests per minute on the y-axis, to point A number of users on the x-axis. The distance and angle from point A to point B represents steady state: a steady increase in the number of concurrent users, without any change in the number of requests per minute. At point B, which represents peak capacity, 100 percent of system resources are being used, and any additional concurrent users causes requests per minute to decline. Ideally, when thrashing finally does occur at point B, the point at which system resources begin to be less efficient due to the number of users, the decline in requests per minute is gradual, as opposed to steep. Again, this graceful degradation in requests per minute indicates an efficient system: requests per minute decline slowly despite the continued increase in the number of concurrent users. Note that you should not design for any points past point B, as this represents an area of performance beyond capacity limits in which the system may react in an unpredictable manner.

Another way of analyzing steady state is in terms of response time, as you can see in the next graph.

During steady state, despite the added load of concurrent users, response time remains constant. For example, assume that the diagram represents a system in which steady state typically ranges from 0 to 100 concurrent users. Response time remains steady, despite the added load of users on the system. From point A to point B (1000 users), response time increases gradually. But after point B, which represents peak capacity, response time rises more dramatically as peak load is reached.

Designing for Peak Load or Steady State When deciding your capacity goals, look at two factors: peak load and steady state. If your design point is peak load, then you must deploy a system that can sustain the expected maximum load of users and requests without a degradation in response time. This means that your system can handle the extreme cases of expected system load.

If your design point is steady state, then you don't have to deploy a system with all the resources required to handle the sever's expected peak load, or even the peak capacity; however, a system designed to support up to steady state load will have slower response time when peak load occurs.

Importance of Response Time
Remember peak capacity may not be able to meet the demands of the desired peak load. However, even at peak capacity, the response time may be greater than the acceptable response time limit that you get at steady state. If 3 seconds is your limit or threshold, but the system at peak capacity processes requests at a response time greater than 3 seconds, then you must determine which areas of your system you'll adjust so as to reduce response time to the desired 3 second limit.

Frequency and Duration of Peak Load
Another issue that may affect whether or not you want to design for peak load or for steady state is how often your system is expected to handle the peak load. If peak load occurs several times a day or even per week, you may decide that this is enough time to warrant expanding the system's capacity to handle this load. If the system operates at steady state 90 percent of the time, and at peak only 10 percent of the time, then you may decide that you prefer deploying a system designed around steady state load. This means that 10 percent of the time your system's response time will be slower than the other 90 percent of the time. You decide if the frequency or duration of time that the system operates at peak justifies the need to add resources to your system, should this be required to handle peak load.

Determining Average Response Time As explained in Chapter 2, "Planning Your Environment," response time refers to the amount of time it takes for request results to be returned to the user. This figure is affected by a number of factors, including network bandwidth, number of users, number and type of requests submitted, and average think time. In this section, response time refers to mean, or average, response time. Each type of request has its own minimal response time, but when evaluating system performance, analyze based on the average response time of all requests.

The faster the response time, the more requests per minute are being processed. However, as the number of users on your system increases, response time starts to increase as well, even though the number of requests per minute declines due to thrashing, as the following diagram illustrates:

A system performance graph like the one above indicates that at peak capacity, requests per minute are inversely proportional to response time: the sharper the decline in requests per minute, the steeper the increase in response time (represented by the dotted line arrow).

You should always deploy with peak capacity in mind: determine the maximum number of concurrent users your system can support with an acceptable response time. In the above graph, point A represents peak capacity, the point at which requests per minute start to decline. Prior to this, response time calculations are not necessarily accurate because they aren't using peak numbers in the formula. After this point, because of the inversely proportional relationship between requests per minute and response time, you can more accurately calculate response time using the two criteria already discussed in this section: maximum number of users and requests per minute.

To determine response time at peak capacity, use the following formula:

Response time = (concurrent users / requests per minute) - think time

To obtain an accurate response time result, you must always include think time in the equation.

For example, if the following conditions exist:

Maximum number of concurrent users that your system can support at peak capacity equals 5,000

Maximum number of requests the system can process at peak capacity equals 1,000 per second

Average think time equals 3 seconds per request:

Then, the calculation is as follows:

2 seconds response time = (5,000 users / 1,000 requests per second) - 3 seconds think time

After you have calculated your system's response time, particularly at peak capacity, decide what is an acceptable response time for your enterprise. Response time, along with throughput, is one of the factors critical to NAS performance and improving it should be one of your goals. If there is a response time beyond which you do not want to wait, and performance is such that you get response times over that level, then work towards improving your response time or redefine your response time threshold.

Determining Requests Per Minute If you know the number of concurrent users at any given time and the response time of their requests and the average user think time at that time, you can determine server capacity, or requests per minute.

Typically, you start by knowing how many concurrent users are on your system. For example, after running some web site performance software, suppose you have calculated that the average number of concurrent users submitting requests on your online banking web site is 3,000. This is dependent on the number of users who have signed up to be members of your online bank, their banking transaction behavior, the times of the day or week they choose to submit requests, and so on. Therefore, knowing this information means you can use the requests per minute formula described in this section to calculate how many requests per minute your system can handle for this user base.

Then, because requests per minute and response time become inversely proportional at peak capacity, decide if fewer requests per minute are acceptable as a trade-off for better response time, or alternately, if a slower response time is acceptable as a trade-off for more requests per minute. Essentially, you start playing around with the requests per minute and response time thresholds that you will accept as a starting point for fine-tuning system performance. Then you decide which areas of your system you want to adjust.

The requests per second formula is as follows:

requests per second = concurrent users / (response time + think time)

For example, if the following conditions exists:

Concurrent users equals 2,800

Average response time equals 1 second per request

Average think time equals 3 seconds

The calculation is as follows:

700 requests per second = 2,800 / (1+3)

700 requests per second translates to 42,000 requests per minute.

Developing an Application Profile

In determining system capacity, you need to consider not only the user load on your system, but the nature and activity of the applications users will run on Netscape Application Server (NAS).

When a user submits a request against your application or applications that run on NAS, one or more application components are invoked, sometimes triggering other components to be invoked. An application component is an element such as a servlet, JSP, EJB, or AppLogic.

This section contains the following topics:

Complexity of Requests

Application Partitioning Scenarios

Complexity of Requests When users submit requests, the requests trigger application activity of a certain nature. The kind of activity that occurs can affect system performance, depending on the level of processing that takes place as a result of the request. Think of requests as fitting into the following general categories:


Complexity of requests	Characteristics
Lightweight	Generates a browse operation that fetches static information, such as an HTML page.
Mediumweight	Generates a search operation consisting of a read-only database query. The information can be cached so as to increase response time the next time the request is generated.
Heavyweight	Generates a login or logout operation Generates calculations that write or update information on a database Creates new records on a database

Examples of different kinds of application requests are described in Chapter 4, "Performance Testing and Fine-Tuning Your System."

Determine what kinds of application components, and more importantly, what kinds of requests they generate. Perhaps most requests fall into one of the categories listed in the table above, or perhaps you'll develop a combination. Knowing what mix of requests run on your system helps you to calculate the level of activity that will take place on your server.

Application Partitioning Scenarios Application partitioning refers to how you deploy and host applications across servers. Application partitioning affects server performance and should be considered when you make decisions about how to improve throughput and response time.

The following table describes three common ways to host an application on multiple NAS machines:


Hosting configuration	Description
Local	The application is installed on each NAS machine and uses multiple web servers to traffic requests to each server. The NAS machines do not communicate with each other.
Partitioned	Parts of the application are hosted on different NAS machines. Each server knows where the application components of the application are hosted on other servers and forwards requests to the appropriate server.
Distributed for load balancing	Parts or all of the application are duplicated on two or more NAS machines. You can then configure the servers to balance application-request loads.

Hosting Applications Locally on Multiple Servers
Hosting applications locally on multiple servers is the simplest of the three most common server configurations. In this configuration, you deploy the complete application on each NAS machine. If the application is already installed on a NAS machine, you can deploy the application to other servers.

Supporting applications locally on multiple servers means that each server stands alone. That is, the two or more NAS machines in the configuration do not communicate with each other. You must have at least one web server for each NAS machine. The following illustration depicts a local hosting configuration

Local hosting requires that you configure each Web Connector plug-in to forward requests to the appropriate NAS machine.

Hosting Partitioned Applications on Multiple Servers
To partition an application, you must divide up the application components that make up an application. Application components are then hosted by separate NAS machines. Partitioning applications allows each NAS machine to specialize in one type of processing.

For example, servlets responsible primarily for data access are I/O intensive, while servlets responsible for performing calculations are CPU and active-memory intensive. To maximize your application's overall performance, you can partition the application to host these different types of servlets on separate NAS machines. Then you can configure the hardware of the individual servers to support the needs of the application components hosted there.

Keep in mind that if you isolate application components on one machine, you risk losing failover capabilities for that component. For more information about failover planning, see "Developing a Failover Provision," on page 71.

Hosting and Deploying Applications for Load Balancing
Balancing application-request loads, or load balancing, differs from partitioning applications. Load balancing requires that you place one or more copies of an application component on multiple NAS machines rather than simply dividing an application's components among multiple servers (or partitioning the application). You then configure each server, allowing it to find application components on other servers.

When you deploy an application, you must decide if you want to configure the application for load balancing and, if so, how you will configure it. Select the configuration that is most useful for your current scenario; for example, you might have three NAS machines used for testing applications. Your production environment, where users' requests are actually processed, also consists of three NAS machines. Because the application components could be different between the two groups of servers, you do not want to enable application load balancing. Therefore, when you deploy an application from the test servers to the production servers, you should choose only to balance the loads between the destination servers.

Later, should you scale your system to include three more NAS machines in the production group, you can join all the servers in that group when deploying the applications from one of the existing runtime servers to the new servers. The application loads are then balanced between the existing servers and the new servers.

For details about the different application deployment configurations, load balancing, and how to partition applications across servers, see Chapter 12, "Administering Multi-Server Applications," in the Administration Guide.

Developing a Failover Provision

Failover, the ability to continue processing server requests without interruption, is one of the general goals of deployment, as explained in Chapter 1, "Overview of Netscape Application Server Deployment." This feature requires that you configure multiple servers (at least two) and processes to provide failure recovery. Netscape Application Server (NAS) distributes and maintains state and session information across servers that are members of the same cluster, allowing requests to be handled by another server in the cluster in case a server becomes unavailable. Thus, there is no single point of failure for a running application.

Planning Failover Capacity Failover capacity planning means deciding how many additional servers and processes to configure so that in the event of a server or process failure, the system can seamlessly recover data and continue processing. If your system gets overloaded, a process or server failure could result, causing response-time degradation or even total loss of service. Preparing for such an occurrence is critical to a successful deployment process. Every enterprise is unique, and failover requirements vary from site to site. Only you can determine what your capacity requirements are based on the kinds of applications you run and hardware resources you have.

However, as a general rule, to ensure failover, make sure of the following: