Sun Java System Application Server 8.1 2004Q4 Deployment Planning Guide |
Chapter 2
Planning your EnvironmentBefore deploying the Application Server, first determine the performance and availability goals, and then make decisions about the hardware, network, and storage requirements accordingly.
This chapter contains the following sections:
Establishing Performance GoalsAt its simplest, high performance means maximizing throughput and reducing response time. Beyond these basic goals, the administrator will want to establish specific goals by determining the following:
Some of the metrics described in this chapter can be calculated using a remote browser emulator (RBE) tool, or web site performance and benchmarking software, that simulates the enterprise’s web application activity. Typically, RBE and benchmarking products generate concurrent HTTP requests and then report back the response time and number of requests per minute. These figures can then be used to calculate server activity.
The results of the calculations described in this chapter are not absolute. Treat them as reference points to work against, as you fine-tune the performance of the Application Server.
This section describes the following topics:
Estimating Throughput
Throughput, as measured for application server instances and for the HADB, has different implications.
A good measure of the throughput for Application Server instances is the number of requests precessed per minute. A good measure of throughput for the HADB is the number of requests precessed per minute by the HADB, and the session size per request. The session size per request is important because the size of the session data stored varies from request to request.
Estimating Load on Application Server Instances
Consider the following factors to estimate the load on application server instances:
Calculating Maximum Number of Concurrent Users
A user runs a process (for example, through a web-browser) that periodically sends requests from a client machine to the Application Server. When estimating the number of concurrent users, include all users currently active. A user is considered active as long as the session that the user is running is active (that is, the session has neither expired nor terminated).
A user is concurrent for as long as the user is on the system as a running process submitting requests, receiving results of requests from the server, and viewing the results.
Eventually, as the number of concurrent users submitting requests increases, requests processed per minute begins to decline (and the response time begins to increase). The following diagram illustrates this situation.
Figure 2-1 Performance Pattern with Increasing Number of Users.
Identify the point at which adding more concurrent users reduces the number of requests that can be processed per minute. This point indicates when performance starts to degrade.
Calculating Think Time
A user does not submit requests continuously. A user submits a request, the server receives the request, processes it and then returns a result, at which point the user spends some time analyzing the result before submitting a new request. The time spent reviewing the result of a request is called think time.
Determining the typical duration of think time is important. The administrator can use the duration to calculate more accurately the number of requests per minute, as well as the number of concurrent users your system can support. Essentially, when a user is on the system but not submitting a request, a gap opens for another user to submit a request without altering system load. This implies that the system can support more concurrent users.
Calculating Average Response Time
Response time refers to the amount of time it takes for the results of a request to be returned to the user. The response time is affected by a number of factors, including network bandwidth, number of users, number and type of requests submitted, and average think time.
In this section, response time refers to the mean, or average, response time. Each type of request has its own minimal response time. However, when evaluating system performance, base the analysis on the average response time of all requests.
The faster the response time, the more requests per minute are being processed. However, as the number of users on the system increases, the response time starts to increase as well, even though the number of requests per minute declines, as the following diagram illustrates:
Figure 2-2 Response Time with Increasing Number of Users
A system performance graph similar to Figure 2-2, indicates that after a certain point, requests per minute are inversely proportional to response time. The sharper the decline in requests per minute, the steeper the increase in response time (represented by the dotted line arrow).
In Figure 2-2, note the point of the peak load, that is, the point at which requests per minute start to decline. Prior to this point, response time calculations are not necessarily accurate because they do not use peak numbers in the formula. After this point, (because of the inversely proportional relationship between requests per minute and response time), the administrator can more accurately calculate response time using maximum number of users and requests per minute.
To determine response time at peak load, use the following formula:
Response time = (concurrent users / requests per second) - think time in seconds
To obtain an accurate response time result, always include think time in the equation.
Example Calculation of Response Time
For example, if the following conditions exist:
Therefore, the response time is 2 seconds.
After the system’s response time has been calculated, particularly at peak load, compare it to the acceptable response time for the application. Response time, along with throughput, is one of the main factors critical to the Application Server performance.
Calculating Requests Per Minute
If the number of concurrent users at any given time, the response time of their requests, and the average user think time is known, then the requests per minute can be calculated. Typically, start by estimating the number of concurrent users that are on the system.
For example, after running web site performance software, the administrator concludes that the average number of concurrent users submitting requests on the online banking web site is 3,000. This number is dependent on the number of users who have signed up to be members of your online bank, their banking transaction behavior, the times of the day or week they choose to submit requests, and so on.
Therefore, knowing this information enables you to use the requests per minute formula described in this section to calculate how many requests per minute your system can handle for this user base. Since requests per minute and response time become inversely proportional at peak load, decide if fewer requests per minute are acceptable as a trade-off for better response time, or alternatively, if a slower response time is acceptable as a trade-off for more requests per minute.
Essentially, experiment with the requests per minute and response time thresholds that are acceptable as a starting point for fine-tuning system performance. Thereafter, decide which areas of the system require adjustment.
The formula for obtaining the requests per second is as follows:
requests/s = concurrent users
response time (s) + think time (s)Example Calculation of Requests per Second
For example, if the following conditions exists:
Therefore, the number of requests per second is 700 and the number of requests per minute is 42000.
Estimating Load on the HADB
To calculate load on the HADB, consider the following factors:
For instructions on configuring session persistence, see Sun Java System Application Server Administration Guide.
Persistence Frequency for HADB
The number of requests per minute received by the HADB depends on the persistence frequency. This is the frequency at which the state of HTTP session and stateful session bean (SFSB) information is stored in the HADB.
The persistence frequency options are:
Table 2-1 summarizes the advantages and disadvantages of the persistence frequency options.
Session Size and Scope Per Request for HADB
The session size per request depends on the amount of session information stored in the session.
Tip
To improve overall performance, reduce the amount of information in the session, as much as possible.
It is possible to further fine-tune the session size per request through the persistence scope settings. Choose from the following options for HTTP and SFSB session persistence scope:
modified-attribute - Only those attributes are stored that have been modified (inserted, updated, or deleted) since the last time the session was stored.Table 2-2 summarizes the advantages and disadvantages of the persistence scope options.
In the case of SFSB session persistence, the load on HADB depends on the number of SFSB beans enabled for check pointing, how many methods in each bean are enabled for checkpointing, and which methods are transactional. Checkpointing generally occurs after any transaction involving the SFSB is completed (even if the transaction rolls back).
For better performance, specify a small subset of methods for checkpointing. The size of the data that is being checkpointed and the frequency at which checkpointing takes place determines the additional overhead in response time for a given client interaction.
Planning the Network Configuration to Meet Your Performance GoalsWhen planning how to integrate the Application Server into the network, the administrator will also estimate the bandwidth requirements and plan the network in such a way that it can meet the user’s performance requirements.
The following topics are covered in this section:
Estimating Bandwidth Requirements
To decide on the desired size and bandwidth of the network, first determine the network traffic and identify its peak. Check if there is a particular hour, day of the week, or day of the month when overall volume peaks, and then determine the duration of that peak.
Tip
At all times consult network experts at your site about the size and type of network components you are considering.
During peak load times, the number of packets in the network is at its highest level. In general, if you design for peak load, scale your system with the goal of handling 100 percent of peak volume. Bear in mind, however, that any network behaves unpredictably and that despite your scaling efforts, 100 percent of peak volume might not always be handled.
For example, assume that at peak load, five percent of users occasionally do not have immediate Internet access when accessing applications deployed on Application Server. Of that five percent, determine how many users retry access after the first attempt. Again, not all of those users might get through, and of that unsuccessful portion, another percentage will retry. As a result, the peak appears longer because peak use is spread out over time as users continue to attempt access.
To ensure optimal access during times of peak load, start by verifying that the Internet service provider (ISP) has a backbone network connection that can reach an Internet hub without degradation.
Calculating Bandwidth Required
Based on the calculations made in Establishing Performance Goals, determine the additional bandwidth required for deploying the Application Server at your site.
Depending on the method of access (T-1 lines, ISDN, and so on), calculate the amount of increased bandwidth required to handle your estimated load. For example, suppose your site uses T-1 or higher-speed T-3 links for Internet access. Given their bandwidth, estimate how many lines are needed on the network, based on the average number of requests generated per second at your site and the maximum peak load. Calculate these figures using a web site analysis and monitoring tool.
Example Calculation of Bandwidth Required
A single T-1 line can handle 1.544 Mbps. Therefore, a network of four T-1 lines carrying 1.544 Mbps each can handle approximately 6 Mbps of data. Assuming that the average HTML page sent back to a client is 30 kilobytes (KB), this network of four T-1 lines can handle the following traffic per second:
6,176,000 bits/8 bits = 772,000 bytes per second
772,000 bytes per second/30 KB = approximately 25 concurrent client requests for pages per second.
With a traffic of 25 pages per second, this system can handle 90,000 pages per hour (25 x 60 seconds x 60 minutes), and therefore 2,160,000 pages per day maximum, assuming an even load throughout the day. If the maximum peak load is greater than this, increase the bandwidth accordingly.
Estimating Peak Load
Having an even load throughout the day is probably not realistic. You need to determine when the peak load occurs, how long it lasts, and what percentage of the total load is the peak load.
Example Calculation of Peak Load
If the peak load lasts for two hours and takes up 30 percent of the total load of 2,160,000 pages, this implies that 648,000 pages must be carried over the T-1 lines during two hours of the day.
Therefore, to accommodate peak load during those two hours, increase the number of T-1 lines according to the following calculations:
648,000 pages/120 minutes = 5,400 pages per minute
5,400 pages per minute/60 seconds = 90 pages per second
If four lines can handle 25 pages per second, then approximately four times that many pages requires four times that many lines, in this case 16 lines. The 16 lines are meant for handling the realistic maximum of a 30 percent peak load. Obviously, the other 70 percent of the load can be handled throughout the rest of the day by these many lines.
Configuring Subnets
If the separate tier topology is used, where the application server instances and HADB nodes are on separate tiers, it is possible to achieve a performance improvement by keeping HADB nodes on a separate subnet. This is because HADB uses the User Datagram Protocol (UDP). Using a separate subnet reduces the UDP traffic on the machines outside of that subnet.
Choosing Network Cards
For greater bandwidth and optimal network performance, use at least 100 Mbps Ethernet cards or, preferably, 1 Gbps Ethernet cards between servers hosting the Application Server and the HADB nodes.
Network Settings for HADB
Use the following suggestions to make HADB work optimally in the network:
Planning for AvailabilityThis section contains the following topics:
Rightsizing Availability
When planning availability of systems and applications, you must assess the availability needs of various user groups accessing different applications. For example, external fee-paying users often have higher quality of service (QoS) expectations than internal users or business partners. Thus, it may be more acceptable for an application feature, application, or server to be unavailable to internal users than it is for the same function to be unavailable to paying external customers.
Figure 2-3 illustrates the increasing cost and complexity of mitigating against decreasingly probable events. At one end of the continuum, a simple load-balanced cluster can tolerate localized application, middleware, and hardware failures. At the other end of the scale, geographically distinct clusters can mitigate against major catastrophes affecting the entire data center.
Figure 2-3 Availability vs. Cost & Complexity
To realize a good return on investment, it often makes sense identify availability requirements of features within an application. For example, it may not be acceptable for an insurance quotation system to be unavailable (potentially turning away new business), but brief unavailability of the account management function (where existing customers can view their current coverage) is unlikely to turn away existing customers.
Using Clusters to Improve Availability
At the most basic level, a cluster is a group of application server instances—often hosted on multiple physical servers—that appear to clients as a single instance. This provides horizontal scalability as well as higher availability than a single instance on a single machine. This basic level of clustering works in conjunction with the Application Server’s load balancer plug-in, which accepts HTTP and HTTPS requests and forwards them to one of the application server instances in the cluster. If an instance fails, become unavailable (due to network faults), or becomes unresponsive, requests are redirected only to existing, available machines. The load balancer can also recognize when an failed instance has recovered and redistribute load accordingly. For stateless applications or applications that only involve low-value, simple user transactions, a simple load balanced cluster is often all that is required. For stateful, mission-critical applications, consider using the Application Server’s HADB. (See the section, High Availability Database (HADB).)
To perform online upgrades of applications, it is best to group the application server instances into multiple clusters. The Application Server has the ability to quiesce both applications and instances. Quiescence is the ability to take an instance (or group of instances) or a specific application offline in a controlled manner without impacting the users currently being served by the instance or application. As one instance is quiesced, new users are served by the upgraded application on another instance.
Adding Redundancy to the System
One way to achieve high availability is to add hardware and software redundancy to the system. When one unit fails, the redundant unit takes over. This is also referred to as fault tolerance. In general, to achieve high availability, determine and remove every possible point of failure in the system.
Identifying Failure Classes
The level of redundancy is determined by the failure classes (types of failure) that the system needs to tolerate. Some examples of failure classes are:
Duplicated system processes tolerate single system process failures, as well as single machine failures. Attaching the duplicated mirrored (paired) machines to different power supplies tolerates single power failures. By keeping the mirrored machines in separate buildings, a single-building fire can be tolerated. By keeping them in separate geographical locations, natural catastrophes like earth quakes can be tolerated.
Using HADB Redundancy Units to Improve Availability
To improve availability, HADB nodes are always used in Data Redundancy Units (DRUs) as explained in Establishing Performance Goals.
Using HADB Spare Nodes to Improve Fault Tolerance
Using spare nodes improves fault tolerance. Although spare nodes are not mandatory, their use is recommended for maximum availability.
Planning Failover Capacity
Failover capacity planning implies deciding how many additional servers and processes you need to add to the Application Server installation so that in the event of a server or process failure, the system can seamlessly recover data and continue processing. If your system gets overloaded, a process or server failure might result, causing response time degradation or even total loss of service. Preparing for such an occurrence is critical to successful deployment.
To maintain capacity, especially at peak loads, it is recommended that you add spare machines running the Application Server instances to the existing Application Server installation. For example, assume you have a system with two machines running one Application Server instance each. Together, these machines handle a peak load of 300 requests per second. If one of these machines becomes unavailable, the system will be able to handle only 150 requests, assuming an even load distribution between the machines. Therefore, half the requests during peak load will not be served.
Design DecisionsBased on the load on the application server instances, the load on the HADB, and the failover requirements, it is possible to make the following decisions:
Number of Application Server Instances Required
To determine the number of applications server instances (hosts) needed, evaluate your environment on the basis of the factors explained in Estimating Load on Application Server Instances. Typically, at least one CPU is allocated to each application server instance, although each instance can use more than one Central Processing Unit (CPU).
Number of HADB Nodes Required
As a general guideline, plan to have one HADB node for each CPU in the system. For example, use two HADB nodes for a machine that has two CPUs.
Alternatively, use the following procedure to determine the required number of HADB nodes:
- Determine the following parameters:
- Determine the size in Gigabytes of the maximum primary data volume, PDV, using the following formula:
PDV = NUSERS * S
- Determine the maximum HADB data transfer rate, R. This reflects the data volume shipped into HADB from the application side. Use the following formula:
R = NUSERS * NTPS * S
- Determine the number of nodes based on data volume considerations, NNODES,. using the following formula:
NNODES = PDV/5GB
Number of Hosts
Determine the number of hosts based on data transfer requirements. This calculation assumes all hosts have similar hardware configurations and operating systems, and have the necessary resources to accommodate the number of nodes they are supposed to run.
To calculate the number of hosts based on data transfer considerations, follow this procedure:
- Determine the maximum host data transfer rate, RMAX.. Determine this value empirically, because it depends on network and host hardware.
- Updating the amount of data V distributed over N hosts causes each host to receive approximately 4V/N data. The number of hosts needed to accommodate this data is determined by using the following formula:
NHOSTS = 4 * DTR / RMAX
Each host needs to run at least one node, so if the number of nodes is less than the number of hosts (NNODES < NHOSTS), adjust NNODES to be equal to NHOSTS. If the number of nodes is greater than the number of hosts, (NNODES > NHOSTS), several nodes can be run on the same host.
HADB Storage Capacity Required
The HADB provides near-linear scaling with the addition of more nodes until the network capacity is exceeded. Each node must be configured with storage devices on a dedicated disk or disks. All nodes must have equal space allocated on the storage devices. Make sure that the storage devices are allocated on local disks.
For example, suppose the expected session data is X MB. The HADB replicates the data on mirror nodes, and therefore, 2X MB of storage is needed.
Further, the HADB uses indexes to enable fast access to data. An additional 2X MB is required (for both nodes together) for indexes. This implies that a storage capacity of 4X is required.
Therefore, the expected storage capacity needed by the HADB is four times the expected data volume.
If future expansion is part of the design, (by adding bigger disks to nodes or adding new nodes to the system) without loss of data from the HADB, the expected storage capacity is eight times the expected data volume. This storage capacity is required for online upgrades because you might want to refragment the data after adding new nodes. In this case, a similar amount (4X) of additional space on the data devices is required, thus increasing the total storage capacity required to 8X.
Additionally, HADB uses disk space for internal use, as follows:
The following table summarizes the HADB storage space requirements for a session data of X MB.
If the HADB runs out of device space, client requests to insert or update data are not accepted. However, delete operations are accepted.
Designing for Peak Load or Steady State Load
In a typical deployment, there is a difference between steady state and peak workloads.
If the system is designed around peak load, deploy a system that can sustain the expected maximum load of users and requests without degrading response time. This implies that the system can handle extreme cases of expected system load. If the difference between peak load and steady state load is substantial, designing for peak loads can mean that money is spent on resources that are idle for a significant amount of time.
If the system is designed around a steady state load, then the system does not have to have all the resources required to handle the server’s expected peak load. However, a system designed to support steady load has a slower response time when peak load occurs.
The factor that can affect whether the administrator wants to design for peak load or for steady state, is how often the system is expected to handle the peak load. If peak load occurs several times a day or even per week, the administrator might decide to expand capacity to handle this load. If the system operates at steady state 90 percent of the time, and at peak only 10 percent of the time, then the administrator might prefer to deploy a system designed around steady state load. This implies that the system’s response time will be slower only 10 percent of the time. Decide if the frequency or duration of time that the system operates at peak justifies the need to add resources to the system.