Deployment Guide: Performance Testing and Fine-Tuning Your System


Request	Description
Browse	Reviews a random group of pages, without involving a Power search. The user randomly goes from one static HTML page to another, by clicking on links. Light operations on the database back-end. Response time should be very fast.
Shopping Cart	Updates a temporary shopping cart record and calculates the total quantity of items in the cart and the total price of the items. Occurs frequently. Databases are not updated. Response time should be fast.
Buy/Place Order	Commits the user's order. Occurs less frequently than a Shopping Cart request. Performs read-write operations to the Order database. Response time should be fast.
Registration	Adds a customer record to the Customer database. Performs a read-write operation. Occurs infrequently, only once per user, compared to other requests such as Buy or Sign In.
Sign in (Log in)	Authenticates the user against the Customer database. Performs a read-only operation on the Customer database. Occurs less frequently than Browse or Shopping Cart requests. Response time should be fast.
Log out	Terminates a user session. Does not affect any database. Response time should be fast.
Search	Goes through the database for a specific item or set of items based on certain search criteria. Performs a read-only database operation. Occurs less frequently than a Shopping Cart request. Does not require as fast a response time as a Shopping Cart or Buy request.

Application Flow The following scenario uses the NAS Online Bookstore application to simulate a single user's interaction with the application. The scenario can be incorporated into a performance test script to test expected application and server activity.

The user enters the URL of the shopping site's home page to navigate to the application.

The user pauses 3 seconds (think time) to view the home page and determine which link or button to click.

The user clicks Sign In to navigate to the Customer Sign In page.

The user pauses 3 seconds (think time) to view the Sign In page.

If the user is already registered, she enters her email address and password, and clicks Sign In.

If she is not yet registered, she does the following:

Clicks Register Here.

Enters the required data on the Registration Page.

Clicks Save.

The user pauses 3 seconds to visually locate the link she will click to search the Book database. (This, plus Step 5—entering email address and password—or Step 6.2—entering the required data on the Registration page—equals think time.)

The user clicks Browse Subjects to navigate to the shopping site's list of book subjects.

The user pauses 3 seconds (think time) to view the Browse Subjects page.

The user clicks a subject name, for example, Literature or Computer, to navigate to the first page of the resulting list.

The user pauses 5 seconds (think time) to view and scroll through the page.

The user clicks the Add to Cart button next to an item on the list to add it to the Shopping Cart and simultaneously navigate to the Shopping Cart page.

The user pauses 3 seconds to view the Shopping Cart page, which displays the total quantity and cost of the items currently in the shopping cart.

The user repeats Steps 7 through 13 three times.

The user clicks Continue Checkout, which navigates to the Checkout page. The checkout page displays a summary of the committed order.

The user pauses 6 seconds (think time) to view the page and ascertain that the ordering information is correct.

The user clicks Logout to exit the application.

This scenario simulates a variety of requests with the following frequency:


Request	Frequency	Steps where request is executed
Browse	7	1, 3, 6.1, 8
Search	4	10
Shopping Cart	4	12
Buy/Place Order	1	15
Registration	1 or 0, depending on user	6.3
Sign in (Log in)	1 or 0, depending on user	5
Log out	1	17

In addition to simulating these requests and their frequency, the application scenario also factors in think time, which affects performance calculations, as explained in Chapter 3, "Determining System Capacity."

Understanding the different requests, when they occur in the application flow, and what parts of the system they affect is important when analyzing test results. If a bottleneck occurs, you can isolate it by removing requests. When you determine which request causes the problem, you'll know what part of the system to address for improving performance.

Running Performance Tests

Assuming that the scenario described in "Application Flow" on page 80 represents a typical application flow, a performance test using this scenario would involve the following tasks:

Prepare your test environment, and make sure it includes the following:

Hardware that mirrors your production environment.

Firewalls that mirror those in your production environment.

A testing tool that predicts and measures system behavior and performance. You want to use a tool that can mimic your enterprise activity by emulating the number of users and requests you expect on your system. After you run the test, such a tool provides statistics and graphs of system behavior to help you analyze performance.

The Netscape Application Server Administrator tool. You can use this tool while running tests to measure NAS performance. For information about how to install Netscape Application Server Administrator, see the Installation Guide. For information about how to measure NAS performance using the Netscape Application Server Administrator tool, see Chapter 3, "Monitoring Server Activity," in the Administration Guide.

For performance test results to be accurate, it is very important that your test environment closely match, if not exactly reproduce, your production environment.

Decide in advance how many users you expect at peak load and at steady state load.

Decide in advance your response time expectations at peak load and at steady state load.

Decide in advance your expectations for requests per minute (or per second) at peak load and at steady state loads.

Know in advance the durations (start and end times) of peak load and steady state load.

Know in advance your user behaviors and the kinds of requests they submit. Your applications determine the mix of requests, based on how they are designed, so know your applications and how your users work with them.

Create a script that automates your application scenario.

The scenario should map as closely to your user behavior as possible, including think times and the typical mix of requests your users submit.

The script should have the ability to simulate multiple users.

Run one or both of the following types of tests, depending on the benchmarking or performance testing tool you use and on what you want to measure:

Peak User Load Test: This test measures how many users your system can handle.

Run the script with the number of users you expect at steady state load as a parameter.

Make sure to vary the iterations so that in half of them, the user performs a Registration request by registering on the application for the first time, while in others, the user performs a Sign In request by signing on to the application.

Increment the number of users over a period of time until you reach the peak load amount. For example, if the initial number of users at steady state is 100, start with that number of simultaneous script iterations and increment this by 20 users every 10 minutes until you reach your peak load of, for example, 300 iterations. Then, continue running the iterations for as long as your peak load typically lasts.

Free Running Threads Test: This test measures how many requests per minute your system can handle, in other words, it validates the system's throughput.

Create an application scenario that does not include think time. It should consist of a variety of raw requests on free-running threads. As soon as a request is submitted, another is submitted immediately following it, and so on, without any think time between requests.

This test is useful because you don't need multiple machines representing multiple clients and you can run the test in less time than a test that includes think time. The normal throughput restrictions found in the peak user load test are eliminated.

Using a web site performance and benchmarking tool, track the following information:

Response time of requests at each phase of the test: What is the response time for requests when there's a load of 100 concurrent users? What is the response time when you increase the number to 120 concurrent users? And so on.

Requests per minute or per second at each phase of the test.

At what point does performance start to degrade? In other words, how many users are on the system concurrently when response time starts to increase and requests per minute start to decrease?

Tracking this information will help you determine peak capacity information; in other words, information about the maximum number of concurrent users your system can realistically sustain, versus what you'd like it to ideally sustain at peak load.

Analyzing Test Results

As explained in Chapter 3, "Determining System Capacity," certain patterns of server behavior are better than others. The benchmarking tools you use to run the performance tests have plotting features for producing graphic representations of test results, or they generate raw data that you can then plot to analyze system performance.

Patterns of System Behavior The following diagram illustrates the patterns you should look for:

In this diagram, the ideal patterns of server behavior, as already explained in Chapter 3, "Determining System Capacity," are lines A and B, or any line that maps as closely as possible to A and B. These two lines represent system activity in which CPU resources reach maximum capacity almost immediately (meaning that any number of users, even the initial small number, can take full advantage of CPU resources), a steady state is maintained while the number of concurrent users increases, and then when peak capacity is reached, performance begins to degrade gradually.

Lines C and D represent activity that is less than optimal. In these cases, as soon as peak capacity is reached, performance declines immediately and at a sharp rate.

Based on the patterns of behavior, you then decide if you are satisfied with the results:

Does the response time for the number of users at a given point in the test cycle meet your response time expectations?

In particular, does the response time for peak capacity meet your expectations?

Does the server handle peak load (the expected maximum number of concurrent users) without a drop in requests per minute?

Is the number of concurrent users acceptable when response time starts to increase and requests per minute start to decrease?

Identifying System Bottlenecks If you are not satisfied with the test results, you must identify where the system bottlenecks are occurring. Examine the various parts of your system to begin isolating the problem.

Analysis Tools
Consider using the analysis tools available with the operating system of the machine where the bottleneck has occurred.

For Solaris machines use the following tools:

mpstat

vmstat

iostat

netstat

proctool

For NT machines use Performance Monitor to look at the resource use of specific components and program processes. This tool creates charts and reports that track the machine's efficiency and identify possible problems.

Possible Bottlenecks

Is the bottleneck occurring on the client side?

Although unlikely, sometimes the bottleneck may be in the client that generates requests to the web server. When you run performance tests, always verify if the script or software that simulates user requests has problems at high loads.

Is the web server performing at near maximum capacity?

Use the operating system tools on the web server machine to determine if a bottleneck exists in the CPU, memory, disk I/O, network I/O, lock usage, and so on. Additionally, look at the web server administration documentation, or consult with your web server system administrator, for information about how to identify performance bottlenecks.

Is the back-end data source the problem?

Use the operating system tools on the back-end data source machine to determine if a bottleneck exists in the CPU, memory, disk I/O, network I/O, lock usage, and so on. For example, if the CPU on the back-end data source is at maximum capacity, while the NAS CPU is, for example, 50 percent idle, then the back-end data source is the bottleneck.

Additionally, look at the back-end data source administration documentation, or consult with your data source system administrator, for information about how to identify performance bottlenecks.

Is the application the source of the bottleneck?

Use an application profiling tool to measure where CPU usage, memory leaks, disk I/O, network I/O, lock usage and other such factors take place in your NAS applications. You may discover that your applications need to be optimized. Refer to the Programmer's Guide for information on how to optimize NAS applications.

Is NAS performing at near maximum capacity?

Use the operating system tools on the NAS machine to determine if a bottleneck exists in the CPU, memory, disk I/O, network I/O, lock usage, and so on. The Netscape Application Server Administrator tool can help monitor several NAS-related statistics and fine-tune NAS. For more information about monitoring and fine-tuning the server, see Chapter 3, "Monitoring Server Activity," in the Administration Guide.

If you determine that the bottleneck exists on NAS, begin by isolating requests that you ran in the tests and fine-tuning different aspects of NAS, according to the information in the following section, "Fine-Tuning Netscape Application Server."

As a general rule, with regard to all the machines that make up your system (the web server, the back-end data source, and NAS), CPU usage should not be at 100 percent of full capacity, but rather at 90 percent. When a machine's CPU is at full capacity, you have no way of knowing by how much the CPU capacity is exceeded. Furthermore, machines tend to underperform when pushed beyond their capacity, because of the increased contention for the CPU resources.

Fine-Tuning Netscape Application Server

If NAS is the bottleneck in your system, you must fine-tune or tweak your server to improve performance. This section provides fine-tuning guidelines for the following NAS areas:

Java Server Engines (KJSs)

C++ Server Engines (KCSs)

Executive Server (KXS)

Sessions and Session Data

Request and Response Objects

Input and Output of Application Components

Sticky Application Components

Load Balancing

Sync Backups

Database Connections

Multiple CPUs

Cluster Sizing

Resizing for Throughput

Web Servers

Java Server Engines (KJSs) Adjusting the number of KJS engines and the number of threads in each can help improve bottlenecks in NAS. As explained in Chapter 3, "Determining System Capacity," each NAS server is made up of several engines, or processes. The Java Server engine (KJS) hosts servlet, JSP, EJB, and AppLogic application components written in Java. You specify the number of KJS engines at installation time and can later adjust this number by using Netscape Application Server Administrator.

Adjusting the Number of Engines Per CPU
The number of KJS engines you configure depends on the nature of your application components, particularly servlets, JSPs and AppLogic components. If most of your application components are CPU intensive, in other words, if they perform a lot of computations that expend CPU time and perform very little I/O operations, you should configure one engine per CPU. So, if your NAS machine has four CPUs, configure four KJS engines.

However, if your application components are I/O intensive, configure between two and four engines per CPU, two being the ideal. Even for long-running application components that are not CPU intensive, two KJS engines per CPU is adequate.

To determine the best configuration for your NAS installation, experiment with these numbers under peak load conditions.

Adjusting the Number of KJS Threads
A KJS engine is a multithreaded process, with each thread running an application component, such as a servlet, an EJB, or an AppLogic component. In some instances, depending on how the application has been designed, an EJB takes up multiple threads.

The range of configurable threads in a KJS engine is 16 to 64. You can adjust the number of available threads for each KJS by using Netscape Application Server Administrator. (Or, if you are familiar with the NAS registry, you can reset the Application Server\\4.0\\CCS0 \\ENG\\engine_number\REQ parameter, where engine_number represents the engine number of the Java Server process. This registry parameter sets threads on a per engine basis, overriding the value in the Application Server\\ 4.0\\CCS0\\REQ parameter, which sets the number of threads for all engines.)

Determining the Number of Threads

Typically, for optimal performance, specify between 32 and 48 threads per engine, depending on the request load you expect for your applications. Since a thread can only be used by one application component at a time, when a component is running in it, the thread becomes busy until the component completes running. Therefore, if your applications use many components, you should configure the number of threads accordingly. Specifying more than 48 threads has its drawbacks: a greater amount of memory is required and a greater amount of thread context switching occurs.

For details about specifying KJS engines during installation, see the Installation Guide. For details about specifying more KJS engines and adjusting the number of threads, see Chapter 7, "Increasing Fault Tolerance and Server Resources," in the Administration Guide.

C++ Server Engines (KCSs) The C++ Server engine (KCS) hosts AppLogic application components written in C++. As with KJS engines, you specify the number of KCS engines at installation time, and can later adjust this number by using Netscape Application Server Administrator. KCS engines also support multiple threads.

For details about specifying KCS engines during installation, see the Installation Guide. For details about specifying more KCS engines and adjusting the number of threads, see Chapter 7, "Increasing Fault Tolerance and Server Resources," in the Administration Guide.

Executive Server (KXS) The Executive Server (KXS) can cause a bottleneck in your NAS configuration. Its role is to accept an application request from the web server, and forward the request to an engine (KJS or KCS) for execution. It also handles the return trip of the request by shuttling the application components back from the KJS and KCS engines to the web server. The KXS does not perform any calculations or heavy-duty operations; it simply controls request traffic. Like KJS and KCS engines, the KXS is a multithreaded process. Each thread handles an application component, such as a servlet, an EJB, or an AppLogic component. When the threads get too busy, throughput is reduced, causing a bottleneck.

Conditions That Affect KXS Threads
The kinds of conditions that make KXS threads busy include:

Large numbers of requests for small application components that spend very little time in the KXS engine. This condition causes the KXS to work harder and more frequently at redirecting requests.

Application component result caching. The KXS must update and search the cache to fulfill requests, so when caching is enabled, it affects KXS performance. The amount of data that can be cached is limited by how much physical memory is available. However, caching large amounts of data degrades performance. Typically, the number of cache entries for any instance of an application component shouldn't be more than 100.

Streaming large amounts of data, such as images and video clips, from the application components. KXS and the Web Connector plug-in performance are affected because all output from an application component must pass through the KXS before reaching the Web Connector plug-in and finally returning to the requesting client.

Huge request and response streams. When the list of name-value pairs sent to and from an application component (such as a servlet) is large, the KXS becomes busy. For more details about request and response objects, see the section "Request and Response Objects" on page 93.

Determining the Number of Threads
The number of request manager threads in the KXS should be equal to or less than the sum of all the threads in all the KJS and KCS engines that the KXS controls. For example, a KXS that controls four KJS engines, with 32 threads per engine, requires 128 or fewer request manager threads. Fewer than 128 request manager threads is adequate, but adding more will not significantly improve performance. To assist in improving NAS performance, make sure to configure enough threads so that the KXS can always accept a request from a web server without delay. However, configuring more threads than those available in your KJS and KCS engines is unnecessary, since the requests from the KXS will have nowhere to be redirected to until a KJS or KCS thread becomes available.

The maximum number of threads you should configure in the KXS is 256; as a rule, configure 128 request manager threads. To adjust the number of available threads in the KXS, use Netscape Application Server Administrator. (Or, if you are familiar with the NAS registry, you can reset the Application Server\\ 4.0\CCS0\\ENG\\0\\REQ parameter in the NAS registry, where 0 represents the engine number of the KXS process unless the engine number has been manually configured to another value. This parameter overrides the value in the CCS0\\REQ parameter, which sets the number of threads for all engines.)

For details about how to adjust the number of KXS threads, see Chapter 7, "Increasing Fault Tolerance and Server Resources," in the Administration Guide.

Sessions and Session Data A session is a grouping of information associated with an application user. The size of a session can affect server performance. If you design an application to use sessions (by calling the getSession() method for servlets, or the CreateSession() method for AppLogics), the information is typically created when the user connects with or logs in to the application. The session is then maintained by NAS until the user exits out of or logs off the application, at which point the session information is destroyed (again, you must explicitly code your application to end the session; this does not happen automatically).

Session Objects
Each session contains a name-value pair list, called the session object, for storing user-specific and interaction-specific information, such as user preferences, security credentials, and shopping cart contents. The name-value pair list is intended to act as a data lookup key, or an index. The session data provides a key that can be used to look up the original data stored elsewhere, such as on a back-end data source. Also, a session is maintained by the Data Synchronization feature of NAS, which provides distributed data synchronization across servers in a cluster.

Limiting the Size of Session Data
Sessions occupy memory and network bandwidth because they must be copied between servers for synchronization purposes. Therefore, keeping sessions small can significantly improve performance when a server participates in data synchronization. Sometimes applications are inadvertently designed in such a way that the session data is treated like a database: the session becomes a channel for pumping large amounts of data to the client. When you design your applications, limit the size of your session data to a few kilobytes, for example 1 to 2 KB, not megabytes. Large numbers of small sessions do not affect performance, but any number of large sessions do.

For information about creating sessions, see Chapter 11, "Creating and Managing User Sessions," in the Programmer's Guide.

Request and Response Objects In addition to the name-value pair list created and maintained by the session, an application component—such as a servlet—has a name-value pair list that is fed into and sent out of it. Every servlet has a name-value pair list, referred to as the request object for application input, and response object for application output. In contrast, a session name-value pair list exists only when you call the getSession() method, for servlets, or the CreateSession() method, for AppLogics. The servlet name-value pair list contains information such as standard HTTP headers—server name, server post, path information, query strings, and so on—and any cookies that were sent by the requesting client, such as a session ID.

Request Objects
The request object is created as soon as the KXS engine receives a request from the Web Connector plug-in by extracting information from the HTTP request. Typically, the request object contains a list of about 15 to 20 name-value entries. The KXS engine essentially deconstructs and rebuilds the request object before sending it to the servlet. Therefore, the length of the list affects performance. Note that the size of a single entry is not as important; a request object with 10 entries, each containing 1000 bytes of data performs better than a request object with a 100 entries each containing 100 bytes of data.

Guidelines for the Name-Value Pair List
Some guidelines for servlet name-value pair lists will help you improve performance:

Keep the number of entries in the list to a minimum.

Even if the name-value pairs in the list are distinct entries, try to bundle them together. NAS does not interpret the data itself when the KXS engine deconstructs the list, so bundle the entries together, using a protocol, and store the bundled entry as a single item in the list. You can then code the application so that it separates the data when it receives the list.

Keep the number of cookies to a minimum.

Every cookie becomes a distinct entry in a name-value pair list. Avoid using too many cookies as a way of reducing the number of entries on the list.

For more information about request and response objects, see Chapter 3, "Controlling Applications with Servlets," in the Programmer's Guide.

Input and Output of Application Components The amount of data that you send to and from your application can affect server performance. Understanding the flow of data can help you avoid some common bottlenecks that affect system performance.

Packet Flow
As the diagram in the section "Planning Firewall Location" on page 25 illustrates, a request is sent from the client (web browser), to the web server over a TCP connection. After further processing inside the Web Connector plug-in, the request is sent to the KXS engine in NAS, again over a TCP connection. The KXS engine performs further processing of the request and then sends it to the KJS or KCS engine, depending on whether your application components are designed in Java or C++.

Any data that the application sends back to the client must first be streamed to the KXS engine, then from there to the Web Connector plug-in, and then from there to the originating web browser client.

Size of Packets
The TCP connection transports the requests as packets of information over network lines, as explained in "Assessing Your Overall Network Configuration" on page 21. Typically, a packet size is about 32 KB, a size that is set by the operating system. Response time will be faster if all the information an application component requires and returns can fit into a single packet. You can also send several packets—for example 256 KB equals 8 packets. However, if you start transferring more than 8—for instance, 10 or more packets—performance will likely degrade as the traffic over the network increases. Given these limitations, try to design your applications so that the size of the HTML page returned to a client in response to a request is no greater than 31 KB. The remaining 1 KB is used by NAS to package HTTP headers, cookies, and so on.

Sticky Application Components When you develop an application component, you can mark it as sticky. This means that it is processed by the same NAS machine or process where it is initially invoked. For example, a servlet called ShopCart is duplicated on two application servers, Server A and Server B, for load balancing. If ShopCart is invoked on Server B by a particular client, all subsequent sticky requests from that same client for that component are processed on Server B only. In other words, the component "sticks" to Server B. This maintains the integrity of state and session information for an application component that does not distribute session information.

Improving Performance with Sticky Application Components
Sticky application components can improve performance because their session data and other application processing data (such as back-end connections), reside locally in a KJS or KCS engine of a particular NAS machine, for a particular requesting client. In other words, if a request for a sticky application component comes from a client that previously submitted a request for a sticky application component, the request is routed to the same engine that handled the earlier request. If the application component is not sticky, the engine has to retrieve the data from the KXS every time the engine processes the component and the KXS has to determine which KJS or KCS to send the request to.

The improvement in performance that you gain from using sticky application components comes at a cost. Sticky application components cannot be dynamically load balanced, because with load balancing the requests are routed to the least loaded server, no matter what its history is of processing application components. So, with sticky application components, servers cannot share the server processing load. If you don't use dynamic load balancing, you risk overloading the sticky server with requests. Furthermore, if the sticky server fails for any reason, then none of the sticky application components selected for that server can execute due to the single point of failure. Decide whether or not to mark an application component as sticky according to the particular application and the access pattern of that application's session data.

For more information about sticky application components, see Chapter 13, "Balancing User-Request Loads," in the Administration Guide.

Load Balancing Load balancing redirects user requests to the server that is best suited to handle them, either because the server is least loaded or because the application component that processes the requests has historically performed best on that particular machine. Load balancing has a direct impact on server performance and on how quickly user requests are processed. In deciding which server to direct requests to, the load-balancing process takes into account factors such as CPU usage, memory thrashing, hard disk access, and so on. It also factors in individual application component statistics, such as last response time, number of times the application component has been executed on a particular NAS machine, and so on.

Broadcasting and Updating NAS Information
For load balancing to be effective, each server involved in the process must have the most current information about all the other servers. This means that information about the factors that affect load balancing must be broadcast to all the NAS machines, and every NAS machine must monitor and update this information to make load-balancing decisions. Broadcasting information too often, results in a high level of network traffic and could slow down response time. However, if the load-balancing information is not calculated and updated frequently, then application components risk not being optimally load balanced because the information NAS uses to make load-balancing decisions is outdated.

When making decisions about load balancing, you face two major dilemmas:

How frequently should a NAS server update its load-balancing information?

How frequently should every NAS installation broadcast its load-balancing information?

Update Interval . A minimum value of 5 seconds and a maximum value of 10 seconds is appropriate in most cases. In general, set the Update Intervals criteria for each server to be twice the response time, under stable conditions, of the most frequently used application component. For example, on a system where the most frequently used application component returns requests in 5 seconds, set the update interval to 10 seconds. Setting it to a more frequent update rate causes the server to do more work and could even alter load-balancing characteristics. Use caution with this calculation; if the response time of a heavily used application component is only 1.5 seconds, do not set the Update Interval to 3 seconds.

Broadcast Interval. As mentioned earlier, broadcasting load-balancing information too frequently will not only increase network traffic, it will also increase the work load of your NAS system as all the servers work to post and gather the information. In general, set the Broadcast Intervals criteria for a server to be twice the value of its Update Interval.

Set the Update Interval and the Broadcast Interval criteria using the Load Balancing tool in Netscape Application Server Administrator.

Monitoring Load-Balancing Information
When you set load-balancing criteria, be patient about the fine-tuning process. Determining the best combination of load balancing criteria takes careful monitoring of your NAS configuration over a period of time, during which you must gather statistics about peak load, your mix of request types, response time averages, bottlenecks, and so on. There is no single load balancing solution for all NAS users, since every system is deployed with different parameters and criteria. As with any aspect of NAS deployment, only you can determine over time the best set of criteria for improving performance of the NAS system deployed at your site.

For more information about load balancing and using Netscape Application Server Administrator to set load-balancing criteria, see Chapter 13, "Balancing User-Request Loads," in the Administration Guide.

Sync Backups As explained in Chapter 2, "Planning Your Environment," deploying a Sync Backup in your NAS cluster can affect performance. Every Sync Backup adds overhead, due to all the state and session data from the Sync Primary that is continuously backed up in realtime to the Sync Backup. This leads to a heavier flow of network traffic as well as memory allocation and deallocation, both of which can degrade performance. When your application server is operating at peak load, the Sync Backup has to work even harder because of the amount of information it has to track. As a general rule, use a Sync Backup only if it's absolutely necessary. For example, if you have a database in which all your server information is stored, you may decide you don't need a Sync Backup.

Whether or not you decide to use a Sync Backup depends on how you balance the tradeoff between your fault tolerance requirements and your performance requirements.

Database Connections An application component can establish a database connection to store and access information to and from the database. The number of connections, which you control when you design your application, to and from the application component can affect performance.

Database connectivity issues to keep in mind include:

The nature of the queries and how long they spend in the database

The speed of the database

The size of the data you are retrieving from the database

The efficiency of the database when handling multiple connections

Setting the Number of Database Connections
Limit the number of database connections to no more than 32. Typically, you can set the number of connections to equal the number of threads per KJS or KCS engine that is processing the particular application component. This guarantees that all the engine threads can communicate with the database concurrently. Note that even if your NAS installation uses more than 32 threads per engine, you should stay within the 32 database connections maximum.

For information about implementing database connectivity in your applications, see the Programmer's Guide, Chapter 8, "Handling Transactions with EJBs" and Chapter 9, "Using JDBC for Database Access," and the Sun Java Database Connectivity (JDBC) API Specification. All specifications are accessible from installdir/nas/docs/index.htm, where installdir is the location in which you installed NAS.

Multiple CPUs As explained in Chapter 2, "Planning Your Environment," when you plan your NAS topology, you must decide how many machines to include in your cluster and, per machine, how many CPUs to implement. These two factors directly affect performance.

Determining the Number of Engines Per CPU
If you follow these recommendations, note that a typical KXS engine can serve approximately 8 KJS or KCS engines, and 16 KJS or KCS engines on an eight CPU machine. For maximum performance, limit the number of KJS and KCS engines on a four CPU machine to 4 or 6. If you intend to exceed this number of engines, do not use more than 2 engines per CPU.

Memory Considerations
The amount of memory on a NAS machine affects its performance. How much memory you should deploy depends on the number of CPUs on that machine and the nature of your applications. Generally, a four CPU machine performs well with a total of 512 MB of memory. This amounts to 128 MB per CPU. The same machine can perform adequately, though not optimally, with at least 64 MB of memory per CPU. However, if your applications require vast amounts of memory because, for example, they allocate a large number of objects, deploy more, such as 256 MB per CPU on a four CPU machine.

Cluster Sizing The number of machines in a NAS cluster also affects system performance. Chapter 2, "Planning Your Environment" describes several different topologies. After you run performance tests, revisit that information and decide if a different topology and cluster size might perhaps improve performance.

Resizing for Throughput Chapter 3, "Determining System Capacity" explains how to plan and measure system performance. After you run performance tests and determine your actual capacity numbers, use the test data to redo the calculations described in Chapter 3, "Determining System Capacity."

Start with the number of concurrent users submitting requests on your system. You can make adjustments to engines and threads per engine based on the number of concurrent requests and the number of CPUs in your system.

Example Sizing Scenario with Lightweight Requests
For example, assume the following scenario:

You have a single NAS machine system with the following configuration

4 CPUs

4 KJS engines (one KJS per CPU)

32 threads per engine

On average, 120 requests are submitted concurrently on your system.

These requests are lightweight (see "Complexity of Requests" on page 68).

The requests should take no more than 2 seconds to complete.

The 120 requests must run on separate threads, which means that there must be approximately 120 free threads to handle the load. In this scenario, there are indeed approximately 120 free threads (32 threads * 4 KJS engines).

Another way of handling this load would be to deploy two NAS machines, each one running two CPUs with two engines per CPU.

Example Sizing Scenario with More Complex Requests
Now, assume that not all your requests are lightweight. Instead, you have a mix of requests, in which 25 percent are mediumweight and heavyweight. Assume that the processing time for these requests is longer, for example 10 to 15 seconds per request. In this case, 30 threads (25 percent of 120) are always busy at any given time. Therefore, if 120 requests are concurrently submitted, only about 98 requests ((32 * 4)-30) are able to find a free thread to run on. The additional requests must wait for a free thread to become available, slowing down response time.

To improve the response time, you should consider adding resources. One option is to increase the number of threads per existing engine, but this might degrade performance because of too many threads on a few already saturated engines. However, if the requests are primarily of a mediumweight level of complexity (see "Complexity of Requests" on page 68), generating a lot of database queries, rather than a heavyweight level of complexity, generating calculations and new records, performance should be adequate. Again, performance depends greatly on the nature and mix of your system's requests and the applications that respond to those requests.

Overloading Engines
Adding too many engines, or too many threads per engine, to fine tune performance is a common situation that you should try to avoid. Often, this causes performance to degrade, because as mentioned above, the CPUs are already saturated (running at 90 to 95 percent of full capacity) and cannot handle the extra load even with the additional resources. The overhead of thread context-switching adds costs that far outweigh the benefits of additional threads. The additional engines and threads don't even have a chance to run, because the CPU is already busy. The only workaround is to add more CPUs or, if possible, deploy more NAS machines on your system.

Anticipating Future Load
Add extra resources, if you can, beyond those you know you'll need to meet peak load requirements. Even if you do not initially add more engines to improve throughput, plan your hardware for the possibility that you will add them in the future as throughput on your server increases, which often happens over time. In other words, deploy multiple CPUs up front in anticipation that you may need to add more engines later. For example, in the scenario described in this section, "Example Sizing Scenario with Lightweight Requests," you may decide to deploy two machines with four CPUs each but only run two engines per machine. This leaves you adequate room to add more engines later, if and when you need them.

Web Servers As explained in Chapter 2, "Planning Your Environment," the number of web servers in your NAS topology can affect performance. After running performance tests, determine if the web server is causing the bottleneck in your overall system.

The web server's role is to accept client requests, package them, send them to the KXS engine in your NAS system, receive the response packets from NAS, and deliver them back to the originating client. The Web Connector plug-in assembles headers (HTTP headers, query strings, cookies, and so on) and sends the packets to the KXS.

Setting the Number of Web Server Threads
Netscape Enterprise Server (NES) can accept approximately 512 concurrent client requests. However, because NES has a smaller back-end thread pool (referred to as the worker threads) to run requests on the Web Connector plug-in, not all of these requests can be concurrently forwarded on to the KXS engine in NAS. You should configure the number of worker threads to be as low as possible, so as to sustain the optimal volume of requests. A high number of threads may allow more requests to be processed concurrently but results in more CPU overhead due to thread context switching.

If your web server is installed on a separate machine from your NAS system, and the CPUs on the web server machine are not operating at full capacity but you notice client requests and responses being queued up, you should increase the number of worker threads. The limit on the number of worker threads is set by the NSCP_THREADPOOL_MAX environment variable in NES. The default value, 100, is usually adequate.

If the CPUs on the web server machine are operating at full capacity and those on the NAS machines are not, then consider adding additional web server machines. Usually, the CPUs on the NAS machine become fully saturated first, but if your applications are returning large amounts of data and a lot of processing is taking place in the Web Connector plug-in, then the Web Connector plug-in could be the bottleneck.

For details about fine-tuning your web server, refer to your web server documentation and consult your web server administrator.

Fine-Tuning Worksheet

The following worksheet summarizes information in the previous section, and helps you track decisions you've made about your NAS configuration. Read the section "Fine-Tuning Netscape Application Server" on page 88 for details about each area of fine-tuning.

This worksheet does not substitute the material covered throughout this Deployment Guide. You should read this guide carefully and use all the information presented when making your deployment decisions.

General System Area
Client that is generating requests

Bottleneck
When running performance tests, the script or tool that generates the requests may have problems with high loads.

Recommendation
Check the script

Your Deployment Decision/Action