Sun Java System Web Proxy Server 4.0.11 Performance Tuning, Sizing, and Scaling Guide

Chapter 2 Tuning Sun Java System Web Proxy Server

This chapter describes specific adjustments you can make that might improve Sun Java System Web Proxy Server performance. It provides an overview of Proxy Server's connection-handling process so that you can better understand the tuning settings. The chapter includes the following topics:

Note –

Be very careful when tuning your server. Always back up your configuration files before making any changes.

General Tuning Tips

As you tune your server, it is important to remember that your specific environment is unique. The impacts of the suggestions provided in this guide will vary, depending on your specific environment. Ultimately you must rely on your own judgement and observations to select the adjustments that are best for you.

As you work to optimize performance, keep the following guidelines in mind:

Work methodically

As much as possible, make one adjustment at a time. Measure your performance before and after each change, and rescind any change that doesnot produce a measurable improvement.
Adjust gradually

When adjusting a quantitative parameter, make several changes in succession, rather than trying to make a drastic change all at once. Different systems face different circumstances, and you might pass by your system’s best setting if you change the value too rapidly.
Start fresh

At each major system change, be it a hardware or software upgrade or deployment of a major new application, review all previous adjustments to see whether they still apply. After a Solaris upgrade, you should start over with an unmodified /etc/system file.
Stay informed

Read the Sun Java System Web Proxy Server 4.0.11 Release Notes and the release notes for your operating system whenever you upgrade your system. The release notes often provide updated information about specific adjustments.

Understanding Threads, Processes, and Connections

Before tuning your server, you should understand the connection-handling process in Proxy Server. Request processing threads handle Proxy Server connections. You can configure Request handling threads from the Admin console or by editing the configuration file. This section includes the following topics:

Connection-Handling Overview

In Proxy Server, acceptor threads on a listen socket accept connections and put them into a connection queue. Request processing threads in a thread pool then pick up connections from the queue and service the requests.

Figure 2–1 Proxy Server Connection Handling

Connection handling in Proxy Server, showing how a request
is transmitted to a request processing thread.

A request is not thread-safe if processing the request requires interaction between a number of threads. A part of the request which is not thread-safe is transferred to a NativePool, which is a collection of threads which can interact with each other. The NativePool processes the request and communicates the request back to the request processing thread.

At startup, the server only creates the number of threads defined in the thread pool minimum threads, by default set to number of processors. As the load increases, the server creates more threads. The policy for adding new threads is based on the connection queue state.

Each time a new request is created, the number of requests waiting in the queue, often considered the backlog of connections, is compared to the number of request processing threads already created. If the number of requests is greater than the number of threads, more threads are created.

The process of adding new session threads is strictly limited by the maximum threads value. For more information on maximum threads, see Maximum Threads (Maximum Simultaneous Requests).

You can change the settings that affect the number and timeout of threads, processes, and connections in the Admin console.

Low Latency and High Concurrency Modes

The server can run in one of two modes, depending upon the load. It changes modes to accommodate the load most efficiently.

In low latency mode, for keep-alive connections, session threads themselves poll for new requests.
In high concurrency mode, after finishing the request, session threads give the connection to the keep-alive subsystem. In high concurrency mode, the keep-alive subsystem polls for new requests for all keep-alive connections.

When the server is started, it starts in low latency mode. When the load increases, the server moves to high concurrency mode. The decision to move from low latency mode to high concurrency mode and back again is made by the server, based on connection queue length, average total sessions, average idle sessions, and currently active and idle sessions.

Disabled Thread Pools

If a thread pool is disabled, no threads are created in the pool, no connection queue is created, and no keep-alive threads are created. When the thread pool is disabled, the acceptor threads themselves process the request.

Connection–Handling `magnus.conf` Directives for NSAPI

In addition to the settings discussed above, you can edit the following directives in the magnus.conf file to configure additional request-processing settings for NSAPI plug-ins:

KernelThreads – Determines whether NSAPI plug-ins always run on kernel-scheduled threads (Windows only)
TerminateTimeout – Determines the maximum amount of time to wait for NSAPI plug-ins to finish processing requests when the server is shut down

For detailed information about these directives, see the Sun Java System Web Proxy Server 4.0.11 Configuration File Reference.

Custom Thread Pools

By default, the connection queue sends requests to the default thread pool. However, you can also create your own thread pools in magnus.conf using a thread pool Init function. These custom thread pools are used for executing NSAPI Service Application Functions (SAFs), not entire requests.

If the SAF requires the use of a custom thread pool, the current request processing thread queues the request, waits until the other thread from the custom thread pool completes the SAF, then the request processing thread completes the rest of the request.

For example, the obj.conf file contains the following:

NameTrans fn="assign-name" from="/testmod" name="testmod" pool="my-custom-pool"
...
<Object name="testmod">
ObjectType fn="force-type" type="magnus-internal/testmod"
Service method=(GET|HEAD|POST) type="magnus-internal/testmod"
fn="testmod_service" pool="my-custom-pool2"
</Object>

In this example, the request is processed as follows:

The request processing thread, referred to as A1 in this example, picks up the request and executes the steps before the NameTrans directive.
If the URI starts with /testmod, the A1 thread queues the request to the my-custom-pool queue. The A1 thread waits.
A different thread in my-custom-pool, called the B1 thread in this example, picks up the request queued by A1. B1 completes the request and returns to the wait stage.
The A1 thread wakes up and continues processing the request. It executes the ObjectType SAF and moves on to the Service function.
Because the Service function must be processed by a thread in my-custom-pool2, the A1 thread queues the request to my-custom-pool2.
A different thread in my-custom-pool2, called C1 in this example, picks up the queued request. C1 completes the request and returns to the wait stage.
The A1 thread wakes up and continues processing the request.

In this example, three threads, A1, B1, and C1 work to complete the request.

Additional thread pools are a way to run thread-unsafe plug-ins. By defining a pool with a maximum number of threads set to 1, only one request is allowed into the specified service function. In the previous example, if testmod_service is not thread-safe, it must be executed by a single thread. If you create a single thread in the my-custom-pool2, the SAF works in a multi-threaded Proxy Server.

For more information on defining thread pools, see thread-pool-init in Sun Java System Web Proxy Server 4.0.11 Configuration File Reference.

Native Thread Pool

On Windows, the native thread pool (NativePool) is used internally by the server to execute NSAPI functions that require a native thread for execution.

Proxy Server uses Netscape Portable Runtime (NSPR), which is an underlying portability layer providing access to the host OS services. This layer provides abstractions for threads that are not always the same as those for the OS-provided threads. These non-native threads have lower scheduling overhead, so their use improves performance. However, these threads are sensitive to blocking calls to the OS, such as I/O calls. To make it easier to write NSAPI extensions that can make use of blocking calls, the server keeps a pool of threads that safely support blocking calls. These threads are usually native OS threads. During request processing, any NSAPI function that is not marked as being safe for execution on a non-native thread is scheduled for execution on one of the threads in the native thread pool.

If you have written your own NSAPI plug-ins such as NameTrans, Service, or PathCheck functions, these execute by default on a thread from the native thread pool. If your plug-in makes use of the NSAPI functions for I/O exclusively or does not use the NSAPI I/O functions at all, then it can execute on a non-native thread. For this to happen, the function must be loaded with a NativeThread="no" option, indicating that it does not require a native thread.

For example, add the following to the load-modules Init line in the obj.conf file:

Init funcs="pcheck_uri_clean_fixed_init" shlib="C:/Sun/proxyserver40/lib/custom.dll" 
fn="load-modules" NativeThread="no"

The NativeThread flag affects all functions in the funcs list, so if you have more than one function in a library, but only some of them use native threads, use separate Init lines. If you set NativeThread to yes, the thread maps directly to an OS thread.

For information on the load-modules function, see load-modules in Sun Java System Web Proxy Server 4.0.11 Configuration File Reference.

Process Modes

You can run Sun Java System Web Proxy Server in one of the following modes:

Note –

Multi-process mode is deprecated for Java technology-enabled servers. Most applications are now multi-threaded, and multi-process mode is usually not needed. However, multi-process mode can significantly improve overall server throughput for NSAPI applications that do not implement fine-grained locking.

Single-Process Mode

In the single-process mode, the server receives requests from web clients to a single process. Inside the single server process, acceptor threads are running that are waiting for new requests to arrive. When a request arrives, an acceptor thread accepts the connection and puts the request into the connection queue. A request processing thread picks up the request from the connection queue and handles the request.

Because the server is multi-threaded, all NSAPI extensions written to the server must be thread-safe. This means that if the NSAPI extension uses a global resource, like a shared reference to a file or global variable, then the use of that resource must be synchronized so that only one thread accesses it at a time. All plug-ins provided with the Proxy Server are thread-safe and thread-aware, providing good scalability and concurrency. However, your legacy applications might be single-threaded. When the server runs the application, it can only execute one at a time. This leads to server performance problems when put under load. Unfortunately, in the single-process design, there is no real workaround.

Multi-Process Mode

You can configure the server to handle requests using multiple processes with multiple threads in each process. This flexibility provides optimal performance for sites using threads, and also provides backward compatibility to sites running legacy applications that are not ready to run in a threaded environment. Because applications on Windows generally already take advantage of multi-thread considerations, this feature applies to UNIX and Linux platforms.

The advantage of multiple processes is that legacy applications that are not thread-aware or thread-safe can be run more effectively in Sun Java System Web Proxy Server. However, because all of the Sun Java System extensions are built to support a single-process threaded environment, they might not run in the multi-process mode. The Search plug-ins fail on startup if the server is in multi-process mode, and if session replication is enabled, the server will fail to start in multi-process mode.

In the multi-process mode, the server spawns multiple server processes at startup. Depending on the configuration, each process contains one or more threads, that receive incoming requests. Since each process is completely independent, each one has its own copies of global variables, caches, and other resources. Using multiple processes requires more resources from your system. Also, if you try to install an application that requires shared state, it has to synchronize that state across multiple processes. NSAPI provides no helper functions for implementing cross-process synchronization.

When you specify a MaxProcs value greater than 1, the server relies on the operating system to distribute connections among multiple server processes (see MaxProcs (UNIX/Linux) for information about the MaxProcs directive). However, many modern operating systems do not distribute connections evenly, particularly when there are a small number of concurrent connections.

Because Sun Java System Web Proxy Server cannot guarantee that load is distributed evenly among server processes, you might encounter performance problems if you set Maximum Threads to 1 and MaxProcs greater than 1 to accommodate a legacy application that is not thread-safe. The problem is especially pronounced if the legacy application takes a long time to respond to requests, for example, when the legacy application contacts a back-end database. In this scenario, it might be preferable to use the default value for Maximum Threads and serialize access to the legacy application using thread pools. For more information about creating a thread pool, see thread-pool-init in Sun Java System Web Proxy Server 4.0.11 Configuration File Reference.

If you are not running any NSAPI in your server, you should use the default settings: one process and many threads. If you are running an application that is not scalable in a threaded environment, you should use a few processes and many threads, for example, 4 or 8 processes and 128 or 512 threads per process.

`MaxProcs` (UNIX/Linux)

To run a UNIX or Linux server in multi-process mode, set the MaxProcs directive to a value that is greater than 1. Multi-process mode might provide higher scalability on multi-processor machines and improve the overall server throughput on large systems such as the Sun Fire^TM T2000 server. If you set the value to less than 1, it is ignored and the default value of 1 is used.

You can set the value for MaxProcs by editing the MaxProcs parameter in magnus.conf.

Note –

You will receive duplicate startup messages when running your server in MaxProcs mode.

Using Monitoring Data to Tune Your Server

This section describes the performance information available through the Admin console, perfdump, and stats-xml. It discusses how to analyze that information and tune parameters to improve your server’s performance.

Proxy Server automatically selects many server defaults based on the system resources. The number of acceptor threads and keep-alive threads defaults to the number of CPUs. The server/thread-pool/max-threads defaults to greater of 128 or the number of CPUs. The server/thread-pool/min-threads defaults to lesser the value of server/thread-pool/max-threads or the number of CPUs. The server/access-log-buffer/max-buffers-per-file defaults to the number of CPUs. The server configures the connection queue size, maximum number of keep-alive connections, and the maximum number of open files in the file cache, based on the total number of available file descriptors in the system. The values for these are obtained from the server log file when the log level is set to fine. All the server chosen defaults are tunable.

The default tuning parameters are appropriate for all sites except those with very high volume. The only settings that large sites might regularly need to change are the thread pool and keep alive settings. Tune these settings at the configuration level in the Admin console or using wadm commands. It is also possible to tune the server by editing the elements directly in the server.xml file, but editing the server.xml file directly can lead to complications.

perfdump monitors statistics in the following categories, which are described in the following sections. In most cases these statistics are also displayed in the Admin console, command-line interface, and stats-xml output. The following sections contain tuning information for all these categories, regardless of which method you use to monitor the data:

Connection Queue Information

In Proxy Server, a connection is first accepted by acceptor threads associated with the HTTP listener. The acceptor threads accept the connection and put it into the connection queue. Then, request processing threads take the connection in the connection queue and process the request. For more information, see Connection-Handling Overview.

Connection queue information shows the number of sessions in the connection queue, and the average delay before the connection is accepted by the request processing thread.

The following is an example of how these statistics are displayed in perfdump:

ConnectionQueue:
-----------------------------------------
Current/Peak/Limit Queue Length            0/1853/160032
Total Connections Queued                   11222922
Average Queue Length (1, 5, 15 minutes)    90.35, 89.64, 54.02
Average Queueing Delay                     4.80 milliseconds

The following table shows the information displayed in the Admin Console when accessing monitoring information for the server instance:

Table 2–1 Connection Queue Statistics


Present Number of Connections Queued	0
Total Number of Connections Queued	11222922
Average Connections Over Last 1 Minute	90.35
Average Connections Over Last 5 Minutes	89.64
Average Connections Over Last 15 Minutes	54.02
Maximum Queue Size	160032
Peak Queue Size	1853
Number of Connections Overflowed	0
Ticks Spent	5389284274
Total Number of Connections Added	425723

Current /Peak /Limit Queue Length

Current/Peak/Limit queue length shows, in order:

The number of connections currently in the queue.
The largest number of connections that have been in the queue simultaneously.
The maximum size of the connection queue. This number is:

Maximum Queue Size = Thread Pool Queue Size + Maximum Threads + Keep-Alive Queue Size

Once the connection queue is full, new connections are dropped.

Tuning

If the peak queue length, also known as the maximum queue size, is close to the limit, you can increase the maximum connection queue size to avoid dropping connections under heavy load.

Total Connections Queued

Total Connections Queued is the total number of times a connection has been queued. This number includes newly-accepted connections and connections from the keep-alive system.

This setting is not tunable.

Average Queue Length

The Average Queue Length shows the average number of connections in the queue over the most recent one-minute, five-minute, and 15-minute intervals.

This setting is not tunable.

Average Queuing Delay

The Average Queueing Delay is the average amount of time a connection spends in the connection queue. This represents the delay between when a request connection is accepted by the server and when a request processing thread begins servicing the request. It is the Ticks Spent divided by the Total Connections Queued, and converted to milliseconds.

This setting is not tunable.

Ticks Spent

A tick is a system-dependent value and provided by the tickPerSecond attribute of the server element in stats.xml. The ticks spent value is the total amount of time that connections spent in the connection queue and is used to calculate the average queuing delay.

This setting is not tunable.

Total Number of Connections Added

The new connections added to the connection queue. This setting is not tunable.

HTTP Listener (Listen Socket) Information

The following HTTP listener information includes the IP address, port number, number of acceptor threads, and the default virtual server. For tuning purposes, the most important field in the HTTP listener information is the number of acceptor threads.

The following is an example of how the HTTP listeners information appears in perfdump:

ListenSocket ls1:
------------------------
Address                   https://0.0.0.0:2014
Acceptor Threads          1
Default Virtual Server    https-test

If you have created multiple HTTP listeners, perfdump displays all of them.

For more information about adding and editing listen sockets, see the Sun Java System Web Proxy Server 4.0.11 Administration Guide.

Address

The Address field contains the base address on which this listen socket is listening. A host can have multiple network interfaces and multiple IP addresses. The address contains the IP address and the port number.

If your listen socket listens on all network interfaces for the host machine, the IP part of the address is 0.0.0.0.

Tuning

This setting is tunable when you edit an HTTP listener. If you specify an IP address other than 0.0.0.0, the server makes one less system call per connection. Specify an IP address other than 0.0.0.0 for best possible performance.

Acceptor Threads

Acceptor threads are threads that wait for connections. The threads accept connections and put them in a queue where they are then picked up by worker threads. For more information, see Connection-Handling Overview.

Ideally, you want to have enough acceptor threads so that there is always one available when a user needs one, but few enough so that they do not burden the system. A good rule is to have one acceptor thread per CPU on your system. You can increase this value to about double the number of CPUs if you find indications of TCP/IP listen queue overruns.

Tuning

This setting is tunable when you edit an HTTP listener. The number of acceptor threads defaults to the number of CPUs on your system.

Other HTTP listener settings that affect performance are the size of the send buffer and receive buffer. For more information regarding these buffers, see your operating system documentation.

Tuning

This setting is tunable when you edit an HTTP listener.

Keep-Alive Information

This section provides information about the server’s HTTP-level keep-alive system.

Note –

The name keep alive should not be confused with TCP keep-alives. Also, note that the name keep-alive was changed to PersistentConnections in HTTP 1.1, but Proxy Server continues to refer to these connections as keep-alive connections. Most modern browsers request a web page from the server through persistent connections with the web server. The connection is kept alive even after processing a request, so that it will be easier to process a similar request.

The following example shows the keep-alive statistics displayed by perfdump:

KeepAliveInfo:
--------------------
KeepAliveCount        198/200
KeepAliveHits         0
KeepAliveFlushes      0
KeepAliveRefusals     56844280
KeepAliveTimeouts     365589
KeepAliveTimeout      10 seconds

The following table shows the keep-alive statistics displayed in the Admin Console:

Table 2–2 Keep-Alive Statistics


Number of Connections Processed	0
Total Number of Connections Added	198
Maximum Connection Size	200
Number of Connections Flushed	0
Number of Connections Refused	56844280
Number of Idle Connections Closed	365589
Connection Timeout	10

Both HTTP 1.0 and HTTP 1.1 support the ability to send multiple requests across a single HTTP session. A proxy server can receive hundreds of new HTTP requests per second. If every request is allowed to keep the connection open indefinitely, the server can become overloaded with connections. On UNIX and Linux systems, this can lead to a file table overflow very easily.

To resolve this problem, the server maintains a counter for the maximum number of waiting keep-alive connections. A waiting keep-alive connection has fully completed processing the previous request, and is now waiting for a new request to arrive on the same connection. If the server has more than the maximum waiting connections open when a new connection waits for a keep-alive request, the server closes the oldest connection. This algorithm keeps an upper bound on the number of open waiting keep-alive connections that the server can maintain.

Sun Java System Web Proxy Server does not always honor a keep-alive request from a client. The following conditions cause the server to close a connection, even if the client has requested a keep-alive connection:

The keep alive timeout is set to 0.
The keep alive maximum connections count is exceeded.
Dynamic content, such as a CGI, does not have an HTTP content-length header set. This applies only to HTTP 1.0 requests. If the request is HTTP 1.1, the server honors keep-alive requests even if the content-length is not set. The server can use chunked encoding for these requests if the client can handle them (indicated by the request header transfer-encoding: chunked).
The request is not HTTP GET or HEAD.
The request was determined to be bad. For example, if the client sends only headers with no content.

The keep-alive subsystem in Proxy Server is designed to be massively scalable. The out-of-the-box configuration can be less than optimal if the workload is non-persistent (that is, HTTP 1.0 without the KeepAlive header), or for a lightly loaded system that is primarily servicing keep-alive connections.

Keep-Alive Count

This section in perfdump has two numbers:

Number of connections in keep-alive mode, also known as the total number of connections added
Maximum number of connections allowed in keep-alive mode simultaneously, also known as the maximum connection size

The maximum number of connections allowed in keep-alive mode can be configured using the MaxKeepAliveConnections magnus.conf directive.

Note –

The number of connections specified by the maximum connections setting is divided equally among the keep-alive threads. If the maximum connections setting is not equally divisible by the keep-alive threads setting, the server might allow slightly more than the maximum number of simultaneous keep-alive connections.

Keep-Alive Hits

The keep-alive hits, or the number of connections processed, is the number of times a request was successfully received from a connection that was kept alive.

This setting is not tunable.

Keep-Alive Flushes

The number of times the server had to close a connection because the total number of connections added exceeded the keep-alive maximum connections setting. The server does not close existing connections when the keep-alive count exceeds the maximum connection size. Instead, new keep-alive connections are refused and the number of connections refused count is incremented.

Keep-Alive Refusals

The number of times the server could not complete the connection to a keep-alive thread, possibly due to too many persistent connections (or when total number of connections added exceeds the keep-alive maximum connections setting). The suggested tuning is to increase the keep-alive maximum connections.

Keep-Alive Timeouts

The number of times the server closed idle keep-alive connections because client connections timed out without any activity. This statistic is useful to monitor. There is no specific tuning advised for this setting.

Keep-Alive Timeout

The time, measured in seconds, before idle keep-alive connections are closed.

Keep-Alive Poll Interval

The keep-alive poll interval specifies the interval in seconds at which the system polls keep-alive connections for further requests. The default is 0.001 second, the lowest value allowed. It is set to a low value to enhance performance at the cost of CPU usage.

Keep-Alive Threads

The KeepAliveThreads magnus.conf directive can be used to specify the number of keep-alive threads.

Tuning for HTTP 1.0-Style Workload

Since HTTP 1.0 results in a large number of new incoming connections, the default acceptor threads of 1 per listen socket would be suboptimal. Increasing this to a higher number should improve performance for HTTP 1.0-style workloads. For instance, for a system with 2 CPUs, you might want to set it to 2. You might also want to reduce the keep-alive connections, for example, to 0.

HTTP 1.0-style workloads can have many connections established and terminated.

If users are experiencing connection timeouts from a browser to Proxy Server when the server is heavily loaded, you can increase the size of the HTTP listener backlog queue by setting the HTTP listener's listen queue size to a larger value, such as 8192. The listen queue size can be specified using the "Configure System Preferences" screen in the admin interface.

The HTTP listener listen queue specifies the maximum number of pending connections on a listen socket. Connections that time out on a listen socket whose backlog queue is full fail.

Tuning for HTTP 1.1-Style Workload

While tuning server-persistent connection handling, balancing throughput and latency is a challenge. The keep-alive poll interval and timeout control latency. Lowering the value of these settings is intended to lower latency on lightly loaded systems, for example, to reduce page load times. Increasing the values of these settings is intended to raise aggregate throughput on heavily loaded systems, for example, by increasing the number of requests per second the server can handle. However, if there is too much latency and too few clients, aggregate throughput suffers as the server sits idle unnecessarily. As a result, the general keep-alive subsystem tuning rules at a particular load are as follows:

If there's idle CPU time, decrease the poll interval.
If there's no idle CPU time, increase the poll interval.

Also, chunked encoding could affect the performance for HTTP 1.1 workload. Tuning the response buffer size can positively affect the performance. A higher response buffer size, set using the magnus.conf parameter, ChunkedRequestBufferSize would result in sending a Content-length: header, instead of chunking the response.

You can also set the buffer size for a Service-class function in the obj.conf file, using the UseOutputStreamSize parameter. UseOutputStreamSize overrides the value set using the output-buffer-size property. If UseOutputStreamSize is not set, Proxy Server uses the output-buffer-size setting. If the output-buffer-size is not set, Web Server uses the output-buffer-size default value of 8192.

The following example shows setting the buffer size for the nsapi_test Service function:

<Object name="nsapitest">
 ObjectType fn="force-type" type="magnus-internal/nsapitest"
 Service method=(GET) type="magnus-internal/nsapitest" fn="nsapi_test"
 UseOutputStreamSize=12288
</Object>

Thread Information

Maximum Threads (Maximum Simultaneous Requests)

The maximum threads setting specifies the maximum number of simultaneous transactions that Proxy Server can handle. The default value is greater of 128 or the number of processors in the system. Changes to this value can be used to throttle the server, minimizing latencies for the transactions that are performed. The Maximum Threads value acts across multiple virtual servers, but does not attempt to load balance. It is set for each configuration.

Reaching the maximum number of configured threads is not necessarily undesirable, and you do not need to automatically increase the number of threads in the server. Reaching this limit means that the server needed this many threads at peak load, but as long as it was able to serve requests in a timely manner, the server is adequately tuned. However, at this point connections queue up in the connection queue, potentially overloading it. If you monitor your server's performance regularly and notice that total sessions created number is often near the maximum number of threads, consider increasing your thread limits.

To compute the number of simultaneous requests, the server counts the number of active requests, adding one to the number when a new request arrives, subtracting one when it finishes the request. When a new request arrives, the server checks to see if it is already processing the maximum number of requests. If it has reached the limit, it defers processing new requests until the number of active requests drops below the maximum amount.

In theory, you can set the maximum threads to 1 and still have a functional server. Setting this value to 1 would mean that the server could only handle one request at a time, but since HTTP requests for static files generally have a very short duration. Response time can be as low as 5 milliseconds. Processing one request at a time still allows you to process up to 200 requests per second.

However, in actuality, Internet clients frequently connect to the server and then do not complete their requests. In these cases, the server waits 30 seconds or more for the data before timing out. This wait interval can be configured using the AcceptTimeout directive in magnus.conf. By setting the default value to less than 30 seconds you can free up threads sooner, but you might also disconnect users with slower connections. Also, some sites perform heavyweight transactions that take minutes to complete. Both of these factors add to the maximum simultaneous requests that are required. If your site is processing many requests that take many seconds, you might need to increase the number of maximum simultaneous requests.

Suitable maximum threads values range from 100—500, depending on the load. Maximum Threads represents a hard limit for the maximum number of active threads that can run simultaneously, which can become a bottleneck for performance.

The thread pool minimum threads is the minimum number of threads the server initiates upon startup. The default is set to number of processors.

Note –

When configuring Proxy Server to be used with the Solaris Network Cache and Accelerator (SNCA), setting the maximum threads and the queue size to 0 provides better performance. Because SNCA manages the client connections, it is not necessary to set these parameters. These parameters can also be set to 0 with non-SNCA configurations, especially for cases in which short latency responses with no keep-alives must be delivered. It is important to note that the maximum threads and queue size must both be set to 0.

Tuning

You can increase your thread limits in the Admin console by editing the value of "Request Throttle" under "Configure System Preferences".

File Cache Statistics Information

The cache information section provides statistics on how your file cache is being used. The file cache is an in-memory cache that stores frequently accessed objects from the proxy server's disk cache.

For performance reasons, Proxy Server caches as follows:

For small files, it caches the content in memory (heap).
For medium files, it caches the content using mmap.
For large files, it caches the open file descriptors to avoid opening and closing files.

The following is an example of how the cache statistics are displayed in perfdump:

CacheInfo:
------------------
File Cache Enabled       yes
File Cache Entries       141/1024
File Cache Hit Ratio     652/664 ( 98.19%)
Maximum Age              30
Accelerator Entries      120/1024
Acceleratable Requests   281/328 ( 85.67%)
Acceleratable Responses  131/144 ( 90.97%)
Accelerator Hit Ratio    247/281 ( 87.90%)

The following table shows the file cache statistics as displayed in the Admin Console:

Table 2–3 File Cache Statistics


Total Cache Hits	46
Total Cache Misses	52
Total Cache Content Hits	0
Number of File Lookup Failures	9
Number of File Information Lookups	37
Number of File Information Lookup Failures	50
Number of Entries	12
Maximum Cache Size	1024
Number of Open File Entries	0
Number of Maximum Open Files Allowed	1024
Heap Size	36064
Maximum Heap Cache Size	10735636
Size of Memory Mapped File Content	0
Maximum Memory Mapped File Size	0
Maximum Age of Entries	30

Accelerator Entries

The number of files that have been cached in the accelerator cache.

Tuning

You can increase the maximum number of accelerator cache entries by increasing the number of file cache entries as described in File Cache Entries. Note that this number will typically be smaller than the File Cache Entries number because the accelerator cache only caches information about files and not directories. If the number is significantly lower than the File Cache Entries number, you can improve the accelerator cache utilization by following the tuning information described in Acceleratable Requests and Acceleratable Responses.

Acceleratable Requests

The number of client requests that were eligible for processing by the accelerator cache. Only simple GET requests are processed by the accelerator cache. The accelerator cache does not process requests that explicitly disable caching, for example, requests sent when a user clicks Reload in the browser.

Tuning

To maximize the number of acceleratable requests, structure your web sites to use static files when possible and avoid using query strings in requests for static files.

Acceleratable Responses

The number of times the response to an acceleratable request was eligible for addition to the accelerator cache.

Accelerator Hit Ratio

The number of times the response for a request that can be accelerated was found in the accelerator cache.

Tuning

Higher hit ratios result in better performance. To maximize the hit ratio, see the tuning information for Acceleratable Responses.

File Cache Enabled

If the cache is disabled, the rest of this section is not displayed in perdump. In the Admin console, the File Cache Statistics section shows zeros for the values.

Tuning

The cache is enabled by default. You can disable it in the Admin console at "Configure File Cache" sub-tab in the "Caching" tab.

File Cache Entries

The number of current cache entries and the maximum number of cache entries are both displayed in perfdump. In the Admin console, they are called the Number of Entries and the Maximum Cache Size. A single cache entry represents a single URI.

Tuning

The available address space for a 32-bit process like the Proxy server is limited to 4Gbytes. The max-entries for file cache is based on the number of threads (as specified by thread-pool/max-threads), and the connection queue size. It is recommended to cache small, frequently accessed cache files in the file cache and use perfdump to ensure that the file cache hit ratio is close to 100%. To achieve this, you may increase file cache size and fine tune the max-entries for optimal performance.

File Cache Hit Ratio

The hit ratio available through perfdump gives you the number of file cache hits compared to cache lookups. Numbers approaching 100% indicate that the file cache is operating effectively, while numbers approaching 0% indicate that the file cache is not serving many requests.

To figure this number yourself using the statistics provided through the Admin console, divide the Total Cache Hits by the sum of the Total Cache Hits and the Total Cache Misses.

This setting is not tunable.

Maximum Age

This field displays the maximum age of a valid cache entry. The parameter controls how long cached information is used after a file has been cached. An entry older than the maximum age is replaced by a new entry for the same file.

Maximum Heap Cache Size

The optimal cache heap size depends upon how much system memory is free. A larger heap size means that the Proxy Server can cache more content and therefore obtain a better hit ratio. However, the heap size should not be so large that the operating system starts paging cached files.

File Cache Dynamic Control and Monitoring

File Cache stores file contents in the memory. You can add an object to obj.conf to dynamically monitor and control the file cache while the server is running.

To Control and Monitor the File Cache

Add a NameTrans directive to the default object:

NameTrans fn="assign-name" from="/nsfc" name="nsfc"

Add an nsfc object definition:

<Object name="nsfc">
Service fn="service-nsfc-dump"
</Object>

This configuration enables the file cache control and monitoring function (nsfc-dump) to be accessed through the URI /nfsc. To use a different URI, change the from parameter in the NameTrans directive.

The following is an example of the information you receive when you access the URI:

Sun Java System File Cache Status (pid 3602)

The file cache is enabled.
Cache resource utilization

Number of cached file entries = 174968 (152 bytes each, 26595136 total bytes)
Heap space used for cache = 1882632616/1882632760 bytes
Mapped memory used for medium file contents = 0/1 bytes
Number of cache lookup hits = 47615653/48089040 ( 99.02 %)
Number of hits/misses on cached file info = 23720344/324195
Number of hits/misses on cached file content = 16247503/174985
Number of outdated cache entries deleted = 0
Number of cache entry replacements = 0
Total number of cache entries deleted = 0

Parameter settings

ReplaceFiles: false
ReplaceInterval: 1 milliseconds
HitOrder: false
CacheFileContent: true
TransmitFile: false
MaxAge: 3600 seconds
MaxFiles: 600000 files
SmallFileSizeLimit: 500000 bytes
MediumFileSizeLimit: 1000001 bytes
BufferSize: 8192 bytes

CopyFiles: false
Directory for temporary files: /tmp
Hash table size: 1200007 buckets

You can include a query string when you access the URI. The following values are recognized:

?list: Lists the files in the cache.
?refresh=n: Causes the client to reload the page every n seconds.
?restart: Causes the cache to be shut down and then restarted.
?start: Starts the cache.
?stop: Shuts down the cache.

If you choose the ?list option, the file listing includes the file name, a set of flags, the current number of references to the cache entry, the size of the file, and an internal file ID value. The flags are as follows:

C: File contents are cached.
D: Cache entry is marked for delete.
E: PR_GetFileInfo() returned an error for this file.
I: File information including size and modification date is cached.
M: File contents are mapped into virtual memory.
O: File descriptor is cached (when TransmitFile is set to true).
P: File has associated private data and appears on shtml files.
T: Cache entry has a temporary file.
W: Cache entry is locked for write access.

Thread Pool Information

If you are using the default settings, threads from the default thread pool process the request. However, you can also create custom thread pools and use them to run custom NSAPI functions. By default, Web Server creates one additional pool, named NativePool. In most cases, the native thread pool is only needed on the Windows platform. For more information on thread pools, see Understanding Threads, Processes, and Connections.

Native Thread Pool

The following example shows native thread pool information as it appears in perfdump:

Native pools:
----------------------------
NativePool:
Idle/Peak/Limit               1/1/128
Work Queue Length/Peak/Limit  0/0/0
my-custom-pool:
Idle/Peak/Limit               1/1/128
Work Queue Length/Peak/Limit  0/0/0

If you have defined additional custom thread pools, they are shown under the Native Pools heading in perfdump.

The following table shows the thread pool statistics as they appear in the Admin Console. If you have not defined additional thread pools, only the NativePool is shown:

Table 2–4 Thread Pools Statistics


Name	NativePool
Idle Threads	1
Threads	1
Requests Queued	0
Peak Requests Queued	0

Idle /Peak /Limit

Idle, listed as Idle Threads in the Admin console, indicates the number of threads that are currently idle. Peak indicates the peak number of threads in the pool. Limit, listed as Threads in the Admin console, indicates the maximum number of native threads allowed in the thread pool, and for NativePool is determined by the setting of NativePoolMaxThreads in the magnus.conf file.

Tuning

You can modify the maximum threads for NativePool by editing the NativePoolMaxThreads parameter in magnus.conf. For more information, see NativePoolMaxThreads Directive.

Work Queue Length /Peak /Limit

These numbers refer to a queue of server requests that are waiting for the use of a native thread from the pool. The Work Queue Length is the current number of requests waiting for a native thread, which is represented as Requests Queued in the Admin console.

Peak indicates peak requests queued in the Admin console and is the highest number of requests that were ever queued up simultaneously for the use of a native thread since the server was started. This value can be viewed as the maximum concurrency for requests requiring a native thread.

Limit is the maximum number of requests that can be queued at one time to wait for a native thread, and is determined by the setting of NativePoolQueueSize.

Tuning

You can modify the queue size for NativePool by editing the NativePoolQueueSize directive in magnus.conf. For more information, see NativePoolQueueSize Directive.

`NativePoolStackSize` Directive

The NativePoolStackSize determines the stack size in bytes of each thread in the native (kernel) thread pool.

Tuning

You can modify the NativePoolStackSize by editing the NativePoolStackSize directive in magnus.conf.

`NativePoolQueueSize` Directive

The NativePoolQueueSize determines the number of threads that can wait in the queue for the thread pool. If all threads in the pool are busy, then the next request-handling thread that needs to use a thread in the native pool must wait in the queue. If the queue is full, the next request-handling thread that tries to get in the queue is rejected, with the result that it returns a busy response to the client. It is then free to handle another incoming request instead of being tied up waiting in the queue.

Setting the NativePoolQueueSize lower than the maximum threads value causes the server to execute a busy function instead of the intended NSAPI function whenever the number of requests waiting for service by pool threads exceeds this value. The default returns a “503 Service Unavailable” response and logs a message, depending on your log level setting. Setting the NativePoolQueueSize higher than the maximum threads causes the server to reject connections before a busy function can execute.

This value represents the maximum number of concurrent requests for service that require a native thread. If your system is unable to fulfill requests due to load, allowing more requests queue up increases the latency for requests, and could result in all available request threads waiting for a native thread. In general, set this value to be high enough to avoid rejecting requests by anticipating the maximum number of concurrent users who would execute requests requiring a native thread.

The difference between this value and the maximum threads is the number of requests reserved for non-native thread requests, such as static HTML and image files. Keeping a reserve and rejecting requests ensures that your server continues to fill requests for static files, which prevents it from becoming unresponsive during periods of very heavy dynamic content load. If your server consistently rejects connections, this value is either set too low, or your server hardware is overloaded.

Tuning

You can modify the NativePoolQueueSize by editing the NativePoolQueueSize directive in magnus.conf.

`NativePoolMaxThreads` Directive

NativePoolMaxThreads determine the maximum number of threads in the native (kernel) thread pool.

A higher value allows more requests to execute concurrently, but has more overhead due to context switching, so bigger is not always better. Typically, you do not need to increase this number, but if the CPU is not saturated and you see requests queue up, then increase this number.

Tuning

You can modify the NativePoolMaxThreads by editing the NativePoolMaxThreads parameter in magnus.conf.

`NativePoolMinThreads` Directive

NativePoolMinThreads determine the minimum number of threads in the native (kernel) thread pool.

Tuning

You can modify the NativePoolMinThreads by editing the NativePoolMinThreads parameter in magnus.conf.

DNS Cache Information

The DNS cache caches IP addresses and DNS names. Proxy Server uses DNS caching for logging and for access control by IP address. DNS cache is enabled by default. The following example shows DNS cache information as displayed in perfdump:

HostDNSCacheInfo:
------------------
enabled             yes
CacheEntries        0/1024
HitRatio            0/0 ( 0.00%)

Async DNS disabled

ClientDNSCacheInfo:
------------------
enabled             yes
CacheEntries        0/1024
HitRatio            0/0 ( 0.00%)

Async DNS disabled

The following example shows the DNS Cache information as displayed in the Admin Console:

Table 2–5 DNS Cache Statistics


Total Cache Hits	62854802
Total Cache Misses	6110
Number of Asynchronous Lookups	0
Lookups in Progress	4
Asynchronous Lookups Enabled	1
Number of Asynchronous Address Lookups Performed	0

Enabled

If the DNS cache is disabled, the rest of this section is not displayed in perfdump. In the Admin console, the page displays zeros.

Tuning

By default, the DNS cache is on. You can enable or disable DNS caching in the Admin console at "Configure DNS Caching".

Note: The Proxy server optionally maintains two types of DNS caches. One is a 'Host DNS' cache which caches the results of hostname to ip address lookups done on remote hosts. The second is a 'Client DNS' cache that caches the results of ip address to hostname lookup done on clients.

Cache Entries

This section in perfdump shows the number of current cache entries and the maximum number of cache entries. In the Admin Console the current cache entries are shown as Total Cache Hits. A single cache entry represents a single IP address or DNS name lookup. The cache should be as large as the maximum number of clients that access your web site concurrently. Note that setting the cache size too high wastes memory and degrades performance.

Hit Ratio of Cache Hits and Lookups

The hit ratio in perfdump displays the number of cache hits compared to the number of cache lookups. You can compute this number using the statistics in the Admin console by dividing the Total Cache Hits by the sum of the Total Cache Hits and the Total Cache Misses.

This setting is not tunable.

Async DNS Enabled or Disabled

Async DNS enabled or disabled displays whether the server uses its own asynchronous DNS resolver instead of the operating system's synchronous resolver. By default, Async DNS is disabled. If it is disabled, this section does not appear in perfdump.

Tuning the ACL Cache

The Proxy server maintains an ACL Cache that maps between URLs and ACL Lists. The ACL cache improves performance by avoiding the need to build the ACL list applicable to a particular URL for each access.

However, the sheer number of URLs accessed through a Proxy server can cause the ACL Cache to grow to huge sizes. The magnus.conf directive called "ACLCacheMax" can be used to restrict the maximum number of entries in the ACL Cache.

Tuning the ACL User Cache (Authentication Cache)

The ACL user cache is active by default. Because of the default size of the cache (200 entries), the ACL user cache can be a bottleneck, or can simply not serve its purpose on a site with heavy traffic. On a busy site, more than 200 users can hit ACL-protected resources in less time than the lifetime of the cache entries. When this situation occurs, Proxy Server must query the LDAP server more often to validate users, which impacts performance.

This bottleneck can be avoided by increasing the maximum users of the ACL cache at "Configure ACL Cache" under "Preferences" in the Admin console.

There can also be a potential (but much harder to hit) bottleneck with the number of groups stored in a cache entry (four by default). If a user belongs to five groups and hits five ACLs that check for these different groups within the ACL cache lifetime, an additional cache entry is created to hold the additional group entry. When there are two cache entries, the entry with the original group information is ignored.

While it would be extremely unusual to hit this possible performance problem, the number of groups cached in a single ACL cache entry can be tuned with "Proxy Auth Group Cache Size" at "Configure ACL Cache" under "Preferences" in the Admin console.

The maximum age setting of the ACL cache determines the number of seconds before the cache entries expire. Each time an entry in the cache is referenced, its age is calculated and checked against the maximum age setting. The entry is not used if its age is greater than or equal to the maximum age. The default value is 120 seconds. If your LDAP is not likely to change often, use a large number for the maximum age. However, if your LDAP entries change often, use a smaller value. For example, when the value is 120 seconds, the Proxy Server might be out of sync with the LDAP server for as long as two minutes. Depending on your environment, that might or might not be a problem.

Tuning the Proxy Disk Cache to Store Dynamic Content

For a list and description of settings available to fine tune the disk cache for a wider range of responses, including dynamic responses, see cache-setting in Sun Java System Web Proxy Server 4.0.11 Configuration File Reference. These settings result in a behavior that violates the HTTP standard.

Using Busy Functions

The default busy function returns a "503 Service Unavailable" response and logs a message depending upon the log level setting. You might want to modify this behavior for your application. You can specify your own busy functions for any NSAPI function in the obj.conf file by including a service function in the configuration file in this format:

busy="my-busy-function"

For example, you could use this sample service function:

Service fn="send-cgi" busy="service-toobusy"

This function allows different responses if the server become too busy in the course of processing a request that includes a number of types (such as Service, AddLog, and PathCheck). Note that the busy function applies to all functions that require a native thread to execute when the default thread type is non-native.

To use your own busy function instead of the default busy function for the entire server, you can write an NSAPI init function that includes a func_insert call as shown below:

extern "C" NSAPI_PUBLIC int my_custom_busy_function
(pblock *pb, Session *sn, Request *rq);
my_init(pblock *pb, Session *, Request *){func_insert
("service-toobusy", my_custom_busy_function);}

Busy functions are never executed on a pool thread, so you must be careful to avoid using function calls that could cause the thread to block.

Two other considerations are footprint and promptness. Footprint is the working size of the JVM process, measured in pages and cache lines. Promptness is the time between when an object becomes dead, and when the memory becomes available.

This is an important consideration for distributed systems. A particular generation size makes a trade-off between these four metrics. For example, a large young generation likely maximizes throughput, but at the cost of footprint and promptness.

Chapter 2 Tuning Sun Java System Web Proxy Server

General Tuning Tips

Understanding Threads, Processes, and Connections

Connection-Handling Overview

Figure 2–1 Proxy Server Connection Handling

Low Latency and High Concurrency Modes

Disabled Thread Pools

Connection–Handling magnus.conf Directives for NSAPI

Custom Thread Pools

Native Thread Pool

Process Modes

Single-Process Mode

Multi-Process Mode

MaxProcs (UNIX/Linux)

Using Monitoring Data to Tune Your Server

Connection Queue Information

Current /Peak /Limit Queue Length

Tuning

Total Connections Queued

Average Queue Length

Average Queuing Delay

Ticks Spent

Total Number of Connections Added

HTTP Listener (Listen Socket) Information

Address

Tuning

Acceptor Threads

Tuning

Tuning

Keep-Alive Information

Keep-Alive Count

Keep-Alive Hits

Keep-Alive Flushes

Keep-Alive Refusals

Keep-Alive Timeouts

Keep-Alive Timeout

Keep-Alive Poll Interval

Keep-Alive Threads

Tuning for HTTP 1.0-Style Workload

Tuning for HTTP 1.1-Style Workload

Thread Information

Maximum Threads (Maximum Simultaneous Requests)

Tuning

File Cache Statistics Information

Accelerator Entries

Tuning

Acceleratable Requests

Tuning

Acceleratable Responses

Accelerator Hit Ratio

Tuning

File Cache Enabled

Tuning

File Cache Entries

Tuning

File Cache Hit Ratio

Maximum Age

Maximum Heap Cache Size

File Cache Dynamic Control and Monitoring

To Control and Monitor the File Cache

Thread Pool Information

Native Thread Pool

Idle /Peak /Limit

Tuning

Work Queue Length /Peak /Limit

Tuning

NativePoolStackSize Directive

Tuning

NativePoolQueueSize Directive

Tuning

NativePoolMaxThreads Directive

Tuning

NativePoolMinThreads Directive

Tuning

DNS Cache Information

Enabled

Tuning

Cache Entries

Hit Ratio of Cache Hits and Lookups

Async DNS Enabled or Disabled

Connection–Handling `magnus.conf` Directives for NSAPI

`MaxProcs` (UNIX/Linux)

`NativePoolStackSize` Directive

`NativePoolQueueSize` Directive

`NativePoolMaxThreads` Directive

`NativePoolMinThreads` Directive