Oracle iPlanet Web Proxy Server 4.0.14 Performance Tuning, Sizing, and Scaling Guide

Chapter 6 Scalability Studies

This chapter describes the results of scalability studies. You can refer to these studies for a sample of how the server performs, and how you can configure your system to best take advantage of Proxy Server’s strengths.

This chapter includes the following topics:

Study Goals

The goal of the tests in the study was to shows how well Proxy Server 4.0 scales. The tests also helped to determine the configuration and tuning requirements.

Study Conclusion

When tuned, Proxy Server 4.0 provides excellent scalability, reliability and performance, particularly when coupled with a network of suitable capacity and hardware whose chip multithreading capabilities take advantage of Proxy Server 4.0's fully threaded model.

Hardware

Network Configuration

Software

Proxy Server system configuration:

Web polygraph benchmarking tool, which is a popular freely available benchmarking tool for caching proxies, origin server accelerators, L4/7 switches, content filters, and other web intermediaries, was used to evaluate the performance of Proxy Server 4.0.

Content

The studies were conducted with the following content:

Configuration and Tuning

The following tuning settings are common to all the tests in this study. Individual studies have additional configuration and tuning information.

/etc/system tuning:

set rlim_fd_max=500000
set rlim_fd_cur=500000


set sq_max_size=0
set consistent_coloring=2
set autoup=60
set ip:ip_squeue_bind=0
set ip:ip_soft_rings_cnt=0
set ip:ip_squeue_fanout=1
set ip:ip_squeue_enter=3
set ip:ip_squeue_worker_wait=0

set segmap_percent=6
set bufhwm=32768
set maxphys=1048576
set maxpgio=128
set ufs:smallfile=6000000

*For ipge driver
set ipge:ipge_tx_ring_size=2048
set ipge:ipge_tx_syncq=1
set ipge:ipge_srv_fifo_depth=16000
set ipge:ipge_reclaim_pending=32
set ipge:ipge_bcopy_thresh=512
set ipge:ipge_dvma_thresh=1
set pcie:pcie_aer_ce_mask=0x1

*For e1000g driver
set pcie:pcie_aer_ce_mask = 0x1

TCP/IP tuning:

ndd -set /dev/tcp tcp_conn_req_max_q 102400
ndd -set /dev/tcp tcp_conn_req_max_q0 102400
ndd -set /dev/tcp tcp_max_buf 4194304
ndd -set /dev/tcp tcp_cwnd_max 2097152
ndd -set /dev/tcp tcp_recv_hiwat 400000
ndd -set /dev/tcp tcp_xmit_hiwat 400000

Network Configuration

Since the tests use multiple network interfaces, it is important to make sure that all the network interfaces are not going to the same core. Network interrupts were enabled on one strand and disabled on the remaining three strand of a core using the following script:


allpsr=`/usr/sbin/psrinfo | grep -v off-line | awk '{ print $1 }'`
  set $allpsr
  numpsr=$#
  while [ $numpsr -gt 0 ];
  do
      shift
      numpsr=`expr $numpsr - 1`
      tmp=1
      while [ $tmp -ne 4 ];
      do
          /usr/sbin/psradm -i $1
          shift
          numpsr=`expr $numpsr - 1`
          tmp=`expr $tmp + 1`
      done
  done

The following example shows psrinfo output before running the script:


# psrinfo | more
0       on-line   since 12/06/2006 14:28:34
1       on-line   since 12/06/2006 14:28:35
2       on-line   since 12/06/2006 14:28:35
3       on-line   since 12/06/2006 14:28:35
4       on-line   since 12/06/2006 14:28:35
5       on-line   since 12/06/2006 14:28:35
.................

The following example shows psrinfo output after running the script:


0       on-line   since 12/06/2006 14:28:34
1       no-intr   since 12/07/2006 09:17:04
2       no-intr   since 12/07/2006 09:17:04
3       no-intr   since 12/07/2006 09:17:04
4       on-line   since 12/06/2006 14:28:35
5       no-intr   since 12/07/2006 09:17:04
          .................

Proxy Server Tuning

The following table shows the tuning settings used for the Proxy Server.

Table 6–1 Proxy Server Tuning Settings

Component 

Default 

Tuned 

Access logging 

enabled

disabled 

Thread pool 

RqThrottle 128

RqThrottle 320

HTTP listener  

Non-secure listener on port 8080

Non-secure listener on port 8080

ListenQ 8192 

Keep alive 

enabled

disabled

Cache in Memory

The tmpfs filesystem was used to carve a 4Gbytes filesystem out of memory. This tmpfs filesystem, which keeps all files in virtual memory, was used for caching purposes.


$ mkdir -p /proxycache
$ mount -F tmpfs -o size=5120m swap /proxycache

This creates a 5Gbytes filesystem in main memory. Although only 4Gbytes are actively used by the proxy server, a 5Gbytes filesystem provides some spare room.

Performance Tests and Results

The following table contains the performance results for Proxy Server 4.0 running on Sun SPARC Enterprise T1000 server.

Target Rate 

Throughput (Operations / seconds) 

Response (ms) 

Error  

Network Utilization 

6000 

5999.70 

11.02 

0% 

78% 

6900 

6906.71 

11.10 

0% 

88% 

7500 

7503.58 

15.65 

0.51% 

98% 

8100 

7925.65 

293.03 

2.15% 

100% 

9000 

7956.88 

365.19 

11.59% 

100% 

-The Target Rate column specifies the target rate for clients submitting requests

-The Error column specifies the percentage of total requests that resulted in an error reported by the clients.

Further measurements indicated that the Sun SPARC Enterprise server had approximately 30% CPU idle time during peak loads of the benchmark test. Hence, it follows that the performance can be potentially increased if additional network bandwidth is made available.

References:

http://www.sun.com/blueprints/0607/820-2142.html

Configuration and Performance

Overloading the server obj.conf with too many assign-name directives can have an adverse effect on performance. Each assign-name directive involves a regular expression comparison which can prove CPU intensive.

The following tables contains the performance results with varying number of assign-name directives in the server obj.conf.

The first set of data is for a server with cache enabled, and the content server present in the local network. Note that the response time is for a single request.

Number of assign-name directives in obj.conf

Response time in milliseconds 

10 

1.05 

100 

1.45 

250 

1.8 

1000 

4.3 

2000 

7.35 

4000 

13.65 

6000 

20.0 

8000 

26.15 

10000 

32.5 

As can be seen from the performance numbers, the response times show a marked increase once the number of assign-name directives cross 100.

The following data was obtained with the cache disabled, and the remote server residing in a remote network.

Number of assign-name directives in obj.conf

Response time in milliseconds 

10 

238.5 

100 

239.7 

250 

240.3 

1000 

242.2 

2000 

245.3 

4000 

252.3 

6000 

258.2 

8000 

264.3 

10000 

271.2 

In the above data, a combination of network delay and the absence of a disk cache tend to hide any performance drop due to the computational delay caused by the high number of assign-name directives.

Recommendations: