This chapter describes the results of scalability studies. You can refer to these studies for a sample of how the server performs, and how you can configure your system to best take advantage of Proxy Server’s strengths.
This chapter includes the following topics:
The goal of the tests in the study was to shows how well Proxy Server 4.0 scales. The tests also helped to determine the configuration and tuning requirements.
When tuned, Proxy Server 4.0 provides excellent scalability, reliability and performance, particularly when coupled with a network of suitable capacity and hardware whose chip multithreading capabilities take advantage of Proxy Server 4.0's fully threaded model.
Sun SPARC Enterprise T1000 Server
UltraSparc T1 processor with 8 1GHz cores and support for 32 simultaneous threads
8Gbytes RAM
Solaris 10 operating system
Four servers used as test machines to generate the load
Four load generating servers connected to the T1000 server via a Gigabit ethernet switch in a single subnet
Single Gigabit ethernet link
Proxy Server system configuration:
4Gbytes tmpfs cache
Suitably higher values for RqThrottle (320 — 512)
Keep alive disabled
Web polygraph benchmarking tool, which is a popular freely available benchmarking tool for caching proxies, origin server accelerators, L4/7 switches, content filters, and other web intermediaries, was used to evaluate the performance of Proxy Server 4.0.
The studies were conducted with the following content:
The content size of each object followed an exponential distribution, with an average content size of 13 Kbytes
All objects were cacheable with a two minute life cycle
The following tuning settings are common to all the tests in this study. Individual studies have additional configuration and tuning information.
set rlim_fd_max=500000 set rlim_fd_cur=500000 set sq_max_size=0 set consistent_coloring=2 set autoup=60 set ip:ip_squeue_bind=0 set ip:ip_soft_rings_cnt=0 set ip:ip_squeue_fanout=1 set ip:ip_squeue_enter=3 set ip:ip_squeue_worker_wait=0 set segmap_percent=6 set bufhwm=32768 set maxphys=1048576 set maxpgio=128 set ufs:smallfile=6000000 *For ipge driver set ipge:ipge_tx_ring_size=2048 set ipge:ipge_tx_syncq=1 set ipge:ipge_srv_fifo_depth=16000 set ipge:ipge_reclaim_pending=32 set ipge:ipge_bcopy_thresh=512 set ipge:ipge_dvma_thresh=1 set pcie:pcie_aer_ce_mask=0x1 *For e1000g driver set pcie:pcie_aer_ce_mask = 0x1
ndd -set /dev/tcp tcp_conn_req_max_q 102400 ndd -set /dev/tcp tcp_conn_req_max_q0 102400 ndd -set /dev/tcp tcp_max_buf 4194304 ndd -set /dev/tcp tcp_cwnd_max 2097152 ndd -set /dev/tcp tcp_recv_hiwat 400000 ndd -set /dev/tcp tcp_xmit_hiwat 400000
Since the tests use multiple network interfaces, it is important to make sure that all the network interfaces are not going to the same core. Network interrupts were enabled on one strand and disabled on the remaining three strand of a core using the following script:
| allpsr=`/usr/sbin/psrinfo | grep -v off-line | awk '{ print $1 }'`
  set $allpsr
  numpsr=$#
  while [ $numpsr -gt 0 ];
  do
      shift
      numpsr=`expr $numpsr - 1`
      tmp=1
      while [ $tmp -ne 4 ];
      do
          /usr/sbin/psradm -i $1
          shift
          numpsr=`expr $numpsr - 1`
          tmp=`expr $tmp + 1`
      done
  done | 
The following example shows psrinfo output before running the script:
| # psrinfo | more 0 on-line since 12/06/2006 14:28:34 1 on-line since 12/06/2006 14:28:35 2 on-line since 12/06/2006 14:28:35 3 on-line since 12/06/2006 14:28:35 4 on-line since 12/06/2006 14:28:35 5 on-line since 12/06/2006 14:28:35 ................. | 
The following example shows psrinfo output after running the script:
| 0       on-line   since 12/06/2006 14:28:34
1       no-intr   since 12/07/2006 09:17:04
2       no-intr   since 12/07/2006 09:17:04
3       no-intr   since 12/07/2006 09:17:04
4       on-line   since 12/06/2006 14:28:35
5       no-intr   since 12/07/2006 09:17:04
          ................. | 
The following table shows the tuning settings used for the Proxy Server.
Table 6–1 Proxy Server Tuning Settings| Component | Default | Tuned | 
|---|---|---|
| Access logging | enabled | disabled | 
| Thread pool | RqThrottle 128 | RqThrottle 320 | 
| HTTP listener | Non-secure listener on port 8080 | Non-secure listener on port 8080 ListenQ 8192 | 
| Keep alive | enabled | disabled | 
The tmpfs filesystem was used to carve a 4Gbytes filesystem out of memory. This tmpfs filesystem, which keeps all files in virtual memory, was used for caching purposes.
| $ mkdir -p /proxycache $ mount -F tmpfs -o size=5120m swap /proxycache | 
This creates a 5Gbytes filesystem in main memory. Although only 4Gbytes are actively used by the proxy server, a 5Gbytes filesystem provides some spare room.
The following table contains the performance results for Proxy Server 4.0 running on Sun SPARC Enterprise T1000 server.
| Target Rate | Throughput (Operations / seconds) | Response (ms) | Error | Network Utilization | 
| 6000 | 5999.70 | 11.02 | 0% | 78% | 
| 6900 | 6906.71 | 11.10 | 0% | 88% | 
| 7500 | 7503.58 | 15.65 | 0.51% | 98% | 
| 8100 | 7925.65 | 293.03 | 2.15% | 100% | 
| 9000 | 7956.88 | 365.19 | 11.59% | 100% | 
-The Target Rate column specifies the target rate for clients submitting requests
-The Error column specifies the percentage of total requests that resulted in an error reported by the clients.
Further measurements indicated that the Sun SPARC Enterprise server had approximately 30% CPU idle time during peak loads of the benchmark test. Hence, it follows that the performance can be potentially increased if additional network bandwidth is made available.
References:
http://www.sun.com/blueprints/0607/820-2142.html
Overloading the server obj.conf with too many assign-name directives can have an adverse effect on performance. Each assign-name directive involves a regular expression comparison which can prove CPU intensive.
The following tables contains the performance results with varying number of assign-name directives in the server obj.conf.
The first set of data is for a server with cache enabled, and the content server present in the local network. Note that the response time is for a single request.
| Number of assign-name directives in obj.conf | Response time in milliseconds | 
| 10 | 1.05 | 
| 100 | 1.45 | 
| 250 | 1.8 | 
| 1000 | 4.3 | 
| 2000 | 7.35 | 
| 4000 | 13.65 | 
| 6000 | 20.0 | 
| 8000 | 26.15 | 
| 10000 | 32.5 | 
As can be seen from the performance numbers, the response times show a marked increase once the number of assign-name directives cross 100.
The following data was obtained with the cache disabled, and the remote server residing in a remote network.
| Number of assign-name directives in obj.conf | Response time in milliseconds | 
| 10 | 238.5 | 
| 100 | 239.7 | 
| 250 | 240.3 | 
| 1000 | 242.2 | 
| 2000 | 245.3 | 
| 4000 | 252.3 | 
| 6000 | 258.2 | 
| 8000 | 264.3 | 
| 10000 | 271.2 | 
In the above data, a combination of network delay and the absence of a disk cache tend to hide any performance drop due to the computational delay caused by the high number of assign-name directives.
Recommendations:
Do not let the assign-name directives run into hundreds
Those assign-name directives that match commonly accessed URLs should appear earlier in the obj.conf