Appendix B Analysis Tools

Previous Contents Index Next
Sun Java System Portal Server 6 2005Q1 Deployment Planning Guide

Appendix B
Analysis Tools

The Sun Java™ Enterprsie System and SDK include default setting options to ensure a satisfactory out-of-the-box experience. However these options might not provide optimal performance for your web applications in the Sun Java System Portal Server production environment. This section describes some alternative options and basic tuning techniques.

Note

The tuning settings discussed in this section focus on Portal Server residing on the Solaris platform. However, the principles can be applied to other generic Unix type operating systems.

Table B-1 below lists the performance analysis tools that will help in providing feedback for tuning the Portal Server and its web container. In addition to performance issues, many of these tools can be used to detect other types of bottlenecks at the overall operating system level.

Many tool descriptions provide sample output, suggestions for interpreting output results, tips on improving output results, and links to related sites.

Table B-1  Performance Analysis Tools

Category

Type

Name

Parameters

Usage

Analysis Tool

Solaris 8 and Solaris 9

mpstat

CPU utilization

iostat

Disk I/O subsystem

netstat

Network subsystem

-I hme) 10

Interface bandwidth

-sP tcp

TCP kernel module

-a|grep hostname|wc-1

Socket connection count

Portal Server on App Server container

verbose:gc

Garbage collection

Tuning Parameters

Solaris 8 and Solaris 9

/etc/system

Various

Performance

/etc/rc2.d/ttuning parameters file

Various

TCP kernel tuning parameters

mpstat

The mpstat utility is a useful tool to monitor CPU utilization, especially with multithreaded applications running on multiprocessor machines, which is a typical configuration for enterprise solutions.

Use mpstat with an argument between 5 seconds to 10 seconds.

An interval that is smaller than 5 or 10 seconds might be more difficult to analyze. A larger interval might provide a means of smoothing the data by removing spikes that could mislead the result.

Output

#mpstat 10

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl

0 1 0 5529 442 302 419 166 12 196 0 775 95 5 0 0

1 1 0 220 237 100 383 161 41 95 0 450 96 4 0 0

4 0 0 27 192 100 178 94 38 44 0 100 99 1 0 0

What to Look For

Note the much higher intr and ithr values for certain CPUs. Solaris will select some CPUs to handle the system interrupts. The CPUs and the number that are chosen depend on the I/O devices attached to the system, the physical location of the devices, and whether interrupts have been disabled on a CPU (psradmin command).

intr - interrupts

intr - thread interrupts (not including the clock interrupts)

csw - Voluntary Context switches. When this number slowly increases, and the application is not IO bound, it may indicate a mutex contention.

icsw - Involuntary Context switches. When this number increases past 500, the system is under a heavy load.

smtx - if smtx increases sharply. An increase from 50 to 500 is a sign of a system resource bottleneck (ex., network or disk).

Usr, sys and idl - Together, all three columns represent CPU saturation. A well-tuned application under full load (0% idle) should be within 80% to 90% usr, and 20% to 10% sys times, respectively. A smaller percentage value for sys reflects more time for user code and less preemption, which result in greater throughput for Portal application.

Considerations

Make your application available to as many CPUs as it can efficiently use. As an example, you get the best performance from one instance from 2 CPUs.You can expect that creating 14 2CPU processor sets would yield the best performance.

An increasing csw value shows an increase with network use. A common cause for a high csw value is the result of having created too many socket connections--either by not pooling connections or by handling new connections inefficiently. If this is the case you would also see a high TCP connection count when executing netstat -a | wc –l. Please refer to the netstat section.

Do you observe increasing icsw? A common cause of this is preemption, most likely because of an end of time slice on the CPU.

iostat

The iostat tool gives statistics on the disk I/O subsystem. The iostat command has many options. More information can be found in the man pages. The following typical options provide information on locating I/O bottlenecks.

Output

#iostat -xn 10

extended device statistics

r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 fd0

2.7 58.2 14.6 2507.0 0.0 1.4 0.0 23.0 0 52 d0

47.3 0.0 2465.6 0.0 0.0 0.4 0.0 8.8 0 30 d1

What to Look For

%b - Percentage of time the disk is busy (transactions in progress). Average %b values over 25 could be a bottleneck.

%w - Percentage of time transactions are waiting for service (queue non-empty).

asvc_t - Reports on average response time of active transactions, in milliseconds. This option is mislabeled asvc_t; it indicates the time between a user process issuing a read and the read completing. Consistent values over 30ms could indicate a bottleneck.

Considerations

Add more disks to the file system. When using a single disk file system, consider, upgrading to a hardware or software RAID is the next logical step. Hardware RAID is significantly faster than software RAID and is highly recommended. A software RAID solution would add additional CPU load to the system.

Depending on storage hardware and application behavior, there may be a better block size to use besides the ufs default of 8192k. Please consult Solaris System Administration Guide.

netstat

The netstat tool gives statistics on the network subsystem. It can be used to analyze many aspects of the network subsystem, two of which are the TCP/IP kernel module and the interface bandwidth. An overview of both uses follow.

netstat -I hme0 10

These netstat options are used to analyze interface bandwidth. The upper bound (max) of the current throughput can be calculated from the output. The upper bound is reported because the netstat output reports the metric of packets, which don't necessarily have to be their maximum size. The upper bound of the bandwidth can be calculated using the following equation:

Bandwidth Used = (Total number of Packets) / (Polling Interval (10) ) ) * MTU (1500 default).

The current MTU for an interface can be found with: ifconfig -a

netstat -I hme0 10 Output

#netstat -I hme0 10

input hme0 output input (Total) output

packets errs packets errs colls packets errs packets errs colls

122004816 272 159722061 0 0 348585818 2582 440541305 2 2

0 0 0 0 0 84144 0 107695 0 0

0 0 0 0 0 96144 0 123734 0 0

0 0 0 0 0 89373 0 114906 0 0

0 0 0 0 0 84568 0 108759 0 0

0 0 0 0 0 84720 0 108800 0 0

What to Look For

colls- collisions. If your network is not switched, then a low level of collisions is expected. As the network becomes increasingly saturated, collision will increase and eventually will become a bottleneck. The best solution for collisions is a switched network.

errs - errors. The presence of errors could indicate device errors. If your network is switched, errors indicate that you are nearly consuming the bandwidth capacity of your network. The solution to this problem is to give the system more bandwidth, which can be achieved through more network interfaces or a network bandwidth upgrade. This is highly dependent on your particular network architecture.

Considerations

If network saturation is occuring quickly (saturation at less than 8CPUs for an application server running on a 100mbit Ethernet), then an investigation to ensure conservative network usage is a good first step.

Increase network bandwidth. Steps that possibly can be taken: upgrade to a switched network, more network interfaces are a possible solution or upgrade to a higher bandwidth network to accommodate your network traffic demand.netstat -sP tcp

These netstat options are used to analyze the TCP kernel module. Many of the fields reported represent fields in the kernel module that indicate bottlenecks. These bottlenecks can be addressed using the ndd command and the tuning parameters referenced in the /etc/inet

netstat -sP tcp Output

#netstat -sP tcp

TCP tcpRtoAlgorithm = 4 tcpRtoMin = 400

<snip>

tcpInDupSegs = 1144 tcpInDupBytes =132520

tcpInPartDupSegs = 1 tcpInPartDupBytes = 416

tcpInPastWinSegs = 0 tcpInPastWinBytes = 0

tcpInWinProbe = 46 tcpInWinUpdate = 48

tcpInClosed = 251 tcpRttNoUpdate = 344

tcpRttUpdate =1105386 tcpTimRetrans = 989

tcpTimRetransDrop = 5 tcpTimKeepalive = 818

tcpTimKeepaliveProbe= 183 tcpTimKeepaliveDrop = 0

tcpListenDrop = 0 tcpListenDropQ0 = 0

tcpHalfOpenDrop = 0 tcpOutSackRetrans = 56

What to look for

tcpListenDrop - If after several looks at the command output the tcpListenDrop continues to increase, it could indicate a problem with queue size.

Considerations:

A possible cause of increasing tcpListenDrop is the application throughput being bottlenecked by the number of executing threads. At this point increasing application threads may be a good thing to try.

Increase queue size. Increase the request queue sizes using ndd. More information on other ndd commands referenced in the Solaris Administration Guide.

ondd -set /dev/tcp tcp_conn_req_max_q <value>

ondd -set /dev/tcp tcp_conn_req_max_q0 <value>

netstat -a | grep <your_hostname> | wc -l

Running this command gives a rough count of socket connections on the system. The number of connections open at one time is limited; you can use this tool to look for bottlenecks.

netstat -a | grep <your_hostname> | wc -l Output

#netstat -a | wc -l

34567

What to Look For

socket count - If the number returned is greater than 20,000 then the number of socket connections could be a possible bottleneck.

Consider the following:

Decrease the point where number of anonymous socket connections start.

ondd -set /dev/tcp tcp_smallest_anon_port <value>

Decrease the time a TCP connection stays in TIME_WAIT.

ondd -set /dev/tcp tcp_time_wait_interval <value>

Tuning Parameters for /etc/system

Table B-2 is a list of /etc/system tuning parameters used during the performance study. The changes are applied by appending each to the /etc/system file.

Table B-2  /etc/system Options

/etc/system Option

Description

set rlim_fd_max=<value>

"Hard" limit on file descriptors that a single process might have open. To override this limit requires superuser privilege.

set tcp:tcp_conn_hash_size=<value>

Controls the hash table size in the TCP module for all TCP connections.

Along with tune_t_flushr, autoup controls the amount of memory examined for dirty pages in each invocation and frequency of file system sync operations.

set autoup=<value>

The value of autoup is also used to control whether a buffer is written out from the free list. Buffers marked with the B_DELWRI flag (file content pages that have changed) are written out whenever the buffer has been on the list for longer than autoup seconds.

Increasing the value of autoup keeps the buffers around for a longer time in memory.

set tune_t_fsflushr=<value>

Specifies the number of seconds between fsflush invocations.

set rechoose_interval=<value>

Number of clock ticks before a process is deemed to have lost all affinity for the last CPU it ran on. After this interval expires, any CPU is considered a candidate for scheduling a thread. This parameter is relevant only for threads in the timesharing class. Real-time threads are scheduled on the first available CPU.

A description of all /etc/system parameters can be found in the Solaris Tunable Parameters Reference Manual.

Table B-3 is a list of TCP kernel tuning parameters. These are known TCP tuning parameters that affect most performance on Portal Servers. Recommended values for these parameters are discussed in the Identity Server Customization and API Guide.

Table B-3  TCP/IP Options

TCP/IP Options

Description

ndd -set /dev/tcp tcp_xmit_hiwat 65535

ndd -set /dev/tcp tcp_recv_hiwat 65535

The default send window size in bytes. The default receive window size in bytes.

ndd -set /dev/tcp tcp_cwnd_max 65535

The maximum value of TCP congestion window (cwnd) in bytes.

ndd -set /dev/tcp tcp_rexmit_interval_min 3000

The default minimum retransmission timeout (RTO) value in milliseconds. The calculated RTO for all TCP connections cannot be lower than this value.

ndd -set /dev/tcp tcp_rexmit_interval_max 10000

The default maximum retransmission timeout value (RTO) in milliseconds. The calculated RTO for all TCP connections cannot exceed this value.

ndd -set /dev/tcp tcp_rexmit_interval_initial 3000

The default initial retransmission timeout value (RTO) in milliseconds

ndd -set /dev/tcp tcp_time_wait_interval 60000

The time in milliseconds a TCP connection stays in TIME-WAIT state. Refer to RFC 1122, 4.2.2.13 for more information.

ndd -set /dev/tcp tcp_keepalive_interval 900000

The time in milliseconds a TCP connection stays in KEEP-ALIVE state. Refer to RFC 1122, 4.2.2.13 for more information.

ndd -set /dev/tcp tcp_conn_req_max_q <value>

The default maximum number of pending TCP connections for a TCP listener waiting to be accepted by accept(SOCKET).

ndd -set /dev/tcp tcp_conn_req_max_q0 <value>

The default maximum number of incomplete (three-way handshake not yet finished) pending TCP connections for a TCP listener.

ndd -set /dev/tcp tcp_ip_abort_interval <value>

Refer to RFC 793 for more information on TCP three-way handshake.

ndd -set /dev/tcp tcp_ip_abort_interval <value>

The default total retransmission timeout value for a TCP connection in milliseconds. For a given TCP connection, if TCP has been re-transmitting for tcp_ip_abort_interval period and it has not received any acknowledgment from the other endpoint during this period, TCP closes this connection.

Previous      Contents      Index      Next

Part No: 817-7697. Copyright 2004 Sun Microsystems, Inc. All rights reserved.


Note	The tuning settings discussed in this section focus on Portal Server residing on the Solaris platform. However, the principles can be applied to other generic Unix type operating systems.

Category	Type	Name	Parameters	Usage
Analysis Tool	Solaris 8 and Solaris 9	mpstat		CPU utilization
		iostat		Disk I/O subsystem
		netstat		Network subsystem
			-I hme) 10	Interface bandwidth
			-sP tcp	TCP kernel module
			-a\|grep hostname\|wc-1	Socket connection count
	Portal Server on App Server container	verbose:gc		Garbage collection
Tuning Parameters	Solaris 8 and Solaris 9	/etc/system	Various	Performance
		/etc/rc2.d/ttuning parameters file	Various	TCP kernel tuning parameters

/etc/system Option	Description
set rlim_fd_max=<value>	"Hard" limit on file descriptors that a single process might have open. To override this limit requires superuser privilege.
set tcp:tcp_conn_hash_size=<value>	Controls the hash table size in the TCP module for all TCP connections. Along with tune_t_flushr, autoup controls the amount of memory examined for dirty pages in each invocation and frequency of file system sync operations.
set autoup=<value>	The value of autoup is also used to control whether a buffer is written out from the free list. Buffers marked with the B_DELWRI flag (file content pages that have changed) are written out whenever the buffer has been on the list for longer than autoup seconds. Increasing the value of autoup keeps the buffers around for a longer time in memory.
set tune_t_fsflushr=<value>	Specifies the number of seconds between fsflush invocations.
set rechoose_interval=<value>	Number of clock ticks before a process is deemed to have lost all affinity for the last CPU it ran on. After this interval expires, any CPU is considered a candidate for scheduling a thread. This parameter is relevant only for threads in the timesharing class. Real-time threads are scheduled on the first available CPU.

TCP/IP Options	Description
ndd -set /dev/tcp tcp_xmit_hiwat 65535 ndd -set /dev/tcp tcp_recv_hiwat 65535	The default send window size in bytes. The default receive window size in bytes.
ndd -set /dev/tcp tcp_cwnd_max 65535	The maximum value of TCP congestion window (cwnd) in bytes.
ndd -set /dev/tcp tcp_rexmit_interval_min 3000	The default minimum retransmission timeout (RTO) value in milliseconds. The calculated RTO for all TCP connections cannot be lower than this value.
ndd -set /dev/tcp tcp_rexmit_interval_max 10000	The default maximum retransmission timeout value (RTO) in milliseconds. The calculated RTO for all TCP connections cannot exceed this value.
ndd -set /dev/tcp tcp_rexmit_interval_initial 3000	The default initial retransmission timeout value (RTO) in milliseconds
ndd -set /dev/tcp tcp_time_wait_interval 60000	The time in milliseconds a TCP connection stays in TIME-WAIT state. Refer to RFC 1122, 4.2.2.13 for more information.
ndd -set /dev/tcp tcp_keepalive_interval 900000	The time in milliseconds a TCP connection stays in KEEP-ALIVE state. Refer to RFC 1122, 4.2.2.13 for more information.
ndd -set /dev/tcp tcp_conn_req_max_q <value>	The default maximum number of pending TCP connections for a TCP listener waiting to be accepted by accept(SOCKET).
ndd -set /dev/tcp tcp_conn_req_max_q0 <value>	The default maximum number of incomplete (three-way handshake not yet finished) pending TCP connections for a TCP listener.
ndd -set /dev/tcp tcp_ip_abort_interval <value>	Refer to RFC 793 for more information on TCP three-way handshake.
ndd -set /dev/tcp tcp_ip_abort_interval <value>	The default total retransmission timeout value for a TCP connection in milliseconds. For a given TCP connection, if TCP has been re-transmitting for tcp_ip_abort_interval period and it has not received any acknowledgment from the other endpoint during this period, TCP closes this connection.