C H A P T E R  7

Running an OpenSSP

OpenSSP refers to the ability to run third-party software on an SSP workstation or server. Third-party software consists of applications other than the SSP software and Solaris operating environment. Traditionally, SSP software has been the only application allowed to run on SSP workstations or servers. However, in an OpenSSP environment, lightweight software applications, such as boot disk management utilities and monitoring and backup agents, can be run on the SSP.

This chapter explains how to determine whether you have sufficient software and hardware resources for running third-party software on SSP workstations or servers used to host SSP software.

This chapter describes:


Impact of Third-Party Applications on the SSP

This section discusses the impact of third-party applications on the SSP. SSP software runs on a workstation or server that runs the Solaris operating environment. This workstation or server is dedicated to controlling and monitoring a Sun Enterprise 10000 system and must respond in a timely manner to incoming events from the control board for the system. This includes responding to environmental conditions, such as over-temperature boards, software problems (panics), and hardware problems (arbstops). If the SSP is delayed in responding, these events can be dropped or handled too late. Information about failures can be lost with no alerts or log trail. In a worst-case scenario, hardware can also be damaged due to lack of timely action by the SSP.

In previous releases, SSP software was the only application permitted to run on an SSP workstation or server. This limitation was required to prevent non-SSP software from interfering with the SSP critical mission of monitoring and controlling a Sun Enterprise 10000 system. This limitation is an important consideration for running SSP software on SPARCstation 5 workstations, which are slower and have less memory than the Sun Netra T1 server, or Sun Ultra 5 or Sun Enterprise 250 workstations. The only non-SSP processes previously allowed on an SSP have been system background programs that run the Solaris operating environment and Common Desktop Environment (CDE).However, due to increasing demand for third-party software agents that enable centralized control of multiple systems on the network, you can now run third-party applications on the SSP. These agents gather information about the Sun Enterprise 10000 system and the SSP and report back to a central server. Backup se! rvers are similarly structured, allowing centralized backup of multiple networked systems.

When you run third-party software on an SSP system, the primary goal must be non-interference with the SSP software and its ability to monitor and control the Sun Enterprise 10000 system. SSP software normally requires few hardware resources, but when it needs those resources, they must be immediately available.

You must perform the following tasks to determine whether you can run third-party software on the SSP:


Required Resources for an OpenSSP

For a list of the minimum OpenSSP software and hardware resource requirements, see TABLE 2-2 . Note that the OpenSSP requirements for disk space, processor speed, and CPU utilization are higher than the SSP minimum requirements because of the extra load expected on the SSP.

You must also keep current with the latest recommended patches for the Solaris operating environment and the SSP software, especially for performance improvements. In addition, consider using the Solaris 8 or 9 operating environment to fully benefit from SSP performance features. See the following sections for details.

Keeping Current with Recommended Patches

Sun produces a set of patches, Recommended and Security Solaris Patch Clusters, which contain Solaris software updates of universal interest for each version of the Solaris operating environment. These selected patches are important and highly recommended because they provide fixes for critical system, user, or security-related bugs. Some of these patches also fix performance problems. They are generally safe to apply, as opposed to higher-risk patches, or patches with new features, new drivers, or low-priority fixes that are not included in these patch clusters. A prudent system administrator keeps systems current with the latest recommended patch level to protect against system problems. Solaris Recommended Patch Clusters are available at the SunSolve Web site, http://sunsolve.Sun.COM/.

Update your SSP workstation on a regular basis with all the SSP patches available for the particular release of SSP software used, except for special-case patches noted in the patch README file. Apply these patches on a case-by-case basis. SSP patches are also available from the SunSolve website, http://sunsolve.Sun.COM/.

Using the Solaris 8 or 9 Operating Environment

To fully benefit from the SSP performance improvements in SSP 3.5, use the Solaris 8 or 9 operating environment (with the appropriate patches, as explained in SSP 3.5 Hardware and Software Requirements ) on the SSP workstation or server, if possible. The Solaris 8 or 9 software improvement most relevant to the SSP involves better thread handling. When a real-time SSP thread is blocked by a lower-priority thread, the kernel temporarily assigns a higher priority to the blocking thread to quickly complete and release the blocked resource, which results in faster SSP response time to Sun Enterprise 10000 events.

Other performance improvements in the Solaris 8 or 9 operating environment can affect third-party applications running on the SSP, especially if those applications communicate over a WAN or use a large number of open files or sockets. For details, see What's New in the Solaris 8 or 9 Operating Environment at
http://www.sun.com/software/solaris/whatsnew.html.


Estimating SSP and Third-Party Software Memory Usage

This section describes how to estimate the memory (real memory and swap space) requirements for third-party applications used on the SSP. When you measure the memory requirements of the SSP workstation, you must consider the cumulative requirements of all the applications running on the workstation as a whole, not just the impact of an individual application.

You must first determine the type of run-time environment involved by answering the following questions:

Your responses to the these questions are used to determine SSP and third-party memory usage, as explained in the following sections.

Calculating Memory Usage by Third-Party Applications

If an SSP is running third-party applications, you must determine how much virtual and real memory is used by those applications. This memory amount usually is found in the installation or administrator guide for the application.

If this memory information is not available, it can be easily calculated using the memory usage output from the pmap command. This command, which must be used when the system is not thrashing (paging at a high rate) and the application is in an active running state, displays output that shows how much resident memory the application requires when it is active, but not thrashing. For details on using the pmap command, see the following procedure, To Obtain the Virtual Memory Amount Used by an Application .

For information on determining whether a system is thrashing, see Verifying That You Have Sufficient Real Memory .


procedure icon  To Obtain the Virtual Memory Amount Used by an Application

1. Type the pmap -x command, followed by the process ID of the application.

The following example shows how to size an application called CST (Configuration and Service Tracker), which has one process, cstd . What this application does is not relevant here, as it is used only as an example of measuring memory usage.

# pgrep cstd 
406
# /usr/proc/bin/pmap -x 406
Address  Kbytes Resident Shared Private Permissions    Mapped File 
. . .
total Kb   2848    2496   1400   1096

The last line in this example shows that 1096 Kbytes of resident private memory is being used.

2. Calculate the virtual memory amount by subtracting the shared memory from the total memory, then round up the existing value.

In the example above, the virtual memory is 1448 Kbytes, which is derived by subtracting the shared memory (1400 Kbytes) from the total memory (2848 Kbytes).This value is then rounded up to 2 megabytes (MB). CST requires 1 MB of resident memory and 2 MBs of virtual memory.

3. If the third-party application has its own application-specific share libraries, add the real and virtual memory sizes of these libraries.

The virtual memory used for these shared libraries is approximately the same as the shared library ( *.so ) file size. The resident memory used by shared libraries is also shown in the pmap -x command output.


procedure icon  To Calculate the Memory Requirements For an SSP Using the OpenSSP Memory Worksheet

Use the OpenSSP memory worksheet to determine the virtual and real memory requirements for an SSP workstation or server. TABLE 7-1 is an example of a completed SSP memory worksheet, which contains sample entries in bold font. The following steps also explain how the sample entries were calculated. For details on how the predetermined values in the worksheet were derived, see Appendix B .

TABLE 7-1 Sample OpenSSP Memory Worksheet

Line

Item

Number

Real Memory (MB)

Virtual Memory (MB)

Real Memory Subtotal (MB)

Virtual Memory Subtotal (MB)

1

System

1

60

236

60

236

2

Base SSP

1

22

35

22

35

3

Domains (1-16)

4

3 x no. of domains

4 x no. of domains

12

16

4

Hostviews
(0 or more)

1

12 x no. of hostviews

17 x no. of hostviews

12

17

5

Sun Management Center (0 or 1)

0

0 or 26

0 or 31

0

0

6

Third-party applications

0

0

7

Subtotal (lines 1 through 6)

106

304

8

Kernel buffer memory (MB)

128 MB RAM

15% of RAM

19

9

Recommended real memory (lines 7 and 8)

125

10

Reserved for /tmp/ in swapfs

512

512

11

Subtract amount of real memory

-128

12

Recommended swap space size

(Virtual memory subtotal − real memory total) +
/tmp/ reserved

688


1. In line 3:

    a. In the Number column, enter the highest number of domains (1 to 16) that you expect to have for your Sun Enterprise 10000 system.

    b. Multiply the number of domains by 3 MBs and enter the result in the Real Memory Subtotal column.

    c. Multiply the number of domains by 4 MBs and enter the result in the Virtual Memory Subtotal column.

In line 3 of the example worksheet, 4 domains are specified, which results in 12 MBs for the Real Memory Subtotal and 16 MBs for the Virtual Memory Subtotal.

2. In line 4:

    a. In the Number column, enter the number of Hostview applications (the SSP GUI) that you expect to run at the same time. (This entry is usually 1).

    b. Multiply the number of Hostview applications by 12 MBs and enter the result in the Real Memory Subtotal column.

    c. Multiply the number of Hostview applications by 17 MBs and enter the result in the Virtual Memory Subtotal column.

In line 4 of the example worksheet, 1 Hostview application is specified. The Real Memory Subtotal value is 12 MBs, and the Virtual Memory Subtotal value is 17 MBs (see the last two columns).

3. If Sun Management Center is installed and running, enter the following in line 5:

    a. In the Number column, enter 1.

    b. In the Real Memory Subtotal column, enter 26 MBs.

    c. In the Virtual Memory Subtotal column, enter 31 MBs.

In line 5 of the example worksheet, Sun Management Center is not used, so 0 is entered in the Number, Real Memory Subtotal, and Virtual Memory Subtotal columns.

4. In line 6, enter the real and virtual memory amounts required for any third-party applications that will be running on the SSP workstation.

For details on estimating these memory requirements, see Calculating Memory Usage by Third-Party Applications .

In line 6 of the example worksheet, 0 is entered in the Real Memory Subtotal, and Virtual Memory Subtotal columns because no third-party applications are being used.

5. In line 7, subtotal the values in the Real Memory Subtotal column and the Virtual Memory Subtotal column.

In line 7 of the example worksheet, the subtotal for the Real Memory Subtotal values is 106, and the subtotal for the Virtual Memory Subtotal values is 304.

6. In line 8:

    a. In the Number column, enter the RAM that you will need. This number must be greater than 115% of the subtotal for the Real Memory entered in line 7. You must round this value up to the next 32 MB increment. The Solaris operating environment uses 15% of the RAM for kernel buffer memory.

    b. In the Real Memory Subtotal column, enter 15% of the RAM specified in the Real Memory Subtotal column. This is the amount of buffer memory used by the kernel.

In line 8 of the example worksheet, 128 MBs of RAM is specified in the Number column, which is greater than the 106 MBs of Real Memory Subtotal entered in line 7. Also, 15% of 128 MBs of RAM yields 19 MBs of kernel buffer memory.

7. In line 9, add the values from lines 7 and 8 and enter the resulting value in the Real Memory Subtotal column. This number typically ranges from 128 MBs to 256 MBs.

In line 9 of the example worksheet, adding 106 MBs and 19 MBs results in the minimum memory requirement of 125 MBs.

8. In line 11, the Virtual Memory Subtotal column, enter the negative value of the RAM supplied in line 8.

In line 8 of the example worksheet, the RAM value is 128 MBs, so -128 is specified in the Virtual Memory Subtotal column in line 11.

9. In line 12:

    a. Add the subtotals from lines 7 and 10, then subtract the virtual memory total (negative RAM) in line 11 from that amount.

    b. Enter the resulting value in the Virtual Memory Subtotal column. This number is the minimum swap space size needed by the SSP workstation and typically ranges from 512 MBs to 1 gigabyte (GB).

In line 12 of the example worksheet, 304 MBs of virtual memory is added to 512 MBs (for /tmp/), which results in 816 MBs. The real memory value, 128 MBs, is subtracted from 816 MBs, which yields a minimum swap file size of 688 MBs.



Note Note - The size limit for a swap partition is 2 GBs. However, you can add multiple swap partitions if needed.



Bottom line: The minimum amount of memory needed for an SSP workstation is 128 MBs. If you do not want to calculate the exact amount required, 256 MBs of memory is more than sufficient if you are using other monitoring software. One GB of swap space is more than sufficient for virtual memory and swapfs
(/tmp/) space.

Verifying That You Have Sufficient Real Memory

Virtual memory consists of real memory (RAM) and page file (swap) space on disk. Unlike some other systems, real memory for the Solaris operating environment is not mirrored in a swap file. It is no longer necessary to duplicate a page of swap for each page of real memory, so the old rule that "swap space size should be twice real memory size" no longer applies. The only swap space required is the amount of virtual memory that exceeds the real memory for your system.

The amount of virtual memory required depends on the working set model for a process. The working set is the set of pages a process needs to work effectively. A working set needs to be in real memory or the program may thrash. Thrashing occurs when there is insufficient real memory for all the working sets of a process. As a result, the system spends an excessive amount of time paging the process working sets in and out of swap space.

The working set for a program is defined as W(t, omega), which is the set of pages referenced from time (t - omega) to time t. Typically, a working set for a program does not change much over time, although it can change drastically on occasion. Increasing the time period, omega, does not have much effect on the working set. Pages currently in use are likely to be used in the near future. Memory outside the working set is rarely, if ever, used. Therefore, a program that uses only its working set in real memory and the remainder in swap space will perform almost as effectively as if all of its pages were in memory. This is true even though disk access time is about 100,000 times slower than RAM access time (about 10,000,000 nanoseconds versus 100 nanoseconds).

However, if there is insufficient real memory to keep the working set for a process in memory, the process can easily thrash and run more slowly. Running fewer processes or adding more real memory keeps the process working set in memory and stops the process from thrashing. Thrashing can affect the SSP ability to handle events in a timely manner, due to time-outs and lost SNMP traps. Thrashing can be prevented by properly sizing the SSP system for all applications that it runs.

The easiest way to determine whether a system is thrashing is to check the paging scan rate (sr). The kernel for the Solaris operating environment uses a page scanner, which scans a circular list of pages in memory in order to reclaim memory and swap it out to disk. Pages not referenced since the last cycle are paged out of memory. The scanner runs faster when demand for memory increases. If the demand is too high, memory in a working set for a process can be removed from real memory, which slows those processes. This can prevent SSP processes from reacting quickly to real-time events. Also, as its scan rate increases, the page scanner uses more CPU time.

If you suspect the system is thrashing, use the vmstat command to sample and display virtual memory statistics, as explained in the following procedure. This command adds little overhead and can run safely for long periods of time, if required.


procedure icon  To Determine Whether a System is Thrashing

1. Type vmstat followed by the number of times you want the command to sample, optionally followed by the frequency to sample, in seconds.

For example,

% vmstat 5 4

prints results every five seconds for four times, while

% vmstat 3 100 

prints results every three seconds for 100 times.

2. Review the sr column in the output displayed by the vmstat command.

Ignore the entries in the first row, as the values are cumulative based on when the system was booted. If subsequent values in the sr column are non-zero, the system is thrashing. Ignore the po (page-out) column, as those values includes swapfs (swap file system or /tmp/ ) activity.

The following example shows the vmstat output for a system that is thrashing:

% vmstat 3 4
procs    memory               page                 disk         faults       cpu
r  b w  swap  free  re  mf  pi  po  fr  de  sr dd f0 s0 --  in  sy   cs  us sy id
0  0 0 669856 7336  41 233   5   5   5 136   0  4  0  0  0 234 3107 1459 5  8  87
3  4 0 597232 2136 131 717 354 317 533 128 302 48  0  0  0 399 4889 2252 65 33  2
7  1 0 597120 3408 175 745 197 133 218   0 137 61  0  0  0 430 4757 2130 67 33  0
11 0 0 595832 2456 145 757 184 221 376 424 272 26  0  0  0 378 5235 2380 65 35  0

In the output above, note the sr column but ignore the first entry. The sr values are 302, 137, 272, and 121, which indicate the system is thrashing heavily.

The next example shows a system that is not thrashing. The values in the sr column are zero, indicating that the Solaris kernel is not performing excessive page scanning to free pages:

% vmstat 3 7
procs    memory              page                disk           faults     cpu
r  b w  swap  free  re  mf  pi  po  fr  de  sr dd f0 s0 --  in  sy   cs  us sy id
0 0 0  672728 8376  41 236   4   5   5   0   0  4  0  0  0 243 3358 1585  6  9 85
0 0 0  672472 6960  29  46   0   0   0   0   0  0  0  0  0 239 3858 1924  8  4 88
0 0 0  672488 6992  59 374   0   0   0   0   0  0  0  0  0 237 4215 1933  5 11 83
0 0 0  666968 7248  87 811   2   0   0   0   0 13  0  0  0 266 4971 1938 24 29 47
0 0 0  672520 7200  47 176   0   0   0   0   0  0  0  0  0 292 4043 2043  8  6 86
0 0 0  672520 7200   0   0   0   0   0   0   0  0  0  0  0 240 3516 1861  4  0 96
0 0 0  672520 7200  31  74   0   0   0   0   0  0  0  0  0 235 3726 1876 14  4 82


Calculating CPU Utilization

This section describes how to calculate SSP CPU utilization. Average CPU utilization should be under 65%. This amount allows sufficient CPU resources for SSP software to immediately handle error conditions on a Sun Enterprise 10000 system.

A system with low CPU utilization, for example 25%, does not necessarily indicate that the system is performing poorly, nor should you consider replacing it with a lower-end system or finding more work for the system to handle.

For a batch-processing system, where response times are not as critical, high CPU utilization is usually preferred. However, for an interactive or a real-time controlling system such as SSP, response time is more critical than high CPU utilization. High utilization leads to slower response time, as noted in queuing theory. As utilization approaches 100%, the wait time increases exponentially.

Queuing theory uses models to predict utilization and wait time for a client/server system. The operations in a retail bank, hospital emergency room, or computer server are examples of a client/server system. One of the basic assumptions in queuing theory is that the arrival time between two customers is an exponential distribution. In other words, long periods between customer arrivals are more unlikely than short periods.

Queuing theory is best illustrated by an example. Assume there is a small town bank, the Bank of Ethel, which has one teller (Ethel), and several customers.

Ethel's utilization (how busy she is) can be determined using the following formula for a single-server model (one teller):

where r (rho) is the proportion of time the servers (tellers) are busy (on a scale of 0 to 1.0, where 0 is no customers at all and 1 indicates the server is completely busy), l (lambda) is the mean arrival rate, and m (mu) is the mean service rate per server.

In this example, if two customers come to the Bank of Ethel every hour and Ethel serves an average of six customers an hour, λ = 2, µ = 6, and Ethel's utilization is: rho = 2 / 6 = 1/3, or about 0.33. Multiply rho by 100 to convert rho to percent, for example, 33%.

The number of customers expected in the bank at any one time is:

In the above example, the expected number of customers is L = 22 /6(6 - 2) = 1/6 or about 0.17. That is, on average there will be about 0.17 customers at the Bank of Ethel, which is not very busy.

What if the number of customer arrivals increases from two to five per hour? Then l= 5, and the utilization will be rho = 5/6 or about 0.83. This means that Ethel will be serving a customer 83% of the time. However, this has a drastic effect on L, the expected number of customers in the bank. In this example, L = 52/6(6 - 5) = 25/6 or about 4.17 customers in the bank, on the average. The number of customers waiting for service is (L - 1), so in this example, there are about 3.17 customers waiting for service. This example shows why high utilization and immediate service are not possible at the same time.

For an SSP workstation, assume the CPU of the SSP workstation is the server. Utilization, rho, is the percentage that the CPU is in use. The CPU "customers" are processes that are either being serviced by the CPU or waiting in the queue. The load average shown by various commands in the Solaris operating environment is (L - 1), which represents the number of waiting customers.

FIGURE 7-1 illustrates how increased utilization drastically increases customer wait time.

FIGURE 7-1 Utilization and Customers Expected

The X-axis is the utilization of a single server, ranging from 50% to 90%. The Y-axis shows L, the expected number of customers at any one time for a given level of utilization, rho. If L is greater than or equal to 2, at least one customer is always waiting. At 60% utilization or less, almost no one is waiting for service. When utilization exceeds 72% or so, a customer is almost always waiting. When utilization exceeds 80%, multiple customers are usually waiting.

To summarize, for quicker service, you must sacrifice high utilization. The highest utilization you can have without having customers wait for service is usually about 65%.

SSP CPU Requirements

The following table shows the approximate CPU utilization by the SSP software at its busiest state (bringup of domains):

TABLE 7-2 CPU Utilization for SSP Software

Domains

Netra T1 Server or Sun Ultra 5 Workstation Average CPU Usage

Sun Enterprise 250 Workstation Average CPU Usage

1 to 4

24%

17%

5 to 8

29%

18%

9 to 16

31%

20%


If you are using the Sun Management Center, add the CPU overhead values in TABLE 7-2 :

TABLE 7-3 Additional CPU Utilization for Sun Management Software

Domains

Netra T1 Server or Sun Ultra 5 Workstation Average CPU Usage

Sun Enterprise 250 Workstation Average CPU Usage

1 to 8

10%

5%

9 to 16

20%

10%


If you are using SunScreen trademark , add the following CPU overhead to the values in TABLE 7-1 :

TABLE 7-4 Additional CPU Utilization for SunScreen

Netra T1 Server or Sun Ultra 5 Workstation AverageCPU Usage

Sun Enterprise 250 Workstation Average

CPU Usage

2%

4%



Determining Hardware for an OpenSSP

This section describes how to determine the appropriate hardware for an OpenSSP. Your SSP workstation or server must have enough hardware resources to operate the maximum number of domains planned on the Sun Enterprise 10000 system it controls, as well as what is required to run third-party applications.

The proper hardware technology depends on the usage profile of the third-party software. Less intensive software requires less CPU and memory than resource-hungry applications. You must determine the CPU and real memory resources needed for these applications before determining the appropriate workstation hardware to use.

Using a Faster Processor

If CPU utilization is too high for your SSP configuration (over 65%), using a faster processor will reduce CPU utilization dramatically. For example, when bringing up 16 domains on a Sun Ultra 5 workstation, an UltraSPARC II processor running at 360 MHz has about one-half the CPU utilization as a slower CPU running at 270 MHz. A faster processor is also appropriate if you add monitoring or third-party software, such as Sun Management Center, which uses a lot of CPU.

When comparing CPUs, remember to consider the CPU family and L2 cache size, in addition to CPU speed. The UltraSPARC II processor on the Sun Enterprise 250 workstation is faster than the UltraSPARC IIi processor on the Sun Ultra 5 workstation. The UltraSPARC II processor comes with more L2 cache (1 to 4 MBs) than the UltraSPARC IIi processor (256 Kbytes to 2 MBs).

Adding More Memory

If a system does not have sufficient memory, it will thrash--that is, swap pages excessively in and out of real memory. Be sure that you have sufficient memory modules to prevent thrashing and enable your software to run with fewer interruptions on the SSP workstation.

Adding More Swap Space

You can add more swap space if needed to improve system reliability, even though it will not improve system performance. If a system runs out of memory, processes cannot allocate more memory and will begin to fail. Swap space is required to save inactive processes and memory regions, and to handle overflow in swapfs (/tmp/). Additionally, automatic SSP failover may need to occasionally propagate large files, such as SSP log files and possibly user-specified files (listed in $SSPVAR/.ssp_private/user_file_list, which identifies various data files, including those used by third-party applications, to be synchronized for failover purposes). Therefore, it is important to have an adequately-sized swap file to hold these files.

Adding More Disk Space

The SSP requires 1 GB of unused disk space for the file system containing /var/ , which is used to store SSP log files (under /var/opt/SUNWssp/ and /var/adm/ ) and SSP backup files (usually under /var/tmp/ ). If the file system fills up, the SSP can exhibit strange behavior, such as freezing, respawning processes, or login failures, and event information might not be saved in logs.