|C H A P T E R 7|
Running an OpenSSP
OpenSSP refers to the ability to run third-party software on an SSP workstation or server. Third-party software consists of applications other than the SSP software and Solaris operating environment. Traditionally, SSP software has been the only application allowed to run on SSP workstations or servers. However, in an OpenSSP environment, lightweight software applications, such as boot disk management utilities and monitoring and backup agents, can be run on the SSP.
This section discusses the impact of third-party applications on the SSP. SSP software runs on a workstation or server that runs the Solaris operating environment. This workstation or server is dedicated to controlling and monitoring a Sun Enterprise 10000 system and must respond in a timely manner to incoming events from the control board for the system. This includes responding to environmental conditions, such as over-temperature boards, software problems (panics), and hardware problems (arbstops). If the SSP is delayed in responding, these events can be dropped or handled too late. Information about failures can be lost with no alerts or log trail. In a worst-case scenario, hardware can also be damaged due to lack of timely action by the SSP.
In previous releases, SSP software was the only application permitted to run on an SSP workstation or server. This limitation was required to prevent non-SSP software from interfering with the SSP critical mission of monitoring and controlling a Sun Enterprise 10000 system. This limitation is an important consideration for running SSP software on SPARCstation 5 workstations, which are slower and have less memory than the Sun Netra T1 server, or Sun Ultra 5 or Sun Enterprise 250 workstations. The only non-SSP processes previously allowed on an SSP have been system background programs that run the Solaris operating environment and Common Desktop Environment (CDE).However, due to increasing demand for third-party software agents that enable centralized control of multiple systems on the network, you can now run third-party applications on the SSP. These agents gather information about the Sun Enterprise 10000 system and the SSP and report back to a central server. Backup se! rvers are similarly structured, allowing centralized backup of multiple networked systems.
When you run third-party software on an SSP system, the primary goal must be non-interference with the SSP software and its ability to monitor and control the Sun Enterprise 10000 system. SSP software normally requires few hardware resources, but when it needs those resources, they must be immediately available.
Verify that your SSP workstation or server meets the minimum software and hardware requirements for OpenSSP (see Required Resources for an OpenSSP ).
Estimate the amount of real memory and swap space required by both SSP and third-party software (see Estimating SSP and Third-Party Software Memory Usage ).
For a list of the minimum OpenSSP software and hardware resource requirements, see TABLE 2-2 . Note that the OpenSSP requirements for disk space, processor speed, and CPU utilization are higher than the SSP minimum requirements because of the extra load expected on the SSP.
You must also keep current with the latest recommended patches for the Solaris operating environment and the SSP software, especially for performance improvements. In addition, consider using the Solaris 8 or 9 operating environment to fully benefit from SSP performance features. See the following sections for details.
Sun produces a set of patches, Recommended and Security Solaris Patch Clusters, which contain Solaris software updates of universal interest for each version of the Solaris operating environment. These selected patches are important and highly recommended because they provide fixes for critical system, user, or security-related bugs. Some of these patches also fix performance problems. They are generally safe to apply, as opposed to higher-risk patches, or patches with new features, new drivers, or low-priority fixes that are not included in these patch clusters. A prudent system administrator keeps systems current with the latest recommended patch level to protect against system problems. Solaris Recommended Patch Clusters are available at the SunSolve Web site, http://sunsolve.Sun.COM/.
Update your SSP workstation on a regular basis with all the SSP patches available for the particular release of SSP software used, except for special-case patches noted in the patch README file. Apply these patches on a case-by-case basis. SSP patches are also available from the SunSolve website, http://sunsolve.Sun.COM/.
To fully benefit from the SSP performance improvements in SSP 3.5, use the Solaris 8 or 9 operating environment (with the appropriate patches, as explained in SSP 3.5 Hardware and Software Requirements ) on the SSP workstation or server, if possible. The Solaris 8 or 9 software improvement most relevant to the SSP involves better thread handling. When a real-time SSP thread is blocked by a lower-priority thread, the kernel temporarily assigns a higher priority to the blocking thread to quickly complete and release the blocked resource, which results in faster SSP response time to Sun Enterprise 10000 events.
Other performance improvements in the Solaris 8 or 9 operating environment can affect third-party applications running on the SSP, especially if those applications communicate over a WAN or use a large number of open files or sockets. For details, see What's New in the Solaris 8 or 9 Operating Environment at
This section describes how to estimate the memory (real memory and swap space) requirements for third-party applications used on the SSP. When you measure the memory requirements of the SSP workstation, you must consider the cumulative requirements of all the applications running on the workstation as a whole, not just the impact of an individual application.
What is the maximum number of domains running on the Sun Enterprise 10000 system?
How many Hostview applications could be running at the same time?
Is Sun Management Center installed on the SSP?
Are third-party applications running on the SSP?
If an SSP is running third-party applications, you must determine how much virtual and real memory is used by those applications. This memory amount usually is found in the installation or administrator guide for the application.
If this memory information is not available, it can be easily calculated using the memory usage output from the pmap command. This command, which must be used when the system is not thrashing (paging at a high rate) and the application is in an active running state, displays output that shows how much resident memory the application requires when it is active, but not thrashing. For details on using the pmap command, see the following procedure, To Obtain the Virtual Memory Amount Used by an Application .
For information on determining whether a system is thrashing, see Verifying That You Have Sufficient Real Memory .
The following example shows how to size an application called CST (Configuration and Service Tracker), which has one process, cstd . What this application does is not relevant here, as it is used only as an example of measuring memory usage.
In the example above, the virtual memory is 1448 Kbytes, which is derived by subtracting the shared memory (1400 Kbytes) from the total memory (2848 Kbytes).This value is then rounded up to 2 megabytes (MB). CST requires 1 MB of resident memory and 2 MBs of virtual memory.
The virtual memory used for these shared libraries is approximately the same as the shared library ( *.so ) file size. The resident memory used by shared libraries is also shown in the pmap -x command output.
To Calculate the Memory Requirements For an SSP Using the OpenSSP Memory Worksheet
Use the OpenSSP memory worksheet to determine the virtual and real memory requirements for an SSP workstation or server. TABLE 7-1 is an example of a completed SSP memory worksheet, which contains sample entries in bold font. The following steps also explain how the sample entries were calculated. For details on how the predetermined values in the worksheet were derived, see Appendix B .
a. In the Number column, enter the highest number of domains (1 to 16) that you expect to have for your Sun Enterprise 10000 system.
a. In the Number column, enter the number of Hostview applications (the SSP GUI) that you expect to run at the same time. (This entry is usually 1).
a. In the Number column, enter 1.
For details on estimating these memory requirements, see Calculating Memory Usage by Third-Party Applications .
a. In the Number column, enter the RAM that you will need. This number must be greater than 115% of the subtotal for the Real Memory entered in line 7. You must round this value up to the next 32 MB increment. The Solaris operating environment uses 15% of the RAM for kernel buffer memory.
In line 8 of the example worksheet, 128 MBs of RAM is specified in the Number column, which is greater than the 106 MBs of Real Memory Subtotal entered in line 7. Also, 15% of 128 MBs of RAM yields 19 MBs of kernel buffer memory.
a. Add the subtotals from lines 7 and 10, then subtract the virtual memory total (negative RAM) in line 11 from that amount.
In line 12 of the example worksheet, 304 MBs of virtual memory is added to 512 MBs (for /tmp/), which results in 816 MBs. The real memory value, 128 MBs, is subtracted from 816 MBs, which yields a minimum swap file size of 688 MBs.
Bottom line: The minimum amount of memory needed for an SSP workstation is 128 MBs. If you do not want to calculate the exact amount required, 256 MBs of memory is more than sufficient if you are using other monitoring software. One GB of swap space is more than sufficient for virtual memory and swapfs
Virtual memory consists of real memory (RAM) and page file (swap) space on disk. Unlike some other systems, real memory for the Solaris operating environment is not mirrored in a swap file. It is no longer necessary to duplicate a page of swap for each page of real memory, so the old rule that "swap space size should be twice real memory size" no longer applies. The only swap space required is the amount of virtual memory that exceeds the real memory for your system.
The amount of virtual memory required depends on the working set model for a process. The working set is the set of pages a process needs to work effectively. A working set needs to be in real memory or the program may thrash. Thrashing occurs when there is insufficient real memory for all the working sets of a process. As a result, the system spends an excessive amount of time paging the process working sets in and out of swap space.
The working set for a program is defined as W(t, ), which is the set of pages referenced from time (t - ) to time t. Typically, a working set for a program does not change much over time, although it can change drastically on occasion. Increasing the time period, , does not have much effect on the working set. Pages currently in use are likely to be used in the near future. Memory outside the working set is rarely, if ever, used. Therefore, a program that uses only its working set in real memory and the remainder in swap space will perform almost as effectively as if all of its pages were in memory. This is true even though disk access time is about 100,000 times slower than RAM access time (about 10,000,000 nanoseconds versus 100 nanoseconds).
However, if there is insufficient real memory to keep the working set for a process in memory, the process can easily thrash and run more slowly. Running fewer processes or adding more real memory keeps the process working set in memory and stops the process from thrashing. Thrashing can affect the SSP ability to handle events in a timely manner, due to time-outs and lost SNMP traps. Thrashing can be prevented by properly sizing the SSP system for all applications that it runs.
The easiest way to determine whether a system is thrashing is to check the paging scan rate (sr). The kernel for the Solaris operating environment uses a page scanner, which scans a circular list of pages in memory in order to reclaim memory and swap it out to disk. Pages not referenced since the last cycle are paged out of memory. The scanner runs faster when demand for memory increases. If the demand is too high, memory in a working set for a process can be removed from real memory, which slows those processes. This can prevent SSP processes from reacting quickly to real-time events. Also, as its scan rate increases, the page scanner uses more CPU time.
If you suspect the system is thrashing, use the vmstat command to sample and display virtual memory statistics, as explained in the following procedure. This command adds little overhead and can run safely for long periods of time, if required.
To Determine Whether a System is Thrashing
Ignore the entries in the first row, as the values are cumulative based on when the system was booted. If subsequent values in the sr column are non-zero, the system is thrashing. Ignore the po (page-out) column, as those values includes swapfs (swap file system or /tmp/ ) activity.
This section describes how to calculate SSP CPU utilization. Average CPU utilization should be under 65%. This amount allows sufficient CPU resources for SSP software to immediately handle error conditions on a Sun Enterprise 10000 system.
A system with low CPU utilization, for example 25%, does not necessarily indicate that the system is performing poorly, nor should you consider replacing it with a lower-end system or finding more work for the system to handle.
For a batch-processing system, where response times are not as critical, high CPU utilization is usually preferred. However, for an interactive or a real-time controlling system such as SSP, response time is more critical than high CPU utilization. High utilization leads to slower response time, as noted in queuing theory. As utilization approaches 100%, the wait time increases exponentially.
Queuing theory uses models to predict utilization and wait time for a client/server system. The operations in a retail bank, hospital emergency room, or computer server are examples of a client/server system. One of the basic assumptions in queuing theory is that the arrival time between two customers is an exponential distribution. In other words, long periods between customer arrivals are more unlikely than short periods.
where r (rho) is the proportion of time the servers (tellers) are busy (on a scale of 0 to 1.0, where 0 is no customers at all and 1 indicates the server is completely busy), l (lambda) is the mean arrival rate, and m (mu) is the mean service rate per server.
In this example, if two customers come to the Bank of Ethel every hour and Ethel serves an average of six customers an hour, λ = 2, µ = 6, and Ethel's utilization is: = 2 / 6 = 1/3, or about 0.33. Multiply by 100 to convert to percent, for example, 33%.
What if the number of customer arrivals increases from two to five per hour? Then l= 5, and the utilization will be = 5/6 or about 0.83. This means that Ethel will be serving a customer 83% of the time. However, this has a drastic effect on L, the expected number of customers in the bank. In this example, L = 52/6(6 - 5) = 25/6 or about 4.17 customers in the bank, on the average. The number of customers waiting for service is (L - 1), so in this example, there are about 3.17 customers waiting for service. This example shows why high utilization and immediate service are not possible at the same time.
For an SSP workstation, assume the CPU of the SSP workstation is the server. Utilization, , is the percentage that the CPU is in use. The CPU "customers" are processes that are either being serviced by the CPU or waiting in the queue. The load average shown by various commands in the Solaris operating environment is (L - 1), which represents the number of waiting customers.
FIGURE 7-1 illustrates how increased utilization drastically increases customer wait time.
The X-axis is the utilization of a single server, ranging from 50% to 90%. The Y-axis shows L, the expected number of customers at any one time for a given level of utilization, . If L is greater than or equal to 2, at least one customer is always waiting. At 60% utilization or less, almost no one is waiting for service. When utilization exceeds 72% or so, a customer is almost always waiting. When utilization exceeds 80%, multiple customers are usually waiting.
If you are using the Sun Management Center, add the CPU overhead values in TABLE 7-2 :
If you are using SunScreen , add the following CPU overhead to the values in TABLE 7-1 :
This section describes how to determine the appropriate hardware for an OpenSSP. Your SSP workstation or server must have enough hardware resources to operate the maximum number of domains planned on the Sun Enterprise 10000 system it controls, as well as what is required to run third-party applications.
The proper hardware technology depends on the usage profile of the third-party software. Less intensive software requires less CPU and memory than resource-hungry applications. You must determine the CPU and real memory resources needed for these applications before determining the appropriate workstation hardware to use.
If CPU utilization is too high for your SSP configuration (over 65%), using a faster processor will reduce CPU utilization dramatically. For example, when bringing up 16 domains on a Sun Ultra 5 workstation, an UltraSPARC II processor running at 360 MHz has about one-half the CPU utilization as a slower CPU running at 270 MHz. A faster processor is also appropriate if you add monitoring or third-party software, such as Sun Management Center, which uses a lot of CPU.
When comparing CPUs, remember to consider the CPU family and L2 cache size, in addition to CPU speed. The UltraSPARC II processor on the Sun Enterprise 250 workstation is faster than the UltraSPARC IIi processor on the Sun Ultra 5 workstation. The UltraSPARC II processor comes with more L2 cache (1 to 4 MBs) than the UltraSPARC IIi processor (256 Kbytes to 2 MBs).
If a system does not have sufficient memory, it will thrash--that is, swap pages excessively in and out of real memory. Be sure that you have sufficient memory modules to prevent thrashing and enable your software to run with fewer interruptions on the SSP workstation.
You can add more swap space if needed to improve system reliability, even though it will not improve system performance. If a system runs out of memory, processes cannot allocate more memory and will begin to fail. Swap space is required to save inactive processes and memory regions, and to handle overflow in swapfs (/tmp/). Additionally, automatic SSP failover may need to occasionally propagate large files, such as SSP log files and possibly user-specified files (listed in $SSPVAR/.ssp_private/user_file_list, which identifies various data files, including those used by third-party applications, to be synchronized for failover purposes). Therefore, it is important to have an adequately-sized swap file to hold these files.
The SSP requires 1 GB of unused disk space for the file system containing /var/ , which is used to store SSP log files (under /var/opt/SUNWssp/ and /var/adm/ ) and SSP backup files (usually under /var/tmp/ ). If the file system fills up, the SSP can exhibit strange behavior, such as freezing, respawning processes, or login failures, and event information might not be saved in logs.