Chapter 6 The Production Environment

This chapter describes how to monitor and tune Sun Java™ System Portal Server software, including the Sun Java™ System Portal Server Secure Remote Access product.

Moving to a Production Environment

Monitoring Portal Server

Moving to a Production Environment

Moving to a production environment occurs after you have thoroughly tested your portal and operated it as a trial deployment to test and refine your design.

Monitoring and Tuning

Monitoring and tuning your portal deployment is an ongoing, cyclical process, in which you look for bottlenecks and other performance issues.

Beginning with the trial portal, define a baseline performance for your deployment, before you add in the full complexity of the project.

Using this initial benchmark, define the transaction volume your organization is committed to supporting in the short term and in the long run.

Determine whether your current physical infrastructure is capable of supporting the transaction volume requirement you have defined. Identify services that will be the first to max out as you increase the activity to the portal. This will indicate the amount of headroom you have as well as identify where to expend your energies.

Measure and monitor your traffic regularly to verify your model.

Use the model for long-range scenario planning. Understand how dramatically you will have to change your deployment to meet your overall growth projections for upcoming years.

In a production system, keep the error logging level to ERROR and not MESSAGE. The MESSAGE error level is verbose and can cause the file system to quickly run out of disk space. The ERROR level logs all error conditions and exceptions.

Run the perftune script on one of your production servers to know if the thread limits are being reached and if you need to further tune web server parameters.

The perftune script (located in the portal-server-install-root/SUNWps/bin directory), bundled with Sun Java System Portal Server, automates most of the tuning process discussed in this guide.

See the Portal Server 6 Administration Guide for information on the perftune script.

Documenting the Portal

A comprehensive set of documentation on how your portal functions is an important mechanism to increasing the supportability of the system. The different areas that need to be documented to create a supportable solution include:

System architecture

Software installation and configuration

Operational procedures, also known as a “run book”

Software customizations

Custom code

Third-party products integration

The run book outlines troubleshooting techniques as well as the deployment life cycle. Make this book available during the training and transfer of knowledge phase of the project.



Tip	Do not wait until the end of the deployment project, when time and money are usually running short, to begin this documentation phase. Documenting your portal should occur as an ongoing activity throughout the entire deployment.

Monitoring Portal Server

This section describes the variables that affect portal performance, as well as the portal monitoring you can perform. Areas to monitor include:

Sun Java™ System Identity Server

Portal Desktop

Sun Java™ System Directory Server

Java™ Virtual Machine

While there are emerging technologies that will enable you to perform detailed monitoring of Portal Server services, this section focuses on the basic but extensive set of hardware and software issues that determine the overall performance of a portal deployment.

Specifically, portal performance is determined by the throughput and latency of which it is capable over a period of time. You must conduct a baseline performance analysis as soon as possible. The baseline performance analysis confirms that your portal substantially conforms to published performance numbers. Establishing a performance baseline helps you to understand infrastructure issues that can severely impact the performance of a production portal.

Nevertheless, when maintaining a properly performing portal, you must look at a broad set of issues. The following sections explain those issues in terms of portal performance variables and provides guidelines for determining portal efficiency.

Memory Consumption and Garbage Collection

Before reading this section, read the following document on tuning garbage collection with the Java Virtual Machine, version 1.4.2:


Note	These rules also apply for performance, scalability, and stress tests.

Portal Server requires substantial amounts of memory to provide the highest possible throughput. In fact, the portal-server-install-root/SUNWps/bin/perftune tuning script sets the heap size maximum to 2 GB. This size is divided between the new generation, which receives 256 MB (one eighth of the space) and the old generation, which receives the rest. At initialization, a maximum address space is virtually reserved but does not allocate physical memory unless it is needed. The complete address space reserved for object memory can be divided into the young and old generations.

Most applications suggest using a larger percentage of the total heap for the new generation, but in the case of Portal Server, using only one eighth the space for the young generation is appropriate, because most memory used by Portal Server is long-lived. The sooner the memory is copied to the old generation the better the garbage collection (GC) performance.

Even with a large heap size, after a portal instance has been running under moderate load for a few days, most of the heap will appear to be used because of the lazy nature of the GC. The GC will start performing full garbage collections until the resident set size (RSS) reaches approximately 85 percent of the total heap space. Those garbage collections can have a measurable impact on performance.

For example, on a 900 MHz UltraSPARCIII™, a full GC on a 2 GB heap can take over ten seconds. During that period of time, the system is unavailable to respond to web requests. During a reliability test, full GCs are clearly visible as spikes in the response time and it is important to understand the impact on performance and the frequency of full GCs. In production, full GCs will go unnoticed most of the time, but any monitoring scripts that measure the performance of the system need to account for the possibility that a full GC might occur.

Measuring the frequency of full GCs is sometimes the only way to determine if the system has a memory leak. It is important to conduct an analysis that shows the expected frequency (of a baseline system) and compare that to the observed rate of full GCs. To record the frequency of GCs, use the vebose:gc JVM™ parameter.

CPU Utilization

When deployed using the building module concept (as described in Chapter 5, "Creating Your Portal Design"), Portal Server has a capable, scalable CPU architecture that also degrades gracefully under high loads.

However, when monitoring a production site, track CPU utilization over time. Load usually comes in spikes and keeping ahead of those spikes involves a careful assessment of availability capabilities.

Most organizations find that portal sites are “sticky” in nature. This means that site usage grows over time, even when the size of the user community is fixed, as users become more comfortable with the site. When the size of the user community also grows over time a successful portal site can see a substantial growth in the CPU requirements over a short period of time.

When monitoring a portal server’s CPU utilization, determine the average page latency during peak load and how that differs from the average latency.

Expect peak loads to be four to eight times higher than the average load, but over short periods of time.

Identity Server Cache and Sessions

The performance of a portal system is affected to a large extent by the cache hit ratio of the Identity Server cache. This cache is highly tunable, but there is a trade-off between memory used by this cache and the available memory in the rest of the heap.

You can enable the amSSO and amSDKStats logs to monitor the number of active sessions on the server and the efficiency of the Sun Java System Directory Server cache. These logs are located by default in the /var/opt/SUNWam/debug directory. Use the com.iplanet.am.stats.interval parameter to set the logging interval. Do not use a value less than five (5) seconds. Values of 30 to 60 seconds give good output without impacting performance.

The com.iplanet.services.stats.directory parameter specifies the log location, whether to a file or to the console, and also is used to turn off the logs. You must restart the server for changes to take effect. Logs are not created until the system detects activity.


Note	Multiple web container instances write logs to the same file.

The cache hit ratio displayed in the amSDKStats file gives both an internal value and an overall value since the server was started. Once a user logs in, the user’s session information remains in cache indefinitely or until the cache is filled up. When the cache is full, oldest entries are removed first. If the server has not needed to remove a user’s entry, it might be the case that on a subsequent login—days later, for example—the user’s information will be retrieved from the cache. Much better performance occurs with high hit ratios. A hit ratio of a minimum of 80 percent is a good target although (if possible) an even higher ratio is desired.

Thread Usage

Use the web container tools to monitor the number of threads being used to service requests. In general, the number of threads actually used is generally lower than many estimates, especially in production sites where CPU utilization usually is far less than 100 percent.

Portal Usage Information

Portal Server does not include a built-in reporting mechanism to monitor portal usage information by portal users. This includes which channels are accessed, how long they are accessed, and the ability to build a user behavioral pattern of the portal. However, it is relatively simple to build a simple Java™ servlet that would intercept every Portal Server Desktop request, extract the SSO token, save the user access information to a log, then redirect the user to the intended URL. Such a construct would be based on custom attribute extensions to the Identity Server schema.

Previous Contents Index Next
Sun Java System Portal Server 6 2004Q2 Deployment Planning Guide