Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun Java System Portal Server 6 2004Q2 Deployment Planning Guide 

Chapter 6
The Production Environment

This chapter describes how to monitor and tune Sun Java™ System Portal Server software, including the Sun Java™ System Portal Server Secure Remote Access product.

This chapter contains the following sections:


Moving to a Production Environment

Moving to a production environment occurs after you have thoroughly tested your portal and operated it as a trial deployment to test and refine your design.

Monitoring and Tuning

Monitoring and tuning your portal deployment is an ongoing, cyclical process, in which you look for bottlenecks and other performance issues.

With monitoring and tuning your portal, keep the following points in mind:

Documenting the Portal

A comprehensive set of documentation on how your portal functions is an important mechanism to increasing the supportability of the system. The different areas that need to be documented to create a supportable solution include:

The run book outlines troubleshooting techniques as well as the deployment life cycle. Make this book available during the training and transfer of knowledge phase of the project.


Tip

Do not wait until the end of the deployment project, when time and money are usually running short, to begin this documentation phase. Documenting your portal should occur as an ongoing activity throughout the entire deployment.



Monitoring Portal Server

This section describes the variables that affect portal performance, as well as the portal monitoring you can perform. Areas to monitor include:

While there are emerging technologies that will enable you to perform detailed monitoring of Portal Server services, this section focuses on the basic but extensive set of hardware and software issues that determine the overall performance of a portal deployment.

Specifically, portal performance is determined by the throughput and latency of which it is capable over a period of time. You must conduct a baseline performance analysis as soon as possible. The baseline performance analysis confirms that your portal substantially conforms to published performance numbers. Establishing a performance baseline helps you to understand infrastructure issues that can severely impact the performance of a production portal.

Nevertheless, when maintaining a properly performing portal, you must look at a broad set of issues. The following sections explain those issues in terms of portal performance variables and provides guidelines for determining portal efficiency.


Note

These rules also apply for performance, scalability, and stress tests.


Memory Consumption and Garbage Collection

Before reading this section, read the following document on tuning garbage collection with the Java Virtual Machine, version 1.4.2:

http://java.sun.com/docs/hotspot/gc1.4.2/index.html

Portal Server requires substantial amounts of memory to provide the highest possible throughput. In fact, the portal-server-install-root/SUNWps/bin/perftune tuning script sets the heap size maximum to 2 GB. This size is divided between the new generation, which receives 256 MB (one eighth of the space) and the old generation, which receives the rest. At initialization, a maximum address space is virtually reserved but does not allocate physical memory unless it is needed. The complete address space reserved for object memory can be divided into the young and old generations.

Most applications suggest using a larger percentage of the total heap for the new generation, but in the case of Portal Server, using only one eighth the space for the young generation is appropriate, because most memory used by Portal Server is long-lived. The sooner the memory is copied to the old generation the better the garbage collection (GC) performance.

Even with a large heap size, after a portal instance has been running under moderate load for a few days, most of the heap will appear to be used because of the lazy nature of the GC. The GC will start performing full garbage collections until the resident set size (RSS) reaches approximately 85 percent of the total heap space. Those garbage collections can have a measurable impact on performance.

For example, on a 900 MHz UltraSPARCIII™, a full GC on a 2 GB heap can take over ten seconds. During that period of time, the system is unavailable to respond to web requests. During a reliability test, full GCs are clearly visible as spikes in the response time and it is important to understand the impact on performance and the frequency of full GCs. In production, full GCs will go unnoticed most of the time, but any monitoring scripts that measure the performance of the system need to account for the possibility that a full GC might occur.

Measuring the frequency of full GCs is sometimes the only way to determine if the system has a memory leak. It is important to conduct an analysis that shows the expected frequency (of a baseline system) and compare that to the observed rate of full GCs. To record the frequency of GCs, use the vebose:gc JVM™ parameter.

CPU Utilization

When deployed using the building module concept (as described in Chapter 5, "Creating Your Portal Design"), Portal Server has a capable, scalable CPU architecture that also degrades gracefully under high loads.

However, when monitoring a production site, track CPU utilization over time. Load usually comes in spikes and keeping ahead of those spikes involves a careful assessment of availability capabilities.

Most organizations find that portal sites are “sticky” in nature. This means that site usage grows over time, even when the size of the user community is fixed, as users become more comfortable with the site. When the size of the user community also grows over time a successful portal site can see a substantial growth in the CPU requirements over a short period of time.

When monitoring a portal server’s CPU utilization, determine the average page latency during peak load and how that differs from the average latency.

Expect peak loads to be four to eight times higher than the average load, but over short periods of time.

Identity Server Cache and Sessions

The performance of a portal system is affected to a large extent by the cache hit ratio of the Identity Server cache. This cache is highly tunable, but there is a trade-off between memory used by this cache and the available memory in the rest of the heap.

You can enable the amSSO and amSDKStats logs to monitor the number of active sessions on the server and the efficiency of the Sun Java System Directory Server cache. These logs are located by default in the /var/opt/SUNWam/debug directory. Use the com.iplanet.am.stats.interval parameter to set the logging interval. Do not use a value less than five (5) seconds. Values of 30 to 60 seconds give good output without impacting performance.

The com.iplanet.services.stats.directory parameter specifies the log location, whether to a file or to the console, and also is used to turn off the logs. You must restart the server for changes to take effect. Logs are not created until the system detects activity.


Note

Multiple web container instances write logs to the same file.


The cache hit ratio displayed in the amSDKStats file gives both an internal value and an overall value since the server was started. Once a user logs in, the user’s session information remains in cache indefinitely or until the cache is filled up. When the cache is full, oldest entries are removed first. If the server has not needed to remove a user’s entry, it might be the case that on a subsequent login—days later, for example—the user’s information will be retrieved from the cache. Much better performance occurs with high hit ratios. A hit ratio of a minimum of 80 percent is a good target although (if possible) an even higher ratio is desired.

Thread Usage

Use the web container tools to monitor the number of threads being used to service requests. In general, the number of threads actually used is generally lower than many estimates, especially in production sites where CPU utilization usually is far less than 100 percent.

Portal Usage Information

Portal Server does not include a built-in reporting mechanism to monitor portal usage information by portal users. This includes which channels are accessed, how long they are accessed, and the ability to build a user behavioral pattern of the portal. However, it is relatively simple to build a simple Java™ servlet that would intercept every Portal Server Desktop request, extract the SSO token, save the user access information to a log, then redirect the user to the intended URL. Such a construct would be based on custom attribute extensions to the Identity Server schema.



Previous      Contents      Index      Next     


Copyright 2004 Sun Microsystems, Inc. All rights reserved.