Sun Java System Access Manager 7.1 Performance Tuning and Troubleshooting Guide

Appendix A Known Issues and Workarounds

The most common performance problems are identified by the following symptoms:

Memory Grows or Leaks

WS 6.1 bundled with JES 4 fail to start

A memory leak may occur with securities libraries on Windows that were shipped with JES 4 for Windows. This causes the Web Server 6.1 (bundled with JES 4) to fail to start.

Solution:Install a standalone version of Weber Server 6.1 SP5 to force the Web Server to use its own bundled securities libraries from the webserver/bin/https/bin directory. Or you can use the JES 5 installer for installing Web Server 6.1SP5 or later.

System Responds Too Slowly

Application Server where Access Manager is deployed throws "cannot create thread" error

You see an error saying "Cannot create thread" with the following stack trace:


"Access ManagerSessionPoller[9]" daemon prio=10 
tid=0x0985e2e0 nid=0x37 in Object.wait() [0x10519000..0x10519a38]
      at java.lang.Object.wait(Native Method)
      - waiting on <0x2ad92c18> (a java.util.ArrayList)
      at java.lang.Object.wait(Object.java:474)
      at com.iplanet.Access Manager.util.ThreadPool.getTask
          (ThreadPool.java:125)
      - locked <0x2ad92c18> (a java.util.ArrayList)
      at com.iplanet.Access Manager.util.ThreadPool$
          WorkerThread.run(ThreadPool.java:144)"

The problem is due to an insufficient amount of JVM heap size, or invalid Access Manager session threads are created out of control. This behavior is expected and not a deadlock at all.

Solution: To increase the JVM heap size, you can change the domain.xml manually or simply run Access Manager amtune-as8.

Directory Server 5.2 hangs and shows high CPU usage when deleting entries

You may see in the error log stating that "search is not indexed". The Directory Server referential integrity plug-in is automatically enabled by the Access Manager. But no indexes exist for Access Manager's attributes such as iplanet-Access Manager-static-group-dn and iplanet-Access Manager-modifiable-by. Access Manager does not configure the arguments of the plug-in, but uses the default arguments (update interval=0). Every deletion causes an immediate integrity check, which consumes a lot of system resources when the search is not indexed.

Solution: Be sure the referential integrity plug-in is enabled and configured, and that the attributes to be maintained are indexed. Be sure the referential integrity is configured to be executed synchronously or with a delay. A delay will remove the thread shortage per application.

If you observe that one or more of the deleted entries are repeatedly added or deleted, over time these entries may trigger non-indexed searches to the database. This issue is addressed in later versions of Directory Server. Upgrade to Directory Server 5.2 Patch 5 or Directory Server 6.0.

Throughput performance of AM is significantly slower when it is deployed on WebLogic or WebSphere Application Server.

Solution:First tune the JVM heap and GC (garbage collection) options for WebSphere Application Server. See Third-Party Web Containers for more information. Since JVM 1.4.2 or earlier is much slower than JVM 1.5.0 or later in throughput performance, make sure the JVM version used in the container is 1.4.2 or later when you use the CMS and NewParallelGC options. WebLogic 9.0 or later and WebSphere 6.0 or later use JDK 1.5.0 or later.

"OutOfMemoryError" When Access Manager is deployed in WebLogic or WebSphere Application Server

This occurs on Access Manager 7.0 or higher, even with sufficient JVM heap sizes.

Solution:Be sure that in addition to having sufficient JVM heap size, the CMS and NewParallel GC options are specified. Also be sure that -XX:-CMSParallelRemarkEnabled is included. See Third-Party Web Containers for more information.

Server Hangs and Does Not Respond

Access Manager Server hangs during session failover

The problem occurs when both Access Manager server and JMQ (and even BDB) are installed in one machine. Both web server instances hang. A thread dump from the first web server instance shows all its threads are in socketRead operations.


waiting:

- java.net.SocketInputStream.socketRead0
(java.io.FileDescriptor, byte[], int, int, int) @bci=0, 
pc=0xf75e0274, methodOop=0xf33a7aa8 
(Compiled frAccess Managere; information may be imprecise)

A thread dump of the second web server instance shows the corresponding writePacketNoAck calls from jmsclient:


 
-com.sun.messaging.jmq.jmsclient.ProtocolHandler.writePacketNoAck
(com.sun.messaging.jmq.io.ReadWritePacket) @bci=7, line=235, 
pc=0xf79a56b4, methodOop=0xf3650320 (Compiled frAccess Managere; 
information may be imprecise)
-com.sun.messaging.jmq.jmsclient.ProtocolHandler.writeJMSMessage
(javax.jms.Message) @bci=565, line=1567, pc=0xf76bc190, 
methodOop=0xf36533c8 (Compiled frAccess Managere)
-com.sun.messaging.jmq.jmsclient.WriteChannel.sendWithFlowControl
(javax.jms.Message) @bci=10, line=123, pc=0xf7825278, 
methodOop=0xf3689e48 (Compiled frAccess Managere)
-com.sun.messaging.jmq.jmsclient.TopicPublisherImpl.publish
(javax.jms.Message) @bci=2, line=73, pc=0xf782ece0, 
methodOop=0xf36b2400 (Compiled frAccess Managere)
-com.iplanet.dpro.session.jmqdb.JMQSessionRepository.save
(com.iplanet.dpro.session.service.InternalSession) @bci=92, 
line=346, pc=0xf7775008, methodOop=0xf3604770 (Compiled frAccess Managere)
-com.iplanet.dpro.session.service.SessionService.saveForFailover
(com.iplanet.dpro.session.service.InternalSession) @bci=26, 
line=2485, pc=0xf7005c34, methodOop=0xf35da5e8 (Interpreted frAccess Managere)
- com.iplanet.dpro.session.service.InternalSession.updateForFailover() 
@bci=46, line=969, pc=0xf74a7c48, methodOop=0xf36c0420 
(Compiled frAccess Managere)

Under a heavy load, the Access Manager server web container process will use most of the machine's CPU resources. Then JMQ and/or BDB with Access Manager sessiondb will not have sufficient CPU resources to process incoming requests. The first Access Manager server instance's threads carrying requests cannot write to the second Access Manager server instance with JMQ in the back because of the lack of CPU resources. Also the first Access Manager server instance will have its threads built up because of the backlog on the second instance due to the lack of processing on the part of JMQ and/or BDB for updating the session table.

Solution: Install JMQ and BDB on their own boxes, separate from Access Manager server machine.

Server hangs when processing request between the load balancer and the Access Manager server.

The problem occurs when using two Access Manager 7.0 SP5 servers with a load balancer in front. You may see a stack trace such as this:


" at sun.net.www.protocol.http.HttpURLConnection.getInputStream
(HttpURLConnection.java:961)
   - locked <0xf0898b88> (a sun.net.www.protocol.http.HttpURLConnection)
*     *at com.iplanet.services.comm.client.PLLClient.send(PLLClient.java:196)
   at com.iplanet.services.comm.client.PLLClient.send(PLLClient.java:115)"

Solution: The problem might be a result of having the server instances separated by a firewall. If this is your environment, move the server instances behind the same firewall.

The problem could be due to a misconfiguration in the Platform Server List. The stack trace shown above occurs when the Platform Server List is missing its associated site ID's and server instances are denoted by virtual host names. Here is an example of a misconfigured Platform Server List:


site list : http://hostname:80|11
server list : http://hostname1:80|01
           		http://hostname2:80|02

Configure your Platform Server List to include the server identifiers. In the following example, 11 is the server identifier.


site list : http://hostname:80|11
server list : http://hostname1:80|01|11
           		http://hostname2:80|02|12

Access Manager server hangs when Sun Java System Directory Server restarts

Connections between Access Manager server and Directory Server seem to close unexpectedly due to unsynchronized access to a shared variable. This is a known problem that is fixed in LDAP JDK 4.19

Solution: Upgrade to LDAP JDK 4.19. Download Patch 119725–04–1 from the following URL: http://sunsolve.sun.com/search/document.do?assetkey=1-21-119725-04-1 The patch will work for Solaris 9 and 10. The same patch is used for both SPARC and x86 platforms.

Access Manager unable to recover after a crash or watch dog restart under heavy load

This problem occurs when using an LDAP v3 plug-in because the plug-in is being initialized more than once.

Solution: This known problem was fixed in both Access Manager 7.0 Patch 6 and Access Manager 7.1 Patch 1. Upgrade to one of these Access Manager versions.

jaxrpc getAttributes throws SSOException

A 403 error occurs. The problem occurs when session expiration notifications come between the SSO validation call and the call to fetch profile attributes on the agent side. The SSO validation is successful, but the profile attribute fetch fails with an SSOException from the server. This is the expected behavior. However, the SOAP client is not processing this exception properly, and is re-constructing. The client side calls wrap up this exception with IdRepoException. As a result the agent is not notified about the SSOException.

Solution:A fix was made in amclientsdk. This known problem was fixed in both Access Manager 7.0 Patch 6 and Access Manager 7.1 Patch 1. Upgrade to one of these Access Manager versions.

Sun Java System Web Server hangs while handling a large number of images files

The web container that hosts Access Manager hangs while handling a large number of image files. The following errors are display in the Web Server errors.log


IO error:
           "java.io.IOException: WEB8004: Error flushing the output stream
            java.io.IOException: WEB8001: Write failed "

These entries indicate that Access Manager server was not involved in the hang of the web server but instead the root cause was due to the an IO error.

Access Manager Web Policy Agent hangs

When a web policy agent hangs, it is usually due to misconfiguration of the web container where the agent is installed, or misconfiguration of the Access Manager web container on another host system. An Access Manager server may be running out of some resources due to, for example, a runaway number of invalid sessions.

Solution: Set the Web Policy Agent debug logging level to the finest level, all:5. Examine the logs to determine the exact cause.

Access Manager server hangs when multiple clients point to one Access Manager server instance

When multiple clients point to an Access Manager server instance, the session polling mechanism used prior to Access Manager 7.1, can cause the Access Manager server to hang. The older polling mechanism was based on caching time.

Solution: Upgrade to Access Manager 7.1. In the 7.1 version, the session polling mode is configurable. You can use on either caching or idle time mode. By default it is based on the idle time.

System hangs when Access Manager clientsdk.jar and Access Manager server are in the same JVM instance

The problem occurs when both a client application with Access Manager clientsdk.jar and Access Manager server are in the Access Manager JVM instance. When an JavaEE application tries to access IdRepo attributes for any identity, then Access Manager server can hang. The problem is due to the unnecessary synchronization block in SMS ServiceConfigImpl, OrganizationConfigManagerImpl and ServiceConfigManagerImpl in Access Manager SDK. This known issue was address in Access Manager 7.1.

Solution: Upgrade to Access Manager 7.1

Server Crashes

Access Manager web container crashes with "StackOverFlowError" errors

The problem is known to occur when Access Manager 7.0 is deployed a web server, and the "-Xss128k" JVM option is used with 64-bit JVM on the web container. The problem can occur with any java application. For 64-bit JVM's, the minimum per thread stack size should be 256k, -Xss256k, or even 512k since 64-bit VM's default per thread stack size is 1 mb.

Solution: For 64-bit JVMs, the minimum per thread stack size should be 512k since the 64-bit JVM default per thread stack size is 1 mb. The 64-bit JVM support was introduced starting with Web Server 6.1 SP5. Application Server 8.1 and 8.2 do not support 64-bit JVM, but Application Server 9.1 will. Access Manager and its amtune scripts support 64-bit JVM starting with AM 7.0 Patch 5. For more information, see http://java.sun.com/docs/hotspot/threads/threads.html

Apache Web Agent 2.2 on Linux crashes

Apache Web Server crashes when Web Agent is deployed.

Solution: There is no solution at this time. Running the Apache Server in multi-process mode (MPM) of compilation and runtime modes are not supported by our Apache Agent in Access Manager.

Access Manager crashes in SSL mode

The problem occurs When Access Manager server is configured in SSL mode and there is outbound SSL traffic from Access Manager server to another Access Manager server or Directory Server. In some rare situations where SSL socket calls don't get closed and queue up, the Access Manager server can crash if it is configured in the default NSS/JSS mode of SSL.

Solution: Upgrade to JSS 4.2.5.

Customized JSP page causes Web Server to crash

Web Server crashes when a customized JSP page for Access Manager is deployed in Sun Java System Web Server. The problem occurs when using a Web Server version prior to 6.1 SP8. The problem is due to a known problem in Sun Web Server that is unrelated to Access Manager server. If the JSP has calls to HttpServlet.getScheme or HttpServlet.service, the Web Server can crash.

Solution: Upgrade to Web Server 6.1 SP8.

Application Server or Web Server crashes under a heavy load

The problem is known to occur when using Sun Fire T1000/T2000 hardware (cool threads, Niagara boxes) to deploy Access Manager server with Sun Java System Application Server or Sun Java System Web Server.

Solution: Set the appropriate memory library the Application Server asenv.conf file or in the Web Server start script. For Application Server, in the asenv.conf file, replace LD_PRELOAD=/usr/lib/libmtmalloc.so with LD_PRELOAD=/usr/lib/libumem.so. For more information, see http://www.sun.com/servers/coolthreads/tnb/applications_sunone.jsp.

For Web Server in the start script, replace LIBMTMALLOC=/usr/lib/libmtmalloc.so with LIBMTMALLOC=/usr/lib/libumem.so..

Use the latest JDK 1.5 version, at least 1.5.0_08 or later when Sun Fire T1000/T2000 boxes are used for Access Manager server deployments.