The problem occurs when both Access Manager server and JMQ (and even BDB) are installed in one machine. Both web server instances hang. A thread dump from the first web server instance shows all its threads are in socketRead operations.
waiting: - java.net.SocketInputStream.socketRead0 (java.io.FileDescriptor, byte, int, int, int) @bci=0, pc=0xf75e0274, methodOop=0xf33a7aa8 (Compiled frAccess Managere; information may be imprecise)
A thread dump of the second web server instance shows the corresponding writePacketNoAck calls from jmsclient:
-com.sun.messaging.jmq.jmsclient.ProtocolHandler.writePacketNoAck (com.sun.messaging.jmq.io.ReadWritePacket) @bci=7, line=235, pc=0xf79a56b4, methodOop=0xf3650320 (Compiled frAccess Managere; information may be imprecise) -com.sun.messaging.jmq.jmsclient.ProtocolHandler.writeJMSMessage (javax.jms.Message) @bci=565, line=1567, pc=0xf76bc190, methodOop=0xf36533c8 (Compiled frAccess Managere) -com.sun.messaging.jmq.jmsclient.WriteChannel.sendWithFlowControl (javax.jms.Message) @bci=10, line=123, pc=0xf7825278, methodOop=0xf3689e48 (Compiled frAccess Managere) -com.sun.messaging.jmq.jmsclient.TopicPublisherImpl.publish (javax.jms.Message) @bci=2, line=73, pc=0xf782ece0, methodOop=0xf36b2400 (Compiled frAccess Managere) -com.iplanet.dpro.session.jmqdb.JMQSessionRepository.save (com.iplanet.dpro.session.service.InternalSession) @bci=92, line=346, pc=0xf7775008, methodOop=0xf3604770 (Compiled frAccess Managere) -com.iplanet.dpro.session.service.SessionService.saveForFailover (com.iplanet.dpro.session.service.InternalSession) @bci=26, line=2485, pc=0xf7005c34, methodOop=0xf35da5e8 (Interpreted frAccess Managere) - com.iplanet.dpro.session.service.InternalSession.updateForFailover() @bci=46, line=969, pc=0xf74a7c48, methodOop=0xf36c0420 (Compiled frAccess Managere)
Under a heavy load, the Access Manager server web container process will use most of the machine's CPU resources. Then JMQ and/or BDB with Access Manager sessiondb will not have sufficient CPU resources to process incoming requests. The first Access Manager server instance's threads carrying requests cannot write to the second Access Manager server instance with JMQ in the back because of the lack of CPU resources. Also the first Access Manager server instance will have its threads built up because of the backlog on the second instance due to the lack of processing on the part of JMQ and/or BDB for updating the session table.
Solution: Install JMQ and BDB on their own boxes, separate from Access Manager server machine.
The problem occurs when using two Access Manager 7.0 SP5 servers with a load balancer in front. You may see a stack trace such as this:
" at sun.net.www.protocol.http.HttpURLConnection.getInputStream (HttpURLConnection.java:961) - locked <0xf0898b88> (a sun.net.www.protocol.http.HttpURLConnection) * *at com.iplanet.services.comm.client.PLLClient.send(PLLClient.java:196) at com.iplanet.services.comm.client.PLLClient.send(PLLClient.java:115)"
Solution: The problem might be a result of having the server instances separated by a firewall. If this is your environment, move the server instances behind the same firewall.
The problem could be due to a misconfiguration in the Platform Server List. The stack trace shown above occurs when the Platform Server List is missing its associated site ID's and server instances are denoted by virtual host names. Here is an example of a misconfigured Platform Server List:
site list : http://hostname:80|11 server list : http://hostname1:80|01 http://hostname2:80|02
Configure your Platform Server List to include the server identifiers. In the following example, 11 is the server identifier.
site list : http://hostname:80|11 server list : http://hostname1:80|01|11 http://hostname2:80|02|12
Connections between Access Manager server and Directory Server seem to close unexpectedly due to unsynchronized access to a shared variable. This is a known problem that is fixed in LDAP JDK 4.19
Solution: Upgrade to LDAP JDK 4.19. Download Patch 119725–04–1 from the following URL: http://sunsolve.sun.com/search/document.do?assetkey=1-21-119725-04-1 The patch will work for Solaris 9 and 10. The same patch is used for both SPARC and x86 platforms.
This problem occurs when using an LDAP v3 plug-in because the plug-in is being initialized more than once.
Solution: This known problem was fixed in both Access Manager 7.0 Patch 6 and Access Manager 7.1 Patch 1. Upgrade to one of these Access Manager versions.
A 403 error occurs. The problem occurs when session expiration notifications come between the SSO validation call and the call to fetch profile attributes on the agent side. The SSO validation is successful, but the profile attribute fetch fails with an SSOException from the server. This is the expected behavior. However, the SOAP client is not processing this exception properly, and is re-constructing. The client side calls wrap up this exception with IdRepoException. As a result the agent is not notified about the SSOException.
Solution:A fix was made in amclientsdk. This known problem was fixed in both Access Manager 7.0 Patch 6 and Access Manager 7.1 Patch 1. Upgrade to one of these Access Manager versions.
The web container that hosts Access Manager hangs while handling a large number of image files. The following errors are display in the Web Server errors.log
IO error: "java.io.IOException: WEB8004: Error flushing the output stream java.io.IOException: WEB8001: Write failed "
These entries indicate that Access Manager server was not involved in the hang of the web server but instead the root cause was due to the an IO error.
When a web policy agent hangs, it is usually due to misconfiguration of the web container where the agent is installed, or misconfiguration of the Access Manager web container on another host system. An Access Manager server may be running out of some resources due to, for example, a runaway number of invalid sessions.
Solution: Set the Web Policy Agent debug logging level to the finest level, all:5. Examine the logs to determine the exact cause.
When multiple clients point to an Access Manager server instance, the session polling mechanism used prior to Access Manager 7.1, can cause the Access Manager server to hang. The older polling mechanism was based on caching time.
Solution: Upgrade to Access Manager 7.1. In the 7.1 version, the session polling mode is configurable. You can use on either caching or idle time mode. By default it is based on the idle time.
The problem occurs when both a client application with Access Manager clientsdk.jar and Access Manager server are in the Access Manager JVM instance. When an JavaEE application tries to access IdRepo attributes for any identity, then Access Manager server can hang. The problem is due to the unnecessary synchronization block in SMS ServiceConfigImpl, OrganizationConfigManagerImpl and ServiceConfigManagerImpl in Access Manager SDK. This known issue was address in Access Manager 7.1.
Solution: Upgrade to Access Manager 7.1