Sun Java System Access Manager 7.1 Performance Tuning and Troubleshooting Guide

Previous: Appendix A Known Issues and Workarounds

Appendix B Error Messages

The following are Access Manager error messages you may encounter in log files for Access Manager, Web Server, Directory Server, Portal Server, or the Policy Agent host.

Error Log for the J2EE Policy Agent Application Server

thread dump from "kill -3" command shows hundreds of waiting threads like:
service-j2ee" daemon prio=10 tid=0x00d59180 nid=0x11e 
waiting for monitor entry//at com.sun.identity.jaxrpc.SOAPClient.encodeMessage//- 
waiting to lock <0x787e3ad0> (a com.sun.identity.jaxrpc.SOAPClient)

Description:

Error message is found in the J2EE Policy Agent web container log.

Cause:

This issue is caused by an unnecessary synchronization in the SOAP client Java class in amclientsdk for encode and send methods. During a load test of URL policy mode with J2EE policy agents, hundreds of threads can be waiting to lock on the com.sun.identity.jaxrpc.SOAPClient.send method.

Solution:

Two related bugs (CR 6302120 and CR 6517760) were fixed in Access Manager 7.0 Patch 5 and Access Manager 7.1 Patch 1.

Web Policy Agent amAgent.log File

Error 30284:bfc093a0 all: Connection::read():
NSPR Error while reading data:-5961

Description:

The Access Manager server is busy responding to all the previous requests from a web policy agent, and cannot respond to this particular request. Then the socket timeout happens on the web policy agent side, and the user will see this error message in the web policy agent amAgent.log.

Cause:

The agent has timed out waiting for a response from the Access Mnager server.

Solution:

Be sure the Access Manager server is properly tuned with values recommended in the amtune script. Also be sure that the web agent HTTP request parameters are properly tuned.

Web Policy Agent amAgent.log File

Error 19516:9088eb0 AM_SSO_SERVICE:SSOTokenService::getSessionInfo(): 
Error 18 for sso token ID             
Error 21907:9088eb0 PolicyEngine: am_policy_evaluate: 
InternalException in Service::initialize() with error message:
Naming query failed during service creation. and code:21"

Description:

These errors occur during stress tests of Web Agent 2.2 for Apache Server 2.0.59 on RedHat Linux 3.0.

Cause:

These errors mean that the SSO token has expired on the server side, but the agent is still sending the expired SSO token. In normal cases, if the web policy agent sees this error, it will redirect to the Access Manager login page. The Access Manager server becomes overwhelmed from all the incoming requests from the web policy agent. Other errors may occur:

Error 30284:bfc093a0 all: Connection::read():  
NSPR Error while reading data:-5961        
Error 30154:bfc093a0 all: Connection::read():        
NSPR Error while reading data:-5961       
Error 30054:bfc093a0 all: Connection::read():
NSPR Error while reading data:"

These errors occur because the web policy agent has timed out waiting for a response from the Access Manager server. During this load test the Access Manager server was so busy responding to all the previous requests, it failed to respond to this particular request. Then the socket timeout happens on the agent side and the user will see this error message.

Solution:

Be sure the Access Manager server is properly tuned with amtune script recommended values. Also be sure that the web agent HTTP request parameters are properly tuned.

Access Manager amclientSDK File or Error Log for the J2EE Policy Agent

ERROR: Send Polling Error:com.iplanet.am.util.ThreadPoolException: 
amSessionPoller thread pookdkdkdkl's task queue is full.

Description:

These errors can occur after you deploy the Distributed Authentication UI web application, J2EE agents, or in any situation where you deploy the Access Manager client SDK on a client machine. There errors are seen in the J2EE Agent web container log or in the amclientSDK container log.

Cause:

The client SDK polling threadpool size and threshhold are not sufficient for the number of incoming sessions.

Solution:

If you have many concurrent sessions, add the following properties and values in either the AMConfig.properties file or the AMAgents.properties file:

com.sun.identity.session.polling.threadpool.size=10
com.sun.identity.session.polling.threadpool.threshold=10000

Access Manager amSession.log File

ERROR: Sending Notification Error:com.iplanet.am.util.ThreadPoolException: 
amSession thread pool's task queue is full.

Description:

These errors can occur when the number of incoming Access Manager sessions is more than the notification threadpool size and threshold can handle.

Cause:

The AMConfig.propertiesdefault values for com.sun.identity.session.notification.threadpool.size and com.sun.identity.session.notification.threadpool.threshold are too low.

Solution:

Increase the value for notification.threadpool.size to be three times the number of CPUs, or cores in case of Niagara boxes, where the Access Manager server is installed. Increase the value for notification.threadpool.threshold or the queue to be 30% of maxSessions in the AMConfig.properties file . If the same error still occurs, then an agent may not processing incoming requests efficiently, or some other bottleneck exists on the client side or on the Access Manager web container.

Access Manager amSession.log File

ERROR: Individual notification to 
http://mycompany.com:7001/agentapp/notificationcom.iplanet.services.
comm.server.SendNotificationException: 
Server returned HTTP response code: 503 for 
URL: http://yourcompany.com:7001/agentapp/notification

Description:

These errors occur during a high load scenario when a bottleneck in the notification queue exists on the port between the Access Manager server and its policy agent machines.

Cause:

The notification attempts to the agent were not successful.

Solution:

There is no solution for this now. But starting with Federated Access Manager 8.0, notification traffic will use JMQ asynchronous publish and subscribe mechanisms with different ports, which will eliminate this kind of bottleneck.

Access Manager amComm.log File

ERROR: Cannot send notification to 
http://mycompany.com:80/amagent/UpdateAgentCacheServlet?shortcircuit=
false java.io.IOException:Server returned HTTP response 
code: 503 for URL: 
http://yourcompany.com:80/amagent/UpdateAgentCacheServlet?shortcircuit=false

Description:

These errors occur during a high load scenario when a bottleneck in the notification queue exists on the port between the Access Manager server and its policy agent machines.

Cause:

The notification attempts to the agent were not successful.

Solution:

Policy Agent amAgent.log File

Info PolicyAgent: am_web_result_attr_map_set(): 
No profile or session or response attributes to be set as headers or cookies 
Debug all: Log::pSetLevelsFromString(): 
setting log level for module 0 to 4, old level 1.

Description:

This error means the session or response attribute is missing from URL string. The Web Server crashes.

Cause:

Insufficient size of the maximum length on the URL (query string length) that can be passed to web policy agent.

Solution:

Upgrade to Web Policy Agent 2.2-HP8.

SAML2 Debug Log

This log is stored in the following location:

/var/opt/SUNWam/fm/federation/debug/fmSAML2

ERROR: Unable to send SOAPMessage to IDP 
com.sun.xml.messaging.saaj.SOAPExceptionImpl: 
java.security.PrivilegedActionException: 
com.sun.xml.messaging.saaj.SOAPExceptionImpl: Unable to internalize message
at com.sun.xml.messaging.saaj.client.p2p.HttpSOAPConnection.call
at com.sun.identity.saml2.common.SAML2Utils.sendSOAPMessage
at com.sun.identity.saml2.profile.LogoutUtil.doSLOBySOAP

Description:

In a high availability and high load scenario, for example when more than one Access Manager server or Federation Manager are behind a load balancer, the SOAP Global logout fails if it redirects to the wrong server.

Cause:

The signature string is not forwarded when redirected to the internal server instance.

Solution:

This bug was fixed in SAML v2 Patch 3. The fix delays the signature validation until the Access Manager or Federation Manager server finds the session in the local server. This way there is little processing involved before the signature verification is done. Update to SAML v2 Patch 3.

SAML2 Debug Log

This log is stored in the following location:

/var/opt/SUNWam/fm/federation/debug/fmSAML2

ERROR: Unable to get infoKeyString from SSOToken.
ERROR: Error sending Logout Request
com.sun.identity.saml2.common.SAML2Exception: 
Error retrieving NameIdInfoKey from SSOToken.
at com.sun.identity.saml2.profile.SPSingleLogout.initiateLogoutRequest
at _jsps._saml2._jsp._spSingleLogoutInit_jsp._jspService

Description:

This error will sometimes occur during a high load scenario even with SAML v2 Patch3. The NameIDInfoKey information is stored in session properties, but sometimes during a high load scenario, the information cannot be retrieved.

Cause:

The session properties do not get refreshed immediately.

Solution:

Refresh the session properties before reading the NameIDInfoKey.

Error Log for Web Server or Application Server

SEC_ERROR_NO_MEMORY: Out of memory

Description:

Web Server or Application Server process crashes. This crash will occur only if SSL is enabled for the Web Server or Application Server.

Cause:

A bug exists in NSPR 4.5. The NSPR threads created in linux use 10240kb as the stack size regardless of the stack size specified during thread creation. The default is 10240 kb per thread stack on the Red Hat Linux platform.

Solution:

Upgrade the NSPR version to 4.6.

Error Log for Web Server or Application Server

Cannot create thread.
amSessionPoller [9] daemon prio=10
tid=0x0985e2e0 nid=0x37 in Object.wait () [0x10519000..0x10519a38
at java.lang.Object.wait (Native Method)

Description:

Web Server or Application Server processes cannot create any more threads for Access Manager sessions.

Cause:

Insufficient JVM heap size or invalid Access Manager session threads are created out of control. This behavior is expected.

Solution:

To increase the JVM heap size, change the domain.xml manually or run the Access Manager amtune-as8 script.

Error Log for Web Server or Application Server

java.IOException:Not enough space
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.forkAndExec
at java.lang.UNIXProcess

Description:

The JVM cannot launch itself while trying to fork a process through the system.

Cause:

Either there are not enough file descriptors, or there is not enough swap space.

Solution:

Do one of the following:

Increase the number of system file descriptors, then reboot the machine. To increase the number of file descriptors, you can run the amtune-os script, manually set them by running the command ulimit -n number_of_file_descriptors.
Increase the swap space by killing unnecessary processes.
Add more swap space using swap command.

Error Log for Web Server or Application Server

Exception in thread "service-j2ee" java.lang.OutOfMemoryError: 
requested 53515 bytes for jbyte in 
/BUILD_AREA/jdk1.5.0_10/hotspot/src/share/vm/prims/jni.cpp. 
Out of swap space?

Description:

The native heap allocation failed and the native heap may be close to exhaustion.

Cause:

A native code leak, for example the C or C++ code, continuously requires memory without releasing it to the operating system. There could be indirect causes like an insufficient amount of swap space or another process that is consuming all memory or leaking it.

Solution:

For further diagnosis of a native code memory leak, see the Java 5.0 Troubleshooting and Disgnostic Guide at http://java.sun.com/j2se/1.5/pdf/jdk50_ts_guide.pdf. In the section “Diagnosing Leaks in Native Code.” See the information about tools for different operating systems. The tools include mdb and dbx (runtime trace) for Solaris 9 U3 or later, mtrace , libnjamd for Linux, and windb or userdump for Windows."

Error Log for Web Server or Application Server

Exception in thread "main" java.lang.OutOfMemoryError: <reason>
<stack trace>(Native method)

Description:

A native method has encountered a memory allocation failure. The difference between this and the previous error message is that the allocation failure here is detected in a JNI or native method rather than VM code.

Cause:

Solution:

For further diagnosis of a native code memory leak, see the Java 5.0 Trouble-Shotting and Disgnostic Guide section “Diagnosing Leaks in Native Code.”

Error Log for Web Server or Application Server

Exception in thread ?main? java.lang.OutOfMemoryError: Java heap space

Description:

An object could not be allocated in the Java heap.

Cause:

The cause is a simple configuration issue. The maximum heap size, noted by -mx options is not sufficient for the load. Or the application may be holding references to objects which cannot be garbage collected. This is a Java equivalent of a memory leak. If the finalize method is used so much that is that the finalizer daemon thread cannot keep up with the finalization queue, then this error can occur when the heap becomes full.

Solution:

Maximum JVM option increase or coding changes.

Error Log for Web Server or Application Server

Exception in thread ?main? java.lang.OutOfMemoryError: PermGen space

Description:

This error occurs when there are a large number of class, method or String objects.

Cause:

The permanent generation is full. The permanent generation is the area of the heap where class and method objects are stored and java.lang.String objects are interned.

Solution:

The JVM option for Perm size may need to be increased.

Error Log for Web Server or Application Server

Exception in thread "main" java.lang.StackOverflowError at java.lang.String.indexOf

Description:

The JVM (java) stack size is not sufficient

Cause:

There can be many types of StackOverflowError errors including a wrong server instance name in the platform list on the Access Manager console, or any one of numerous Java coding issues. But here the only type of StackOverflowError that will be addressed is the one that can occur when you use 64-bit JVM with the -Xss128k option.

Solution:

For 64-bit JVM's, the minimum per thread stack size should be at least 256k, -Xss256k, or even 512k since 64-bit VM's default per thread stack size is 1 mb. 64-bit JVM support was introduced starting with Web Server 6.1 SP5 or later, including Web Server 7.0. Application Server 8.1 and 8.2 do not support 64-bit JVM, but Application Server 9.1 will. Access Manager and its amtune scripts support64-bit JVM, starting with AM 7.0 Patch 5 and 7.1. For more information, see http://java.sun.com/docs/hotspot/threads/threads.html.

Previous: Appendix A Known Issues and Workarounds