Runtime Problems

This chapter addresses problems that you may encounter while running the Sun™ Open Net Environment (ONE) Application Server 7 product.

Runtime Logs

Refer to the logs and information in "Server Logs" for information on using the logs to troubleshooting runtime problems.

Load Balancer / Web Server won’t start

This problem occurs when two instances have the same listener value—for example, if instance foo has listener value bar:80 and instance spam has listener value bar:80.

04/Sep/2003:13:01:08] warning ( 2938): reports: lb.runtime: RNTM2029: DaemonMonitor :http://hostname:81 : could be because of connection saturation

[04/Sep/2003:13:01:08] failure ( 2938): ServerInstance.cpp@265: reports: lb.runtime:RNTM3002 : Failed to add listener multiple times: <instance name>

[04/Sep/2003:13:01:08] failure ( 2938): FailoverGroup.cpp@102: reports:lb.failovermanager: FGRP1002: Instance <instance name> could not be added to theFailoverGroup: cluster1

[04/Sep/2003:13:01:08] failure ( 2938): LBConfigurator.cpp@209: reports:lb.cofigurator: CNFG1007 :ServerInstance <instance name> could not be added onFailoverGroup cluster1

[04/Sep/2003:13:01:08] failure ( 2938): lbplugin.cpp@168: reports: lb.runtime:RNTM3004 : Failed to initialise load balancing subsystem

Solution

Instance goes unused after restarting

An instance was down, and is now back up, but the access log shows that is not getting any requests.

This situation occurs when the Load Balancer has not been configured to regularly check the health of instances. When the instance was down, the Load Balancer marked it as unhealthy, and recorded that fact in the load balancer log. But now that the instance is up and running again, the Load Balancer doesn’t know that its health has been restored.

Solution

Configure the Load Balancer to check the health of instance regularly by adding the health checker URL to loadbalancer.xml with a line like this:

<health-checker url="/pathToHealthChecker"
interval-in-seconds="10" timeout-in-seconds="30" />

Can’t access a web page.

This means that the web page you are attempting to access is not available at the location you have specified. The most likely causes are a change in the location, or an error in the URL you have specified.

Solution

Try again.

If you still receive this error, verify that you have entered the location correctly.

If you have entered the URL correctly, verify that the location has not changed or been deleted. You will have to contact the page owner to verify this.

Can’t access an application.

There are a number of possible reasons that you cannot access an application. A typical error message in this case is

Is the application deployed?

The most likely cause is that the application is not deployed. When an application is deployed to a cluster, an entry for it appears in the web server plug-in’s loadbalancer.xml file. If an application has been successfully deployed to a cluster, the following snippet shows how the loadbalancer.xml file should look:

The cladmin command is used to deploy an application to all instances in your cluster. Refer to the Sun ONE Application Server Administrator’s Guide for deployment instructions. Otherwise, refer to the Sun ONE Application Server Developer’s Guide for non-cluster deployment guidelines.

Solution

Is your loadbalancer.xml file correct?

Check the web server log files to verify that the load balancer started. If it hasn’t, there may be errors about the loadbalancer.xml file written in the to the error log.

Is the web server running?

Has the correct port been specified for the web server?

Determine the correct web server port number and verify that the correct port has been specified. Refer to "Is the Admin Server running at the expected port?" for guidelines on determining the port number.

Server responds slowly after being idle

If the server takes a while to service a request after a long period of idleness, consider the following:

Does the log contain “Lost connection” messages?

Solution: Change the timeout value

The default HADB connection timeout value is 1800 seconds. If the application server does not send any request over a JDBC connection during this period, HADB closes the connection, and the application server needs to re-establish it. To change the timeout value, use the hadbm set SessionTimeout= command.

Important Note:
Make sure the HADB connection time out is greater than the JDBC connection pool time out. If the JDBC connection time out is more than the HADB connection time out, the connection will be closed from the HADB side, but it will be there in appserver connection pool. So when the application tries to use the connection, the application server will have to re-create the connection, which incurs significant over head

My application suddenly went away.

Is the application you are using being quiesced by the load balancer?

When an application is being quiesced, you may experience loss of service when the application is disabled, until the application is re-enabled.

Requests are not succeeding.

Is the load balancer timeout correct?

When configuring the response-timeout-in-seconds property in the loadbalancer.xml file, you must take into account the maximum timeouts for all the applications that are running. If the response timeout it is set to a very low value, numerous in-flight requests will fail because the load balancer will not wait long enough for the Application Server to respond to the request.

On the other hand, setting the response timeout to an inordinately large value will result in requests being queued to an instance that has stopped responding, resulting in numerous failed requests.

Solution

Set the response-timeout-in-seconds value to the maximum response time of all the applications.

Are the system clocks synchronized?

When a session is stored in HADB, it includes some time information, including the last time the session was accessed and the last time it was modified. If the clocks are not synchronized, then when an instance fails and another instance takes over (on another machine), that instance may think the session was expired when it was not, or worse yet, that the session was last accessed in the future!

Solution

Have you enabled the instances of the cluster?

Even if you start an application server instance and define it to be a part of the cluster, the instance will not receive requests from the load balancer until you enable the instance. Enabling makes the instance and active part of the cluster. The correct sequence of events for activating and deactivating an instance is:

Start the Application Server.

Create an Application Server instance.

Enable the instance

Disable the instance.

Stop the instance.

Start the instance (only if it has been stopped).

Enable the instance.

Is the AppServer communicating with HADB?

HADB may be created & running, but if the persistence store has not yet been created, created the Application Server won’t be able to communicate with the HADB. This situation is accompanied by the following message:

WARNING (7715): ConnectionUtilgetConnectionsFromPool failed using connection URL: <connection URL>

Solution

asadmin create-session-store --storeurl connection URL --storeuser haadmin --storepassword hapasswd --dbsystempassword super123

Server log: app persistence-type = memory

The server.log shows that the J2EE application is using memory persistence instead of High Availability, with a message like this:

Enabling no persistence for web module [Application.war]'s sessions: persistence-type = [memory]

This situation occurs when the application server has not been configured to use HA.

Solution

asadmin set --user admin --password netscape --host localhost
--port 4848 serverName.availability-service.availabilityEnabled=true

Dynamic reconfiguration failed.

The load balancer plug-in detects changes to its configuration by examining the time stamp of the loadbalancer.xml file. If a change has been made to the loadbalancer.xml file, the load balancer automatically reconfigures itself. The load balancer ensures that the modified configuration data is compliant with the DTD before overwriting the existing configuration.

If changes to the loadbalancer.xml file are not in the correct format, as specified by the sun-loadbalancer_1_0.dtd file, the reconfiguration fails and a failure notice is printed in the web server's error log files. The load balancer continues to use the old configuration in memory.



Note	If the load balancer encounters a hard disk read error while attempting to reconfigure itself, it uses the configuration that is currently in memory, and a warning message is logged to the web server's error log file.

Solution

Edit the loadbalancer.xml file as needed until it follows the correct format as specified in the DTD file.

Session Persistence Problems

The create-session-store command failed.

Are the HADB and the application server instance on different sides of a firewall?

The asadmin create-session-store command cannot run across firewalls. Therefore, for the create-session-store command to work, the application server instance and the HADB must be on the same side of a firewall.

The create-session-store command communicates with the HADB and not with the application server instance.

Solution

Locate the HADB and the application server instance on the same side of a firewall.

Configuring instance-level session persistence didn’t work.

The application-level session persistence configuration always takes precedence over instance-level session persistence configuration. Even if you change the instance-level session persistence configuration after an application has been deployed, the settings for the application still override the settings for the application server instance.

Session data seems to be corrupted.

Session data may be corrupted if the system log reports errors under the following circumstances:

During session persistence

When the session state is read during session activation

When the session state is read after session failover

If you determine that the data has been corrupted, there are three possible solution.

Solution

Use the asadmin clear-session-store command to clear the session store.

If clearing the session store doesn’t work, reinitialize the data space on all the nodes and clear the data in the HADB using the hadbm clear command.

If clearing the HADB doesn’t work, delete and then recreate the database.

HTTP session failover is not working.

The Sun ONE Application Server 7, Enterprise Edition includes the high-availability database (HADB) for storing session data. The HADB is not a general-purpose database but instead is an HttpSession store.

Are the system clocks synchronized?

For HTTP session failover to work, the clocks of all the computers on which the application server instances in a cluster reside must be synchronized. (For more detail, see "Are the system clocks synchronized?".)

Solution

Do all objects bound to a distributed session implement the java.io.Serializable interface?

If an object does not implement the java.io.Serializable interface, it will not be persisted. No errors or warnings are produced, because the lack of persistence may well be the desired behavior. The remaining session objects are successfully persisted and will fail over.



Note	If an object which is not serializable implements java.io.Serializable, an exception will be thrown in the log. In this case, the entire user session is not persisted. A failover will produce an empty session.

Solution

Make sure that every class in the session that is supposed to persist implements java.io.Serializable, as in this example:

Is session information stored in HttpSession?

Sun ONE Application Server 7, Enterprise Edition does not support session failover if session data is stored in a stateful session bean.

Is your web application distributable?

For a web application to be highly available, it should be distributable. An application is non-distributable if the webapp element of the web.xml file does not contain a distributable subelement.

For additional information, refer to the Sun ONE Application Server Developer’s Guide to Web Applications.

Solution

Verify that the webapp element of the web.xml file contains a distributable subelement.

Is the persistence type set to ha?

The persistence type must be set to ha for session failover to work. When you run the clsetup command, the persistence type is set to ha by default. If you do not use the clsetup command to set up your initial cluster, the persistence type is specified as memory, the default. The memory type offers no session persistence upon failover, while the failover capabilities offered by the file persistence type are intended for use only in development systems where session failover capabilities are not strictly required.

Instructions for setting the persistence type are contained in the Session Persistence chapter of the Sun ONE Application Server Administrator’s Guide.

Solution

Verify that the persistence type is set to ha. If it isn’t, modify either the entire instance or your particular application to use the ha persistence type. For details, see the Session Persistence chapter of the Administration Guide.

An object is cloned instead of shared

When using modified attribute persistence scope, and a session fails over or is activated after being passivated, an object that is shared between two attributes comes back as two separate copies of the object instead of as a single shared object.

The situation occurs because you can not have one object referred to by two separate attributes when using modified attribute. The application server serializes and persists each attribute separately, so the shared object gets serialized twice, once for each attribute. When the objects get deserialized, they are now two separate objects.

Has a session store been created?

HTTP session failover will not work until a session store has been created using the asadmin create-session-store command.

Instructions for creating a session store are contained in the Session Persistence chapter of the Sun ONE Application Server Administrator’s Guide

Are all the machines in the cluster homogenous?

All Application Server instances in a cluster must have the same applications deployed to them. For these applications to take part in failover, they must have a consistent session persistence configuration and point to the same session store.

Any new instance that you add to a cluster must have the same version and same patch level as all existing instances in a cluster.

Has high availability been enabled?

HTTP session failover will not work until high availability has been enabled using the availability-enabled attribute.

Solution

Out of Memory and Stack Overflow errors.

When using the Sun ONE Application Server to deploy web applications, out of memory errors and stack overflow errors are occurring., even though the volume of data is not large.

Solution 1

Java VM runs out of memory because the web applications are quite demanding in terms of creating Java objects.

This can usually be solved by setting the VM Xms/Xmx and Xss parameters. If you have default values similar to these:

-Xmx256m - for initial app. server install
-Xss512k - default for 1.4.0 server VM

Solution 2

For instance, the application creates lots of references to Java objects and they are being cross referenced through the lifecycle of this application preventing the garbage collector from doing its job. With each additional user using the application, it consumes more and more heap space until it runs out.

There is nothing you can do here except for changing the application. You are most likely dealing with an application memory leak which can be traced using a profiling tool such as OptimizeIt.

Solution 3

The other thing you can try is to configure your Application Server with JDK 1.4.2 instead the default 1.4.0_02 which has better memory management and more aggressive garbage collection.

HADB Transaction Failures

Transaction failures are generally caused by a shortage of system resources. The causes of the failures will generally be found in the application server log.

Is there a shortage of HADB data devices space?

One possible reason for transaction failure is running out of data device space. If this situation occurs, HADB will write warnings to the history file, and abort the transaction which tried to insert and/or update data.

The general rule of thumb is that the data devices must have room for at least four times the volume of the user data. Please refer to the Tuning Guide for an explanation.

Solution 1

This solution requires that there is space available on the physical disk which is used for the HADB data device.

Once all nodes have re-started, the database will start accepting insert and update requests.

Solution 2

Stop and clear the HADB, and create a new instance with more nodes and/or larger data devices and/or several data devices per node, see the Administrator's guide.

Is there a shortage of other HADB resources?

If an HADB node runs out of resources it will delay and/or abort transactions. Resource usage information is shipped between mirror nodes, so that a node can delay or abort an operation which is likely to fail on its mirror node.

Transactions that are delayed repeatedly may time out, and return an error message to the client. If they do not time out, the situation will be visible to the client only as decreased performance in the period where the system is short on resources.

Have history files grown too large?

As history files grow, they consume more and more disk space. Using the hadbm clearhistory command recovers that space, with the option of saving those files to a specified directory. For more information, consult the Administrator's guide.

Other Transaction Problems

Can’t set TRANSACTION_SERIALIZABLE level for a bean.

Since the Release Notes say that check-modified-at-commit flag is not implemented for this release, is there an equivalent of the Weblogic <transaction-isolation> element in Sun ONE Application Server?

Solution

First identify which resource is being used in that bean, then add the following attribute to the jdbc-connection-pool in the server.xml file:

This is an optional element and does not exist in the server.xml file by default. You must explicitly add it.

You can do this either using the Administration interface or using the command-line interface to modify the server.xml file, then running the asadmin reconfig command.

Sporadic failures during high loads.

User requests do not succeed.

The database gives multiple timeout and 'transaction aborted' messages.

The history files have 'HIGH LOAD' warnings

Note:
Frequently, these problems can be solved by making more CPU horsepower available, as described in "Solution 2: Improve CPU Utilization". (For even more information, consult the Tuning Guide.)

No space left in Tuple Log

The user operations (delete, insert, update) are logged in the tuple log and executed. There tuple log may fill up because:

Execution slows due to CPU or disk I/O contention

The mirror node is slow in receiving the log records ('log throw due to..'

Network contention, so the log records don’t reach the mirror node

CPU contention at the mirror node which keeps it from processing the received log records quickly enough.

Solution 1

Solution 2

If CPU utilization is not a problem, and there is sufficient memory, increase the LogBufferSize using the hadbm set LogbufferSize= command.

Solution 3

Node-internal log gets full

Too many node-internal operations are scheduled but not processed due to CPU or disk I/O problems.

Solution 1

Solution 2

If CPU utilization is not a problem, and there is sufficient memory, increase he InternalLogbufferSize using the hadbm set InternalLogbufferSize= command.

Not enough locks

Error code 2080 or 2096 delivered to the client.

HIGH LOAD log messages thrown due to locks.

hadbm resourceinfo --locks shows locks allocated, and all are in use., all the time

Solution 1: Increase the number of locks

Solution 2: Improve CPU Utilization

Available CPU cycles and I/O capacity can impose severe restrictions on performance. Resolving and preventing such issues is necessary to optimize system performance (in addition to configuring the HADB optimally.)

If there are more CPUs on the host that are not exploited, add new nodes to the same host. Otherwise add new machines and add new nodes on them.

If the machine has enough memory, increase the DataBufferPoolSize, and increase other internal buffers that may are putting warnings into the log files, as described in "Sporadic failures during high loads.". Otherwise, add new machines and add new nodes on them.

For much more information on this subject, consult the Performance and Tuning Guide.

Frequent “High Load” Warnings

In some cases HADB will write informational or warning “HIGH LOAD” messages to the history file. If such messages are frequently encountered in the history file, the database administrator should take certain steps to improve the availability of resources. In most situations, reducing load or improving host performance will increase the availability of resources. This may be accomplished by:

Running the nodes on hosts with better hardware characteristics
(more internal memory, higher processor speed, more processors).

Adding physical disks and use several data devices, not more than one device on each physical disk.

Adding more nodes, on new hosts, and refragmenting the data to utilize the new nodes.

Changing configuration variables in order to allocate larger memory segments or internal data structures.

The following resources can be adjusted to improve “HIGH LOAD” problems, as described in the Tuning Guide:

Client cannot connect to HADB.

11626 is an HADB error code, 'Error in IPC operations', which means that some Inter Process Communication operation failed.

'iostat = 28' means (on solaris) that the operating system set errno to ENOSPC, which again translates to 'No space left on device'.

The most likely explanation is that a semget() call failed (see the unix man pages). If HADB started successfully, and you get this message at runtime, it means that the host computer has too few semaphore “undo structures”. See the “Preparing for HADB Setup” chapter in the Installation guide for how to configure semmnu in /etc/system.

Solution 1

Stop the affected HADB node, reconfigure and reboot the affected host, re-start the HADB node. HADB will be available during the process.

Solution 2

Stop the HADB database, reconfigure and reboot the affected host, re-start the HADB database. HADB will be unavailable during the process.

Connection Queue Problems

Full connection queue closes socket

This problem typically occurs in “high load” scenarios. The server.log file shows the error: SEVERE (17357): Connection queue full, closing socket.

Solution: Increase configuration values

Increase the value for ConnQueueSize and MaxKeepAliveConnections in file magnus.conf under the config directory of your Sun ONE WebServer, for example:

Connection Pool Problems

Single sign-on requires larger connection pool.

When single sign-on (or session persistence) requires connections and the wait time is exceeded, the following error occurs:

The Sun ONE Application Server uses the same connection pool for both HADB session persistence and single sign-on. Single sign-on is enabled by default. If an application requires single sign-on functionality, the connection pool setting must be doubled.



Tip	If your application does not require single sign-on functionality, disabling it can improve performance. To disable single sign-on, change the following settings in the server.xml file: <virtual-server id="server1" ... > ... <property name="sso-enabled" value="false"/> ... </virtual-server>

Solution

For example, if an application indicates that 16 is the optimal number of connections to a single HADB node, the number of connections should be doubled to 32 if single sign-on functionality is required. In this case, the JDBC connection pool settings look like this:

<jdbc-connection-pool steady-pool-size="32" max-pool-size="32" max-wait-time-in-millis="60000" pool-resize-quantity="2" idle-timeout-in-seconds="10000" is-isolation-level-guaranteed="true" is-connection-validation-required="true" connection-validation-method="auto-commit" fail-all-connections="false" datasource classname="com.sun.hadb.jdbc.ds.HadbDataSource" name="CluJDBC" transaction-isolation-level="repeatable-read">

You should also double the loadbalancer.xml file setting for response-timeout-in-seconds from 60 seconds to 12 0 seconds.

Max Response Time for any activity + (<jdbc-connection-pool max-wait-time-in-millis="90000")

Server: Unable to obtain connection from pool

The application server is having trouble connecting with HADB, as evidenced by a message like the following in the server.log file:

ConnectionUtilgetConnectionsFromPool failed using connection URL: null Unable to obtain connection from pool

Solution

Make sure HADB is running. Make sure that session-store, JDBC connection pool, and JNDI name (jdbc/hastore) are created. Configure the session persistence for High Availability with a command like the following:

asadmin configure-session-persistence --user admin
  --password netscape --host localhost --port 4848
  --type ha --frequency web-method --scope session
  --store jdbc/hastore server1

JDBC connection is not available.

Is the max-pool-size setting adequate?

During server start/restart, the ejb-container steady pool for deployed enterprise beans will be created. Since the default steady-pool-size is 32, 32 enterprise beans will be created for each bean unless a different value is specified in the sun-ejb-jar.xml file.

The setEntityContext method will be called for each of the beans created. If more than one bean is grabbing JDBC connections in the setEntityContext method from the same JDBC connection pool, the following happens:

During the steady pool creation, all the JDBC connections from the JDBC connection pool will be used (since default max-pool-size = 32)

No connections will be left for any other beans that are created.

If the newly-created beans attempt to grab a JDBC connection from the same pool in their setEntityContext method, an exception is thrown with the following message:

No available resource . Wait-time expired.

Solution

Increase the max-pool-size of the default jdbc-connection-pool to a higher value, such as 256.

Exception occurs while calling DataSource.getConnection().

This exception occurs when an invalid DataSource class property is registered within the JDBC connection pool. Misspelling is a common cause. For instance, while creating a jdbc-connection-pool for Oracle, one might specify the following:

This will result in the following exception since OracleURL is not a property of any Oracle datasource:

Solution

Verify that the datasource classname specified is for the required vendor datasource class.

Exception occurs while executing against MSSQL.

The following exception occurs while executing a statement against the MSSQL server using a non-XA driver:

java.sql.SQLException: [DataDirect] [SQLServer JDBC Driver] Can't start a cloned connection while in manual transaction mode

Solution

Ensure the selectMethod property is set correctly during JDBC connection pool registration:

asadmin set -u < admin user> -p < admin password > -H < host > -p < port > < instance >.jdbc-connection-pool.< pool name > .property.selectMethod="cursor"

IOException: Connection in invalid state

WEB2001: Web application service failed
java.io.IOException: Error from HA Store:
Connection is in invalid state for LOB operations.
<stack trace>

This error occurs when you have the HADB JDBC Connection pool transaction-isolation-level entry set to read-committed and read-uncommitted.

Solution

Change the transaction-isolation-level value of your HADB JDBC Connection pool to repeatable-read and re-start the application server.

Chapter 4 Runtime Problems

Runtime Logs

Load Balancer / Web Server won’t start

Solution

Instance goes unused after restarting

Solution

Can’t access a web page.

Solution

Can’t access an application.

Is the application deployed?

Solution

Is your loadbalancer.xml file correct?

Is the web server running?

Has the correct port been specified for the web server?

Server responds slowly after being idle

Does the log contain “Lost connection” messages?

Solution: Change the timeout value

My application suddenly went away.

Is the application you are using being quiesced by the load balancer?

Requests are not succeeding.

Is the load balancer timeout correct?

Solution

Are the system clocks synchronized?

Solution

Have you enabled the instances of the cluster?

Is the AppServer communicating with HADB?

Solution

Server log: app persistence-type = memory

Solution

Dynamic reconfiguration failed.

Solution

Session Persistence Problems

The create-session-store command failed.

Are the HADB and the application server instance on different sides of a firewall?

Solution

Configuring instance-level session persistence didn’t work.

Session data seems to be corrupted.

Solution

HTTP session failover is not working.

Are the system clocks synchronized?

Solution

Do all objects bound to a distributed session implement the java.io.Serializable interface?

Solution

Is session information stored in HttpSession?

Is your web application distributable?

Solution

Is the persistence type set to ha?

Solution

An object is cloned instead of shared

Has a session store been created?

Are all the machines in the cluster homogenous?

Has high availability been enabled?

Solution

Out of Memory and Stack Overflow errors.

Solution 1

Solution 2

Solution 3

HADB Transaction Failures

Is there a shortage of HADB data devices space?

Solution 1

Solution 2

Is there a shortage of other HADB resources?

Have history files grown too large?

Other Transaction Problems

Can’t set TRANSACTION_SERIALIZABLE level for a bean.

Solution

Sporadic failures during high loads.

No space left in Tuple Log

Solution 1

Solution 2

Solution 3

Node-internal log gets full

Solution 1

Solution 2

Not enough locks

Solution 1: Increase the number of locks

Solution 2: Improve CPU Utilization

Frequent “High Load” Warnings

Client cannot connect to HADB.

Solution 1

Chapter 4
Runtime Problems