Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun Java System Application Server 7 2004Q2 Update 1 Standard and Enterprise Edition Troubleshooting Guide 

Chapter 4
Runtime Problems

This chapter addresses problems that you may encounter while running the Application Server.

The following sections are contained in this chapter:


Runtime Logs

Refer to the logs and information in Server Logs for information on using the logs to troubleshooting runtime problems.


Can’t stop a remote server instance

Using root to start an admin server instance for a remote non-root user can lead to diccifulties when attempting to stop the instance.

Explanation

When an admin server instance starts up there are three Unix processes which implement that instance - two watchdog processes and the admin instance process.

When you use root to start an admin instance the watchdog processes run as root, but the admin instance process performs a setuid to run as the instance owner. If the instance owner now attempts to stop the instance, the attempt will fail because the instance owner cannot stop the watchdog processes owned by root.

Then, even if root attempts to stop the instance remotely, the instance only appears to stop—the processes continues to run. For example:

asadmin stop-domain --user admin --password adminadmin --port 4848
DomainStoppedRemotely

pgrep 'appserv*'
19049 19048 19050
(3 processes)

Even though the asadmin client runs as root, the command is sent to the remote admin server instance, which is running as the non-root user, so the root watchdog processes can't be stopped. (The "DomainStoppedRemotely" message really means "Domain Requested to Stop"—it doesn't guarantee success.)

Solution

To actually stop the admin server instance, a local root user must intiate the stop request:

asadmin stop-instance --domain domain1 admin-server
Instance admin-server stopped

pgrep 'appserv*'
(no processes)

To avoidsuch problems in the future, it is best to not use root to start instance on behalf of other users. Instead, let those users start the instances directly.


Instance goes unused after restarting

An instance was down, and is now back up, but the access log shows that is not getting any requests.

This situation occurs when the Load Balancer has not been configured to regularly check the health of instances. When the instance was down, the Load Balancer marked it as unhealthy, and recorded that fact in the load balancer log. But now that the instance is up and running again, the Load Balancer doesn’t know that its health has been restored.

Solution

Configure the Load Balancer to check the health of instance regularly by adding the health checker URL to loadbalancer.xml with a line like this:

<health-checker url="/pathToHealthChecker"
  interval-in-seconds="10" timeout-in-seconds="30" />

Note:
Consult the Performance and Tuning Guide for recommendations on settting interval-in-seconds. Checking too frequently degrades performance, but checking too seldom (or not all) creates periods in which the insance goes unused.


Can’t access a web page

Browsers report the following error when accessing an application:

404 Not Found
The requested URL destination_URL was not found on this server.

This means that the web page they were attempting to access is not available at the location they specified. The error frequently occurs because the incorrect URL was specified, or it may be transient. But it could also indicate a problem with the server.

Solution

Check that the application is deployed and enabled. See Can’t access an application, next.


Can’t access an application

There are a number of possible reasons that you cannot access an application. A typical error message in this case is

Consider the following:

Is the application deployed?

The most likely cause is that the application is not deployed. The cladmin command is used to deploy an application to all instances in your cluster. Refer to the Sun Java System Application Server Administrator’s Guide for deployment instructions. Otherwise, refer to the Sun Java System Application Server Developer’s Guide for non-cluster deployment guidelines.

Solution

If needed, redeploy the application.

Is your loadbalancer.xml file correct?

Check the web server log files to verify that the load balancer started. If it hasn’t, there may be errors about the loadbalancer.xml file written in the to the error log.

Consider the following:

Is the web server running?

Verify that the web server has started.

Has the correct port been specified for the web server?

Determine the correct web server port number and verify that the correct port has been specified. Refer to Is the Admin Server running at the expected port? for guidelines on determining the port number.


Server responds slowly after being idle

If the server takes a while to service a request after a long period of idleness, consider the following:

Does the log contain “Lost connection” messages?

Does the log contain “Lost connection” messages?

If the server log shows error messages of the form,

java.io.IOException:..HA Store: Lost connection to the server..

then server has to recreate the JDBC pool for HADB.

Solution: Change the timeout value

The default HADB connection timeout value is 1800 seconds. If the application server does not send any request over a JDBC connection during this period, HADB closes the connection, and the application server needs to re-establish it. To change the timeout value, use the hadbm set SessionTimeout= command.

Important Note:
Make sure the HADB connection time out is greater than the JDBC connection pool time out. If the JDBC connection time out is more than the HADB connection time out, the connection will be closed from the HADB side, but it will be there in appserver connection pool. So when the application tries to use the connection, the application server will have to re-create the connection, which incurs significant over head


Application suddenly goes away

Consider the following:

Is the application you are using being quiesced by the load balancer?

When an application is being quiesced, you may experience loss of service when the application is disabled, until the application is re-enabled.


Requests are not succeeding

The following problems are addressed in this section:

Is the load balancer timeout correct?

When configuring the response-timeout-in-seconds property in the loadbalancer.xml file, you must take into account the maximum timeouts for all the applications that are running. If the response timeout it is set to a very low value, numerous in-flight requests will fail because the load balancer will not wait long enough for the Application Server to respond to the request.

On the other hand, setting the response timeout to an inordinately large value will result in requests being queued to an instance that has stopped responding, resulting in numerous failed requests.

Solution

Set the response-timeout-in-seconds value to the maximum response time of all the applications.

Are the system clocks synchronized?

When a session is stored in HADB, it includes some time information, including the last time the session was accessed and the last time it was modified. If the clocks are not synchronized, then when an instance fails and another instance takes over (on another machine), that instance may think the session was expired when it was not, or worse yet, that the session was last accessed in the future!


Note

In a non-colocated configuration, it is important to synchronize the clocks on that machines that are hosting HADB nodes. For more information, see the Installation Guide chapter, “Preparing for HADB Setup”.


Solution

Verify that clocks are synchronized for all systems in the cluster.

Have you enabled the instances of the cluster?

Even if you start an application server instance and define it to be a part of the cluster, the instance will not receive requests from the load balancer until you enable the instance. Enabling makes the instance and active part of the cluster. The correct sequence of events for activating and deactivating an instance is:

  1. Start the Application Server.
  2. Create an Application Server instance.
  3. Enable the instance
  4. Disable the instance.
  5. Stop the instance.
  6. Start the instance (only if it has been stopped).
  7. Enable the instance.

Is the AppServer communicating with HADB?

HADB may be created & running, but if the persistence store has not yet been created, created the Application Server won’t be able to communicate with the HADB. This situation is accompanied by the following message:

WARNING (7715): ConnectionUtilgetConnectionsFromPool failed using connection URL: <connection URL>

Solution

Create the session store in the HADB with a command like the following:

asadmin create-session-store --storeurl connection URL --storeuser haadmin --storepassword hapasswd --dbsystempassword super123


Server log: app persistence-type = memory

The server.log shows that the J2EE application is using memory persistence instead of High Availability, with a message like this:

Enabling no persistence for web module [Application.war]'s sessions: persistence-type = [memory]

This situation occurs when the application server has not been configured to use HA.

Solution

Enable the availability service with a command like this:

asadmin set --user admin --password netscape --host localhost
--port 4848 serverName.availability-service.availabilityEnabled=true


Dynamic reconfiguration failed

The load balancer plug-in detects changes to its configuration by examining the time stamp of the loadbalancer.xml file. If a change has been made to the loadbalancer.xml file, the load balancer automatically reconfigures itself. The load balancer ensures that the modified configuration data is compliant with the DTD before overwriting the existing configuration.

If changes to the loadbalancer.xml file are not in the correct format, as specified by the sun-loadbalancer_1_0.dtd file, the reconfiguration fails and a failure notice is printed in the web server's error log files. The load balancer continues to use the old configuration in memory.


Note

If the load balancer encounters a hard disk read error while attempting to reconfigure itself, it uses the configuration that is currently in memory, and a warning message is logged to the web server's error log file.


Solution 1

You may have to wait up to the amount time configured for the reload time interval before you see the change. Verify that the interval isn’t longer than you expected.

Solution 2

Edit the loadbalancer.xml file as needed until it follows the correct format as specified in the DTD file.


Session Persistence Problems

The following problems are addressed in this section:

The create-session-store command failed

Consider the following:

Are the HADB and the application server instance on different sides of a firewall?

The asadmin create-session-store command cannot run across firewalls. Therefore, for the create-session-store command to work, the application server instance and the HADB must be on the same side of a firewall.

The create-session-store command communicates with the HADB and not with the application server instance.

Solution

Locate the HADB and the application server instance on the same side of a firewall.

Configuring instance-level session persistence didn’t work

The application-level session persistence configuration always takes precedence over instance-level session persistence configuration. Even if you change the instance-level session persistence configuration after an application has been deployed, the settings for the application still override the settings for the application server instance.

Session data seems to be corrupted

Session data may be corrupted if the system log reports errors under the following circumstances:

If you determine that the data has been corrupted, there are three possible solution.

Solution

To bring the session store back to a consistent state, do the following:

  1. Use the asadmin clear-session-store command to clear the session store.
  2. If clearing the session store doesn’t work, reinitialize the data space on all the nodes and clear the data in the HADB using the hadbm clear command.
  3. If clearing the HADB doesn’t work, delete and then recreate the database.


HTTP session failover is not working

The Sun Java System Application Server 7, Enterprise Edition includes the high-availability database (HADB) for storing session data. The HADB is not a general-purpose database but instead is an HttpSession store.

If HTTP session failover is not working correctly, consider the following:

Are the system clocks synchronized?

For HTTP session failover to work, the clocks of all the computers on which the application server instances in a cluster reside must be synchronized. (For more detail, see Are the system clocks synchronized?.)

Solution

Verify that clocks are synchronized for all systems in the cluster.

Do all objects bound to a distributed session implement the java.io.Serializable interface?

If an object does not implement the java.io.Serializable interface, it will not be persisted. No errors or warnings are produced, because the lack of persistence may well be the desired behavior. The remaining session objects are successfully persisted and will fail over.


Note

If an object which is not serializable implements java.io.Serializable, an exception will be thrown in the log. In this case, the entire user session is not persisted. A failover will produce an empty session.


Solution

Make sure that every class in the session that is supposed to persist implements java.io.Serializable, as in this example:

public class MyClass implements java.io.Serializable
{
  ..
}

Is your web application distributable?

For a web application to be highly available, it should be distributable. An application is non-distributable if the webapp element of the web.xml file does not contain a distributable subelement.

For additional information, refer to the Sun Java System Application Server Developer’s Guide to Web Applications.

Solution

Verify that the webapp element of the web.xml file contains a distributable subelement.

Is the persistence type set to ha?

The persistence type must be set to ha for session and SFSB failover to work. When you run the clsetup command, the persistence type is set to ha by default. If you do not use the clsetup command to set up your initial cluster, the persistence type is specified as memory, the default. The memory type offers no session persistence upon failover, while the failover capabilities offered by the file persistence type are intended for use only in development systems where failover capabilities are not strictly required.

Instructions for setting the persistence type are contained in the Session Persistence chapter of the Sun Java System Application Server Administrator’s Guide.

Solution

Verify that the persistence type is set to ha. If it isn’t, modify either the entire instance or your particular application to use the ha persistence type. For details, see the Session Persistence chapter of the Administration Guide.

An object is cloned instead of shared

When using modified attribute persistence scope, and a session fails over or is activated after being passivated, an object that is shared between two attributes comes back as two separate copies of the object instead of as a single shared object.

The situation occurs because you can not have one object referred to by two separate attributes when using modified attribute. The application server serializes and persists each attribute separately, so the shared object gets serialized twice, once for each attribute. When the objects get deserialized, they are now two separate objects.

Has a session store been created?

HTTP session and SFSB failover will not work until a session store has been created using the asadmin create-session-store command.

Instructions for creating a session store are contained in the Session Persistence chapter of the Sun Java System Application Server Administrator’s Guide

Are all the machines in the cluster homogenous?

All Application Server instances in a cluster must have the same applications deployed to them. For these applications to take part in failover, they must have a consistent session persistence configuration and point to the same session store.

Any new instance that you add to a cluster must have the same version and same patch level as all existing instances in a cluster.

Has high availability been enabled?

HTTP session and SFSB failover will not work until high availability has been enabled using the availability-enabled attribute.

Solution

Set the availability-enabled attribute using the asadmin command.


Out of Memory and Stack Overflow errors

When using the Application Server to deploy web applications, out of memory errors and stack overflow errors are occurring., even though the volume of data is not large.

Memory: 2048M
CPU : 1 900Mhz Ultra SparcIII

Solution 1

Java VM runs out of memory because the web applications are quite demanding in terms of creating Java objects.

This can usually be solved by setting the VM Xms/Xmx and Xss parameters. If you have default values similar to these:

-Xmx256m - for initial app. server install
-Xss512k - default for 1.4.0 server VM

try something like:

-Xmx1024m -Xss1024k

Solution 2

Java VM runs out of memory because of bad web application design.

For instance, the application creates lots of references to Java objects and they are being cross referenced through the lifecycle of this application preventing the garbage collector from doing its job. With each additional user using the application, it consumes more and more heap space until it runs out.

There is nothing you can do here except for changing the application. You are most likely dealing with an application memory leak which can be traced using a profiling tool such as OptimizeIt.

Solution 3

The other thing you can try is to configure your Application Server with a later version of the J2SE platform, to take advantage of improvements in memory management and garbage collection.


Performance Problems

This section discusses the following issues:

Too much swapping in a colocated HADB system

When HADB is sharing a system with other processes, insufficient memory may lead to exceptionally high levels of swapping activity. When that situation arises, performance suffers and transactions may be lost.

If the system is running slowly, use the Unix vmstat command to check swapping levels, and look for this message in the HADB history files, where M is greater than N:

Process blocked for .M. sec, max block time is .N. sec

This message can occur when excessive swapping causes HADB restarts.

For additonal explanations and recommendations, see the Performance Tuning Guide section: “Tuning for High-Availability”, “Tuning HADB, “Memory Allocation”.

Indefinite looping in server.log after MDB failure

When invoking CMP from MDB (with a container managed transaction), there could be a problem if the CMP throws an EJBException, but a RuntimeException is not thrown in MDB and global transaction is set to true. The MDB container is then unaware of the exception. Because the message is part of the transaction, the JMS provider( Sun Java™ System Message Queue) will keep trying to deliver the message. In that case, the exception continues to get logged in server.log indefinitely.

Solution

To prevent indefinite looping with Message Queue, throw a runtime exception in the MDB so that the MDB container will be aware of it.

Performance suffers when using server-side SOAP message handlers

SOAP Message Handlers are commonly be used in commercial Web services for encryption and decryption before sending a message to its endpoint or setting new Headers (for example, setting a session id and handling attachments). However, they are known to reduce throughput.

Explanation

Bug #4733037 explains the problem . We have observed about a 50% reduction in throughput even with a completely empty MessageHandlers on the Server side.

Solution

For the sake of performance, it is best to minimize the use of SOAP message handlers on the server side, until we have a Service pack or performanec pack release to fix this issue.


Low Memory in MQ Broker

Sometimes while running the JMS related applications for long hours under stress, the MQ Broker goes to a low memory state.

Solution

The default Java heap size is -Xms8m -Xmx128m -Xss128k. Increasing the Broker heap size can help to remedy the issue.

  1. Access the Admin Console at http://<hostname>:<adminport>.
  2. Login to the Admin Console using the admin username and password.
  3. Expand "App server instances" and select the instance where the JMS application is run.
  4. Select JMS in the menu, expand it, and select Service.
  5. In the right hand side we get the General and Properties lists.
  6. Under the General list, select Start Arguments.
  7. In the text box enter the heap size as: -vmargs -Xms512m -vmargs -Xmx512m ( where 512m is an example of the heap size.
  8. Click the SAVE button below the screen.
  9. Select the server instance again on the left hand frame and click Apply Changes.
  10. Start or restart the server instance if it is not running.
  11. To verify that the broker has picked up the heap size set, inspect the broker log.


HADB Performance Problems

Performance will be affected when the transactions to HADB get delayed or aborted. This situation is generally caused by a shortage of system resources. Delays beyond five seconds causes the transactions to abort. Any node failures can also cause the active transaction on that node at the time of the crash to abort. Any double failures (failure of both mirrors) will make the HADB unavailable. The causes of the failures can generally be found in the HADB history files.

In your efforts to isolate the problem, consider the following:

Is there a shortage of CPU or memory resources/too much swapping?

Node restarts or double failures due to Process blocked for x sec, max block time is 2.500000 sec.

Here, x is the actual time duration the process was blocked and this is greater than 2.5 seconds.

The HADB Node Supervisor Process (NSUP/clu_nsup_srv) tracks the time elapsed since the last time it did some monitoring work. If that time duration exceeds a specified maximum (2500 ms, by default), NSUP concludes that it was blocked too long and restarts the node.

If NSUP is blocked for more than 2.5 seconds, it restarts the node. If mirror nodes are placed on the same host, the probability of the occurrence of double failure is high. Simultaneous occurrence of the blocking on the mirror hosts may also lead to double failures.

The situation is especially likely to arise when there are other processes, eg. in a colocated configuration, in the system that compete for CPU, or memory which produces extensive swapping and multiple page faults as processes are rescheduled. NSUP being blocked can also be caused by negative system clock adjustments.

Solution: Ensure that HADB nodes get enough system resources. Ensure also that the time syncronisation daemon does not make large (not higher than 2 seconds) jumps.

Is There a Disk Contention?

Disk contention can be give negative impact to user data read/writes to the disk devices, as well as to HADB writes to history files. Severe disk contention may delay or abort user transactions. Delay in history file writing may cause node restarts, in the worst case, lead to double failures.

The disk contention can be identified by monitoring the disk I/O from the operating sysytem, for the disks used for data, log devices and history files. This can also identified by the following statement in the history files:

HADB warning: Schedule of async <read,write> operation took ..

Write delays to the history file will be written to the HADB history files. This can be identified by,

NSUP0 BEWARE <timestamp> Last flush took too long (x msecs).

This warning shows that disk I/O took too long. If the delay exceeds 10 seconds, the node supervisor restarts the trans process with the error message:

Child process trans0 10938 does not respond.

Child died - restarting nsup.

Psup::stop: stopping all processes.

This message indicates that a trans (clu_trans_srv) process has been too busy doing other things (eg., waiting to write to the history file) to reply to the node supervisor’s request checking the heartbeat of the trans process. This causes the nsup to believe that the trans has died and then restart it.

This problem is observed especially in Red Hat Advanced Server 2.1 when multiple HADB nodes are placed on the same host and all the nodes use the same disk to place their devices.

Solution: Use one disk per node to place the devices used by that node. If the node has more than one data devices and the disk contention is observed, move one data device to another disk. The same applies to the history file as well.

Is there a shortage of HADB data devices space?

One possible reason for transaction failure is running out of data device space. If this situation occurs, HADB will write warnings to the history file, and abort the transaction which tried to insert and/or update data.

To find the messages, follow the instructions in Examining the HADB history files. Typical messages are:

HIGH LOAD: about to run out of device space, ...

HIGH LOAD: about to run out of device space on mirror node, ...

The general rule of thumb is that the data devices must have room for at least four times the volume of the user data. Please refer to the Tuning Guide for an explanation.

Solution 1

Increase the size of the data devices using the following command:

hadbm set TotalDataDevicePerNode=size

(See the Administrator's Guide)

This solution requires that there is space available on the physical disks which are used for the HADB data devices on all nodes.

HADBM will automatically re-start each node of the database.

Solution 2

Stop and clear the HADB, and create a new instance with more nodes and/or larger data devices and/or several data devices per node, see the Administrator's guide.

Unfortunately, using this solution will erase all persistent data.

See also, Bug ID 5097447 in Sun Java System Application Server 7 2004Q2 Update 1 Release Notes.

Is there a shortage of other HADB resources?

When an HADB node is started, it will allocate:

If an HADB node runs out of resources it will delay and/or abort transactions. Resource usage information is shipped between mirror nodes, so that a node can delay or abort an operation which is likely to fail on its mirror node.

Transactions that are delayed repeatedly may time out, and return an error message to the client. If they do not time out, the situation will be visible to the client only as decreased performance in the period where the system is short on resources.

These problems frequently occur in “High Load” situations. For details, see High Load Problems.

Direct Disk I/O Mapping Failed Errors

When using Veritas file system on Solaris, you might get the message WRN: Direct disk I/O mapping failed in the HADB history files.

This message indicates that HADB is unable to turn on direct I/O for the data and log devices. Direct I/O is a performance enhancement that reduces the CPU cost of writing disk pages, and also causes less overhead of administering dirty data pages in the operating system.

To use direct I/O with Veritas, you should create the data and log devices on a file system that is mounted with the option mincache=direct. Note that this option will apply to all files created on the file system. For details, see the command mount_vxfs(1M).

An alternative is to use the Veritas Quick I/O facility. In effect, this product makes it possible to perform raw I/O to file system files. For more information, see the Veritas document, VERITAS File System™ 4.0 Administrator’s Guide for Solaris.


Note

This description is based on available documentation only.

Sun Database Technology Group has not tested these configurations.



High Load Problems

High load scenarios are recognizable by the following symptoms:

When you suspect a high load scenario, consider the following:

Note:
Frequently, these problems can be solved by making more CPU horsepower available, as described in Improving CPU Utilization.

. (For even more information, consult the Tuning Guide.)

Is the Tuple Log out of space?

The user operations (delete, insert, update) are logged in the tuple log and executed. There tuple log may fill up because:

messages in the history files), which happens as a result of:

Solution 1

Check CPU usage, as described in Improving CPU Utilization.

Solution 2

If CPU utilisation is not a problem, check the disk I/O. If the disk shows contention, avoid page faults when log records are being processed by increasing the data buffer size with hadbm set DataBufferPoolSize=...

Solution 3

Look for evidence of network contention, and resolve bottlenecks.

Solution 4

Increase the tuple log buffer using hadbm set LogBufferSize=...

See also, Bug ID 5097447 in Sun Java System Application Server 7 2004Q2 Update 1 Release Notes.

Is the node-internal log full?

Too many node-internal operations are scheduled but not processed due to CPU or disk I/O problems.

Solution 1

Check CPU usage, as described in Solution 2: Improve CPU Utilization.

Solution 2

If CPU utilization is not a problem, and there is sufficient memory, increase he InternalLogbufferSize using the hadbm set InternalLogbufferSize= command.

Are there enough locks?

Some extra symptoms that identify this condition are:

Solution 1: Increase the number of locks

Use hadbm set NumberOfLocks= to increase the number of locks.

Solution 2: Improve CPU Utilization

Check CPU usage, as described in Improving CPU Utilization.

Can you fix the problem by doing some performance tuning?

In most situations, reducing load or increasing the availability of resources will improve host performance. Some of the more common steps to take are:

In addition, the following resources can be adjusted to improve “HIGH LOAD” problems, as described in the Performance and Tuning Guide:

Size of Database Buffer:        hadbm attribute DataBufferPoolSize
Size of Tuple Log Buffer:         hadbm attribute LogBufferSize
Size of Node Internal Log Buffer:  hadbm attribute InternalLogBufferSize
Number of Database Locks:        hadbm attribute NumberOfLocks


Client cannot connect to HADB

This problem is accompanied by a message in the history file:

HADB-E-11626: Error in IPC operations, iostat = 28: No space left on device

where:

The most likely explanation is that a semget() call failed (see the unix man pages). If HADB started successfully, and you get this message at runtime, it means that the host computer has too few semaphore “undo structures”. See the “Preparing for HADB Setup” chapter in the Installation guide for how to configure semmnu in /etc/system.

Solution

Stop the affected HADB node, reconfigure and reboot the affected host, re-start the HADB node. HADB will be available during the process.


Connection Queue Problems

The following problems may occur with connection queues:

Full connection queue closes socket

This problem typically occurs in “high load” scenarios. The server.log file shows the error: SEVERE (17357): Connection queue full, closing socket.

Solution: Increase configuration values

Increase the value for ConnQueueSize and MaxKeepAliveConnections in file magnus.conf under the config directory of your Web Server, for example:


Connection Pool Problems

The following problems may occur in relation to connection pools:

Single sign-on requires larger connection pool

When single sign-on (or session persistence) requires connections and the wait time is exceeded, the following error occurs:

Unable to get connection - Wait-Time has expired

The Sun Java System Application Server uses the same connection pool for both HADB session persistence and single sign-on. Single sign-on is enabled by default. If an application requires single sign-on functionality, the connection pool setting must be doubled.


Tip

If your application does not require single sign-on functionality, disabling it can improve performance. To disable single sign-on, change the following settings in the server.xml file:

<virtual-server id="server1" ... >
  ...
  <property name="sso-enabled" value="
false"/>
  ...
</virtual-server>


Solution

Double the size of the connection pool.

For example, if an application indicates that 16 is the optimal number of connections to a single HADB node, the number of connections should be doubled to 32 if single sign-on functionality is required. In this case, the JDBC connection pool settings look like this:

<jdbc-connection-pool steady-pool-size="32" max-pool-size="32" max-wait-time-in-millis="60000" pool-resize-quantity="2" idle-timeout-in-seconds="10000" is-isolation-level-guaranteed="true" is-connection-validation-required="true" connection-validation-method="auto-commit" fail-all-connections="false" datasource classname="com.sun.hadb.jdbc.ds.HadbDataSource" name="CluJDBC" transaction-isolation-level="repeatable-read">

You should also double the loadbalancer.xml file setting for response-timeout-in-seconds from 60 seconds to 12 0 seconds.

<property name="response-timeout-in-seconds" value="120"/>

This value must be equal to or greater than the following:

Server: Unable to obtain connection from pool

The application server is having trouble connecting with HADB, as evidenced by a message like the following in the server.log file:

ConnectionUtilgetConnectionsFromPool failed using connection URL: null Unable to obtain connection from pool

Solution

Make sure HADB is running. Make sure that session-store, JDBC connection pool, and JNDI name (jdbc/hastore) are created. Configure the session persistence for High Availability with a command like the following:

asadmin configure-session-persistence --user admin
  --password netscape --host localhost --port 4848
  --type ha --frequency web-method --scope session
  --store jdbc/hastore server1

JDBC connection is not available

Consider the following:

Is the max-pool-size setting adequate?

The server.xml file defines the following default values:

During server start/restart, the ejb-container steady pool for deployed enterprise beans will be created. Since the default steady-pool-size is 32, 32 enterprise beans will be created for each bean unless a different value is specified in the sun-ejb-jar.xml file.

The setEntityContext method will be called for each of the beans created. If more than one bean is grabbing JDBC connections in the setEntityContext method from the same JDBC connection pool, the following happens:

If the newly-created beans attempt to grab a JDBC connection from the same pool in their setEntityContext method, an exception is thrown with the following message:

Solution

Increase the max-pool-size of the default jdbc-connection-pool to a higher value, such as 256.

Exception occurs while calling DataSource.getConnection()

This exception occurs when an invalid DataSource class property is registered within the JDBC connection pool. Misspelling is a common cause. For instance, while creating a jdbc-connection-pool for Oracle, one might specify the following:

< property name="OracleURL" value="jdbc:oracle:...."/>

This will result in the following exception since OracleURL is not a property of any Oracle datasource:

Solution

Verify that all jdbc-connection-pool properties in use are valid.

Verify that the datasource classname specified is for the required vendor datasource class.

Exception occurs while executing against MSSQL

The following exception occurs while executing a statement against the MSSQL server using a non-XA driver:

java.sql.SQLException: [DataDirect] [SQLServer JDBC Driver] Can't start a cloned connection while in manual transaction mode

This happens when the selectMethod property is not set to cursor.

Solution

Ensure the selectMethod property is set correctly during JDBC connection pool registration:

Using the command-line interface, issue the following command:

The options can also be set using the graphical Administration interface.

IOException: Connection in invalid state

The error log shows the following message:

WEB2001: Web application service failed
java.io.IOException: Error from HA Store:
  Connection is in invalid state for LOB operations.
  <stack trace>

This error occurs when you have the HADB JDBC Connection pool transaction-isolation-level entry set to read-committed and read-uncommitted.

Solution

Change the transaction-isolation-level value of your HADB JDBC Connection pool to repeatable-read and re-start the application server.


Common Runtime Procedures

This section covers the following common runtime and recovery procedures:

Improving CPU Utilization

Available CPU cycles and I/O capacity can impose severe restrictions on performance. Resolving and preventing such issues is necessary to optimize system performance (in addition to configuring the HADB optimally.)

If there are more CPUs on the host that are not exploited, add new nodes to the same host. Otherwise add new machines and add new nodes on them.

If the machine has enough memory, increase the DataBufferPoolSize, and increase other internal buffers that may be putting warnings into the log files. Otherwise, add new machines and add new nodes on them.

For much more information on this subject, consult the Performance and Tuning Guide.



Previous      Contents      Index      Next     


Copyright 2004 Sun Microsystems, Inc. All rights reserved.