Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun Java System Message Queue 3 2005Q1 Administration Guide 

Chapter 12
Troubleshooting Problems

This chapter explains how to understand and resolve the following problems:

When problems occur, it is useful to check the version number of the installed Message Queue software. Use the version number to ensure that you are using documentation whose version matches the software version. You also need the version number to report a problem to Sun. To check the version number, issue the following command:

imqcmd -v


A Client Cannot Establish a Connection

The symptoms of this problem are as follows:

This section explores the following possible causes:

Client applications are not closing connections, causing the number of connections to exceed resource limitations

To confirm this cause of the problem

List all connections to a broker:

imqcmd list cxn

The output will list all connections and the host from which each connection has been made, revealing an unusual number of open connections for specific clients.

To resolve the problem

Rewrite the offending clients to close unused connections.

Broker is not running or there is a network connectivity problem

To confirm this cause of the problem

To resolve the problem

Connection service is inactive or paused

To confirm this cause of the problem

Check the status of all connection services:

imqcmd list svc

If the status of a connection service is shown as unknown or paused, clients will not be able to establish a connection using that service.

To resolve the problem

Too few threads available for the number of connections required

To confirm this cause of the problem

Check for the following entry in the broker log:

WARNING [B3004]: No threads are available to process a new connection on service ... Closing the new connection.

Also check the number of connections on the connection service and the number of threads currently in use, using one of the following formats:

imqcmd query svc -n serviceName

imqcmd metrics svc -n serviceName -m cxn

Each connection requires two threads: one for incoming messages and one for outgoing messages (see Thread Pool Manager).

To resolve the problem

Too few file descriptors for the number of connections required on the Solaris or Linux operating system

For more information about this issue, see Setting the File Descriptor Limits (Solaris or Linux).

To confirm this cause of the problem

Check for an entry in the broker log similar to the following: Too many open files.

To resolve the problem

Increase the file descriptor limit, as described in the ulimit man page.

TCP backlog limits the number of simultaneous new connection requests that can be established

The TCP backlog places a limit on the number of simultaneous connection requests that can be stored in the system backlog (imq.portmapper.backlog) before the Port Mapper rejects additional requests. (On Windows operating systems there is a hard-coded backlog limit: 5 for Windows desktops and 200 for Windows servers.)

The rejection of requests because of backlog limits is usually a transient phenomenon, due to an unusually high number of simultaneous connection requests.

To confirm this cause of the problem

Examine the broker log. First, check to see whether the broker is accepting some connections during the same time period that it is rejecting other connections. Next, check for messages that explain rejected connections. If you find such messages, the TCP backlog is probably not the problem, because the broker does not log connection rejections due to the TCP backlog.

If some successful connections are logged, and no connection rejections are logged, the TCP backlog is probably the problem.

To resolve the problem

The following approaches can be used to resolve TCP backlog limitations:

Operating system limits the number of concurrent connections

The Windows operating system license places limits on the number of concurrent remote connections that are supported.

To confirm this cause of the problem

Check that there are plenty of threads available for connections (using imqcmd query svc) and check the terms of your Windows license agreement. If you can make connections from a local client, but not from a remote client, operating system limitations might be the cause of the problem.

To resolve the problem

Authentication or authorization of the user is failing

The authentication can be failing due to an incorrect password, because there is no entry for the user in the user repository, or because the user does not have access permissions for the connection service.

To confirm this cause of the problem

Check entries in the broker log for the Forbidden error message. This will indicate an authentication error, but will not indicate the reason for it.

To resolve the problem


Connection Throughput Is Too Slow

The symptoms of this problem are as follows:

This section explores the following possible causes:

Network connection or WAN is too slow

To confirm this cause of the problem

Ping the network to see how long it takes for the ping to return, and then consult a network administrator. Also you can send and receive messages using local clients and compare the delivery time with that of remote clients (which use a network link).

To resolve the problem

If the connection is too slow, upgrade the network link.

Connection service protocol is inherently slow compared to TCP

As an example, SSL-based or HTTP-based protocols are slower than TCP (see Figure 11-5).

To confirm this cause of the problem

If you are using SSL-based or HTTP-based protocols, try using TCP and compare the delivery times.

To resolve the problem

Application requirements usually dictate the protocols being used, so there is little that you can do, other than to attempt to tune the protocol as described in (Tuning Transport Protocols).

Connection service protocol is not optimally tuned

To confirm this cause of the problem

Try tuning the protocol and see if it makes a difference.

To resolve the problem

Try tuning the protocol as described in (Tuning Transport Protocols).

Messages are so large they consume too much bandwidth

To confirm this cause of the problem

Try running your benchmark with smaller-sized messages.

To resolve the problem

What appears to be slow connection throughput is actually a bottleneck in some other step of the message delivery process

To confirm this cause of the problem

If none of the items above appear to be the cause of what appears to be slow connection throughput, consult Figure 11-1 for other possible bottlenecks and check for symptoms associated with the following problems:

To resolve the problem

Follow the problem resolution guidelines provided in the problem troubleshooting sections above.


A Client Cannot Create a Message Producer

The symptoms of this problem are as follows:

This section explores the following possible causes:

A physical destination has been configured to allow only a limited number of producers

One of the ways of avoiding the accumulation of messages on a physical destination is to limit the number of producers (maxNumProducers) that it supports.

To confirm this cause of the problem

Check the physical destination (see Displaying Information about Physical Destinations):

imqcmd query dst

The output will show the current number of producers and the value of maxNumProducers. If the two values are the same, the number of producers has reached its configured limit. When a new producer is rejected by the broker, the broker returns a ResourceAllocationException [C4088]: A JMS destination limit was reached and makes the following entry in the broker log: [B4183]: Producer can not be added to destination.

To resolve the problem

Increase the value of the maxNumProducers attribute (see Updating Physical Destination Properties).

The user is not authorized to create a message producer due to settings in the access control properties file

To confirm this cause of the problem

When a new producer is rejected by the broker, the broker returns the following message:

The broker also makes the following entries in the broker log:

To resolve the problem

Change the access control properties to allow the user to produce messages (see Access Control for Physical Destinations).


Message Production Is Delayed or Slowed

The symptoms of this problem are as follows:

This section explores the following possible causes:

The message server is backlogged and has responded by slowing message producers

A backlogged server accumulates messages in broker memory.

When the number of messages or number of message bytes in physical destination memory reaches configured limits, the broker attempts to conserve memory resources in accordance with the specified limit behavior. The following limit behaviors slow down message producers:

Similarly, when the number of messages or number of message bytes in broker-wide memory (for all physical destinations) reaches configured limits, the broker will attempt to conserve memory resources by rejecting the newest messages.

Also, when system memory limits are reached because physical destination or broker-wide limits have not been set properly, the broker takes increasingly serious action to prevent memory overload. These actions include throttling back message producers.

To confirm this cause of the problem

When a message is rejected by the broker due to configured message limits, the broker returns the following message:

The broker also makes this entry in the broker log:

The message is followed by a message indicating the limit that has been reached. If the message limit is on a physical destination, the broker makes an entry like the following: [

If the message limit is broker wide, the broker makes an entry like the following:

More generally, you can check for message limit conditions before the rejections occur as follows:

To resolve the problem

There are a number of approaches to addressing the slowing of producers due to messages becoming backlogged:

The broker cannot save a persistent message to the data store

If the broker cannot access a data store or write a persistent message to the data store, the producing client is blocked. This condition can also occur if destination or broker-wide message limits are reached, as described above.

To confirm this cause of the problem

If the broker is unable to write to the data store, it makes one of the following entries in the broker log: [B2011]: Storing of JMS message from connectionID failed… or [B4004]: Failed to persist message messageID

To resolve the problem

Broker acknowledgment timeout is too short

Due to slow connections or a lethargic message server (caused by high CPU utilization or scarce memory resources), a broker might require more time to acknowledge receipt of a persistent message than allowed by the value of the connection factory’s imqAckTimeout attribute.

To confirm this cause of the problem

If the imqAckTimeout value is exceeded, the broker returns the following message:

JMSException [C4000]: Packet acknowledge failed

To resolve the problem

Change the value of the imqAckTimeout connection factory attribute (see Connection Factory Attributes.).

A producing client is encountering JVM limitations

To confirm this cause of the problem

To resolve the problem

Adjust the JVM (see Java Virtual Machine Adjustments).


Messages Are Backlogged

The symptoms of this problem are as follows:

This section explores the following possible causes:

There are inactive durable subscriptions on a topic destination

If a durable subscription is inactive, messages are stored in a destination until the corresponding consumer becomes active and can consume the messages.

To confirm this cause of the problem

Check the state of durable subscriptions on each topic destination:

imqcmd list dur -d destName

To resolve the problem

You can take any of the following actions:

There are too few consumers available to consume messages in a queue

If there are too few active consumers to which messages can be delivered, a queue destination can become backlogged as messages accumulate. This condition can occur for any of the following reasons:

To confirm this cause of the problem

To help determine the reason for unavailable consumers, check the number of active consumers on a destination:

imqcmd metrics dst -n destName -t q -m con

To resolve the problem

You can take any of the following actions, depending on the reason for unavailable consumers:

Message consumers are processing too slowly to keep up with message producers

In this case topic subscribers or queue receivers are consuming messages more slowly than the producers are sending messages. One or more destinations is getting backlogged with messages due to this imbalance.

To confirm this cause of the problem

Check for the rate of flow of messages into and out of the broker:

imqcmd metrics bkr -m rts

Then check flow rates for each of the individual destinations:

imqcmd metrics bkr -t destType -n destName -m rts

To resolve the problem

Client acknowledgment processing is slowing down message consumption

Two factors affect the processing of client acknowledgments:

To confirm this cause of the problem

To resolve the problem

The broker cannot keep up with produced messages

In this case, messages are flowing into the broker faster than the broker can route and dispatch them to consumers. The sluggishness of the broker can be due to limitations in any or all of the following: CPU, network socket read/write operations, disk read/write operations, memory paging, the persistent store, or JVM memory limits.

To confirm this cause of the problem

Check that none of the other causes of this problem are responsible.

To resolve the problem

Client code defects: consumers are not acknowledging messages

Messages are held in a destination until they have been acknowledged by all consumers to which the messages have been sent. If a client is not acknowledging consumed messages, the messages accumulate in the destination without being deleted.

For example, client code might have the following defects:

To confirm this cause of the problem

First check all other possible causes listed in this section. Next, list the destination with the following command:

imqcmd list dst

Notice whether the number of messages listed under the UnAcked header is the same as the number of messages in the destination. The messages under the UnAcked header were sent to consumers but not acknowledged. If this number is the same as the total number of messages, the broker has sent all the messages and is waiting for acknowledgment.

To resolve the problem

Request the help of application developers in debugging this problem.


Message Server Throughput Is Sporadic

The symptom of this problem is as follows:

This section explores the following possible causes:

The broker is very low on memory resources

Because destination and broker limits were not properly set, the broker takes increasingly serious action to prevent memory overload, and this can cause the broker to become very sluggish until the message backlog is cleared.

To confirm this cause of the problem

Check the broker log for a low memory condition ([B1089]: In low memory condition, broker is attempting to free up resources), followed by an entry describing the new memory state and the amount of total memory being used.

Also check the free memory available in the JVM heap:

imqcmd metrics bkr -m cxn

Free memory is low when the value of total JVM memory is close to the maximum JVM memory value.

To resolve the problem

JVM memory reclamation (garbage collection) is taking place

Memory reclamation periodically sweeps through the system to free up memory. When this occurs, all threads are blocked. The larger the amount of memory to be freed up and the larger the JVM heap size, the larger the delay due to memory reclamation.

To confirm this cause of the problem

Monitor CPU usage on your computer. CPU usage drops when memory reclamation is taking place.

Also start your broker using the following command line options:

-vmargs -verbose:gc

Standard output indicates the time that memory reclamation takes place.

To resolve the problem

In multiple CPU computers, set the memory reclamation to take place in parallel:

-XX:+UseParallelGC=true

The JVM is using the Just-In-Time compiler to speed up performance

To confirm this cause of the problem

Check that none of the other causes of this problem are responsible.

To resolve the problem

Let the system run for a while; performance should improve.


Messages Are Not Reaching Consumers

The symptom of this problem is as follows:

This section explores the following possible causes:

Limit behaviors are causing messages to be deleted on the broker

When the number of messages or number of message bytes in destination memory reach configured limits, the broker attempts to conserve memory resources. Three of the configurable behaviors taken by the broker when these limits are reached will cause messages to be lost:

As the number of messages or number of message bytes in broker memory reach configured limits, the broker attempts to conserve memory resources by rejecting the newest messages.

To confirm this cause of the problem

Check the dead message queue, as described under The Dead Message Queue Contains Messages. Specifically, use the instructions under The number of messages, or their sizes, exceed destination limits. Look for the REMOVE_OLDEST or REMOVE_LOW_PRIORITY reason.

To resolve the problem

Increase the destination limits. For example:

imqcmd update dst -n MyDest -o maxNumMsgs=1000

Message time-out value is expiring

The broker deletes messages whose time-out value has expired. If a destination gets sufficiently backlogged with messages, messages whose time-to-live value is too short might be deleted.

To confirm this cause of the problem

Check the dead message queue to see whether messages are timing out.

Use the QBrowser demo application to look at the DMQ contents. The QBrowser demo is in an operating system-specific location; for the location, see Appendix A, "Operating System-Specific Locations of Message Queue Data" and look in the tables for “Example Applications and Locations.”

This is an example of invocation on Windows:

cd \MessageQueue3\demo\applications\qbrowser java QBrowser

When the QBrowser main window appears, select the queue name mq.sys.dmq and then click Browse. A list like the following appears.

Figure 12-1  QBrowser Window

QBrowser showing messages for mq.sys.dmq. For each message, there is a number, timestamp, type, mode, and priority.

Double click a message to display details about that message.

Figure 12-2  QBrowser Message Details

Message details window. Top pane shows message; middle pane shows its properties; bottom pane contains message.

Note whether the JMS_SUN_DMQ_UNDELIVERED_REASON property for messages has the value EXPIRED.

To resolve the problem

Contact the application developers and have them increase the time-to-live value.

Clocks are not synchronized

If clocks are not synchronized, broker calculations of message lifetimes can be wrong, causing messages to exceed their expiration times and be deleted.

To confirm this cause of the problem

In the broker log file, look for any of the following messages: B2102, B2103, B2104. These messages all report that possible clock skew was detected.

To resolve this problem

Check that you are running a time synchronization program, as described in Preparing System Resources.

Consuming client failed to start message delivery on a connection

Messages cannot be delivered until client code establishes a connection and starts message delivery on the connection.

To confirm this cause of the problem

Check that client code establishes a connection and starts message delivery.

To resolve the problem

Rewrite the client code to establish a connection and start message delivery.


The Dead Message Queue Contains Messages

The symptom of this problem is as follows:

This section explores the following possible causes:

The number of messages, or their sizes, exceed destination limits

To confirm this cause of the problem

Use the QBrowser demo application to look at the contents of the dead message queue. The QBrowser demo is in an operating system-specific location; for the location, see Appendix A, "Operating System-Specific Locations of Message Queue Data" and look in the tables for “Example Applications and Locations.”

This is an example of invocation on Windows:

cd \MessageQueue3\demo\applications\qbrowser java QBrowser

When the QBrowser main window appears, select the queue name mq.sys.dmq and then click Browse. A list like the one shown in Figure 12-1 appears.

Double click any message to display details about that message. The window shown in Figure 12-2 appears.

Note the values for the following message properties:

Under JMS Headers, note the value for JMSDestination to determine the destination whose messages are becoming dead.

To resolve this problem

Increase the destination limits. For example:

imqcmd update dst -n MyDest -o maxNumMsgs=1000

The broker clock and producer clock are not synchronized

To confirm this cause of the problem:

Using the QBrowser application, view the message details for messages in the dead message queue. Check the value for JMS_SUN_DMQ_UNDELIVERED_REASON, looking for messages with the reason EXPIRED.

In the broker log file, look for any of the following messages: B2102, B2103, B2104. These messages all report that possible clock skew was detected.

To resolve this problem

Check that you are running a time synchronization program, as described in Preparing System Resources.

Consumers are not receiving the messages before messages time out

To verify this cause of the problem

Using the QBrowser application, view the message details for messages in the dead message queue. Check the value for JMS_SUN_DMQ_UNDELIVERED_REASON, looking for messages with the reason EXPIRED.

Check to see whether there any consumers on the destination. For example:

imqcmd query dst -t q -n MyDest

Check the value listed for Current Number of Active Consumers. If there are active consumers, one of the following is true:

To resolve the problem

Request that application developers increase message time-to-live values.

There are too many producers for the number of consumers

To confirm this cause of the problem

Using the QBrowser application, view the message details for messages in the dead message queue. Check the value for JMS_SUN_DMQ_UNDELIVERED_REASON.

If the reason is REMOVE_OLDEST or REMOVE_LOW_PRIORITY, use the imqcmd query dst command to check the number of producers and consumers on the destination. If the number of producers exceeds the number of consumers, production rate might be overwhelming consumption rate.

To resolve the problem

Add more consumer clients or set the destination to use the FLOW_CONTROL limit behavior. The FLOW_CONTROL limit behavior uses consumption rate to control production rate.

Start the flow control behavior by using a command such as the following example:

imqcmd update dst -n myDst -t q -o consumerFlowLimit=FLOW_CONTROL

Producers are faster than consumers

To confirm this cause of the problem

To determine whether slow consumers are causing producers to slow down, set the destination limit behavior to FLOW_CONTROL. The FLOW_CONTROL limit behavior uses consumption rate to control production rate.

Start the flow control behavior by using a command such as the following example:

imqcmd update dst -n myDst -t q -o consumerFlowLimit=FLOW_CONTROL

Use metrics to examine the destination input and output, by issuing a command like the following example:

imqcmd metrics dst -n myDst -t q -m rts

In the metrics output, examine the following values:

Because flow control aligns production to consumption, note whether production slows or stops. If the rate slows or stops, there is a discrepancy between the processing speed of producers and consumers.

You can also check the number of unacknowledged (UnAcked) sent messages, by using the imqcmd list dst command. If the number of unacknowledged messages is less than the size of the destination. the destination has additional capacity and is being held back by client flow control.

To resolve the problem

If production rate is consistently faster than consumption rate, consider using flow control regularly, to keep the system aligned.

In addition, using the subsequent sections, consider and attempt to resolve each of the following possible factors:

A consumer is too slow

To confirm this cause of the problem

Use metrics to determine the rate of production and consumption, as described under Producers are faster than consumers.

To resolve the problem

Try one or more of the following:

Clients are not committing messages

To confirm this cause of the problem

Check with application developers to find out whether the application uses transactions. If the application uses transactions, list the active transactions as follows:

imqcmd list txn

This is an example of the command output:

----------------------------------------------------------------------

Transaction ID       State    User name  # Msgs/# Acks Creation time

----------------------------------------------------------------------

6800151593984248832  STARTED  guest       3/2         7/19/04 11:03:08 AM

Note the numbers of messages and number of acknowledgments.

If the number of messages is high, producers may be sending individual messages but failing to commit transactions. Until the broker receives a commit, it cannot route and deliver the messages for that transaction.

If the number of acknowledgments is high, consumers may be sending acknowledgments for individual messages but failing to commit transactions. Until the broker receives a commit, it cannot remove the acknowledgments for that transaction.

To resolve this problem

Contact application developers to fix the coding error.

Consumers are failing to acknowledge messages

To confirm this cause of the problem

Contact application developers to determine whether the application uses system-based acknowledgment or client-based acknowledgment. If the application uses system-based acknowledgment, skip this section.

If the application uses client-based acknowledgment (the CLIENT_ACKNOWLEDGE type), first decrease the number of messages stored on the client. Use a command like the following:

imqcmd update dst -n myDst -t q -o consumerFlowLimit=1

Next, you will determine whether the broker is buffering messages because a consumer is slow, or whether the consumer processes messages quickly but does not acknowledge them.

List the destination, using the following command:

imqcmd list dst

After you supply a user name and password, output like the following appears:

Listing all the destinations on the broker specified by:

---------------------------------

Host Primary Port

---------------------------------

localhost 7676

----------------------------------------------------------------------

Name Type State Producers Consumers Msgs

                                         Total Count UnAck Avg Size

-----------------------------------------------------------------------

MyDest      Queue   RUNNING   0          0        5    200    1177.0

mq.sys.dmq  Queue   RUNNING   0          0       35      0    1422.0

Successfully listed destinations.

The UnAck number represents messages that the broker has sent and for which it is waiting for acknowledgment. If the UnAck number is high or increasing, you know that the broker is sending messages, so it is not waiting for a slow consumer. You also know that the consumer is not acknowledging the messages.

To resolve the problem

Contact application developers to fix the coding error.

Durable consumers are inactive

To confirm this cause of the problem

Look at the topic’s durable subscribers, using the following command format:

imqcmd list dur -d topicName

To resolve the problem

An unexpected broker error occurred

To confirm this cause of the problem

Use QBrowser to examine a message, as described under Producers are faster than consumers.

If the value for JMS_SUN_DMQ_UNDELIVERED_REASON is ERROR, a broker error occurred.

To resolve the problem



Previous      Contents      Index      Next     


Part No: 819-0066-10.   Copyright 2005 Sun Microsystems, Inc. All rights reserved.