14 Troubleshooting Common Problems

Learn the guidelines to prevent Oracle WebLogic Server cluster problems or troubleshoot them if they occurs.

For more information about troubleshooting IP multicast configuration problems, see Troubleshooting Multicast Configuration.

This chapter includes the following sections:

Before You Start the Cluster

Do the following checks before you start the cluster to prevent the problems.

Check the Server Version Numbers

During steady-state operation, all servers in the cluster should be at the same maintenance level, such as:
  • Same major and minor version number
  • Same Patch Set number
  • Same Patch Set Update number
  • Same Interim/One-off Patches
Rolling upgrade (applying maintenance to servers sequentially within a cluster) is supported in WebLogic Server for applying the following:
  • Interim/One-off Patches
  • Patch Set Updates (PSUs)
  • WebLogic Server 10.3.x Patch Sets. For example, performing a rolling upgrade from WebLogic Server 10.3.5 to 10.3.6.

The cluster's Administration Server is typically not configured as a cluster member, but it should generally run at the same maintenance level as the Managed Servers. There may be situations where the Administration Server manages multiple clusters within a single domain, which may be at different maintenance levels. In this case, the Administration Server should be at the highest maintenance level of the Managed Servers within the domain.

Check the Multicast Address

A problem with the multicast address is one of the most common reasons a cluster does not start or a server fails to join a cluster.

A multicast address is required for each cluster. The multicast address can be an IP number between 224.0.0.0 and 239.255.255.255, or a host name with an IP address within that range.

On the Configuration > Multicast page in the WebLogic Server Administration Console, check the cluster's multicast address and port.

For each cluster on a network, the combination of multicast address and port must be unique. If two clusters on a network use the same multicast address, they should use different ports. If the clusters use different multicast addresses, they can use the same port or accept the default port, 7001.

Before startting the cluster, make sure the cluster's multicast address and port are correct and do not conflict with the multicast address and port of any other clusters on the network.

The errors you are most likely to see if the multicast address is bad are:

Unable to create a multicast socket for clustering Multicast socket send error Multicast socket receive error

Check the CLASSPATH Value

Make sure the value of CLASSPATH is the same in all Managed Servers in the cluster. CLASSPATH is set by the setEnv script, which you run before the startManagedWebLogic script to start the Managed Servers.

By default, setEnv sets this value for CLASSPATH (as represented on Windows systems):

set WL_HOME=C:\bea\wlserver_10.00
set JAVA_HOME=C:\bea\jdk131
.
.
set CLASSPATH=%JAVA_HOME%\lib\tools.jar;
	%WL_HOME%\server\lib\weblogic_sp.jar;
	%WL_HOME%\server\lib\weblogic.jar;
	%CLASSPATH%

If you change the value of CLASSPATH in one Managed Server, or change how setEnv sets CLASSPATH, you must change it in all Managed Servers in the cluster.

After You Start the Cluster

After you start a cluster, you can troubleshoot problems using the following methods.

Check Your Commands

If the cluster fails to start, or a server fails to join the cluster, the first step is to check any commands you have entered, such as startManagedWebLogic or a java interpreter command, for errors and misspellings.

Generate a Log File

Before contacting Oracle for help with cluster-related problems, collect diagnostic information. The most important information is a log file with multiple thread dumps from a Managed Server. The log file is especially important for diagnosing cluster freezes and deadlocks.

Note:

A log file that contains multiple thread dumps is a prerequisite for diagnosing your problem.
  1. Stop the server.
  2. Remove or back up any log files you currently have. You should create a new log file each time you boot a server, rather than appending to an existing log file.
  3. Start the server with this command, which turns on verbose garbage collection and redirects both the standard error and standard output to a log file:
    % java -ms64m -mx64m -verbose:gc -classpath $CLASSPATH
    	-Dweblogic.domain=mydomain -Dweblogic.Name=clusterServer1
    	-Djava.security.policy==$WL_HOME/lib/weblogic.policy
    	-Dweblogic.admin.host=192.168.0.101:7001
    	 weblogic.Server >> logfile.txt
    

    Redirecting both standard error and standard output places thread dump information in the proper context with server informational and error messages and provides a more useful log.

  4. Continue to run the cluster until you reproduce the problem.
  5. If a server hangs, use kill -3 or <Ctrl>-<Break> to create the necessary thread dumps to diagnose your problem. Make sure to do this several times on each server, spaced about 5-10 seconds apart, to help diagnose deadlocks.
  6. Compress the log file using the below UNIX utility or zip it using a Windows utility.
    % tar czf logfile.tar logfile.txt
    
  7. Attach the compressed log file to an e-mail to your Oracle Support representative. Do not cut and paste the log file into the body of an e-mail.
Getting an Oracle HotSpot VM Thread Dump

If you use the Oracle HotSpot VM, use one of the following methods to generate a thread dump:

  • Use the WLST threadDUMP command.

  • Use the jstack utility.

  • If you are using the Oracle HotSpot VM in Linux, use Kill -3 PID, where PID is the root of the process tree.

    To obtain the root PID, run the below command.

    ps -efHl | grep 'java' **. ** 
    

    Using a grep argument that is a string will be found in the process stack that matches the server startup command. The first PID reported will be the root process, assuming that the ps command has not been piped to another routine.

    In Linux, each execute thread appears as a separate process under the Linux process stack. To use Kill -3 on Linux, your supply must match the PID of the main WebLogic execute thread. Otherwise, no thread dump will be produced.

  • If you are using the Oracle HotSpot VM in Windows, you can use the Ctrl-Break command on the application console to generate a thread dump.

Check Garbage Collection

If you are experiencing cluster problems, you should also check the garbage collection on the Managed Servers. If garbage collection is taking too long, the servers will not be able to make the frequent heartbeat signals that tell the other cluster members they are running and available.

If garbage collection (either first or second generation) is taking 10 or more seconds, you need to tune heap allocation (the msmx parameter) on your system.

Run utils.MulticastTest

You can verify that multicast is working by running utils.MulticastTest from one of the Managed Servers. For more information, see Using the Oracle WebLogic Server Java Utilities in Command Reference for Oracle WebLogic Server.