Sun Directory Server Enterprise Edition 7.0 Troubleshooting Guide

Troubleshooting a Hung or Unresponsive Directory Proxy Server Process

This section describes how to troubleshoot a unresponsive or hung Directory Proxy Server process. A totally unresponsive process is called a hang. The remainder of this section describes how to collect and analyze data about a hang.

Collecting Data About a Directory Proxy Server 7.0 Hang on Solaris

The jstat tool tells you the amount of CPU being used for each thread. If you collect a thread stack using the jstack utility at the same time you run the jstat tool, you can then use the jstack output to see what the thread was doing when it had trouble. If you run the jstack and jstat tools simultaneously several times, you can see over time if the same thread was causing the problem and if it was encountering the problem during the same function call.

To get the process ID of the running Directory Proxy Server, use the jps command. For example, the command is run as follows on Solaris:


# jps
8393 DistributionServerMain
2115 ContainerPrivate
21535 startup.jar
16672 Jps
13953 swupna.jar

The following script automates the process of running these tools:


cat scpTools
#!/bin/sh  

i=0 
while [ "$i" -lt "10" ] 
do
         echo "$i\n"
         date=`date "+%y%m%d:%H%M%S"`
         prstat -L -p $1 0 1 > /tmp/prstat.$date
         pstack $1 > /tmp/pstack.$date
         i=`expr $i + 1`;
         sleep 1 
done  

The value 10 in the [ "$i" -lt "10" ] line can be increased or decreased to suit the time during which the problem you are troubleshooting occurs. This adjustment allows to you collect a full set of process data to help troubleshoot the issue. Thus enabling a full process data set to be captured around the issue.

Collect usage information as follows:


# ./scpTools DPS-PID

The DPS-PID field specifies the PID of the unresponsive process. The Directory Proxy Server PID contains the line DistributionServerMain.

On Solaris and other UNIX platforms, show system calls that occur during the crash using the truss command as follows:


truss -o /tmp/trace.txt -ealf -rall -wall -vall -p 21362

The value 21362 corresponds to the PID of the unresponsive process.