Oracle Fusion Middleware Troubleshooting Guide for Oracle Directory Server Enterprise Edition

Analyzing Data About a Unresponsive Process: an Example

For example, you try running an ldapsearch on your Directory Server as follows:


# ldapsearch -p 5389 -D "cn=Directory Manager" -w secret 
-b "o=test" description=*

Suppose, this command runs for 40 seconds and does not give any results. To analyze why the process in unresponsive, first get the process ID using the following command:


# ps -aef | grep slapd | grep slapd-server1
   mares 15013 24159  0 13:06:20 pts/32   0:00 grep slapd-server1
   mares 14993     1  1 13:05:36 ?        0:04 ./ns-slapd -D
/local/dsInst -i /local/dsInst

Next, rerun the search and during the search run the prstat and pstack commands simultaneously for the Directory Server process, which in the output above has a process ID of 14993.


prstat -L -p 14993 0 1 > prstat.output ; pstack 14993 > pstack.output

We rerun the commands three times, with an interval of two seconds between each consecutive run.

The output of the first prstat command appears as follows:


   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/LWPID      
 14993 mares     128M  110M cpu0    59    0   0:00.02 3.0% ns-slapd/51
 14993 mares     128M  110M sleep   59    0   0:00.49 1.3% ns-slapd/32
 14993 mares     128M  110M sleep   59    0   0:00.00 0.0% ns-slapd/16
 14993 mares     128M  110M sleep   59    0   0:00.00 0.0% ns-slapd/15
 14993 mares     128M  110M sleep   59    0   0:00.00 0.0% ns-slapd/14
 14993 mares     128M  110M sleep   59    0   0:00.00 0.0% ns-slapd/13
 14993 mares     128M  110M sleep   59    0   0:00.00 0.0% ns-slapd/12
 14993 mares     128M  110M sleep   59    0   0:00.00 0.0% ns-slapd/11
 14993 mares     128M  110M sleep   59    0   0:00.00 0.0% ns-slapd/10
 14993 mares     128M  110M sleep   59    0   0:00.00 0.0% ns-slapd/9
 14993 mares     128M  110M sleep   59    0   0:00.00 0.0% ns-slapd/8
 14993 mares     128M  110M sleep   59    0   0:00.00 0.0% ns-slapd/6
 14993 mares     128M  110M sleep   59    0   0:00.00 0.0% ns-slapd/5
 14993 mares     128M  110M sleep   59    0   0:00.00 0.0% ns-slapd/4
 14993 mares     128M  110M sleep   59    0   0:00.00 0.0% ns-slapd/3
Total: 1 processes, 51 lwps, load averages: 0.36, 0.29, 0.17

The problem appears to be occurring in thread 51. Next, we look for thread 51 in the output of the first pstack command and it appears as follows:


-----------------  lwp# 51 / thread# 51  --------------------
 ffffffff7eb55a78 ???????? (1, 102183a10, ffffffff70c1d340, 1001c5390, 0, 
ffffffff7ecea248)
 ffffffff77925fe0 id2entry (1002b7610, 1a09, 0, ffffffff70c1e7f4, 0, ffffffff77a6faa8) 
+ 3e8
 ffffffff7795ed20 ldbm_back_next_search_entry_ext (101cfcb90, 10190fd60, 0, 101b877b0, 
1a08, 45b4aa34) + 300
 ffffffff7ebaf6f8 ???????? (101cfcb90, 1002b7610, 1, ffffffff70c1eaf4, 0, 0)
 ffffffff7ebafbc4 ???????? (101cfcb90, 1, ffffffff70c1eaf4, 0, 10190fd60, 
ffffffff70c1e980)
 ffffffff7ebaf170 op_shared_search (101cfcb90, 0, 1015ad240, 0, ffffffffffffffff, 
ffffffff7ecea248) + 8c0
 ffffffff7e92efcc search_core_pb (101cfcb90, 2, 1000, 4000, ffffffff7ea4c810, 
ffffffff7ea56088) + 6c4
 ffffffff7e93a710 dispatch_operation_core_pb (101cfcb90, 101cfcb90, c00, 
ffffffff7ea4c810, 0, d10) + cc
 ffffffff7e926420 ???????? (101f3fe80, 102fd3250, 2, 63, 2, 200000)
 ffffffff7e92672c ldap_frontend_main_using_core_api (101f3fe80, 102fd3250, 2, 
101da1218, 10133db10, 0) + fc
 ffffffff7e927764 ???????? (220, 101c97310, ffffffffffffffff, 800, 958, 101f3fe80)
 ffffffff7d036a7c _pt_root (101c97310, ffffffff70b00000, 0, 0, 20000, ffffffff70c1ff48)
 + d4
 ffffffff7c1173bc _lwp_start (0, 0, 0, 0, 0, 0)

Note –

The ends of the lines in this example have been wrapped so that they fit on the page.


The output of the second and third pstack command show the same results, with thread 51 doing the same types of operation.

All three pstack outputs taken at two second intervals show thread 51 doing the same search operations. The first parameter of the op_shared_search function contains the address of the operations taking place, which is 101cfcb90. The same operation occurs in each of the three stacks, meaning that the same search is taking place during the four seconds that elapsed between the first and the last pstack run. Moreover, the prstat output always shows thread 51 as the thread taking the highest amount of CPU.

If you check the access log for the result of the search operations at the time the hang was observed, we find that it is a result of the search on the unindexed description entry. By creating a description index, this hang will be avoided.