Troubleshooting a Crash

This section describe how to begin troubleshooting a crashed Directory Server process. It describes possible causes of a crash, what pieces of information you need to collect to help identify the problem, and how to analyze the information you collect.

Possible Causes of a Crash

A crash could be caused by one or more of the following:

Buffer overflows
Out of resources, such as memory, disk, or file descriptors
Memory allocation problems, such as double frees or free unallocated memory
NULL de-referencing
Other programmatic errors

If a Directory Server process crashes, you need to open a service request with the Sun Support Center.

Collecting Data About a Crash

This section describes the data you need to collect when the server crashes. The most critical data to collect is the core file.

Note - If you contact the Sun Support Center about a crashed Directory Server process, you must provide a core file and logs.

Generating a Core File

Core file and crash dumps are generated when a process or application terminates abnormally. You must configure your system to allow Directory Server to generate a core file if the server crashes. The core file contains a snapshot of the Directory Server process at the time of the crash, and can be indispensable in determining what led to the crash. Core files are written to the same directory as the errors logs, by default, instance-path/logs/. Core files can be quite large, as they include the entry cache.

If a core file was not generated automatically, you can configure your operating system to allow core dumping by using the commands described in the following table and then waiting for the next crash to retrieve the data.

Solaris	`coreadm` and ulimit -c unlimited ulimit -H -c unlimited
Linux	ulimit -c unlimited ulimit -H -c unlimited
HPUX/AIX	`ulimit -c`
Windows	`Windows crashdump`

For example, on Solaris OS, you enable applications to generate core files using the following command:

# coreadm -g /path-to-file/%f.%n.%p.core -e global -e process \
 -e global-setid -e proc-setid -e log

The path-to-file specifies the full path to the core file you want to generate. The file will be named using the executable file name (%f), the system node name (%n), and the process ID (%p).

If after enabling core file generation your system still does not create a core file, you may need to change the file-size writing limits set by your operating system. Use the ulimit command to change the maximum core file size and maximum stack segment size as follows:

# ulimit -c unlimited 
# ulimit -s unlimited

Check that the limits are set correctly using the -a option as follows:

# ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        unlimited
coredump(blocks)     unlimited
nofiles(descriptors) 256
vmemory(kbytes)      unlimited

For information about configuring core file generate on Red Hat Linux and Windows, see the respective operating system documentation.

Next, verify that applications can generate core files using the kill -11 process-id command. The cores should be generated in either the specified directory or in the default instance-name/logs directory.

# cd /var/cores
# sleep 100000 &
[1] process-id
# kill -11 process-id
# ls

Getting the Core and Shared Libraries

Get all the libraries and binaries associated with the slapd process for core file analysis. Collect the libraries using the pkgapp script . The pkgapp script packages an executable and all of its shared libraries into one compressed tar file. You provide the process ID of the application and, optionally, the name of the core file to be opened. For more information about the pkgapp script see Using the pkgapp Script on Solaris.

As superuser, run the pkgapp script as follows:

# pkgapp server-pid core-file

Note - You can also run the pkgapp script without a core file. This reduces the size of the script's output. You need to later set the variable to the correct location of the core file.

Additional Information

To look at the log files created at the time the problem occurred, check the following files:

# instance-name/logs/errors*
# instance-name/logs/access*

If the crash is related to the operating system running out of disk or memory, retrieve the system logs. For example, on Solaris OS check the /var/adm/messages file and the /var/log/syslogs file for hardware or memory failures.

To get complete version output, use the following commands:

# dsadm -V

Analyzing Crash Data

Whenever the Directory Server crashes, it generates a core. With this core file and the process stack of the core file you obtained from the ns-slapd binary directory, you can analyze the problem.

This section describes how to analyze the core file crash data on a Solaris OS.

Examining a Core File on Solaris

Once you have obtained a core file, run the pstack and pmap Solaris utilities on the file. The pmap utility shows the process map, which includes a list of virtual addresses, where the dynamic libraries are loaded, and where the variables are declared. The pstack utility shows the process stack. For each thread in the process, it describes the exact stack of instruction the thread was executing at the moment when the process died or when the pstack command was executed.

# pstack core-file

# pmap core-file

If the results of the pstack utility are almost empty, all of the lines in the output look as follows:

0002c3cc ???????? (1354ea0, f3400, 1354ea0, 868, 2fc, 1353ff8)

In this case, make sure to run pstack on the machine where the core file was generated.

You can also use the mdb command instead of the pstack command to know the stack of the core. Run the mdb command as follows:

# mdb $path-to-executable $path-to-core
$C to show the core stack
$q to quit

The output of the mdb and the pstack commands provide helpful information about the process stack at the time of the crash. The mdb $C command output provides the exact thread that caused the crash.

On Solaris 9, the first thread of the pstack output often contains the thread responsible for the crash. On Solaris 10, use mdb to find the crashing thread or, if using the pstack command, analyze the stack by looking for threads that do not contain lwp-park, poll, and pollsys.

For example, the following core process stack occurs during the call of a plug-in function:

core '/local/dsInst/logs/core' of 18301:    ./ns-slapd \
-D /local/dsInst -i /local/dsInst
-----------------  lwp# 13 / thread# 25  --------------------
 ff2b3148 strlen   (0, fde599fb, 0, fbed1, 706d2d75, fde488a8) + 1c
 ff307ef8 sprintf  (7fffffff, fde488a0, fde599d8, fde599ec, 706d2d75, fde599fc) \
+ 3c
 fde47cf8 ???????? (1354ea0, 850338, fde59260, e50243, 923098, 302e3800) + f8
 fde429cc ???????? (1354ea0, 3, 440298, 154290, 345c10, 154290) + 614
 ff164018 plugin_call_exop_plugins (1354ea0, 8462a0, d0c, ff1e7c70, ff202a94, \
1353ff8) + d0
 0002c3cc ???????? (1354ea0, f3400, 1354ea0, 868, 2fc, 1353ff8)
 00025e08 ???????? (0, 1353ff8, fdd02a68, f3400, f3000, fbc00)
 fef47d18 _pt_root (362298, fe003d10, 0, 5, 1, fe401000) + a4
 fed5b728 _thread_start (362298, 0, 0, 0, 0, 0) + 40

When analyzing process stacks from cores, concentrate on the operations in the middle of the thread. Processes at the bottom are too general and processes at the top are too specific. The commands in the middle of the thread are specific to the Directory Server and can thus help you identify at which point during processing the operation failed. In the above example, we see the plugin_call_exop_plugins process call indicates a problem calling an external operation in the custom plug-in.

If the problem is related to the Directory Server, you can use the function call that seems like the most likely cause of the problem to search on SunSolve for known problems associated with this function call. SunSolve is located at http://sunsolve.sun.com/.

If you do locate a problem related to the one you are experiencing, confirm that it applies to the version of Directory Server that you are running. To get information about the version you are running, use the following command:

# dsadm -V

If after doing a basic analysis of your core files you cannot identify the problem, collect the binaries and libraries using the pkgapp script and contact the Sun Support Center.

Skip Navigation Links
Exit Print View
	Oracle Directory Server Enterprise Edition Troubleshooting Guide 11g Release 1 (11.1.1.5.0)