A process hang is defined as one of the Calendar Server processes not responding to requests anymore while the process is still running locally. Calendar Server's six specific processes are:
enpd—Collects and dispatches events that occur to properties of resources (calendars).
csadmind—Provides a single point of authentication for administering Calendar Server. This service also manages alarm notifications and group scheduling requests.
csnotifyd—Sends notifications of events and tasks and subscribes to alarm events. When an alarm event occurs, this service sends SMTP message reminders to recipients.
cshttpd—Listens for and receives HTTP commands from Calendar Server end users and returns calendar data.
csdwpd—Distributes calendar databases across multiple back-end servers.
csstored—When configured properly, creates automatic backups of the calendar database.
Calendar Server processes usually hang because of an orphan lock left in one of the databases. Stopping the server (especially the csstored process), and cleaning the temporary shared database files helps to resolve the problem. This task is described in Step 13, at the end of the following procedure.
If you are sure that the hung process is a fleeting and insignificant issue, and you do not need help from the Sun Support Center, you can go to Step 13 now. Otherwise, do not stop and restart Calendar Server until you have gathered the data requested in the following procedure. Stopping and restarting the server destroys all the debug data related to the hung process.
Make sure that you collect all the data over the same time frame in which the problem occurs. See 1.6 Configuring Solaris OS to Generate Core Files if a core file is not generated.
On Solaris systems, you can easily gather the required data by running the cscapture command in Invasive mode.
For more information about running cscapture and its Invasive operation, see Using Calendar Server Capture (cscapture) to Collect Debug Data for Sun Java System Calendar Server.
The cscapture command gathers netstat, ps, swap, gcore, pkg_app, and other data. After you run cscapture on a hung or unresponsive process, proceed to Step 10, below, which describes how to restart the calendar services.
For all other platforms, collect the following information for process hang problems. Run the commands in order when the problem occurs. Be sure to specify the time when the process hang happened and affected processes, if possible.
Collect the general system information as explained in To Collect Required Debug Data for Any Calendar Server Problem.
Specify the time the hang occurred and, if possible, the process that was hung.
Run the netstat command and save the output.
netstat -an | grep calendar-service-port
Run the following commands and save the output.
ps -ef | grep cal-svr-basevmstat 5 5iostat -xtopsar
ps -aux | grep cal-svr-basevmstat 5 5topuptimesar
Obtain the CALENDAR process PID: C:\windbg-root>tlist.exe
Obtain process details of the CALENDAR running process PID: C:\windbg-root>tlist.exe calendar-pid
To use the preceding commands on Windows systems, you need to install the debugging tools, available from the following url:
Install the latest version of the debugging tools and OS symbols for your version of Windows.
You also must add the environment variable "_NT_SYMBOL_PATH".
Get the swap information.
Already provided in C:\report.txt as described in To Collect Required Debug Data for Any Calendar Server Problem.
Get the Calendar Server process log files.
The logfile.logdir parameter in the ics.conf file specifies the locations of these log files.
On Solaris systems, the default value of the path is /var/opt/SUNWics5/logs.
Each process uses its own log file, as shown in the following list:
ics.conf setting: logfile.admin.logname
Default value: admin.log
ics.conf setting: logfile.http.logname
Default value: http.log
ics.conf setting: logfile.dwp.logname
Default value: dwp.log
ics.conf setting: logifle.notify.logname
Default value: notify.log
ics.conf setting: logifle.store.logname
Default value: store.log
ics.conf: No setting; file is always called watcher.log.
Default value: watcher.log
The enpd process does not have a log.
Look for any core file that could have been dumped by one of the Calendar Server processes. If you find one, see To Collect Debug Data on a Calendar Server Crashed Process.
Get the output of the following command.
tusc -v -fealT -rall -wall -o /tmp/calendar-process-name-calendar-pid.tusc.out -p calendar-pid
strace -fv -o /tmp/calendar-process-name-calendar-pid.strace.out -p calendar-pid
Use DebugView: http://www.sysinternals.com/Utilities/DebugView.html
Wait one minute after launching the appropriate command (truss, strace, tusc, or DebugView) then stop it by pressing Control-C in the terminal where you launched the command.
Get core files and the output of the following commands.
In a process hang situation, it is helpful to compare several core files to review the state of the threads over time. To not overwrite a core file, copy that core file to a new name, wait approximately one minute, then rerun the following commands. Do this three times to obtain three core files.
For HP-UX, you need the following two patches to use the gcore command: PHKL_31876 and PHCO_32173. If you cannot install these patch, use the HP-UX /opt/langtools/bin/gdb command from version 3.2 and later, or the dumpcore command.
# cd cal-svr-base/cal/bin # gcore -p calendar-pid (gdb) attach calendar-pid Attaching to process calendar-pid No executable file name was specified (gdb) dumpcore Dumping core to the core file core.calendar-pid (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: , process calendar-pid
# cd cal-svr-base/cal/bin # gdb (gdb) attach calendar-pid Attaching to process calendar-pid No executable file name was specified (gdb) gcore Saved corefile core.calendar-pid (gdb)backtrace (gdb)quit
Get the CALENDAR process PID:
Generate a crash dump on the CALENDAR running process PID:
C:\windbg-root>adplus.vbs -hang -p calendar-pid -o C:\crashdump_dir
For Windows, provide the complete generated folder under C:\crashdump_dir.
When you have collected all debug data, perform the following steps to restore the service.
Stop Calendar Server.
Make sure that all Calendar Server processes stopped.
Wait one minute, then kill any remaining processes.
Clean the temporary shared database files.
cd caldb.berkeleydb.homedir.path; rm __db.00*
where caldb.berkeleydb.homedir.path is the path of the database, which is specified in the caldb.berkeleydb.homedir.path parameter in the ics.conf file.
Restart Calendar Server.
After restarting the services, check the logs for any unexpected behavior.