Sun Gathering Debug Data for Sun Java System Calendar Server

ProcedureTo Collect Debug Data on a Hung or Unresponsive Calendar Server Process

A process hang is defined as one of the Calendar Server processes not responding to requests anymore while the process is still running locally. Calendar Server's six specific processes are:


Caution – Caution –

Calendar Server processes usually hang because of an orphan lock left in one of the databases. Stopping the server (especially the csstored process), and cleaning the temporary shared database files helps to resolve the problem. This task is described in Step 13, at the end of the following procedure.

If you are sure that the hung process is a fleeting and insignificant issue, and you do not need help from the Sun Support Center, you can go to Step 13 now. Otherwise, do not stop and restart Calendar Server until you have gathered the data requested in the following procedure. Stopping and restarting the server destroys all the debug data related to the hung process.


Before You Begin

Make sure that you collect all the data over the same time frame in which the problem occurs. See 1.6 Configuring Solaris OS to Generate Core Files if a core file is not generated.

On Solaris systems, you can easily gather the required data by running the cscapture command in Invasive mode.

./cscapture -i

For more information about running cscapture and its Invasive operation, see Using Calendar Server Capture (cscapture) to Collect Debug Data for Sun Java System Calendar Server.

The cscapture command gathers netstat, ps, swap, gcore, pkg_app, and other data. After you run cscapture on a hung or unresponsive process, proceed to Step 10, below, which describes how to restart the calendar services.

For all other platforms, collect the following information for process hang problems. Run the commands in order when the problem occurs. Be sure to specify the time when the process hang happened and affected processes, if possible.

  1. Collect the general system information as explained in To Collect Required Debug Data for Any Calendar Server Problem.

  2. Specify the time the hang occurred and, if possible, the process that was hung.

  3. Run the netstat command and save the output.

    HP-UX and Linux

    netstat -an | grep calendar-service-port

    Windows

    netstat -an

  4. Run the following commands and save the output.

    HP-UX

    ps -ef | grep cal-svr-basevmstat 5 5iostat -xtopsar

    Linux

    ps -aux | grep cal-svr-basevmstat 5 5topuptimesar

    Windows

    Obtain the CALENDAR process PID: C:\windbg-root>tlist.exe

    Obtain process details of the CALENDAR running process PID: C:\windbg-root>tlist.exe calendar-pid


    Note –

    To use the preceding commands on Windows systems, you need to install the debugging tools, available from the following url:

    http://www.microsoft.com/whdc/devtools/debugging/default.mspx

    Install the latest version of the debugging tools and OS symbols for your version of Windows.

    You also must add the environment variable "_NT_SYMBOL_PATH".


  5. Get the swap information.

    HP-UX

    swapinfo

    Linux

    free

    Windows

    Already provided in C:\report.txt as described in To Collect Required Debug Data for Any Calendar Server Problem.

  6. Get the Calendar Server process log files.

    The logfile.logdir parameter in the ics.conf file specifies the locations of these log files.

    On Solaris systems, the default value of the path is /var/opt/SUNWics5/logs.

    Each process uses its own log file, as shown in the following list:

    csadmind

    ics.conf setting: logfile.admin.logname

    Default value: admin.log

    cshttpd

    ics.conf setting: logfile.http.logname

    Default value: http.log

    csdwpd

    ics.conf setting: logfile.dwp.logname

    Default value: dwp.log

    csnotifyd

    ics.conf setting: logifle.notify.logname

    Default value: notify.log

    csstored

    ics.conf setting: logifle.store.logname

    Default value: store.log

    watcher (Communications Suite 5 release only)

    ics.conf: No setting; file is always called watcher.log.

    Default value: watcher.log

    The enpd process does not have a log.

  7. Look for any core file that could have been dumped by one of the Calendar Server processes. If you find one, see To Collect Debug Data on a Calendar Server Crashed Process.

  8. Get the output of the following command.

    HP-UX

    tusc -v -fealT -rall -wall -o /tmp/calendar-process-name-calendar-pid.tusc.out -p calendar-pid

    Linux

    strace -fv -o /tmp/calendar-process-name-calendar-pid.strace.out -p calendar-pid

    Windows

    Use DebugView: http://www.sysinternals.com/Utilities/DebugView.html


    Note –

    Wait one minute after launching the appropriate command (truss, strace, tusc, or DebugView) then stop it by pressing Control-C in the terminal where you launched the command.


  9. Get core files and the output of the following commands.

    In a process hang situation, it is helpful to compare several core files to review the state of the threads over time. To not overwrite a core file, copy that core file to a new name, wait approximately one minute, then rerun the following commands. Do this three times to obtain three core files.


    Note –

    For HP-UX, you need the following two patches to use the gcore command: PHKL_31876 and PHCO_32173. If you cannot install these patch, use the HP-UX /opt/langtools/bin/gdb command from version 3.2 and later, or the dumpcore command.


    HP-UX

    # cd cal-svr-base/cal/bin 
    # gcore -p calendar-pid
    (gdb) attach calendar-pid
    Attaching to process calendar-pid
    No executable file name was specified
    (gdb) dumpcore
    Dumping core to the core file core.calendar-pid
    (gdb) quit
    The program is running. Quit anyway (and detach it)? (y or n) y
    Detaching from program: , process calendar-pid
    
    Linux

    # cd cal-svr-base/cal/bin
    # gdb
    (gdb) attach calendar-pid
    Attaching to process calendar-pid
    No executable file name was specified
    (gdb) gcore
    Saved corefile core.calendar-pid
    
    (gdb)backtrace
    (gdb)quit
    
    Windows

    Get the CALENDAR process PID:

    C:\windbg-root>tlist.exe

    Generate a crash dump on the CALENDAR running process PID:

    C:\windbg-root>adplus.vbs -hang -p calendar-pid -o C:\crashdump_dir


    Note –

    For Windows, provide the complete generated folder under C:\crashdump_dir.


  10. When you have collected all debug data, perform the following steps to restore the service.

    1. Stop Calendar Server.

      cd cal-svr-base/cal/sbin

      ./stop-cal

    2. Make sure that all Calendar Server processes stopped.

      Wait one minute, then kill any remaining processes.

    3. Clean the temporary shared database files.

      cd caldb.berkeleydb.homedir.path;
      rm __db.00*

      where caldb.berkeleydb.homedir.path is the path of the database, which is specified in the caldb.berkeleydb.homedir.path parameter in the ics.conf file.

    4. Restart Calendar Server.

      ./start-cal

    5. After restarting the services, check the logs for any unexpected behavior.