Sun Gathering Debug Data for Sun Java System Calendar Server

1.5 What Calendar Server Debug Data Should You Collect?

This section describes the kinds of debug data that you need to provide based on the kind of problem you are experiencing.

This section contains the following tasks:

ProcedureTo Collect Required Debug Data for Any Calendar Server Problem

All problems described in this technical note need basic information collected about when the problem occurred and about the system having the problem. Use this task to collect that basic information.

  1. Note the day(s) and time(s) the problem occurred.

  2. Provide a graphical representation of your deployment. Include all hosts and IP addresses, host names, operating system versions, role they perform, and other important systems such as load balancers, firewalls, and so forth. Also include the following information:

    • Calendar Server topology. For example, describe front-end and back-end Calendar servers, if you have them.

    • LDAP Directory topology. Describe your Directory Server deployment features such as replicated directories. Are any caches enabled?

    • Calendar Server features. Describe other Calendar Server features you have configured. These include, at a minimum, virtual domains, SSL, SSO, and LDAP cache..

  3. Note the operating system.

    Solaris OS

    uname -a

    HP-UX

    uname -r

    Linux

    more /etc/redhat-release

    Windows

    C:\Program Files\Common files\Microsoft Shared\MSInfo\msinfo32.exe /report C:\report.txt

  4. Note the patch level.

    Solaris OS

    showrev -p | grep SUNWics5

    HP-UX

    swlist

    Linux

    rpm -qa

    Windows

    Already provided in the C:\report.txt file above.


    Note –

    If possible, also provide explorer information (SUNWexplo) of the machine having the problem. Edit the ics.conf file and increase the log level by using the logfile.loglevel = "Debug" parameter.


  5. Note the version of Calendar Server.

    Be sure to send the entire screen output of the command.

    UNIX and Linux

    cd cal-svr-base./cal/bin/csstart version

    Windows

    cd cal-svr-base\cal\bin\csstart.exe version

  6. Create a tar file of the Calendar Server configuration directory.

    UNIX and Linux

    Create a tar file of the cal-svr-base/cal/bin/config directory.

    Windows

    Create a tar file of the cal-svr-base\cal\bin\config directory.


    Note –

    Always be sure to include the ics.conf file.


  7. Get the start, stop, http, admin, and notifyd processes log files.

    The logfile.logdir parameter in the ics.conf file specifies the paths of these log files.


    Note –

    If possible, provide just the relevant extracts of log files for the same time period that show the problem, with sufficient context to see what else was occurring during the error occurrence and shortly before. Thus for relatively short log files, send the entire log file, whereas for long-running hence large log files, an extract might be more appropriate, though be sure to include all the material from the time of the error as well as at least some lead-in logging from before the error apparently occurred.


  8. Get the Access, Errors, and Audit logs from the Directory Server used by Calendar Server.

    The paths of these files are specified by the following parameters in the Directory Server dse.ldif file:

    • nssldap-accesslog

    • nssldap-errorlog

    • nssldap-auditlog

    UNIX and Linux

    server-root/slapd-identifier/logs/access

    server-root/slapd-identifier/logs/errors

    server-root/slapd-identifier/logs/audit (if enabled)

    Windows

    server-root\slapd-identifier\logs\access

    server-root\slapd-identifier\logs\errors

    server-root\slapd-identifier\logs\audit (if enabled)

ProcedureTo Collect Debug Data on Calendar Server Installation Problems

Follow these steps if you are unable to complete the installation or if you get a “failed” installation status for Calendar Server.

  1. Consult the following troubleshooting information.

    Read the “Troubleshooting” chapter in the Sun Java Communications Suite Installation Guide or, for older versions of Calendar Server, Sun Java Enterprise System Installation Guide.

    If the problem persists after using this troubleshooting information, then continue with this procedure to collect the necessary data for the Sun Support Center.

  2. Collect the general system information as explained in To Collect Required Debug Data for Any Calendar Server Problem.

  3. Specify if this is a first-time installation or a Hot Fix installation on a pre-existing Calendar Server instance.

  4. Get the installation logs.

    Solaris OS

    /var/sadm/install/logsThe log file names start with Java_Enterprise_System*_install.Bdatetime, where date and time correspond to the failing installing (for example, B12161532).

    HP-UX and Linux

    /var/opt/sun/install/logsThe log file names start with Java_Enterprise_System*_install.Bdatetime, where date and time correspond to the failing installing (for example, B12161532).

    Windows

    C:\DocumentsandSettings\current-user\LocalSettings\TempThe log file names start with CSI*.log (usually a text file). The asterisk (*) represents a random number in the Temp directory for each CSI based setup.

ProcedureTo Collect Debug Data on a Hung or Unresponsive Calendar Server Process

A process hang is defined as one of the Calendar Server processes not responding to requests anymore while the process is still running locally. Calendar Server's six specific processes are:


Caution – Caution –

Calendar Server processes usually hang because of an orphan lock left in one of the databases. Stopping the server (especially the csstored process), and cleaning the temporary shared database files helps to resolve the problem. This task is described in Step 13, at the end of the following procedure.

If you are sure that the hung process is a fleeting and insignificant issue, and you do not need help from the Sun Support Center, you can go to Step 13 now. Otherwise, do not stop and restart Calendar Server until you have gathered the data requested in the following procedure. Stopping and restarting the server destroys all the debug data related to the hung process.


Before You Begin

Make sure that you collect all the data over the same time frame in which the problem occurs. See 1.6 Configuring Solaris OS to Generate Core Files if a core file is not generated.

On Solaris systems, you can easily gather the required data by running the cscapture command in Invasive mode.

./cscapture -i

For more information about running cscapture and its Invasive operation, see Using Calendar Server Capture (cscapture) to Collect Debug Data for Sun Java System Calendar Server.

The cscapture command gathers netstat, ps, swap, gcore, pkg_app, and other data. After you run cscapture on a hung or unresponsive process, proceed to Step 10, below, which describes how to restart the calendar services.

For all other platforms, collect the following information for process hang problems. Run the commands in order when the problem occurs. Be sure to specify the time when the process hang happened and affected processes, if possible.

  1. Collect the general system information as explained in To Collect Required Debug Data for Any Calendar Server Problem.

  2. Specify the time the hang occurred and, if possible, the process that was hung.

  3. Run the netstat command and save the output.

    HP-UX and Linux

    netstat -an | grep calendar-service-port

    Windows

    netstat -an

  4. Run the following commands and save the output.

    HP-UX

    ps -ef | grep cal-svr-basevmstat 5 5iostat -xtopsar

    Linux

    ps -aux | grep cal-svr-basevmstat 5 5topuptimesar

    Windows

    Obtain the CALENDAR process PID: C:\windbg-root>tlist.exe

    Obtain process details of the CALENDAR running process PID: C:\windbg-root>tlist.exe calendar-pid


    Note –

    To use the preceding commands on Windows systems, you need to install the debugging tools, available from the following url:

    http://www.microsoft.com/whdc/devtools/debugging/default.mspx

    Install the latest version of the debugging tools and OS symbols for your version of Windows.

    You also must add the environment variable "_NT_SYMBOL_PATH".


  5. Get the swap information.

    HP-UX

    swapinfo

    Linux

    free

    Windows

    Already provided in C:\report.txt as described in To Collect Required Debug Data for Any Calendar Server Problem.

  6. Get the Calendar Server process log files.

    The logfile.logdir parameter in the ics.conf file specifies the locations of these log files.

    On Solaris systems, the default value of the path is /var/opt/SUNWics5/logs.

    Each process uses its own log file, as shown in the following list:

    csadmind

    ics.conf setting: logfile.admin.logname

    Default value: admin.log

    cshttpd

    ics.conf setting: logfile.http.logname

    Default value: http.log

    csdwpd

    ics.conf setting: logfile.dwp.logname

    Default value: dwp.log

    csnotifyd

    ics.conf setting: logifle.notify.logname

    Default value: notify.log

    csstored

    ics.conf setting: logifle.store.logname

    Default value: store.log

    watcher (Communications Suite 5 release only)

    ics.conf: No setting; file is always called watcher.log.

    Default value: watcher.log

    The enpd process does not have a log.

  7. Look for any core file that could have been dumped by one of the Calendar Server processes. If you find one, see To Collect Debug Data on a Calendar Server Crashed Process.

  8. Get the output of the following command.

    HP-UX

    tusc -v -fealT -rall -wall -o /tmp/calendar-process-name-calendar-pid.tusc.out -p calendar-pid

    Linux

    strace -fv -o /tmp/calendar-process-name-calendar-pid.strace.out -p calendar-pid

    Windows

    Use DebugView: http://www.sysinternals.com/Utilities/DebugView.html


    Note –

    Wait one minute after launching the appropriate command (truss, strace, tusc, or DebugView) then stop it by pressing Control-C in the terminal where you launched the command.


  9. Get core files and the output of the following commands.

    In a process hang situation, it is helpful to compare several core files to review the state of the threads over time. To not overwrite a core file, copy that core file to a new name, wait approximately one minute, then rerun the following commands. Do this three times to obtain three core files.


    Note –

    For HP-UX, you need the following two patches to use the gcore command: PHKL_31876 and PHCO_32173. If you cannot install these patch, use the HP-UX /opt/langtools/bin/gdb command from version 3.2 and later, or the dumpcore command.


    HP-UX

    # cd cal-svr-base/cal/bin 
    # gcore -p calendar-pid
    (gdb) attach calendar-pid
    Attaching to process calendar-pid
    No executable file name was specified
    (gdb) dumpcore
    Dumping core to the core file core.calendar-pid
    (gdb) quit
    The program is running. Quit anyway (and detach it)? (y or n) y
    Detaching from program: , process calendar-pid
    
    Linux

    # cd cal-svr-base/cal/bin
    # gdb
    (gdb) attach calendar-pid
    Attaching to process calendar-pid
    No executable file name was specified
    (gdb) gcore
    Saved corefile core.calendar-pid
    
    (gdb)backtrace
    (gdb)quit
    
    Windows

    Get the CALENDAR process PID:

    C:\windbg-root>tlist.exe

    Generate a crash dump on the CALENDAR running process PID:

    C:\windbg-root>adplus.vbs -hang -p calendar-pid -o C:\crashdump_dir


    Note –

    For Windows, provide the complete generated folder under C:\crashdump_dir.


  10. When you have collected all debug data, perform the following steps to restore the service.

    1. Stop Calendar Server.

      cd cal-svr-base/cal/sbin

      ./stop-cal

    2. Make sure that all Calendar Server processes stopped.

      Wait one minute, then kill any remaining processes.

    3. Clean the temporary shared database files.

      cd caldb.berkeleydb.homedir.path;
      rm __db.00*

      where caldb.berkeleydb.homedir.path is the path of the database, which is specified in the caldb.berkeleydb.homedir.path parameter in the ics.conf file.

    4. Restart Calendar Server.

      ./start-cal

    5. After restarting the services, check the logs for any unexpected behavior.

ProcedureTo Collect Debug Data on a Calendar Server Crashed Process

Use this task to collect data when a Calendar Server process has stopped (crashed) unexpectedly.

On Solaris systems, you can easily gather the required data by running the cscapture command.

./cscapture

For more information about running cscapture, see Using Calendar Server Capture (cscapture) to Collect Debug Data for Sun Java System Calendar Server.

The cscapture command gathers ps, swap, gcore, pkg_app, and other data.

On all other platforms, manually run all the commands on the actual machine where the core file(s) were generated, as described in the following procedure.

  1. Collect the general system information as explained in To Collect Required Debug Data for Any Calendar Server Problem.

  2. Note whether you can you restart Calendar Server. If the problem is reproducible, provide a test case that can be reproduced in the Sun Support Center labs.

  3. Get the output of the following commands.

    HP-UX

    ps -ef | grep cal-svr-basevmstat 5 5iostat -xtopsar

    Linux

    ps -aux | grep cal-svr-basevmstat 5 5topuptimesar

    Windows

    Obtain the CALENDAR process PID: C:\windbg-root>tlist.exe

    Obtain process details of the CALENDAR running process PID: C:\windbg-root>tlist.exe calendar-pid


    Note –

    To use the preceding commands on Windows systems, you need to install the debugging tools, available from the following url:

    http://www.microsoft.com/whdc/devtools/debugging/default.mspx

    Install the latest version of the debugging tools and OS symbols for your version of Windows.

    You also must add the environment variable "_NT_SYMBOL_PATH".


  4. Get the swap information.

    HP-UX

    swapinfo

    Linux

    free

    Windows

    Already provided in C:\report.txt as described in To Collect Required Debug Data for Any Calendar Server Problem.

  5. Get the system logs.

    Linux

    /var/adm/messages/var/log/syslog

    HP-UX

    /var/adm/syslog/syslog.log

    Windows

    Event log files:Start-> Settings-> Control Panel —> Event Viewer-> Select LogThen click Action-> Save log file as

  6. Get core files (called “Crash Dumps” by Windows).

    Solaris OS

    See 1.6 Configuring Solaris OS to Generate Core Files if a core file was not generated.

    Linux

    Core dumps are turned off by default in the /etc/profile file. You can make per user changes by editing your ~/.bash_profile file. Look for the following line:

    ulimit -S -c 0 > /dev/null 2>&1

    You can either comment out the entire line to set no limit on the size of the core files or set your own maximum size.

    Windows

    Generate a crash dump during a crash of Calendar Server by using the following commands:

    Get the CALENDAR process PID : C:\windbg-root>tlist.exeGenerate a crash dump when the CALENDAR process crashes: C:\windbg-root>adplus.vbs -crash -FullOnFirst -p calendar-pid -o C:\crashdump_dir

    The adplus.vbs command watches calendar-pid until it crashes and will generate the dmp file. Provide the complete generated folder under C:\crashdump_dir.


    Note –

    If you didn't install the Debugging Tools for Windows, you can use the drwtsn32.exe -i command to select Dr. Watson as the default debugger. Use the drwtsn32.exe command, check all options, and choose the path for crash dumps. Then provide the dump and the drwtsn32.log files.


ProcedureTo Collect Debug Data on a Calendar Server Database Problem

Use this task to collect data when Calendar Server is experiencing a database problem in a standard database deployment.

A Calendar Server database problem can occur when there are errors in the Calendar Server backup database (obtained with the csbackup command) or when error messages in the logs identify problems in the integrity of the database.

  1. Collect the general system information as explained in To Collect Required Debug Data for Any Calendar Server Problem.

  2. (Solaris only) Run the cscapture script to gather a copy of the Calendar Server's database and additional information:

    ./cscapture -db
  3. (Other platforms) Run the capture_environment.pl script.

    First go to the directory where the script is located. For example:

    cd /export/home/myscripts
    perl capture_environment.pl

    This command creates a file named archive.tar.gz, which contains Calendar Server database logs, system statistics, and configuration information, and database information.

  4. Get the output from the csdb check command.

    The csdb check command reports errors on the Calendar Server backup database and on database corruption issues that turn up in the error logs.


    Note –

    This command can return false errors when it is run on a production server database.


    UNIX and Linux

    cd cal-svr-base/cal/bin./csdb check

    Windows

    cd cal-svr-base\cal\bincsdb.exe check

  5. Try to recover your database from a backup.

    Read the section, “Checking and Rebuilding a Calendar Database” or “To Rebuild the Calendar Databases (caldb)” in the Calendar Server 6 Administration Guide:

  6. If the database is corrupted, and you have tried to recover it without success, provide a tar file of the entire database directory.

    The path of the database is specified in the caldb.berkeleydb.homedir.path parameter in the ics.conf file.

    For a Windows system, the following example shows the physical path of the database event, task, and alarm files:

    caldb.berkeleydb.homedir.path = 
    "C:/www/Cal6-1-1/CalendarServer6/var/csdb"
  7. Obtain user information for at least 2 users impacted by the problem and 2 users unaffected by the problem.

    If no users can connect to the database, please include this information.

    Use the csattribute command to get the LDIF entry of each user. The csattribute command displays the user's DN and the attributes of the specified user. following is a sample DN:

    uid=user1,ou=people,o=siroe.com,o=isp

    Run the command as follows:

    UNIX and Linux

    cal-svr-base/cal/sbin/csattribute -v list uid

    Windows

    csattribute -v list uid

    where:

    server-root

    The directory on the Calendar Server machine dedicated to holding the Calendar Server program, and configuration, maintenance, and information files. The default location for UNIX and Linux versions of Calendar Server is /var/opt/mps/serverroot/.

    uid

    User ID of the user you are searching for.

  8. Get the LDIF entry of the domain where the impacted user resides.

    Use the domain portion of the DN shown by the csattribute command in the preceding step.

    UNIX and Linux

    dir-root/shared/bin/ldapsearch -h hostname -p port -D "cn=Directory Manager" -w password -s base -b "baseDN" "(objectclass=*)" > /tmp/domain.ldif

    Windows

    dir-root\shared\bin\ldapsearch.exe -h hostname -p port -D "cn=Directory Manager" -w password -s base -b "baseDN" "(objectclass=*)" > C:\domain.ldif

    where:

    dir-root

    The directory on the Directory Server machine dedicated to holding the server program, and configuration, maintenance, and information files. The default location for UNIX and Linux versions of Calendar Server is /var/opt/mps/serverroot/.

    hostname

    Name of the host running Directory Server. You can omit -h hostname if the Directory Server is running locally.

    port

    Port number on which Directory Server is listening. The default is 389. You can omit port if the Directory Server is running on port 389.

  9. If you have not already done so, set the Calendar Server debug log level to Debug and provide all errors printed to the Calendar Server logs.

    Provide all log files with pointers to timestamps or error messages.

  10. If you have not already done so, provide specific steps showing how to reproduce the problem.

    Include any observed error messages from the Web UI, WCAP commands, and the Calendar Server log files and timestamps showing when the error occurred.

ProcedureTo Collect Debug Data on a Calendar Server CLD/DWP Problem

Use this task to collect data when Calendar Server is experiencing a database problem in a Calendar Lookup Database/Database Wire Protocol (CLD/DWP) deployment.

A Calendar Server CLD/DWP installation comprises a multi-tiered deployment with multiple servers connecting to a distributed back-end Calendar Server database. If a database problem occurs in this environment, you must provide detailed information about your network topology and front-end and back-end servers to facilitate the identification of the issue.

  1. Collect the general system information as explained in To Collect Required Debug Data for Any Calendar Server Problem.

  2. Provide a detailed explanation of how you have set up your Calendar Server deployment. For example, you could have

    • Two front-end servers and two back-end servers

    • Two front-end servers and one back-end server

    • Three front-end servers and three back-end servers

    and so on.

  3. Provide the specific name of each machine in your deployment. Identify whether it is a front-end server or back-end server.

  4. (Solaris only) Run the cscapture script to gather Calendar Server information:

    ./cscapture
  5. (Other platforms) Run the capture_environment.pl script.

    First go to the directory where the script is located. For example:

    cd /export/home/myscripts
    perl capture_environment.pl

    This command creates a file named archive.tar.gz, which contains Calendar Server database logs, system statistics, and configuration information, and database information.

  6. Obtain user information for at least 2 users impacted by the problem and 2 users unaffected by the problem.

    If no users can connect to the database, please include this information.

    Use the csattribute command to get the LDIF entry of each user. The csattribute command displays the user's DN and the attributes of the specified user. following is a sample DN:

    uid=user1,ou=people,o=siroe.com,o=isp

    Run the command as follows:

    UNIX and Linux

    cal-svr-base/cal/sbin/csattribute -v list uid

    Windows

    csattribute -v list uid

    where:

    server-root

    The directory on the Calendar Server machine dedicated to holding the Calendar Server program, and configuration, maintenance, and information files. The default location for UNIX and Linux versions of Calendar Server is /var/opt/mps/serverroot/.

    uid

    User ID of the user you are searching for.

  7. Get the LDIF entry of the domain where the impacted user resides.

    Use the domain portion of the DN shown by the csattribute command in the preceding step.

    UNIX and Linux

    dir-root/shared/bin/ldapsearch -h hostname -p port -D "cn=Directory Manager" -w password -s base -b "baseDN" "(objectclass=*)" > /tmp/domain.ldif

    Windows

    dir-root\shared\bin\ldapsearch.exe -h hostname -p port -D "cn=Directory Manager" -w password -s base -b "baseDN" "(objectclass=*)" > C:\domain.ldif

    where:

    dir-root

    The directory on the Directory Server machine dedicated to holding the server program, and configuration, maintenance, and information files. The default location for UNIX and Linux versions of Calendar Server is /var/opt/mps/serverroot/.

    hostname

    Name of the host running Directory Server. You can omit -h hostname if the Directory Server is running locally.

    port

    Port number on which Directory Server is listening. The default is 389. You can omit port if the Directory Server is running on port 389.

  8. If you have not already done so, set the Calendar Server debug log level to Debug and provide all errors printed to the Calendar Server logs.

    Provide all log files with pointers to timestamps or error messages. For example:


    [04/Jul/2006:19:49:52 +0200] mouline cshttpd[25278]:
    General Error: chttp_ctx_async_connect: ASock_Connect
    failed: pctx->s 924500
    
    [04/Jul/2006:19:49:52 +0200] mouline cshttpd[25278]:
    General Notice: CldCacheInit: attemptint to open cache
    database for 25278
  9. If you have not already done so, provide specific steps showing how to reproduce the problem.

    Include any observed error messages from the Web UI, WCAP commands, and the Calendar Server log files and timestamps showing when the error occurred.

ProcedureTo Collect Debug Data on a Calendar Express or Communications Express Interface Problem

Use this task to collect data for a problem with the either the Calendar Express or Communications Express interfaces. The most common problems are related to incorrect translation of fields when using a localized Calendar Server interface.

  1. Collect the general system information as explained in To Collect Required Debug Data for Any Calendar Server Problem.

  2. Take a snapshot of the problematic screen(s).

  3. Note the step-by-step procedure to reproduce the problem. Include a test case.


    Note –

    The Sun Support Center does not support Webmail customizations. Contact your sales representative for those problems.