Sun Gathering Debug Data for Sun Java System Messaging Server

1.5 What Messaging Server Debug Data Should You Collect?

This section describes the kinds of debug data that you need to provide based on the kind of problem you are experiencing.

This section contains the following tasks:

ProcedureTo Collect Required Debug Data for Any Messaging Server Problem

All problems described in this technical note need basic information collected about when the problem occurred and about the system having the problem. Use this task to collect that basic information.

  1. Note the day(s) and time(s) the problem occurred.

  2. Provide a graphical representation of your deployment. Include all hosts and IP addresses, host names, operating system versions, role they perform, and other important systems such as load balancers, firewalls, and so forth.

  3. Note the operating system.

    Solaris OS

    uname -a

    HP-UX

    uname -r

    Linux

    more /etc/redhat-release

    Windows

    C:\Program Files\Common files\Microsoft Shared\MSInfo\msinfo32.exe /report C:\report.txt

  4. Note the patch level.

    Solaris OS

    patchadd -p

    HP-UX

    swlist

    Linux

    rpm -qa

    Windows

    Already provided in the C:\report.txt file above.

  5. Note the version of Messaging Server.

    Be sure to send the entire screen output of the imsimta version command.

    • Sun Java System Messaging Server (Messaging Server 6):

      UNIX and Linux

      cd server-root/sbin./imsimta version

      Windows

      cd server-root\sbinimsimta.exe version

    • iPlanet Messaging Server (Messaging Server 5):

      UNIX and Linux

      cd server-root/msg-identifier./imsimta version

      Windows

      cd server-root\msg-identifierimsimta.exe version

  6. Create a tar file of the Messaging Server configuration directory.

    • Sun Java System Messaging Server (Messaging Server 6):

      UNIX and Linux

      cd server-root/sbin./configutilCreate a tar file of the server-root/config directory.

      Windows

      cd server-root\configconfigutil.exeCreate a tar file of the server-root\config directory.

    • iPlanet Messaging Server (Messaging Server 5):

      UNIX and Linux

      cd server-root/msg-identifier./configutilCreate a tar file of the server-root/msg-identifier/imta/config directory.

      Windows

      cd server-root\msg-identifierconfigutil.exeCreate a tar file of the server-root\msg-identifier\imta\config directory.


    Note –

    If possible, provide just the relevant extracts of log files for the same time period that show the problem, with sufficient context to see what else was happening during the error occurrence and shortly before. Thus for relatively short log files (for example, MTA channel debug log files), send the entire log file, whereas for long-running hence large log files, an extract might be more appropriate, though be sure to include all the material from the time of the error as well as at least some lead-in logging from before the error apparently occurred.

    However, when questions arise about message structure or content, or about notification or bounced messages, then send an entire sample message, including all the outermost header (not just an excerpt of the message).


ProcedureTo Collect Debug Data on Messaging Server Installation Problems

Follow these steps if you are unable to complete the installation or if you get a “failed” installation status for Messaging Server.

  1. Consult the following troubleshooting information:

    If the problem persists after using this troubleshooting information, then continue with this procedure to collect the necessary data for the Sun Support Center.

  2. Collect the general system information as explained in To Collect Required Debug Data for Any Messaging Server Problem.

  3. Specify if this is a first-time installation or a Hot Fix installation on a pre-existing Messaging Server instance.

  4. Get the installation logs.

    • Sun Java System Messaging Server (Messaging Server 6):

      Messaging Server 6 log files mostly reside in the server-root/log directory. However, the initial configuration log files reside in the server-root/install directory, which also contains information on the initial configuration.

      Solaris OS

      /var/sadm/install/logsThe log file names start with Java_Enterprise_System*_install.Bdatetime, where date and time correspond to the failing installing (for example, B12161532).

      HP-UX and Linux

      /var/opt/sun/install/logsThe log file names start with Java_Enterprise_System*_install.Bdatetime, where date and time correspond to the failing installing (for example, B12161532).

      Windows

      C:\DocumentsandSettings\current-user\LocalSettings\TempThe log file names start with MSI*.log (usually a text file). The asterisk (*) represents a random number in the Temp directory for each MSI based setup.

    • iPlanet Messaging Server (Messaging Server 5): Rerun the installation with the following command and save the resultant file.

      Solaris OS

      truss -ealf -rall -wall -vall -o /tmp/install-messaging.truss ./setup

      HP-UX

      tusc -v -feaIT -rall -wall -o /tmp/install-messaging.tusc ./setup

      Linux

      strace -fv -o /tmp/install-messaging.strace ./setup

      Windows

      Use Debug View: http://www.sysinternals.com/Utilities/DebugView.html

ProcedureTo Collect Debug Data on a Hung or Unresponsive Messaging Server Process

A process hang is defined as one of the Messaging Server processes not responding to requests anymore while the process is still running locally. Messaging Server's seven specific processes are:

Additionally, Messaging Server uses these three processes:

The system can be running more than one of each of these processes (especially tcp_smtp_server).

Other processes from the job_controller.cnf configuration file might be running (autoreply, ims_master, l_master, conversion, reprocess, and so on), as well as reconstruct and imexpire processes.

Before You Begin

Make sure that you collect all the data over the same time frame in which the problem occurs. See 1.6 Configuring Solaris OS to Generate Core Files if a core file is not generated.

Collect the following information for process hang problems. Run the commands in order when the problem occurs. Be sure to specify the time when the process hang happened and affected processes, if possible.

  1. Collect the general system information as explained in To Collect Required Debug Data for Any Messaging Server Problem.

  2. Run the netstat command and save the output.

    UNIX and Linux

    netstat -an | grep messaging-service-port

    Windows

    netstat -an

  3. Run the following commands and save the output.

    Solaris OS

    ps -ef | grep server-rootvmstat 5 5iostat -xtopuptime

    HP-UX

    ps -aux | grep server-rootvmstat 5 5iostat -xtopsar

    Linux

    ps -aux | grep server-rootvmstat 5 5topuptimesar

    Windows

    Obtain the MESSAGING process PID: C:\windbg-root>tlist.exe

    Obtain process details of the MESSAGING running process PID: C:\windbg-root>tlist.exe messaging-pid

  4. Get the swap information.

    Solaris OS

    swap -l

    HP-UX

    swapinfo

    Linux

    free

    Windows

    Already provided in C:\report.txt as described in To Collect Required Debug Data for Any Messaging Server Problem.

  5. Get the system logs.

    Solaris OS and Linux

    /var/adm/messages/var/log/syslog

    HP-UX

    /var/adm/syslog/syslog.log

    Windows

    Event log files:Start-> Settings-> Control Panel —> Event Viewer-> Select LogThen click Action-> Save log file as


    Note –

    For UNIX systems, depending on your site's configuration of the SNDOPR_PRIORITY option.dat option and your syslog configuration (syslog.conf), the MTA might be sending automatically generated syslog notices to a non-default location. Also, the LOG_SNDOPR option.dat option controls whether additional potential syslog notices are generated by the MTA message logging facility.


  6. For UNIX and Linux systems, get the contents of the /etc/nsswitch.conf and /etc/resolv.conf files.

  7. Get the most current log files for the hanging process, if known. Otherwise, get the log files for all processes.

    You should also set LOG_SNDOPR in the option.dat file and check for syslog notices if some expected MTA log file doesn't seem to have been generated. When you set the LOG_SNDOPR option, then the MTA generates a syslog notice if it cannot write to its regular log files.

    • Sun Java System Messaging Server (Messaging Server 6):

      UNIX and Linux

      server-root/log/*

      Windows

      server-root\data\log\default\defaultserver-root\data\log\http\httpserver-root\data\log\imap\imap server-root\data\log\pop\popserver-root\data\log\imta\*

    • iPlanet Messaging Server (Messaging Server 5):

      UNIX and Linux

      server-root/msg-identifier/log/default/defaultserver-root/msg-identifier/log/http/httpserver-root/msg-identifier/log/imap/imapserver-root/msg-identifier/log/pop/popserver-root/msg-identifier/log/imta/*

      Windows

      server-root\msg-identifier\log\default\defaultserver-root\msg-identifier\log\http\httpserver-root\msg-identifier\log\imap\imap server-root\msg-identifier\log\pop\popserver-root\msg-identifier\log\imta\*

  8. (Solaris OS only) If you are able to isolate the hanging process, get the following debug data for that process. Otherwise, get the following data for each of the Messaging Server processes.

    Using the PID obtained in Step 3, get a series of five of the following commands (one every 10 seconds):

    pstack messaging-pid
    pmap -x messaging-pid
    
  9. Look for any core file that could have been dumped by one of the Messaging Server processes. If you find one, see To Collect Debug Data on a Messaging Server Crashed Process.

  10. If any of the ims_master, imapd, mshttpd, popd, mboxutil, or reconstruct processes is hung, as the mailsrv user, get the outputs of the db_stat command as follows.

    • UNIX and Linux:

      cd msg-instance/data/store/mboxlist
      server-root/lib/db_stat -Co -h msg-instance/data/store/mboxlist/ -N
      server-root/lib/db_stat -t

      If you have set the dbtmpdir configutil variable, use that location instead of msg-instance/store/mboxlist/.


      Note –

      For Solaris OS systems only, the script dbhang captures all the following debug data for you. Edit the top of the script to match your system's configuration. Specifically, edit the SRVROOT, INST, MAILUSER, and DBTMPDIR parameters. Then run the script and collect the data. See 1.7 Running the Messaging Server Debugging Scripts for more information.


    • Windows:

      cd msg-instance\data\store\mboxlist
      server-root\lib\db_stat -Co -h msg-instance\data\store\mboxlist\ -N
      server-root\lib\db_stat -t
  11. Get the output of the following command.

    Solaris OS

    truss -ealf -rall -wall -vall -o /tmp/truss.out -p messaging-pid

    HP-UX

    tusc -v -fealT -rall -wall -o /tmp/tusc.out -p messaging-pid

    Linux

    strace -fv -o /tmp/strace.out -p messaging-pid

    Windows

    Use DebugView: http://www.sysinternals.com/Utilities/DebugView.html


    Note –

    Wait one minute after launching the appropriate command (truss, strace, tusc, or DebugView) then stop it by pressing Control-C in the terminal where you launched the command.


  12. Get core files and the output of the following commands.

    In a process hang situation, it is helpful to compare several core files to review the state of the threads over time. To not overwrite a core file, copy that core file to a new name, wait approximately one minute then rerun the following commands. Do this three times to obtain three core files.


    Note –

    For HP-UX, you need the following two patches to use the gcore command: PHKL_31876 and PHCO_32173. If you cannot install these patch, use the HP-UX /opt/langtools/bin/gdb command from version 3.2 and later, or the dumpcore command.


    Solaris OS

    cd server-root/bin/slapd/servergcore -o /tmp/messaging_process-core messaging-pidpstack /tmp/messaging_process-core

    HP-UX

    # gcore -p messaging-pid
    (gdb) attach messaging-pid
    Attaching to process messaging-pid
    No executable file name was specified
    (gdb) dumpcore
    Dumping core to the core file core.messaging-pid
    (gdb) quit
    The program is running. Quit anyway (and detach it)? (y or n) y
    Detaching from program: , process messaging-pid
    
    Linux

    # gdb
    (gdb) attach messaging-pid
    Attaching to process messaging-pid
    No executable file name was specified
    (gdb) gcore
    Saved corefile core.messaging-pid
    
    (gdb)backtrace
    (gdb)quit
    
    Windows

    Get the MESSAGING process PID:

    C:\windbg-root>tlist.exe

    Generate a crash dump on the MESSAGING running process PID:

    C:\windbg-root>adplus.vbs -hang -p messaging-pid -o C:\crashdump_dir


    Note –

    For Windows, provide the complete generated folder under C:\crashdump_dir.


  13. (Solaris OS only) Archive the result of the script pkg_app (one core file is sufficient).

    ./pkg_app.ksh pid-of-application corefile
    

    The Sun Support Center must have the output from the pkg_app script to properly analyze the core file(s).


    Note –

    Make sure the appropriate limitations are set by using the ulimit command, and that the user is not nobody. Also check the coreadm command for additional control. See 1.6 Configuring Solaris OS to Generate Core Files if a core file is not generated.


  14. When you have collected all debug data, perform the following steps to restore the service.

    Messaging Server processes usually hang because of an orphan lock left in one of the databases. Stopping the server (especially the stored process), and cleaning the temporary shared db files helps to resolve the problem.

    1. Stop Messaging Server.

      cd server-root/sbin./stop-msg

    2. Make sure that all Messaging Server processes stopped.

      Wait one minute, then kill any remaining processes, except tcp_smtp_server processes (which do not use databases).

    3. Restart Messaging Server.

      ./start-msg

    4. After restarting the services, check the logs for any unexpected behavior.

ProcedureTo Collect Debug Data on a Messaging Server Crashed Process

Use this task to collect data when a Messaging Server process has stopped (crashed) unexpectedly. Run all the commands on the actual machine where the core file(s) were generated.

  1. Collect the general system information as explained in To Collect Required Debug Data for Any Messaging Server Problem.

  2. Note whether you can you restart Messaging Server.

  3. Get the output of the following commands.

    Solaris OS

    ps -ef | grep server-rootvmstat 5 5iostat -xtopuptime

    HP-UX

    ps -aux | grep server-rootvmstat 5 5iostat -xtopsar

    Linux

    ps -aux | grep server-rootvmstat 5 5topuptimesar

    Windows

    Obtain the MESSAGING process PID: C:\windbg-root>tlist.exe

    Obtain process details of the MESSAGING running process PID: C:\windbg-root>tlist.exe messaging-pid

  4. Get the swap information.

    Solaris OS

    swap -l

    HP-UX

    swapinfo

    Linux

    free

    Windows

    Already provided in C:\report.txt as described in To Collect Required Debug Data for Any Messaging Server Problem.

  5. Get the system logs.

    Solaris OS and Linux

    /var/adm/messages/var/log/syslog

    HP-UX

    /var/adm/syslog/syslog.log

    Windows

    Event log files:Start-> Settings-> Control Panel —> Event Viewer-> Select LogThen click Action-> Save log file as


    Note –

    For UNIX systems, depending on your site's configuration of the SNDOPR_PRIORITY option.dat option and your syslog configuration (syslog.conf), the MTA might be sending automatically generated syslog notices to a pre—determined location. Also, the LOG_SNDOPR option.dat option controls whether additional potential syslog notices are generated by the MTA message logging facility.


  6. Get core files (called “Crash Dumps” by Windows).

    Solaris OS

    See 1.6 Configuring Solaris OS to Generate Core Files if a core file was not generated.

    Linux

    Core dumps are turned off by default in the /etc/profile file. You can make per user changes by editing your ~/.bash_profile file. Look for the following line:

    ulimit -S -c 0 > /dev/null 2>&1

    You can either comment out the entire line to set no limit on the size of the core files or set your own maximum size.

    Windows

    Generate a crash dump during a crash of Messaging Server by using the following commands:

    Get the MESSAGING process PID : C:\windbg-root>tlist.exeGenerate a crash dump when the MESSAGING process crashes: C:\windbg-root>adplus.vbs -crash -FullOnFirst -p messaging-pid -o C:\crashdump_dir

    The adplus.vbs command watches messaging-pid until it crashes and will generate the dmp file. Provide the complete generated folder under C:\crashdump_dir.


    Note –

    If you didn't install the Debugging Tools for Windows, you can use the drwtsn32.exe -i command to select Dr. Watson as the default debugger. Use the drwtsn32.exe command, check all options, and choose the path for crash dumps. Then provide the dump and the drwtsn32.log files.


  7. (Solaris OS only) For each core file, provide the output of the following commands.

    file corefile
    pstack corefile
    pmap corefile
    pflags corefile
    
  8. (Solaris OS only) Archive the result of the script pkg_app (one core file is sufficient).

    ./pkg_app.ksh Pid-of-application corefile
    

    Note –

    The Sun Support Center must have the output from the pkg_app script to properly analyze the core file(s).


ProcedureTo Collect Debug Data on a Messaging Server Routing Problem

Use this task to collect data when Messaging Server is experiencing a routing problem.

A Messaging Server routing problem is defined as the inability of the system to correctly route a message. For example, a message might be ending up in the wrong Message Store, might be sent to the wrong channel, might be delivered to the wrong user, and so on.

  1. Collect the general system information as explained in To Collect Required Debug Data for Any Messaging Server Problem.

  2. Provide a detailed explanation of what do you want to obtain.

  3. Get the output of the following command.

    • Sun Java System Messaging Server (Messaging Server 6):

      UNIX and Linux

      cd server-root/sbin./imsimta test -rewrite -debug=level=5 mailaddress

      Windows

      cd server-root\sbinimsimta.exe test -rewrite -debug=level=5 mailaddress

    • iPlanet Messaging Server (Messaging Server 5):

      UNIX and Linux

      cd server-root/msg-identifier./imsimta test -rewrite -debug=level=5 mailaddress

      Windows

      cd server-root\msg-identifierimsimta.exe test -rewrite -debug=level=5 mailaddress


    Note –

    When the problem is about Sieve filters, add the option -filter to the above command.

    The -noimage qualifier to the imsimta test -rewrite -debug command enables you to test changes made to the configuration file prior to recompiling the new MTA configuration.

    If possible, use the -source_channel=source_channel option to specify the incoming channel. This is sometimes necessary for testing the interactions with the mapping tables and the antirelay rules. By default, Internet mail arrives on the tcp_local channel whereas internal mail arrives on the tcp_intranet channel. If the connection is authenticated, use tcp_auth.

    The command would then look like:# ./imsimta test -rewrite -debug -source_channel=tcp_intranet email-address


  4. Get the LDIF entry of an impacted user.

    UNIX and Linux

    dir-root/shared/bin/ldapsearch -h hostname -p port -D "cn=Directory Manager" -w password -b "basedn" "(objectclass=*)" uid=user > /tmp/user.ldif

    Windows

    dir-root\shared\bin\ldapsearch.exe -h hostname -p port -D "cn=Directory Manager" -w password -b "basedn" "(objectclass=*)" uid=user > C:\user.ldif

    where:

    dir-root

    The directory on the Directory Server machine dedicated to holding the server program, and configuration, maintenance, and information files. The default location for UNIX and Linux versions of Messaging Server is /var/opt/mps/serverroot/.

    hostname

    Name of the host running Directory Server. The default value is localhost. You can omit -h hostname if the Directory Server is running locally.

    port

    Port number on which Directory Server is listening. The default is 389. You can omit port if the Directory Server is running on port 389.

    basedn

    The base dn for the search. Use basedn as the starting point for the search.

  5. Get the LDIF entry of the domain where the impacted user resides.

    UNIX and Linux

    dir-root/shared/bin/ldapsearch -h hostname -p port -D "cn=Directory Manager" -w password -s base -b "baseDN" "(objectclass=*)" > /tmp/domain.ldif

    Windows

    dir-root\shared\bin\ldapsearch.exe -h hostname -p port -D "cn=Directory Manager" -w password -s base -b "baseDN" "(objectclass=*)" > C:\domain.ldif

    where:

    dir-root

    The directory on the Directory Server machine dedicated to holding the server program, and configuration, maintenance, and information files. The default location for UNIX and Linux versions of Messaging Server is /var/opt/mps/serverroot/.

    hostname

    Name of the host running Directory Server. You can omit -h hostname if the Directory Server is running locally.

    port

    Port number on which Directory Server is listening. The default is 389. You can omit port if the Directory Server is running on port 389.

ProcedureTo Collect Debug Data on a Messaging Server MTA Queue Problem

Use this task to collect data when Messaging Server is experiencing a queue problem, for example, when a specific queue is growing and messages are not being dequeued.

  1. Collect the general system information as explained in To Collect Required Debug Data for Any Messaging Server Problem.

  2. Is the “growing” queue full of ZZ*.00 message files or full of Z*.00 message files? How many ZZ*.00 message files, and how many Z*.00 message files are there? Or are there .HELD files?

    1. If the channel has lots of ZZ*.00 message files and relatively few Z*.00 message files (where “relatively” depends heavily on the specific channel and site usage), make sure that the Job Controller is running. For example:

      ps -ef | grep job_controller
    2. If a channel has lots of Z*.00 message files and not very many ZZ*.00 message files, then typically the MTA itself does not have any “problem,” but rather than there is a problem with a separate destination host or a network problem.

      In this case, look at the delivery history of the Z*.00 messages. You need the message files themselves, or better yet, the output of the imsimta qm history command. Examine the imsimta qm history output for old mail.log* records for those message files. They indicate what sort of SMTP or other error is occurring (and when) for these “old” message files.

      For more information on the imsimta qm command, see the following:

  3. Look at the “Messages are Not Dequeued” section in the Messaging Server Administration Guide.

  4. Run the following imsimta command to see if messages will now get delivered.

    If the messages do now get delivered, then whatever problem was preventing message delivery was probably transient (for example, a network or DNS problem) and is now resolved. Or else the “problem” is simply that you do not have the channel/Job Controller configured for enough simultaneous delivery jobs for that channel to keep up with the current load.

    • Sun Java System Messaging Server (Messaging Server 6):

      UNIX and Linux

      cd server-root/sbin./imsimta run channel

      Windows

      cd server-root\sbinimsimta.exe run channel

    • iPlanet Messaging Server (Messaging Server 5):

      UNIX and Linux

      cd server-root/msg-identifier./imsimta run channel


    Note –

    The imsimta run command does not provide much helpful information for Z*.00 message files.


  5. Get the output of the following commands.

    • Sun Java System Messaging Server (Messaging Server 6):

      UNIX and Linux

      cd server-root/sbin ./imsimta qm counters show ./imsimta qm summ./imsimta qtop -database -domain_to./imsimta qm messages channel

      Windows

      cd server-root\sbinimsimta.exe qm counters showimsimta.exe qm summ imsimta.exe qtop -database -domain_toimsimta.exe qm messages channel

    • iPlanet Messaging Server (Messaging Server 5):

      UNIX and Linux

      cd server-root/msg-identifier./imsimta qm counters show./imsimta qtop -database -domain_to

      Windows

      cd server-root\msg-identifierimsimta.exe qm counters showimsimta.exe qm summ imsimta.exe qtop -database -domain_to

  6. Get the current log files.

    • Sun Java System Messaging Server (Messaging Server 6):

      UNIX and Linux

      server-root/log/*

      Windows

      server-root\data\log\imta\*

    • iPlanet Messaging Server (Messaging Server 5):

      UNIX and Linux

      server-root/msg-identifier/log/imta/*

      Windows

      server-root\msg-identifier\log\imta\*

ProcedureTo Collect Debug Data on a Messaging Server Webmail Problem

Use this task to collect data for a Webmail problem. The most common problems are related to incorrect translation of fields when using a localized Messaging Server interface.

  1. Collect the general system information as explained in To Collect Required Debug Data for Any Messaging Server Problem.

  2. Take a snapshot of the problematic screen(s).

  3. Note the step-by-step procedure to reproduce the problem with a test case.


    Note –

    The Sun Support Center does not support Webmail customizations. Contact your sales representative for those problems.