A process hang is defined as one of the Messaging Server processes not responding to requests anymore while the process is still running locally. Messaging Server's seven specific processes are:
imapd—Provides access to IMAP services.
popd—Provides access to POP services.
mshttpd—Provides access to Webmail services.
dispatcher—Permits multiple multithreaded server processes to share responsibility for SMTP connection services.
enpd—Collects and dispatches events that occur to properties of resources (inboxes or calendars).
job_controller—Ensures delivery of messages.
stored—Performs a variety of important tasks such as deadlock and transaction operations of the message database, enforcing aging policies, and expunging and erasing messages stored on disk.
Additionally, Messaging Server uses these three processes:
tcp_smtp_server—Handles SMTP sessions.
tcp_lmtp_server—Handles LMTP sessions.
tcp_lmtpn_server—Handles LMTP native sessions.
The system can be running more than one of each of these processes (especially tcp_smtp_server).
Other processes from the job_controller.cnf configuration file might be running (autoreply, ims_master, l_master, conversion, reprocess, and so on), as well as reconstruct and imexpire processes.
Make sure that you collect all the data over the same time frame in which the problem occurs. See 1.6 Configuring Solaris OS to Generate Core Files if a core file is not generated.
Collect the following information for process hang problems. Run the commands in order when the problem occurs. Be sure to specify the time when the process hang happened and affected processes, if possible.
Collect the general system information as explained in To Collect Required Debug Data for Any Messaging Server Problem.
Run the netstat command and save the output.
netstat -an | grep messaging-service-port
netstat -an
Run the following commands and save the output.
ps -ef | grep server-rootvmstat 5 5iostat -xtopuptime
ps -aux | grep server-rootvmstat 5 5iostat -xtopsar
ps -aux | grep server-rootvmstat 5 5topuptimesar
Obtain the MESSAGING process PID: C:\windbg-root>tlist.exe
Obtain process details of the MESSAGING running process PID: C:\windbg-root>tlist.exe messaging-pid
Get the swap information.
swap -l
swapinfo
free
Already provided in C:\report.txt as described in To Collect Required Debug Data for Any Messaging Server Problem.
Get the system logs.
/var/adm/messages/var/log/syslog
/var/adm/syslog/syslog.log
Event log files:Start-> Settings-> Control Panel —> Event Viewer-> Select LogThen click Action-> Save log file as
For UNIX systems, depending on your site's configuration of the SNDOPR_PRIORITY option.dat option and your syslog configuration (syslog.conf), the MTA might be sending automatically generated syslog notices to a non-default location. Also, the LOG_SNDOPR option.dat option controls whether additional potential syslog notices are generated by the MTA message logging facility.
For UNIX and Linux systems, get the contents of the /etc/nsswitch.conf and /etc/resolv.conf files.
Get the most current log files for the hanging process, if known. Otherwise, get the log files for all processes.
You should also set LOG_SNDOPR in the option.dat file and check for syslog notices if some expected MTA log file doesn't seem to have been generated. When you set the LOG_SNDOPR option, then the MTA generates a syslog notice if it cannot write to its regular log files.
Sun Java System Messaging Server (Messaging Server 6):
server-root/log/*
server-root\data\log\default\defaultserver-root\data\log\http\httpserver-root\data\log\imap\imap server-root\data\log\pop\popserver-root\data\log\imta\*
iPlanet Messaging Server (Messaging Server 5):
server-root/msg-identifier/log/default/defaultserver-root/msg-identifier/log/http/httpserver-root/msg-identifier/log/imap/imapserver-root/msg-identifier/log/pop/popserver-root/msg-identifier/log/imta/*
server-root\msg-identifier\log\default\defaultserver-root\msg-identifier\log\http\httpserver-root\msg-identifier\log\imap\imap server-root\msg-identifier\log\pop\popserver-root\msg-identifier\log\imta\*
(Solaris OS only) If you are able to isolate the hanging process, get the following debug data for that process. Otherwise, get the following data for each of the Messaging Server processes.
Using the PID obtained in Step 3, get a series of five of the following commands (one every 10 seconds):
pstack messaging-pid pmap -x messaging-pid
Look for any core file that could have been dumped by one of the Messaging Server processes. If you find one, see To Collect Debug Data on a Messaging Server Crashed Process.
If any of the ims_master, imapd, mshttpd, popd, mboxutil, or reconstruct processes is hung, as the mailsrv user, get the outputs of the db_stat command as follows.
UNIX and Linux:
cd msg-instance/data/store/mboxlist server-root/lib/db_stat -Co -h msg-instance/data/store/mboxlist/ -N server-root/lib/db_stat -t
If you have set the dbtmpdir configutil variable, use that location instead of msg-instance/store/mboxlist/.
For Solaris OS systems only, the script dbhang captures all the following debug data for you. Edit the top of the script to match your system's configuration. Specifically, edit the SRVROOT, INST, MAILUSER, and DBTMPDIR parameters. Then run the script and collect the data. See 1.7 Running the Messaging Server Debugging Scripts for more information.
Windows:
cd msg-instance\data\store\mboxlist server-root\lib\db_stat -Co -h msg-instance\data\store\mboxlist\ -N server-root\lib\db_stat -t
Get the output of the following command.
truss -ealf -rall -wall -vall -o /tmp/truss.out -p messaging-pid
tusc -v -fealT -rall -wall -o /tmp/tusc.out -p messaging-pid
strace -fv -o /tmp/strace.out -p messaging-pid
Use DebugView: http://www.sysinternals.com/Utilities/DebugView.html
Wait one minute after launching the appropriate command (truss, strace, tusc, or DebugView) then stop it by pressing Control-C in the terminal where you launched the command.
Get core files and the output of the following commands.
In a process hang situation, it is helpful to compare several core files to review the state of the threads over time. To not overwrite a core file, copy that core file to a new name, wait approximately one minute then rerun the following commands. Do this three times to obtain three core files.
For HP-UX, you need the following two patches to use the gcore command: PHKL_31876 and PHCO_32173. If you cannot install these patch, use the HP-UX /opt/langtools/bin/gdb command from version 3.2 and later, or the dumpcore command.
cd server-root/bin/slapd/servergcore -o /tmp/messaging_process-core messaging-pidpstack /tmp/messaging_process-core
# gcore -p messaging-pid (gdb) attach messaging-pid Attaching to process messaging-pid No executable file name was specified (gdb) dumpcore Dumping core to the core file core.messaging-pid (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: , process messaging-pid |
# gdb (gdb) attach messaging-pid Attaching to process messaging-pid No executable file name was specified (gdb) gcore Saved corefile core.messaging-pid (gdb)backtrace (gdb)quit |
Get the MESSAGING process PID:
C:\windbg-root>tlist.exe
Generate a crash dump on the MESSAGING running process PID:
C:\windbg-root>adplus.vbs -hang -p messaging-pid -o C:\crashdump_dir
For Windows, provide the complete generated folder under C:\crashdump_dir.
(Solaris OS only) Archive the result of the script pkg_app (one core file is sufficient).
./pkg_app.ksh pid-of-application corefile
The Sun Support Center must have the output from the pkg_app script to properly analyze the core file(s).
Make sure the appropriate limitations are set by using the ulimit command, and that the user is not nobody. Also check the coreadm command for additional control. See 1.6 Configuring Solaris OS to Generate Core Files if a core file is not generated.
When you have collected all debug data, perform the following steps to restore the service.
Messaging Server processes usually hang because of an orphan lock left in one of the databases. Stopping the server (especially the stored process), and cleaning the temporary shared db files helps to resolve the problem.
Stop Messaging Server.
cd server-root/sbin./stop-msg
Make sure that all Messaging Server processes stopped.
Wait one minute, then kill any remaining processes, except tcp_smtp_server processes (which do not use databases).
Restart Messaging Server.
./start-msg
After restarting the services, check the logs for any unexpected behavior.