This section describes the kinds of debug data that you need to provide based on the kind of problem you are experiencing.
This section contains the following tasks:
To Collect Required Debug Data for Any Portal Server Problem
To Collect Debug Data on Portal Server Installation Problems
To Collect Debug Data on a Hung or Unresponsive Portal Server Process
All problems described in this technical note need basic information collected about when the problem occurred and about the system having the problem. Use this task to collect that basic information.
For problems with Portal Server Secure Remote Access (gateway), you need to collect data from both the portal and gateway hosts if they are separate (the usual configuration in a production environment). If possible, provide the output from Sun Explorer Data Collector (SUNWexplo) of the machine where the problem occurred.
Note the day(s) and time(s) the problem occurred.
Provide a graphical representation of your deployment. Include all hosts and IP addresses, host names, operating system versions, role they perform, and other important systems such as load balancers, firewalls, and so forth.
For Solaris OS systems, use the ps6info.sh script to gather all the necessary information. For HP-UX, Linux, and Windows platforms, or if you do not have the ps6info.sh script, continue with the remaining steps.
Note the operating system.
uname -a
uname -r
more /etc/redhat-release
C:\Program Files\Common files\Microsoft Shared\MSInfo\msinfo32.exe /report C:\report.txt
Note the patch level.
patchadd -p
swlist
rpm -qa
Already provided in the C:\report.txt file above.
Get the /etc/opt/SUNWps/.version file (or .version-sra for the Portal Server Secure Remote Access).
Note the web container (Sun Java System Web Server, Sun Java System Application Server, BEA WebLogic, or IBM WebSphere).
Get the log files.
/var/opt/SUNWps/*
server-root\instance-dir\portal\logs\*
If possible, provide just the relevant extracts of log files for the same time period that show the problem, with sufficient context to see what else was occurring during the error occurrence and shortly before. Thus for relatively short log files, send the entire log file, whereas for long-running hence large log files, an extract might be more appropriate, though be sure to include all the material from the time of the error as well as at least some lead-in logging from before the error apparently occurred.
Get the configuration file.
more /etc/opt/SUNWps/.version (or .version-sra for Secure Remote Access)
more server-root\.version (or .version-sra for Secure Remote Access)
Follow these steps if you are unable to complete the installation or if you get a “failed” installation status for Portal Server.
Consult the following troubleshooting information:
Sun Java System Portal Server 2005Q4 :Chapter 9, Troubleshooting, in Sun Java Enterprise System 2005Q4 Installation Guide for UNIX
Sun Java System Portal Server 2005Q1:Chapter 13, “Troubleshooting,” in Sun Java Enterprise System 2005Q1 Installation Guide
Sun Java System Portal Server 2004Q2:Chapter 11, “Troubleshooting,” in Sun Java Enterprise System 2004Q2 Installation Guide
Sun ONE Portal Server:Chapter 9, “Troubleshooting Installation Problems,” in Sun Java Enterprise System 2003Q4 Installation Guide
If the problem persists after using this troubleshooting information, then continue with this procedure to collect the necessary data for the Sun Support Center.
Collect the general system information as explained in To Collect Required Debug Data for Any Portal Server Problem.
Specify if this is a first-time installation or a Hot Fix installation on a pre-existing Sun ONE Portal Server instance.
Get the installation logs.
On Solaris OS systems and Sun ONE Portal Server (Portal Server 6.0 and 6.1) systems, get the following logs:
/var/sadm/install/logs/pssetup.install.pid/*
On Windows systems, if you chose the Config Later option during the installation, provide the following log files:
server-root\instance_dir\portal\config\logs\*
Get the install error messages.
/var/sadm/install/logsThe log file names start with Java_Enterprise_System*_install.Bdatetime, where date and time correspond to the failing installing (for example, B12161532).
/var/opt/sun/install/logsThe log file names start with Java_Enterprise_System*_install.Bdatetime, where date and time correspond to the failing installing (for example, B12161532).
C:\DocumentsandSettings\current-user\LocalSettings\TempThe log file names start with MSI*.log (usually a text file). The asterisk (*) represents a random number in the Temp directory for each MSI based setup.
A process hang is defined as one of the Portal Server processes not responding to requests anymore while the process is still running locally. The Portal Server processes are:
appservd: When Portal Server is hosted on Sun Java System Application Server
webservd: When Portal Server is hosted on Sun Java System Web Server
java process: Secure Remote Access (gateway)
Collect the general system information as explained in To Collect Debug Data on Portal Server Installation Problems.
(Secure Remote Access only) Consult the following information.
http://sunsolve.sun.com/search/document.do?assetkey=1-25-75583-1&searchclause=75583 |
If the problem persists after using this information, then continue with this procedure to collect the necessary data for the Sun Support Center.
(Secure Remote Access only) Can you connect to the Portal Server host when you bypass the gateway host?
If yes, the gateway java process is hung. Collect the debug data that follows on this process and not on the Portal Server container process.
Get the pid of the Web Server process.
ps -ef | grep uxwdogThe result will give you the PID of the uxwdog daemon, for example, 11449:
ps -ef | grep ns-hhtpd
C:\windbg-root>tlist.exe
For example, on Solaris OS:
# ptree 11449 11449 ./uxwdog -d /prods/crypto/60SP6/https-sun/config 11450 ns-httpd -d /prods/crypto/60SP6/https-sun/config 11451 ns-httpd -d /prods/crypto/60SP6/https-sun/config |
You want to gather data on the highest PID process, which in this example is 11451. The Web processes is either ns-hhtpd or webservd depending on the Web Server version.
Note the day and time that the process hang occurred.
Get the output of the following command.
netstat -an | grep web-port (or gateway-port)
netstat -an | web-port (or gateway-port)
For Solaris OS systems, the iwshang script gathers all the following debug data for you, except the output of the pkg_app script.
You must run the pkg_app script as indicated on one of core files generated by the iwshang script. Be sure to launch the iwshang script on the valid PID. For HP-UX, Linux, and Windows platforms, or if you do not have the iwshang script, continue with the remaining steps. See To Run the iwshang Script for more information.
Run the following commands and save the output.
ps -ef | grep server-rootvmstat 5 5iostat -xtopuptime
ps -aux | grep server-rootvmstat 5 5iostat -xtopsar
ps -aux | grep server-rootvmstat 5 5topuptimesar
Obtain the WEB process PID: C:\windbg-root>tlist.exe
Obtain process details of the WEB running process PID: C:\windbg-root>tlist.exe web-pid
Get the swap information.
swap -l
swapinfo
free
Already provided in C:\report.txt as described in To Collect Debug Data on Portal Server Installation Problems.
For Unix-Linux systems, if you are able to isolate the hanging process, get the following debug data for that process. Otherwise, get the following data for each of the Web Server processes. For Windows systems, get the following data for the webservd.exe or ns-httpd.exe process.
Get the output of the following command.
truss -ealf -rall -wall -vall -o /tmp/web-pid.truss -p web-pid
tusc -v -fealT -rall -wall -o /tmpweb-pid.tusc -p web-pid
strace -fv -o /tmp/web-pid.strace -p web-pid
Use DebugView: http://www.sysinternals.com/Utilities/DebugView.html
Wait one minute after launching the appropriate command (truss, strace, tusc, or DebugView) then stop it by pressing Control-C in the terminal where you launched the command.
Get the the Directory Server Access, Errors, and Audit logs used by Portal Server.
server-root/slapd-identifier/logs/accessserver-root/slapd-identifier/logs/errors server-root/slapd-identifier/logs/audit (if enabled)
server-root\slapd-identifier\logs\accessserver-root\slapd-identifier\logs\errors server-root\slapd-identifier\logs\audit (if enabled)
Get core files and the output of the following commands.
In a process hang situation, it is helpful to compare several core files to review the state of the threads over time. To not overwrite a core file, copy that core file to a new name, wait approximately one minute then rerun the following commands. Do this three times to obtain three core files.
For HP-UX, you need the following two patches to use the gcore command: PHKL_31876 and PHCO_32173. If you cannot install these patch, use the HP-UX /opt/langtools/bin/gdb command from version 3.2 and later, or the dumpcore command.
cd server-root/bin/https/bingcore -o /tmp/web_process-core Archive the result of the pkg_app script:./pkg_app.ksh PID-of-application corefile
The output of the pkg_app script is required to analyze the core files.
Make sure that you have set the size of the core dumps to unlimited by running the ulimit command. and that the user is not nobody. Also, check the coreadm command for additional control. See 1.6 Configuring Solaris OS to Generate Core Files if a core file isn't generated.
# gcore -p web-pid (gdb) attach web-pid Attaching to process web-pid No executable file name was specified (gdb) dumpcore Dumping core to the core file core.web-pid (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: , process web-pid |
The file core.web-pid should be generated in the https-instance/config directory.
cd server-root/bin/https/bin # gdb (gdb) attach web-pid Attaching to process web-pid No executable file name was specified (gdb) gcore Saved corefile core.web-pid (gdb)backtrace (gdb)quit |
Get the WEB process PID:
C:\windbg-root>tlist.exe
Generate a crash dump on the WEB running process PID:
C:\windbg-root>adplus.vbs -hang -p web-pid -o C:\crashdump_dir
For Windows, provide the complete generated folder under C:\crashdump_dir.
Get the Access Manager configuration file.
/opt/SUNWam/lib/AMConfig.properties
access-manager-server-root\lib\AMConfig.properties
Get the Access Manager log files.
/var/opt/SUNWam/*
access-manager-server-root\debug\*
Get network trace files between the gateway and the portal hosts, and between the client and the portal host.
Make sure that all the data collection is done over the same time frame in which you had the problem. Try to indicate the hung process if possible.
Indicate clearly all IP addresses and host names for each component to correctly read these network traces.
snoop -V -vvv -d interface-o /tmp/gw-snoop-portal ip-portal-server
tcpdump -i interface -w /tmp/gw-snoop-portal ip-portal-serverThe tcpdump command is available here:http://hpux.connect.org.ukYou can use the native nettl command too.
tethereal -V -F snoop -i interface -w /tmp/gw-snoop-portal ip-portal-server
The tethereal command already should be installed. If not, get it from the following location: http://www.ethereal.com. You can also use the ethereal GUI or the tcpdump command.
tethereal -vvv -i interface -w /tmp/gw-snoop-portal host ip-portal-server
The tethereal command is available at the following location: http://www.ethereal.com. You can also use the ethereal GUI.
Use this task to collect data when a Portal Server process has stopped (crashed) unexpectedly. Run all the commands on the actual machine where the core file(s) were generated.
Collect the general system information as explained in To Collect Required Debug Data for Any Portal Server Problem.
Get the output of the following commands.
ps -ef | grep server-rootvmstat 5 5iostat -xtopuptime
ps -aux | grep server-rootvmstat 5 5iostat -xtopsar
ps -aux | grep server-rootvmstat 5 5topuptimesar
Obtain the PROXY process PID: C:\windbg-root>tlist.exe
Obtain process details of the PROXY running process PID: C:\windbg-root>tlist.exe proxy-pid
Get the swap information.
swap -l
swapinfo
free
Already provided in C:\report.txt as described in To Collect Required Debug Data for Any Portal Server Problem.
Get the system logs.
/var/adm/messages/var/log/syslog
/var/adm/syslog/syslog.log
Event log files:Start-> Settings-> Control Panel —> Event Viewer-> Select LogThen click Action-> Save log file as
Get the the Directory Server Access, Errors, and Audit logs used by Portal Server.
server-root/slapd-identifier/logs/accessserver-root/slapd-identifier/logs/errors server-root/slapd-identifier/logs/audit (if enabled)
server-root\slapd-identifier\logs\accessserver-root\slapd-identifier\logs\errors server-root\slapd-identifier\logs\audit (if enabled)
Get the Access Manager configuration file.
/opt/SUNWam/lib/AMConfig.properties
access-manager-server-root\lib\AMConfig.properties
Get the Access Manager log files.
/var/opt/SUNWam/*
access-manager-server-root\debug\*
Get core files (called “Crash Dumps” by Windows).
See 1.6 Configuring Solaris OS to Generate Core Files if a core file was not generated.
Core dumps are turned off by default in the /etc/profile file. You can make per user changes by editing your ~/.bash_profile file. Look for the following line:
ulimit -S -c 0 > /dev/null 2>&1
You can either comment out the entire line to set no limit on the size of the core files or set your own maximum size.
Generate a crash dump during a crash of Portal Server by using the following commands:
Get the PORTAL process PID : C:\windbg-root>tlist.exeGenerate a crash dump when the PORTAL process crashes: C:\windbg-root>adplus.vbs -crash -FullOnFirst -p portal-pid -o C:\crashdump_dir
The adplus.vbs command watches portal-pid until it crashes and will generate the dmp file. Provide the complete generated folder under C:\crashdump_dir.
If you didn't install the Debugging Tools for Windows, you can use the drwtsn32.exe -i command to select Dr. Watson as the default debugger. Use the drwtsn32.exe command, check all options, and choose the path for crash dumps. Then provide the dump and the drwtsn32.log files.
(Solaris OS only) For each core file, provide the output of the following commands.
file corefile pstack corefile pmap corefile pflags corefile
(Solaris OS only) Archive the result of the script pkg_app (one core file is sufficient).
./pkg_app.ksh Pid-of-application corefile
The Sun Support Center must have the output from the pkg_app script to properly analyze the core file(s).