Sun Gathering Debug Data for Sun Java System Portal Server

ProcedureTo Collect Debug Data on a Hung or Unresponsive Portal Server Process

A process hang is defined as one of the Portal Server processes not responding to requests anymore while the process is still running locally. The Portal Server processes are:

  1. Collect the general system information as explained in To Collect Debug Data on Portal Server Installation Problems.

  2. (Secure Remote Access only) Consult the following information.


    http://sunsolve.sun.com/search/document.do?assetkey=1-25-75583-1&searchclause=75583

    If the problem persists after using this information, then continue with this procedure to collect the necessary data for the Sun Support Center.

  3. (Secure Remote Access only) Can you connect to the Portal Server host when you bypass the gateway host?

    If yes, the gateway java process is hung. Collect the debug data that follows on this process and not on the Portal Server container process.

  4. Get the pid of the Web Server process.

    Solaris OS and HP-UX

    ps -ef | grep uxwdogThe result will give you the PID of the uxwdog daemon, for example, 11449:

    Linux

    ps -ef | grep ns-hhtpd

    Windows

    C:\windbg-root>tlist.exe

    For example, on Solaris OS:


    # ptree 11449
    11449 ./uxwdog -d /prods/crypto/60SP6/https-sun/config
       11450 ns-httpd -d /prods/crypto/60SP6/https-sun/config
         11451 ns-httpd -d /prods/crypto/60SP6/https-sun/config

    You want to gather data on the highest PID process, which in this example is 11451. The Web processes is either ns-hhtpd or webservd depending on the Web Server version.

  5. Note the day and time that the process hang occurred.

  6. Get the output of the following command.

    UNIX and Linux

    netstat -an | grep web-port (or gateway-port)

    Windows

    netstat -an | web-port (or gateway-port)

  7. For Solaris OS systems, the iwshang script gathers all the following debug data for you, except the output of the pkg_app script.

    You must run the pkg_app script as indicated on one of core files generated by the iwshang script. Be sure to launch the iwshang script on the valid PID. For HP-UX, Linux, and Windows platforms, or if you do not have the iwshang script, continue with the remaining steps. See To Run the iwshang Script for more information.

  8. Run the following commands and save the output.

    Solaris OS

    ps -ef | grep server-rootvmstat 5 5iostat -xtopuptime

    HP-UX

    ps -aux | grep server-rootvmstat 5 5iostat -xtopsar

    Linux

    ps -aux | grep server-rootvmstat 5 5topuptimesar

    Windows

    Obtain the WEB process PID: C:\windbg-root>tlist.exe

    Obtain process details of the WEB running process PID: C:\windbg-root>tlist.exe web-pid

  9. Get the swap information.

    Solaris OS

    swap -l

    HP-UX

    swapinfo

    Linux

    free

    Windows

    Already provided in C:\report.txt as described in To Collect Debug Data on Portal Server Installation Problems.

  10. For Unix-Linux systems, if you are able to isolate the hanging process, get the following debug data for that process. Otherwise, get the following data for each of the Web Server processes. For Windows systems, get the following data for the webservd.exe or ns-httpd.exe process.

    1. For Solaris OS only, using the PID obtained in Step 4, get a series of five of the following commands (one every 10 seconds):

      pstack web-pid

      pmap -x web-pid

    2. For Solaris OS only, get the output of the following commands:

      prstat -L -p web-pid

      pmap web-pid

      pfiles web-pid

  11. Get the output of the following command.

    Solaris OS

    truss -ealf -rall -wall -vall -o /tmp/web-pid.truss -p web-pid

    HP-UX

    tusc -v -fealT -rall -wall -o /tmpweb-pid.tusc -p web-pid

    Linux

    strace -fv -o /tmp/web-pid.strace -p web-pid

    Windows

    Use DebugView: http://www.sysinternals.com/Utilities/DebugView.html


    Note –

    Wait one minute after launching the appropriate command (truss, strace, tusc, or DebugView) then stop it by pressing Control-C in the terminal where you launched the command.


  12. Get the the Directory Server Access, Errors, and Audit logs used by Portal Server.

    UNIX and Linux

    server-root/slapd-identifier/logs/accessserver-root/slapd-identifier/logs/errors server-root/slapd-identifier/logs/audit (if enabled)

    Windows

    server-root\slapd-identifier\logs\accessserver-root\slapd-identifier\logs\errors server-root\slapd-identifier\logs\audit (if enabled)

  13. Get core files and the output of the following commands.

    In a process hang situation, it is helpful to compare several core files to review the state of the threads over time. To not overwrite a core file, copy that core file to a new name, wait approximately one minute then rerun the following commands. Do this three times to obtain three core files.


    Note –

    For HP-UX, you need the following two patches to use the gcore command: PHKL_31876 and PHCO_32173. If you cannot install these patch, use the HP-UX /opt/langtools/bin/gdb command from version 3.2 and later, or the dumpcore command.


    Solaris OS

    cd server-root/bin/https/bingcore -o /tmp/web_process-core Archive the result of the pkg_app script:./pkg_app.ksh PID-of-application corefile

    The output of the pkg_app script is required to analyze the core files.


    Note –

    Make sure that you have set the size of the core dumps to unlimited by running the ulimit command. and that the user is not nobody. Also, check the coreadm command for additional control. See 1.6 Configuring Solaris OS to Generate Core Files if a core file isn't generated.


    HP-UX

    # gcore -p web-pid
    (gdb) attach web-pid
    Attaching to process web-pid
    No executable file name was specified
    (gdb) dumpcore
    Dumping core to the core file core.web-pid
    (gdb) quit
    The program is running. Quit anyway (and detach it)? (y or n) y
    Detaching from program: , process web-pid
    

    The file core.web-pid should be generated in the https-instance/config directory.

    Linux

    cd server-root/bin/https/bin
    # gdb
    (gdb) attach web-pid
    Attaching to process web-pid
    No executable file name was specified
    (gdb) gcore
    Saved corefile core.web-pid
    
    (gdb)backtrace
    (gdb)quit
    
    Windows

    Get the WEB process PID:

    C:\windbg-root>tlist.exe

    Generate a crash dump on the WEB running process PID:

    C:\windbg-root>adplus.vbs -hang -p web-pid -o C:\crashdump_dir


    Note –

    For Windows, provide the complete generated folder under C:\crashdump_dir.


  14. Get the Access Manager configuration file.

    UNIX and Linux

    /opt/SUNWam/lib/AMConfig.properties

    Windows

    access-manager-server-root\lib\AMConfig.properties

  15. Get the Access Manager log files.

    UNIX and Linux

    /var/opt/SUNWam/*

    Windows

    access-manager-server-root\debug\*

  16. Get network trace files between the gateway and the portal hosts, and between the client and the portal host.

    Make sure that all the data collection is done over the same time frame in which you had the problem. Try to indicate the hung process if possible.


    Note –

    Indicate clearly all IP addresses and host names for each component to correctly read these network traces.


    Solaris OS

    snoop -V -vvv -d interface-o /tmp/gw-snoop-portal ip-portal-server

    HP-UX

    tcpdump -i interface -w /tmp/gw-snoop-portal ip-portal-serverThe tcpdump command is available here:http://hpux.connect.org.ukYou can use the native nettl command too.

    Linux

    tethereal -V -F snoop -i interface -w /tmp/gw-snoop-portal ip-portal-server


    Note –

    The tethereal command already should be installed. If not, get it from the following location: http://www.ethereal.com. You can also use the ethereal GUI or the tcpdump command.


    Windows

    tethereal -vvv -i interface -w /tmp/gw-snoop-portal host ip-portal-server


    Note –

    The tethereal command is available at the following location: http://www.ethereal.com. You can also use the ethereal GUI.