Sun MPI 4.0 Programming and Reference Guide

Debugging

Debugging parallel programs is notoriously difficult, since you are in effect debugging a program potentially made up of many distinct programs executing simultaneously. Even if the application is an SPMD one (single process, multiple data), each instance may be executing a different line of code at any instant. Prism eases the debugging process considerably.

Prism is recommended for debugging in the Sun HPC ClusterTools environment. However, if you need to debug multithreaded Sun MPI programs at the thread level, you should see "Debugging With dbx". See also "Debugging With MPE", if you are using the multiprocessing environment (MPE) from Argonne National Laboratory.

Debugging With Prism

This section provides a brief introduction to the Prism development environment. For complete information about Prism, see the Prism 6.0 User's Guide.

Prism can debug only one Sun MPI job at a time. Therefore, if an MPI job spawns or connects to another job (using MPI_Comm_accept and MPI_Comm_connect to implement client/server communication, for example, or MPI_Comm_spawn to spawn jobs), the Prism session nonetheless has control of only the original MPI job to which it is attached. For example, a Prism session debugging a server job cannot also debug the clients of that job.

To use Prism to debug a Sun MPI program, the program must be written in the SPMD (single process, multiple data) style -- that is, all processes that make up a Sun MPI program must be running the same executable.


Note -

MPI_Comm_spawn_multiple can create multiple executables with only one job id. You cannot use Prism to debug jobs with different executables that have been spawned with this command.


Starting Up Prism


Note -

To debug a Sun MPI program with Prism, you need to have compiled your program using one of the compilers included in either the Sun Performance WorkShop Fortran or Sun Performance WorkShop C++/C suite of tools.


To start Prism on a Sun MPI program, use the -n option to bsub to specify how many processors you want to run on. For example,

% prism -n 4 foo

launches Prism on executable foo with four processes.

This starts up a graphical version of Prism with your program loaded. You can then debug and visualize data in your Sun MPI program.

You can also attach Prism to running processes. First determine the job id (not the individual process id), or jid, using either bsub (in LSF) or mpps (in the CRE). (See the LSF Batch User's Guide for further information about bjobs. See the Sun MPI 4.0 User`s Guide: With CRE for further information about mpps.) Then specify the jid at the command line with the -n (or -np, -c, -p) option:

% prism -n 4 foo 12345

This will launch Prism and attach it to the processes running in job 12345.


Note -

To run graphical Prism, you must be running Solaris 2.6 or Solaris 7 with either OpenWindows(TM) or the Common Desktop Environment (CDE), and with your DISPLAY environment variable set correctly. See the Prism 6.0 User's Guide for information.


One important feature of Prism is that it lets you debug the Sun MPI program at any level of detail. You can look at the program as a whole or at subsets of processes within the program (for example, those that have an error condition), or at individual processes, all within the same debugging session. For complete information, see the Prism 6.0 User's Guide.

Debugging With dbx

To debug your multithreaded program at the thread level, you can use dbx. The following example illustrates this method of debugging with LSF Suite.

To Debug Threads With dbx
  1. Add a variable to block the process until you attach with dbx.

    In this sample program, simple-comm, the wait_for_dbx variable is set to 1 to create a wait loop. It is placed before the function or functions to be debugged.


    Example 3-3 Debugging a Multithreaded Sun MPI Program With dbx

    #include <stdio.h>
    #include "mpi.h"
    
    void
    main( int argc, char **argv )
    {
      MPI_Comm comm_dup;
      int error;
      int wait_for_dbx = 1;
    
      if((error = (MPI_Init(&argc, &argv))) != MPI_SUCCESS)
    {
        printf("Bad Init\n");
        exit(-1);
      }
      
      while (wait_for_dbx);
    
      error = MPI_Comm_dup(MPI_COMM_WORLD, &comm_dup);
      if (error != MPI_SUCCESS) {
        printf("Bad Dup\n");
        exit(-1); 
      }
      
      error = MPI_Comm_free(&comm_dup);
      if (error != MPI_SUCCESS) {
        printf("Bad Comm free\n");
        exit(-1); 
      }
      
      MPI_Finalize();
    }

  2. Compile the code, then run it.

    After compiling the program, run it using bsub. (See the LSF Batch User's Guide for more information.)

    % bsub -n 4 -Ip simple-comm
    
  3. Identify the processes to which you want to attach the debugger.

    Use bjobs to obtain information about the processes in the task. (See the LSF Batch User's Guide for more about getting information about processes.)

  4. Attach the debugger to the processes that you would like to debug.

    Attach the debugger to the processes. If you would like to debug only a subset of the processes, you must set up a conditional in such a way that the while statement is executed in only the process(es) that you will debug.

    % dbx simple-comm 10838
    Attached to process 10838 with 2 LWPs
    t@1 (l@1) stopped in main at line 18 in file "simple-comm.c"
       18     while (wait_for_dbx);
  5. Set your variable such that it will allow the process to unblock.

    At the dbx prompt, use assign to change the value of the variable (here wait_for_dbx) and, hence, unblock the processes.

    (dbx) assign wait_for_dbx = 0
    
  6. Debug the processes.

    After you have attached and set the instrumentation code appropriately, you can start debugging the processes as you normally would with dbx.

Debugging With MPE

The multiprocessing environment (MPE) available from Argonne National Laboratory includes a debugger that can also be used for debugging at the thread level. For information about obtaining and building MPE, see "MPE: Extensions to the Library".