C H A P T E R  3

Getting Started

This chapter explains how to develop, compile and link, execute, and debug a Sun MPI program. The chapter focuses on what is specific to the Sun MPI implementation and does not repeat information that can be found in related documents. Information about programming with the Sun MPI I/O routines is in Chapter 4.


Header Files

Include syntax must be placed at the top of any program that calls Sun MPI routines.

#include <mpi.h>

INCLUDE 'mpif.h'

These lines enable the program to access the Sun MPI version of the mpi header file, which contains the definitions, macros, and function prototypes required when compiling the program. Ensure that you are referencing the Sun MPI include file.

The include files are usually found in /opt/SUNWhpc/include/ or /opt/SUNWhpc/include/v9/for SPARC-based systems. For x64-based systems, the files reside in /opt/SUNWhpc/include/amd64.If the compiler cannot find them, verify that they exist and are accessible from the machine on which you are compiling your code. The location of the include file is specified by the -l compiler option (see Compiling and Linking).

Sample Code

Three simple Sun MPI programs are available in /opt/SUNWhpc/examples/mpi and are included here in their entirety. In the same directory you will find the Readme file, which provides instructions for using the examples, and the make file Makefile.


CODE EXAMPLE 3-1 Simple Sun MPI Program in C: connectivity.c
/*
 * Test the connectivity between all processes.
 */
 
#pragma ident "@(#)connectivity.c 1.1 06/03/02"
 
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <netdb.h>
#include <unistd.h>
 
#include <mpi.h>
 
int
main(int argc, char **argv)
{
    MPI_Status  status;
    int         verbose = 0;
    int         rank;
    int         np;	                 /* number of processes in job */
    int         peer;
    int         i;
    int         j;
 
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &np);
 
    if (argc>1 && strcmp(argv[1], "-v")==0)
        verbose = 1;
 
    for (i=0; i<np; i++) {
        if (rank==i) {
            /* rank i sends to and receives from each higher rank */
            for(j=i+1; j<np; j++) {
                if (verbose)
                  printf("checking connection %4d <-> %-4d\n", i, j);
                MPI_Send(&rank, 1, MPI_INT, j, rank, MPI_COMM_WORLD);
               MPI_Recv(&peer, 1, MPI_INT, j, j, MPI_COMM_WORLD, &status);
            }
        } else if (rank>i) {
            /* receive from and reply to rank i */
          MPI_Recv(&peer, 1, MPI_INT, i, i, MPI_COMM_WORLD, &status);
            MPI_Send(&rank, 1, MPI_INT, i, rank, MPI_COMM_WORLD);
        }
    }
 
    MPI_Barrier(MPI_COMM_WORLD);
    if (rank==0)
        printf("Connectivity test on %d processes PASSED.\n", np);
 
    MPI_Finalize();
    return 0;
}

CODE EXAMPLE 3-2 Simple Sun MPI Program in Fortran: monte.f
!
! Estimate pi via Monte-Carlo method.
! 
! Each process sums how many of samplesize random points generated 
! in the square (-1,-1),(-1,1),(1,1),(1,-1) fall in the circle of 
! radius 1 and center (0,0), and then estimates pi from the formula
! pi = (4 * sum) / samplesize.
! The final estimate of pi is calculated at rank 0 as the average of 
! all the estimates.
!
        program monte
 
        include 'mpif.h'
 
        double precision drand
        external drand
 
        double precision x, y, pi, pisum
        integer*4 ierr, rank, np
        integer*4 incircle, samplesize
 
        parameter(samplesize=2000000)
 
        call MPI_INIT(ierr)
        call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
        call MPI_COMM_SIZE(MPI_COMM_WORLD, np, ierr)
 
!       seed random number generator
        x = drand(2 + 11*rank)
 
        incircle = 0
        do i = 1, samplesize
           x = drand(0)*2.0d0 - 1.0d0     ! generate a random point
           y = drand(0)*2.0d0 - 1.0d0
 
           if ((x*x + y*y) .lt. 1.0d0) then
              incircle = incircle+1       ! point is in the circle
           endif
        end do
 
        pi = 4.0d0 * DBLE(incircle) / DBLE(samplesize)
 
!       sum estimates at rank 0
         call MPI_REDUCE(pi, pisum, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 
      &         0 , MPI_COMM_WORLD, ierr)
 
        if (rank .eq. 0) then
!          final estimate is the average
           pi = pisum / DBLE(np)
              print '(A,I4,A,F8.6,A)','Monte-Carlo estimate of pi by ',np,
      &          ' processes is ',pi,'.'
        endif
 
        call MPI_FINALIZE(ierr)
        end

  CODE EXAMPLE 3-3 Simple Sun MPI Program in C++: prime.cc

/*
 * Copyright 2006 Sun Microsystems, Inc. All rights reserved.
 * Use is subject to license terms.
 *
 * Sun, Sun Microsystems, the Sun logo, Sun HPC ClusterTools, Sun PFS,
 * Sun C++, Sun MPI, Prism, Sun Prism, and all Sun-based trademarks and logos,
 * are trademarks or registered trademarks of Sun Microsystems, Inc. in
 * the United States and other countries.
 */
#pragma ident "@(#)prime.cc	1.2	06/03/06 SMI"
#include <stdio.h>
#include <mpi++.h>
#define BUFF_SIZE 10
#define ROOT 0
/*
 * prototypes
 */
int primeset(int);
/*
 * main
 *
 * Description:	Each non-root rank sends a list of numbers to root to
 * 		be tested if any lie in the set of prime numbers. Report
 *		the results.
 */
main(int argc, char **argv)
{
int 		rank, size;
int 		list[BUFF_SIZE];
int 		i, j;
MPI::Status 	status;
MPI::Init(argc, argv);
rank = MPI::COMM_WORLD.Get_rank();
size = MPI::COMM_WORLD.Get_size();
if (rank != ROOT) {
/* create list to be tested */
for(i=0; i< BUFF_SIZE; i++)
list[i] = rank*10 + i;
/* send list to ROOT and report those numbers that are in the prime set */
MPI::COMM_WORLD.Send(list, BUFF_SIZE, MPI::INT, ROOT, 22);
MPI::COMM_WORLD.Recv(list, BUFF_SIZE, MPI::INT, ROOT, 22, status);
printf("Rank %d - prime set:: ", rank);
for (i=0; i< BUFF_SIZE; i++) {
if (list[i] > 0)
	printf("%d ",list[i]);
}
printf("\n");
} else {
/* recieve from non-ROOT ranks, test list, and return modified list */
for(j=0; j< (size-1); j++) {
MPI::COMM_WORLD.Recv(list, BUFF_SIZE, MPI::INT, MPI::ANY_SOURCE,
			
MPI::ANY_TAG, status);
for(i=0; i< BUFF_SIZE; i++)
	list[i] = primeset(list[i]);
MPI::COMM_WORLD.Send(list, BUFF_SIZE, MPI::INT, status.Get_source(),
			
status.Get_tag());
}	
}

 
 


Compiling and Linking

Sun MPI programs are compiled with ordinary C, C++, or Fortran compilers, just like any other C, C++, or Fortran program, and linked with the Sun MPI library.

The mpf77, mpf90, mpcc, and mpCC utilities can be used to compile Fortran 77, Fortran 90, C, and C++ programs, respectively. For example, you might use the following entry to compile a Fortran 77 program that uses Sun MPI:


% mpf77 -fast -xarch=v9 -o a.out a.f -lmpi

See the man pages for more information on these utilities.

For performance, the single most important compilation switch is -fast. This is a macro that expands to settings appropriate for high performance for a general set of circumstances. Because its expansion varies from one compiler release to another, you might prefer to specify the underlying switches. To see what -fast expands to, use -v for "verbose" compilation output in Fortran, and -# for C. Also, -fast assumes native compilation, so you should compile on UltraSPARCtrademark processors.

The next important compilation switch is -xarch. The Sun Studio Compiler Collection compilers set -xarch by default when you select -fast for native compilations. If you plan to compile on one type of processor and run the program on another type (non-native compilation), be sure to use the -xarch flag. You should also use it to compile in 64-bit mode. To compile in 64-bit mode on UltraSPARC processors, specify -xarch=v9. For AMD Opteron x64 processors, specify -xarch=amd64.

For more information, see the Sun HPC ClusterTools Software Performance Guide and the documents that came with your compiler.

The Sun Studio Compiler Collection software releases 8, 9, 10, and 11 are supported for the Sun HPC ClusterTools 6 suite.

Sun MPI programs compiled using the Sun Studio Compiler Collection Fortran compiler should be compiled with -xalias=actual. The
-xalias=actual workaround requires patch 111718-01 (which requires 111714-01).

This recommendation arises because the MPI Fortran binding is inconsistent with the Fortran 90 standard in several respects. Specifically, this is documented in the MPI 2 standard, which you can find on the World Wide Web:

http://www-unix.mcs.anl.gov/mpi/mpi-standard/mpi-report-2.0/node19.htm#Node19

This recommendation applies to the use of high levels of compiler optimization. A highly optimizing Fortran compiler could break MPI codes that use nonblocking operations.

The failure modes are varied and insidious and include the following:



Note - For the Fortran interface, the -dalign option is necessary to avoid the possibility of bus errors. (The underlying C or C++ routines in Sun MPI internals assume that parameters and buffer types passed as REALs are double-aligned.)





Note - If your program has previously been linked to any static libraries, you must relink it to libmpi.so before executing it.



Choosing a Library Path

The paths for the MPI libraries, which you must specify when you are compiling and linking your program, are listed in TABLE 3-2.


TABLE 3-2 Sun MPI Libraries

Category

Description

Path: /opt/SUNWhpc/lib/...

32-bit libraries

Default, not thread-safe

libmpi.so

 

C++ (in addition to libmpi.so)

SCx.0/libmpi++.so

 

Thread-safe

libmpi_mt.so

64-bit libraries

Default, not thread-safe

sparcv9/libmpi.so (SPARC)
amd64/libmpi.so (x64)

 

C++ (in addition to sparcv9/libmpi.so)

sparcv9/SCx.0/libmpi++.so (SPARC)
amd64/SCx.0/libmpi++.so (x64)

 

Thread-safe

sparcv9/libmpi_mt.so (SPARC)
amd64/libmpi_mt.so (x64)


Note that x.0 denotes the version of your compiler.

Stubbing Thread Calls

The libthread.so libraries are automatically linked into the respective libmpi.so libraries. This means that any thread-function calls in your program can be resolved by the libthread.so library. Simply omitting libthread.so from the link line does not cause thread calls to be stubbed out; you must remove the thread calls yourself. For more information about the libthread.so library, see its man page. (For the location of Solaris man pages at your site, see your system administrator.)


Profiling With mpprof

If you plan to extract MPI profiling information from the execution of a job, you need to set the MPI_PROFILE environment variable to 1 before you start the job execution.


% setenv MPI_PROFILE 1

If you want to set any other mpprof environment variables, you must set them also before starting the job. See Appendix B for detailed descriptions of the mpprof environment variables.


Basic Job Execution

The CRE environment provides close integration with batch-processing systems, also known as resource managers. You can launch parallel jobs from a batch system to control resource allocation, and continue to use the CRE environment to monitor job status. For a list of currently supported resource managers, see TABLE 3-3.


TABLE 3-3 Currently Supported Resource Managers

Resource manager

Name used with
-x option to mprun

Version

Man page

Sun N1 Grid Engine

sge

N1GE 6

sge_cre.1

PBS

pbs

PBS 2.3.16

pbs_cre.1

 

pbs

PBS Professional 7.1

pbs_cre.1

LSF

lsf

LSF HPC 6.2

lsf_cre.1


To enable the integration between the CRE environment and the supported resource managers, you must call mprun from a script in the resource manager. Use the -x flag to specify the resource manager, and the -np and -nr flags to specify the resources you need. Instructions and examples for each resource manager are provided in the Sun HPC ClusterTools Software User's Guide.

Before starting your job, you might want to set one or more environment variables, which are also described in Appendix B and in the Sun HPC ClusterTools Software Performance Guide.

Executing With CRE

When using CRE software, parallel jobs are launched using the mprun command. For example, to start a job with six processes named mpijob, use this command:


% mprun -np 6 mpijob

Executing With LSF Suite

Parallel jobs can be either launched by the LSF Parallel Application Manager (PAM) or submitted in queues configured to run PAM as the parallel job starter. LSF's bsub command launches both parallel interactive and batch jobs. For example, to start a batch job named mpijob on four CPUs, use this command:


% bsub -n 4 pam mpijob

To launch an interactive job, add the -I argument to the command line. For example, to launch an interactive job named earth on a single CPU in the queue named sun, which is configured to launch jobs with PAM, use this command:


% bsub -q sun -Ip -n 1 earth


Debugging

Debugging parallel programs is notoriously difficult, because you are in effect debugging a program potentially made up of many distinct programs executing simultaneously. Even if the application is an SPMD (single-program, multiple-data) application, each instance can be executing a different line of code at any instant.

Debugging with DTrace

The DTrace utility comes as part of the Solaris 10 OS. DTrace is a comprehensive dynamic tracing utility that you can use to monitor the behavior of applications programs as well as the operating system itself. You can use DTrace on live production systems to understand those systems' behavior and to track down any problems that might be occurring.

For more information about the D language and DTrace, refer to the Solaris Dynamic Tracing Guide (Part Number 817-6223). This guide is part of the Solaris 10 OS Software Developer Collection.

Solaris 10 OS documentation can be found on the web at the following location:

http://www.sun.com/documentation

Follow these links to the Solaris Dynamic Tracing Guide:

Solaris Operating Systems -> Solaris 10 -> Solaris 10 Software Developer Collection

Debugging With TotalView

TotalView is a third-party multiprocess debugger from Etnus that runs on many platforms. Support for using the TotalView debugger on Sun MPI applications includes:

The following sections provide a brief description of how to use the TotalView debugger with Sun MPI applications, including:

Refer to your TotalView documentation for more information about using TotalView.

Limitations

Related Documentation

For more information, refer to the following related documentation:



Note - The example program connectivity used in this section and other sample programs can be found in /opt/SUNWhpc/examples/mpi.



Starting a New Job Using TotalView

You can start a new job from the Total View Graphical User Interface (GUI) using:


procedure icon  To Start a New Job Using GUI Method 1

1. Type:


% totalview mprun [totalview args] -a [mprun args]

For example:


% totalview mprun -bg blue -a -np 4 /opt/SUNWhpc/examples/mpi/connectivity

2. When the GUI appears, type g for go, or click Go in the TotalView window.

TotalView may display a dialog box:


Process mprun is a parallel job. Do you want to stop the job now?

3. Click Yes to open the TotalView debugger window with the Sun MPI source window, if compiled with option -g, and to leave all processes in a traced state.


procedure icon  To Start a New Job Using GUI Method 2

1. Type:


% totalview

2. Select the menu option File and then New Program.

3. Type mprun as the executable name in the dialog box.

4. Click OK.

TotalView displays the main debug window.

5. Select the menu option Process and then Startup Parameters, which are the mprun args.


procedure icon  To Start a New Job Using the CLI

1. Type:


% totalviewcli mprun [totalview args] -a [mprun args]

For example:


% totalviewcli mprun -a -np 4 /opt/SUNWhpc/examples/mpi/connectivity

2. When the job starts, type dgo.

TotalView displays this message:


Process mprun is a parallel job. Do you want to stop the job now?

3. Type y to start the MPI job, attach TotalView, and leave all processes in a traced state.

Attaching to an mprun Job

This section describes how to attach to an already running mprun job from both the TotalView GUI and CLI.


procedure icon  To Attach to a Running Job from the GUI

1. Find the host name and process identifier (PID) of the mprun job by typing:


% mpps -b

mprun displays the PID and host name in a similar manner to this example:


JOBNAME   MPRUN_PID   MPRUN_HOST
cre.99    12345       hpc-u2-9
cre.100   12601       hpc-u2-8

For more information, refer to the mpps(1M) man page, option -b.

2. In the TotalView GUI, select File and then New Program.

3. Type the PID in Process ID.

4. Type mprun in the field Executable Name.

5. Do one of the following:

6. Click OK.


procedure icon  To Attach to a Running Job From the CLI

1. Find the process identifier (PID) of the launched job.

See the example under the preceding GUI procedure. For more information, refer to the mpps(1M) man page, option -b.

2. Start totalviewcli by typing:


% totalviewcli

3. Attach the executable program to the mprun PID:


% dattach mprun mprun_pid

For example:


% dattach mprun 12601 

Launching Sun MPI Batch Jobs Using TotalView

This section describes how to launch Sun MPI batch jobs, including:

This section provides examples of launching batch jobs in Sun Grid Engine (SGE). Refer to Chapter 5 of the Sun HPC ClusterTools Software User's Guide for descriptions of launching batch jobs in the Load Sharing Facility (LSF) and the Portable Batch System (PBS).


procedure icon  To Execute Startup in Batch Mode for the TotalView GUI

Executing startup in batch mode for the TotalView CLI is not practical, because there is no controlling terminal for input and output. This procedure describes executing startup in batch mode for the TotalView GUI:

1. Write a batch script, which contains a line similar to the following:


% totalview mprun -a -x sge /opt/SUNWhpc/examples/mpi/connectivity

2. Then submit the script to SGE for execution with a command similar to the following:


% qsub -pe cre 4 batch_script

The TotalView GUI appears upon successful allocation of resources and execution of the batch script in SGE.


procedure icon  To Use the Interactive Mode

The interactive mode creates an xterm window for your use, so you can use either the TotalView GUI or the CLI.

1. Run the following, or an equivalent path, to source the SGE environment:


% source <sgeroot>/default/common/settings.csh

The system displays an xterm window.

2. Submit an interactive mode job to SGE with a command similar to the following:


% qsh -pe cre 4

3. Execute a typical totalview or totalviewcli command.

TotalView GUI example:


% totalview mprun -a -x sge /opt/SUNWhpc/examples/mpi/connectivity

TotalView CLI example:


% totalviewcli mprun -a -x sge /opt/SUNWhpc/examples/mpi/connectivity

Debugging With MPE

The multiprocessing environment (MPE) available from Argonne National Laboratory includes a debugger that can also be used for debugging at the thread level. For information about obtaining and building MPE, see MPE: Extensions to the Library.