Sun Studio 12 Update 1: Performance Analyzer

MPI Tracing Data

The Collector can collect data on calls to the Message Passing Interface (MPI) library.

MPI tracing is implemented using the open source VampirTrace 5.5.3 release. It recognizes the following VampirTrace environment variables:

VT_STACKS

Controls whether or not call stacks are recorded in the data. The default setting is 1. Setting VT_STACKS to 0 disables call stacks.

VT_BUFFER_SIZE

Controls the size of the internal buffer of the MPI API trace collector. The default value is 64M (64 MBytes).

VT_MAX_FLUSHES

Controls the number of times the buffer is flushed before terminating the experiment. The default value is 1. Set VT_MAX_FLUSHES to 0 to allow an unlimited number of flushes.

VT_VERBOSE

Turns on various error and status messages. The default value is 1, which turns on critical error and status messages. Set the variable to 2 if problems arise.

For more information on these variables, see the Vampirtrace User Manual on the Technische Universität Dresden web site.

MPI events that occur after the buffer limits have been reached are not written into the trace file resulting in an incomplete trace.

To remove the limit and get a complete trace of an application, set the VT_MAX_FLUSHES environment variable to 0. This setting causes the MPI API trace collector to flush the buffer to disk whenever the buffer is full.

To change the size of the buffer, set the VT_BUFFER_SIZE environment variable. The optimal value for this variable depends on the application that is to be traced. Setting a small value increases the memory available to the application, but triggers frequent buffer flushes by the MPI API trace collector. These buffer flushes can significantly change the behavior of the application. On the other hand, setting a large value such as 2G minimizes buffer flushes by the MPI API trace collector, but decreases the memory available to the application. If not enough memory is available to hold the buffer and the application data, parts of the application might be swapped to disk leading to a significant change in the behavior of the application.

The functions for which data is collected are listed below.

MPI_Abort

MPI_Accumulate

MPI_Address

MPI_Allgather

MPI_Allgatherv

MPI_Allreduce

MPI_Alltoall

MPI_Alltoallv

MPI_Alltoallw

MPI_Attr_delete

MPI_Attr_get

MPI_Attr_put

MPI_Barrier

MPI_Bcast

MPI_Bsend

MPI_Bsend-init

MPI_Buffer_attach

MPI_Buffer_detach

MPI_Cancel

MPI_Cart_coords

MPI_Cart_create

MPI_Cart_get

MPI_Cart_map

MPI_Cart_rank

MPI_Cart_shift

MPI_Cart_sub

MPI_Cartdim_get

MPI_Comm_compare

MPI_Comm_create

MPI_Comm_dup

MPI_Comm_free

MPI_Comm_group

MPI_Comm_rank

MPI_Comm_remote_group

MPI_Comm_remote_size

MPI_Comm_size

MPI_Comm_split

MPI_Comm_test_inter

MPI_Dims_create

MPI_Errhandler_create

MPI_Errhandler_free

MPI_Errhandler_get

MPI_Errhandler_set

MPI_Error_class

MPI_Error_string

MPI_File_close

MPI_File_delete

MPI_File_get_amode

MPI_File_get_atomicity

MPI_File_get_byte_offset

MPI_File_get_group

MPI_File_get_info

MPI_File_get_position

MPI_File_get_position_shared

MPI_File_get_size

MPI_File_get_type_extent

MPI_File_get_view

MPI_File_iread

MPI_File_iread_at

MPI_File_iread_shared

MPI_File_iwrite

MPI_File_iwrite_at

MPI_File_iwrite_shared

MPI_File_open

MPI_File_preallocate

MPI_File_read

MPI_File_read_all

MPI_File_read_all_begin

MPI_File_read_all_end

MPI_File_read_at

MPI_File_read_at_all

MPI_File_read_at_all_begin

MPI_File_read_at_all_end

MPI_File_read_ordered

MPI_File_read_ordered_begin

MPI_File_read_ordered_end

MPI_File_read_shared

MPI_File_seek

MPI_File_seek_shared

MPI_File_set_atomicity

MPI_File_set_info

MPI_File_set_size

MPI_File_set_view

MPI_File_sync

MPI_File_write

MPI_File_write_all

MPI_File_write_all_begin

MPI_File_write_all_end

MPI_File_write_at

MPI_File_write_at_all

MPI_File_write_at_all_begin

MPI_File_write_at_all_end

MPI_File_write_ordered

MPI_File_write_ordered_begin

MPI_File_write_ordered_end

MPI_File_write_shared

MPI_Finalize

MPI_Gather

MPI_Gatherv

MPI_Get

MPI_Get_count

MPI_Get_elements

MPI_Get_processor_name

MPI_Get_version

MPI_Graph_create

MPI_Graph_get

MPI_Graph_map

MPI_Graph_neighbors

MPI_Graph_neighbors_count

MPI_Graphdims_get

MPI_Group_compare

MPI_Group_difference

MPI_Group_excl

MPI_Group_free

MPI_Group_incl

MPI_Group_intersection

MPI_Group_rank

MPI_Group_size

MPI_Group_translate_ranks

MPI_Group_union

MPI_Ibsend

MPI_Init

MPI_Init_thread

MPI_Intercomm_create

MPI_Intercomm_merge

MPI_Irecv

MPI_Irsend

MPI_Isend

MPI_Issend

MPI_Keyval_create

MPI_Keyval_free

MPI_Op_create

MPI_Op_free

MPI_Pack

MPI_Pack_size

MPI_Probe

MPI_Put

MPI_Recv

MPI_Recv_init

MPI_Reduce

MPI_Reduce_scatter

MPI_Request_free

MPI_Rsend

MPI_rsend_init

MPI_Scan

MPI_Scatter

MPI_Scatterv

MPI_Send

MPI_Send_init

MPI_Sendrecv

MPI_Sendrecv_replace

MPI_Ssend

MPI_Ssend_init

MPI_Start

MPI_Startall

MPI_Test

MPI_Test_cancelled

MPI_Testall

MPI_Testany

MPI_Testsome

MPI_Topo_test

MPI_Type_commit

MPI_Type_contiguous

MPI_Type_extent

MPI_Type_free

MPI_Type_hindexed

MPI_Type_hvector

MPI_Type_indexed

MPI_Type_lb

MPI_Type_size

MPI_Type_struct

MPI_Type_ub

MPI_Type_vector

MPI_Unpack

MPI_Wait

MPI_Waitall

MPI_Waitany

MPI_Waitsome

MPI_Win_complete

MPI_Win_create

MPI_Win_fence

MPI_Win_free

MPI_Win_lock

MPI_Win_post

MPI_Win_start

MPI_Win_test

MPI_Win_unlock

   

MPI tracing data is converted into the following metrics.

Table 2–4 MPI Tracing Metrics

Metric 

Definition 

MPI Receives 

Number of point‐to‐point messages received by MPI functions  

MPI Bytes Received 

Number of bytes in point‐to‐point messages received by MPI functions 

MPI Sends 

Number of point‐to‐point messages sent by MPI functions 

MPI Bytes Sent 

Number of bytes in point‐to‐point messages sent by MPI functions 

MPI Time 

Time spent in all calls to MPI functions 

Other MPI Events 

Number of calls to MPI functions that neither send nor receive point-to-point messages 

MPI Time is the total LWP time spent in the MPI function. If MPI state times are also collected, MPI Work Time plus MPI Wait Time for all MPI functions other than MPI_Init and MPI_Finalize should approximately equal MPI Work Time. On Linux, MPI Wait and Work are based on user+system CPU time, while MPI Time is based on real tine, so the numbers will not match.

MPI byte and message counts are currently collected only for point‐to‐point messages; they are not recorded for collective communication functions. The MPI Bytes Received metric counts the actual number of bytes received in all messages. MPI Bytes Sent counts the actual number of bytes sent in all messages. MPI Sends counts the number of messages sent, and MPI Receives counts the number of messages received.

Collecting MPI tracing data can help you identify places where you have a performance problem in an MPI program that could be due to MPI calls. Examples of possible performance problems are load balancing, synchronization delays, and communications bottlenecks.