MPI Tracing Data
The Collector can collect data
on calls to the Message Passing Interface (MPI) library.
MPI tracing is implemented using the open source VampirTrace 5.5.3 release. It
recognizes the following VampirTrace environment variables:
VT_STACKS
|
Controls whether call stacks are recorded in the data. The default setting is
1. Setting VT_STACKS to 0 disables call
stacks.
|
VT_BUFFER_SIZE
|
Controls the size of the internal buffer of the MPI API trace collector. The default
value is 64M (64 MBytes).
|
VT_MAX_FLUSHES
|
Controls the number of times the buffer is flushed before terminating MPI tracing. The
default value is 0, which sets the buffer to be flushed to disk whenever it is
full. Setting VT_MAX_FLUSHES to a positive number sets a limit for the number of
times the buffer is flushed.
|
VT_VERBOSE
|
Turns on various error and status messages. The default value is 1,
which turns on critical error and status messages. Set the variable to 2 if
problems arise.
|
|
For more information on these variables, see the Vampirtrace User Manual on the Technische Universität Dresden web
site.
MPI events that occur after the buffer limits have been reached are not written into the trace
file resulting in an incomplete trace.
To remove the limit and get a complete trace of an application, set the
VT_MAX_FLUSHES environment variable to 0. This setting causes the MPI API trace
collector to flush the buffer to disk whenever the buffer is full.
To change the size of the buffer, set the VT_BUFFER_SIZE environment
variable. The optimal value for this variable depends on the application that is to be traced.
Setting a small value increases the memory available to the application but triggers frequent buffer
flushes by the MPI API trace collector. These buffer flushes can significantly change the behavior
of the application. On the other hand, setting a large value such as 2G minimizes buffer flushes by
the MPI API trace collector but decreases the memory available to the application. If not enough
memory is available to hold the buffer and the application data, parts of the application might be
swapped to disk, leading to a significant change in the behavior of the application.
The following list shows the functions
for which data is collected.
MPI_Abort
|
MPI_Accumulate
|
MPI_Address
|
MPI_Allgather
|
MPI_Allgatherv
|
MPI_Allreduce
|
MPI_Alltoall
|
MPI_Alltoallv
|
MPI_Alltoallw
|
MPI_Attr_delete
|
MPI_Attr_get
|
MPI_Attr_put
|
MPI_Barrier
|
MPI_Bcast
|
MPI_Bsend
|
MPI_Bsend-init
|
MPI_Buffer_attach
|
MPI_Buffer_detach
|
MPI_Cancel
|
MPI_Cart_coords
|
MPI_Cart_create
|
MPI_Cart_get
|
MPI_Cart_map
|
MPI_Cart_rank
|
MPI_Cart_shift
|
MPI_Cart_sub
|
MPI_Cartdim_get
|
MPI_Comm_compare
|
MPI_Comm_create
|
MPI_Comm_dup
|
MPI_Comm_free
|
MPI_Comm_group
|
MPI_Comm_rank
|
MPI_Comm_remote_group
|
MPI_Comm_remote_size
|
MPI_Comm_size
|
MPI_Comm_split
|
MPI_Comm_test_inter
|
MPI_Dims_create
|
MPI_Errhandler_create
|
MPI_Errhandler_free
|
MPI_Errhandler_get
|
MPI_Errhandler_set
|
MPI_Error_class
|
MPI_Error_string
|
MPI_File_close
|
MPI_File_delete
|
MPI_File_get_amode
|
MPI_File_get_atomicity
|
MPI_File_get_byte_offset
|
MPI_File_get_group
|
MPI_File_get_info
|
MPI_File_get_position
|
MPI_File_get_position_shared
|
MPI_File_get_size
|
MPI_File_get_type_extent
|
MPI_File_get_view
|
MPI_File_iread
|
MPI_File_iread_at
|
MPI_File_iread_shared
|
MPI_File_iwrite
|
MPI_File_iwrite_at
|
MPI_File_iwrite_shared
|
MPI_File_open
|
MPI_File_preallocate
|
MPI_File_read
|
MPI_File_read_all
|
MPI_File_read_all_begin
|
MPI_File_read_all_end
|
MPI_File_read_at
|
MPI_File_read_at_all
|
MPI_File_read_at_all_begin
|
MPI_File_read_at_all_end
|
MPI_File_read_ordered
|
MPI_File_read_ordered_begin
|
MPI_File_read_ordered_end
|
MPI_File_read_shared
|
MPI_File_seek
|
MPI_File_seek_shared
|
MPI_File_set_atomicity
|
MPI_File_set_info
|
MPI_File_set_size
|
MPI_File_set_view
|
MPI_File_sync
|
MPI_File_write
|
MPI_File_write_all
|
MPI_File_write_all_begin
|
MPI_File_write_all_end
|
MPI_File_write_at
|
MPI_File_write_at_all
|
MPI_File_write_at_all_begin
|
MPI_File_write_at_all_end
|
MPI_File_write_ordered
|
MPI_File_write_ordered_begin
|
MPI_File_write_ordered_end
|
MPI_File_write_shared
|
MPI_Finalize
|
MPI_Gather
|
MPI_Gatherv
|
MPI_Get
|
MPI_Get_count
|
MPI_Get_elements
|
MPI_Get_processor_name
|
MPI_Get_version
|
MPI_Graph_create
|
MPI_Graph_get
|
MPI_Graph_map
|
MPI_Graph_neighbors
|
MPI_Graph_neighbors_count
|
MPI_Graphdims_get
|
MPI_Group_compare
|
MPI_Group_difference
|
MPI_Group_excl
|
MPI_Group_free
|
MPI_Group_incl
|
MPI_Group_intersection
|
MPI_Group_rank
|
MPI_Group_size
|
MPI_Group_translate_ranks
|
MPI_Group_union
|
MPI_Ibsend
|
MPI_Init
|
MPI_Init_thread
|
MPI_Intercomm_create
|
MPI_Intercomm_merge
|
MPI_Irecv
|
MPI_Irsend
|
MPI_Isend
|
MPI_Issend
|
MPI_Keyval_create
|
MPI_Keyval_free
|
MPI_Op_create
|
MPI_Op_free
|
MPI_Pack
|
MPI_Pack_size
|
MPI_Probe
|
MPI_Put
|
MPI_Recv
|
MPI_Recv_init
|
MPI_Reduce
|
MPI_Reduce_scatter
|
MPI_Request_free
|
MPI_Rsend
|
MPI_rsend_init
|
MPI_Scan
|
MPI_Scatter
|
MPI_Scatterv
|
MPI_Send
|
MPI_Send_init
|
MPI_Sendrecv
|
MPI_Sendrecv_replace
|
MPI_Ssend
|
MPI_Ssend_init
|
MPI_Start
|
MPI_Startall
|
MPI_Test
|
MPI_Test_cancelled
|
MPI_Testall
|
MPI_Testany
|
MPI_Testsome
|
MPI_Topo_test
|
MPI_Type_commit
|
MPI_Type_contiguous
|
MPI_Type_extent
|
MPI_Type_free
|
MPI_Type_hindexed
|
MPI_Type_hvector
|
MPI_Type_indexed
|
MPI_Type_lb
|
MPI_Type_size
|
MPI_Type_struct
|
MPI_Type_ub
|
MPI_Type_vector
|
MPI_Unpack
|
MPI_Wait
|
MPI_Waitall
|
MPI_Waitany
|
MPI_Waitsome
|
MPI_Win_complete
|
MPI_Win_create
|
MPI_Win_fence
|
MPI_Win_free
|
MPI_Win_lock
|
MPI_Win_post
|
MPI_Win_start
|
MPI_Win_test
|
MPI_Win_unlock
|
|
|
|
MPI tracing data is converted into the following
metrics.
Table 2-5 MPI Tracing Metrics
|
|
MPI Sends
|
Number of MPI point-to-point sends started
|
MPI Bytes Sent
|
Number of bytes in MPI Sends
|
MPI Receives
|
Number of MPI point‐to‐point receives completed
|
MPI Bytes Received
|
Number of bytes in MPI Receives
|
MPI Time
|
Time spent in all calls to MPI functions
|
Other MPI Events
|
Number of calls to MPI functions that neither send nor receive point-to-point messages
|
|
MPI Time is the total thread time spent in the MPI function. If MPI state times are also
collected, MPI Work Time plus MPI Wait Time for all MPI functions other than MPI_Init and
MPI_Finalize should approximately equal MPI Work Time. On Linux, MPI Wait and Work are based on
user+system CPU time, while MPI Time is based on real time, so the numbers will not match.
MPI byte and message counts are currently collected only for point‐to‐point
messages. They are not recorded for collective communication functions. The MPI Bytes Received
metric counts the actual number of bytes received in all messages. MPI Bytes Sent counts the actual
number of bytes sent in all messages. MPI Sends counts the number of messages sent, and MPI Receives
counts the number of messages received.
Collecting MPI tracing data can help you identify places where you have a performance problem
in an MPI program that could be due to MPI calls. Examples of possible performance problems are load
balancing, synchronization delays, and communications bottlenecks.