C H A P T E R  7

Using MCA Parameters With mpirun

Open MPI uses Modular Component Architecture (MCA) parameters to provide a way to tune your runtime environment. Each parameter corresponds to a specific function. You change the value of the parameter in order to change the function. Appendix B contains the complete list of MCA parameters.

Developing an Open MPI application that uses MCA parameters poses a number of advantages. Developers and administrators can customize the Open MPI environment to suit the specific needs of hardware or the operating environment. For example, a system administrator might use MCA parameters to optimize an Open MPI installation on a network so that users only need to run with the default values to obtain the best performance.

This chapter contains the following topics:

In order to understand how MCA parameters fit within Open MPI, you must understand how the Modular Component Architecture is constructed.


About the Modular Component Architecture

The Modular Component Architecture (MCA) is the backbone for much of Open MPI’s functionality. It is a series of frameworks, components, and modules that are assembled at runtime to create an MPI implementation.

An MCA framework manages a specific Open MPI task (such as process launching for ORTE). Each MCA framework supports a single component type, but can support multiple versions of that type. The framework uses the services from the MCA base functionality to find and/or load components.

An MCA component is an implementation of a framework’s interface. It is a standalone collection of code that can be bundled into a plug-in that can be inserted into the Open MPI code base, either at runtime and/or at compile time.

An MCA module is an instance of a component. For example, if a node running an Open MPI application has multiple Ethernet NICs, the Open MPI application will contain one TCP MPI point-to-point component, but two TCP point-to-point modules.

For more information about the Open MPI Modular Component Architecture, see the Open MPI FAQ on runtime tuning at:

http://www.open-mpi.org/faq/?category=tuning


Open MPI Frameworks

There are three types of frameworks in Open MPI:

You might think of these frameworks as ways to group MCA parameters by function. For example, the OMPI btl framework controls the functions in the byte transfer layer, or BTL (point-to-point byte movement) in the network. All of the MCA parameters that are grouped under btl affect the BTL layer.

In addition to the parameters that are grouped under the individual frameworks, there are top-level MCA parameters that affect the frameworks themselves and specify values to your Open MPI installation.


TABLE 7-1 Top-Level MCA Parameters

Parameter Group

Description

mca

Specify paths or functions for MCA parameters

mpi

Specify MPI behavior at runtime

orte

Specify debugging functions and components for ORTE

opal

Specify stack trace information


To view the available top-level parameters in each group, type the following command:


% ompi_info --param groupname groupname

where groupname stands for the parameter group you want to vire. For example, to view the available MPI parameters, you would type:


% ompi_info --param mpi mpi

OMPI Frameworks

The following table lists the frameworks in the MPI layer.


TABLE 7-2 OMPI Frameworks

Framework

Description

allocator

Memory allocator

bml

BTL management layer (managing multiple devices)

btl

Byte transfer layer (point-to-point byte movement)

coll

MPI collective algorithms

io

MPI-2 I/O functionality

mpool

Memory pool management

mtl

Messaging transport layer

osc

One-sided communication

pml

Point-to-point management layer (fragmenting, reassembly, top-layer protocols, etc.)

rcache

Memory registration management

topo

MPI topology information


Currently, there is no simple way to get a list of the available components in a framework. You can use the grep command to search for components. For example, the following command searches for a list of components in the btl framework:


 

% ompi_info | grep btl

ORTE Frameworks

The following table lists the ORTE frameworks.


TABLE 7-3 ORTE Frameworks

Framework

Description

errmgr

Error manager

gpr

General purpose registry

iof

I/O forwarding

ns

Name server

oob

Out-of-band communication

plm

Process launch module (was pls).

ras

Resource allocation subsystem

rds

Resource discovery subsystem

rmaps

Resource mapping subsystem

rmgr

Resource manager (upper meta layer for all other Resource frameworks)

rml

Remote messaging layer (routing of OOB messages)

schema

Name schemas

sds

Startup discovery services

soh

State of health


OPAL Frameworks

The following table lists the OPAL frameworks.


TABLE 7-4 OPAL Frameworks

Framework

Description

backtrace

Stack trace framework for debugging

maffinity

Memory affinity

memory

Memory hooks

paffinity

Processor affinity

timer

High-resolution timers


A complete list of MCA parameters, grouped under each of these frameworks, appears in Appendix B.


The ompi_info Command

The ompi_info command returns information about your Sun HPC ClusterTools/Open MPI installation. When you issue the command without any modifiers, ompi_info returns the following information:

Command Options

The ompi_info command has the following options:


TABLE 7-5 Options for the ompi_info Command

Option

Description

-a or
--all

Shows all configuration options and MCA parameters

--arch

Shows the architecture on which this installation of Open MPI was compiled

-c or
--config

Shows configuration options

-gmca or
--gmca param-name value

Passes global MCA parameters that apply to all contexts. param-name is the parameter name; value is the value of the parameter

-h or
--help

Shows the ompi_info help message

--hostname

Shows the name of the host on which Open MPI was configured and built

 

--internal

Shows internal MCA parameters (not meant to be modified by users)

 

-mca or
--mca param-name value

Passes context-specific MCA parameters; they are considered global if --gmca is not used. param-name is the name of the parameter; value is the value for that parameter.

 

--param arg1 arg2

Shows MCA parameters. arg1 can be a specific framework name or all. arg2 can be a specific parameter name or all.

--parsable or

--parseable

Displays output in parsable format

--path pathname

Shows the paths with which Open MPI was configured.

--pretty

Displays output in “prettyprint” format (default)

-v or
--version arg0 arg1

Shows version of Open MPI or a component. arg0 can be the name of a specific framework or all. arg1 can be the name of a specific component or all.


The output from the ompi_info command appears similar to the following:


% ompi_info
                Open MPI: 1.2r13978-ct7b027r1708
   Open MPI SVN revision: 0
                Open RTE: 1.2r13978-ct7b027r1708
   Open RTE SVN revision: 0
                    OPAL: 1.2r13978-ct7b027r1708
       OPAL SVN revision: 0
                  Prefix: /opt/SUNWhpc/HPC8.0
 Configured architecture: i386-pc-solaris2.10
           Configured by: root
           Configured on: Thu Mar  8 16:47:40 EST 2008
          Configure host: burpen-csx10-0
                Built by: root
                Built on: Thu Mar  8 17:04:51 EST 2008
              Built host: burpen-csx10-0
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: trivial
              C compiler: cc
     C compiler absolute: /ws/ompi-tools/SUNWspro/SOS12/bin/cc
            C++ compiler: CC
   C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS12/bin/CC
      Fortran77 compiler: f77
  Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS12/bin/f77
      Fortran90 compiler: f95
  Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS12/bin/f95
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: yes
          Thread support: no
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: yes
           MCA backtrace: printstack (MCA v1.0, API v1.0, Component v1.2)
           MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2)
               MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.2)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.2)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.2)
                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.2)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2)
               MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2)
                 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2)
              MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2)
              MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2)
                 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2)
                 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2)
                 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
                 MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.2)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2)
              MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2)
              MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2)
              MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2)
                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.2)
                  MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2)
                  MCA ns: replica (MCA v1.0, API v2.0, Component v1.2)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2)
                 MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2)
                 MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2)
                 MCA ras: tm (MCA v1.0, API v1.3, Component v1.2)
                 MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2)
                 MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2)
                 MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2)
               MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2)
                MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2)
                MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.2)
                 MCA plm: gridengine (MCA v1.0, API v1.3, Component v1.2)
                 MCA plm: proxy (MCA v1.0, API v1.3, Component v1.2)
                 MCA plm: rsh (MCA v1.0, API v1.3, Component v1.2)
                 MCA plm: tm (MCA v1.0, API v1.3, Component v1.2)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.2)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.2)
                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2)

Using the ompi_info Command With MCA Parameters

The ompi_info command can list the parameters for a given component, all the parameters for a specific framework, or all parameters. The ompi_info output for most parameters contains a description of the parameter. The output for any parameter shows the current value of that parameter.


procedure icon  To List All MCA Parameters

single-step bullet  Type the following command at the system prompt:


% ompi_info --param all all

The output from ompi_info lists all of the installed frameworks, their MCA parameters, and their current values. To see the complete list of MCA parameters, see List of Available MCA Parameters.


procedure icon  To List All MCA Parameters For a Framework

single-step bullet  Type the following command at the system prompt:


% ompi_info --param btl all

In this example, the command lists all of the available MCA parameters for the btl framework. The output from ompi_info looks similar to the following:


MCA btl: parameter "btl_base_debug" (current value: "0")
                          If btl_base_debug is 1 standard debug is output, if >
                          1 verbose debug is output
                 MCA btl: parameter "btl" (current value: <none>)
                          Default selection set of components for the btl
                          framework (<none> means "use all components that can
                          be found")
                 MCA btl: parameter "btl_base_verbose" (current value: "0")
                          Verbosity level for the btl framework (0 = no
                          verbosity)
                 MCA btl: parameter "btl_self_free_list_num" (current value:
                          "0")
                          Number of fragments by default
                 MCA btl: parameter "btl_self_free_list_max" (current value:
                          "-1")
                          Maximum number of fragments
                 MCA btl: parameter "btl_self_free_list_inc" (current value:
                          "32")
                          Increment by this number of fragments
                 MCA btl: parameter "btl_self_eager_limit" (current value:
                          "131072")
                          Eager size fragmeng (before the rendez-vous ptotocol)
                 MCA btl: parameter "btl_self_min_send_size" (current value:
                          "262144")
                          Minimum fragment size after the rendez-vous
                 MCA btl: parameter "btl_self_max_send_size" (current value:
                          "262144")
                          Maximum fragment size after the rendez-vous
                 MCA btl: parameter "btl_self_min_rdma_size" (current value:
                          "2147483647")
                          Maximum fragment size for the RDMA transfer
                 MCA btl: parameter "btl_self_max_rdma_size" (current value:
                          "2147483647")
                          Maximum fragment size for the RDMA transfer
                 MCA btl: parameter "btl_self_exclusivity" (current value:
                          "65536")
                          Device exclusivity
                 MCA btl: parameter "btl_self_flags" (current value: "10")
                          Active behavior flags
                 MCA btl: parameter "btl_self_priority" (current value: "0")
                 MCA btl: parameter "btl_sm_free_list_num" (current value: "8")
                 MCA btl: parameter "btl_sm_free_list_max" (current value:
                          "-1")
                 MCA btl: parameter "btl_sm_free_list_inc" (current value:
                          "64")
                 MCA btl: parameter "btl_sm_exclusivity" (current value:
                          "65535")
                 MCA btl: parameter "btl_sm_latency" (current value: "100")
                 MCA btl: parameter "btl_sm_max_procs" (current value: "-1")
                 MCA btl: parameter "btl_sm_sm_extra_procs" (current value:
                          "2")
                 MCA btl: parameter "btl_sm_mpool" (current value: "sm")
                 MCA btl: parameter "btl_sm_eager_limit" (current value:
                          "4096")
                 MCA btl: parameter "btl_sm_max_frag_size" (current value:
                          "32768")
                 MCA btl: parameter "btl_sm_size_of_cb_queue" (current value:
                          "128")
                 MCA btl: parameter "btl_sm_cb_lazy_free_freq" (current value:
                          "120")
                 MCA btl: parameter "btl_sm_priority" (current value: "0")
                 MCA btl: parameter "btl_tcp_if_include" (current value:
                          <none>)
                 MCA btl: parameter "btl_tcp_if_exclude" (current value: "lo")
                 MCA btl: parameter "btl_tcp_free_list_num" (current value:
                          "8")
                 MCA btl: parameter "btl_tcp_free_list_max" (current value:
                          "-1")
                 MCA btl: parameter "btl_tcp_free_list_inc" (current value:
                          "32")
                 MCA btl: parameter "btl_tcp_sndbuf" (current value: "131072")
                 MCA btl: parameter "btl_tcp_rcvbuf" (current value: "131072")
                 MCA btl: parameter "btl_tcp_endpoint_cache" (current value:
                          "30720")
                 MCA btl: parameter "btl_tcp_exclusivity" (current value: "0")
                 MCA btl: parameter "btl_tcp_eager_limit" (current value:
                          "65536")
                 MCA btl: parameter "btl_tcp_min_send_size" (current value:
                          "65536")
                 MCA btl: parameter "btl_tcp_max_send_size" (current value:
                          "131072")
                 MCA btl: parameter "btl_tcp_min_rdma_size" (current value:
                          "131072")
                 MCA btl: parameter "btl_tcp_max_rdma_size" (current value:
                          "2147483647")
                 MCA btl: parameter "btl_tcp_flags" (current value: "122")
                 MCA btl: parameter "btl_tcp_priority" (current value: "0")
                 MCA btl: parameter "btl_udapl_free_list_num" (current value:
                          "8")
                          Initial size of free lists (must be >= 1).
                 MCA btl: parameter "btl_udapl_free_list_max" (current value:
                          "-1")
                          Maximum size of free lists (-1 = infinite, otherwise
                          must be >= 1).
                 MCA btl: parameter "btl_udapl_free_list_inc" (current value:
                          "8")
                          Increment size of free lists (must be >= 1).
                 MCA btl: parameter "btl_udapl_mpool" (current value: "udapl")
                          Name of the memory pool to be used.
                 MCA btl: parameter "btl_udapl_max_modules" (current value:
                          "8")
                          Maximum number of supported HCAs.
                 MCA btl: parameter "btl_udapl_num_recvs" (current value: "8")
                          Total number of receive buffers to keep posted per
                          endpoint (must be >= 1).
                 MCA btl: parameter "btl_udapl_num_sends" (current value: "7")
                          Maximum number of sends to post on an endpoint (must
                          be >= 1).
                 MCA btl: parameter "btl_udapl_sr_win" (current value: "4")
                          Window size at which point an explicit credit message
                          will be generated (must be >= 1).
                 MCA btl: parameter "btl_udapl_eager_rdma_num" (current value:
                          "32")
                          Number of RDMA buffers to allocate for small messages
                          (must be >= 1).
                 MCA btl: parameter "btl_udapl_max_eager_rdma_peers" (current
                          value: "16")
                          Maximum number of peers allowed to use RDMA for short
                          messages (independently RDMA will still be used for
                          large messages, (must be >= 0; if zero then RDMA will
                          not be used for short messages).
                 MCA btl: parameter "btl_udapl_eager_rdma_win" (current value:
                          "28")
                          Window size at which point an explicit credit message
                          will be generated (must be >= 1).
                 MCA btl: parameter "btl_udapl_timeout" (current value:
                          "10000000")
                          Connection timeout, in microseconds.
                 MCA btl: parameter "btl_udapl_conn_priv_data" (current value:
                          "1")
                          Use connect private data to establish connections (not supported by all uDAPL implementations).
                 MCA btl: parameter "btl_udapl_async_events" (current value:
                          "1000000000")
                          The asynchronous event queue will only be checked
                          after entering progress this number of times.
                 MCA btl: parameter "btl_udapl_buffer_alignment" 
(current value: "256")
                          Preferred communication buffer alignment, in bytes
                          (must be >= 1).
                 MCA btl: parameter "btl_udapl_evd_qlen" (current value: "256")
The event dispatcher queue length is a function of the number of connections as well as the maximum number of outstanding data transfer operations.
                 MCA btl: parameter "btl_udapl_max_request_dtos" (current value: "44")
Maximum number of outstanding submitted sends and rdma                          operations per endpoint, (see Section 6.6.6 of uDAPL Spec.).
                 MCA btl: parameter "btl_udapl_max_recv_dtos" (current value:
                          "8")
                          Maximum number of outstanding submitted receive
                          operations per endpoint, (see Section 6.6.6 of uDAPL
                          Spec.).
                 MCA btl: parameter "btl_udapl_exclusivity" (current value:
                          "1014")
                          uDAPL BTL exclusivity (must be >= 0).
                 MCA btl: parameter "btl_udapl_eager_limit" (current value:
                          "8192")
                          Eager send limit, in bytes (must be >= 1).
                 MCA btl: parameter "btl_udapl_min_send_size" (current value:
                          "16384")
                          Minimum send size, in bytes (must be >= 1).
                 MCA btl: parameter "btl_udapl_max_send_size" (current value:
                          "65536")
                          Maximum send size, in bytes (must be >= 1).
                 MCA btl: parameter "btl_udapl_min_rdma_size" (current value:
                          "524288")
                          Minimum RDMA size, in bytes (must be >= 1).
                 MCA btl: parameter "btl_udapl_max_rdma_size" (current value:
                          "131072")
                          Maximum RDMA size, in bytes (must be >= 1).
                 MCA btl: parameter "btl_udapl_flags" (current value: "2")
                          BTL flags, added together: PUT=2 (cannot be 0).
                 MCA btl: parameter "btl_udapl_bandwidth" (current value:
                          "225")
                          Approximate maximum bandwidth of network (must be >=
                          1).
                 MCA btl: parameter "btl_udapl_priority" (current value: "0")
                 MCA btl: parameter "btl_base_include" (current value: <none>)
                 MCA btl: parameter "btl_base_exclude" (current value: <none>)
                 MCA btl: parameter "btl_base_warn_component_unused" (current
                          value: "0")
                          This parameter is used to turn on warning messages
                          when certain NICs are not used

 


procedure icon  To Display All MCA Parameters For a Selected Component

single-step bullet  Type the following command at the system prompt:


% ompi_info --param btl tcp

The ompi_info output looks similar to the following:


MCA btl: parameter "btl_base_debug" (current value: "0")
                          If btl_base_debug is 1 standard debug is output, if >
                          1 verbose debug is output
                 MCA btl: parameter "btl" (current value: <none>)
                          Default selection set of components for the btl
                          framework (<none> means "use all components that can
                          be found")
                 MCA btl: parameter "btl_base_verbose" (current value: "0")
                          Verbosity level for the btl framework (0 = no
                          verbosity)
                 MCA btl: parameter "btl_tcp_if_include" (current value:
                          <none>)
                 MCA btl: parameter "btl_tcp_if_exclude" (current value: "lo")
                 MCA btl: parameter "btl_tcp_free_list_num" (current value:
                          "8")
                 MCA btl: parameter "btl_tcp_free_list_max" (current value:
                          "-1")
                 MCA btl: parameter "btl_tcp_free_list_inc" (current value:
                          "32")
                 MCA btl: parameter "btl_tcp_sndbuf" (current value: "131072")
                 MCA btl: parameter "btl_tcp_rcvbuf" (current value: "131072")
                 MCA btl: parameter "btl_tcp_endpoint_cache" (current value:
                          "30720")
                 MCA btl: parameter "btl_tcp_exclusivity" (current value: "0")
                 MCA btl: parameter "btl_tcp_eager_limit" (current value:
                          "65536")
                 MCA btl: parameter "btl_tcp_min_send_size" (current value:
                          "65536")
                 MCA btl: parameter "btl_tcp_max_send_size" (current value:
                          "131072")
                 MCA btl: parameter "btl_tcp_min_rdma_size" (current value:
                          "131072")
                 MCA btl: parameter "btl_tcp_max_rdma_size" (current value:
                          "2147483647")
                 MCA btl: parameter "btl_tcp_flags" (current value: "122")
                 MCA btl: parameter "btl_tcp_priority" (current value: "0")
                 MCA btl: parameter "btl_base_warn_component_unused" (current
                          value: "0")
                          This parameter is used to turn on warning messages
                          when certain NICs are not used


Using MCA Parameters

There are three ways to use MCA parameters with Open MPI:

1. Setting the parameter from the command line using the mpirun --mca command. This method assumes the highest precedence; values set for parameters using this method override any other values specified for the same parameter.

2. Using the parameter as an environment variable. Values for parameters set in this fashion assume the next highest priority.

3. Setting the parameter values in a text file. Parameter values specified using this method have the lowest priority.


procedure icon  To Set MCA Parameters From the Command Line

single-step bullet  Type the following command at the system prompt:


% mpirun --mca param-name value

In this example, param-name stands for the name of the MCA parameter you want to set, and value stands for the new value you want to specify for the parameter. For example, the following command sets the value of the mpi_show_handle_leaks parameter to 1 for the specified job:


% mpirun --mca mpi_show_handle_leaks 1 -np 4 a.out

This sets the value of MCA parameter mpi_show_handle_leaks to 1 before running the program a.out with four processes.

Using MCA Parameters As Environment Variables

As with other types of environment variables, the syntax for setting MCA parameters as environment variables varies with the type of command shell.


procedure icon  To Set MCA Parameters in the sh Shell

1. Type the following command at the prompt:


% OMPI_MCA_param-name=value

where param-name is the name of the MCA parameter you want to set, and value is the desired value for the parameter. For example, the following command sets the mpi_show_handle_leaks parameter to 1:


% OMPI_MCA_mpi_show_handle_leaks=1

2. Type the following command:


% export OMPI_MCA_param-name

For example, an export command using the parameter used in the previous step would look like this:


% export OMPI_MCA_mpi_show_handle_leaks

3. Issue the mpirun command with the desired options. For example:


% mpirun -np 4 a.out


procedure icon  To Set MCA Parameters in the C Shell

1. Use the setenv command to set the MCA parameter.


% setenv OMPI_MCA_param-name value

where param-name is the name of the MCA parameter you want to set, and value is the desired value for the parameter. The following example shows how to set the mpi_show_handle_leaks parameter to 1.


% setenv OMPI_MCA_mpi_show_handle_leaks 1

2. Issue the mpirun command for the program (in this example, a.out).


% mpirun -np 4 a.out


procedure icon  To Specify MCA Parameters Using a Text File

1. Create a text file, specifying each parameter/value pair on a separate line. Comments are allowed. For example:


# This is a comment
# Set the same MCA parameter as in previous examples
mpi_show_handle_leaks = 1
 
# Default to rsh always
plm_rsh_agent = rsh
 
mpi_preconnect_all = 1
mpi_param_check = 0
#
# udapl parameters - comment or uncomment as needed
#
#btl = self,tcp,sm
#btl = self,udapl,sm
btl = ^tcp

2. Name the file mca-params.conf and save it.

You can save the file either to your home directory uner $HOME/.openmpi/mca-params.conf, where the parameter values in the file will only affect your jobs, or you can save it to /opt/SUNWhpc/HPC8.0/lib/openmpi-mca-params.conf, where the parameter values in the file affect all users.

The following example shows the output from the ompi_info command for mca_param_files.


% ompi_info --param mca mca_param_files
MCA mca: parameter "mca_param_files" (current value:
"/home/joeuser/.openmpi/mca-params.conf:
/opt/SUNWhpc/HPC8.0/etc/openmpi-mca-params.conf")
Path for MCA configuration files containing default parameter values
MCA mca: parameter "mca_component_path" (current value:
"/opt/SUNWhpc/HPC8.0/lib/openmpi:/home/joeuser/.openmpi/components")
Path where to look for Open MPI and ORTE components
MCA mca: parameter "mca_verbose" (current value: <none>)
Top-level verbosity parameter
MCA mca: parameter "mca_component_show_load_errors" (current value: "1")
Whether to show errors for components that failed to load or not
MCA mca: parameter "mca_component_disable_dlopen" (current value: "0")
Whether to attempt to disable opening dynamic components or not

The MCA parameter mca_param_files specifies a colon-delimited path of files to search for MCA parameters. Files to the left of the colon have lower precedence; files to the right of the colon have higher precedence. At runtime, mpirun searches the following two files in order when the mca_param_files parameter is set:

1. $HOME/.openmpi/mca-params.conf: The user-supplied set of values takes the highest precedence.

2. $prefix/etc/openmpi-mca-params.conf: The system-supplied set of values has a lower precedence.

In the above example, Open MPI first searches /home/joeuser/.openmpi/mca-params.conf for MCA parameters, and then searches /opt/SUNWhpc/HPC8.0/etc/openmpi-mca-params.conf. If a parameter appears in both locations, the value set in the second file (the file to the right of the colon) is used.

Including and Excluding Components

Each MCA framework has a top-level MCA parameter that you can use to select which components are to be used at runtime. In other words, there is an MCA parameter of the same name as each MCA framework (for example, btl) that you can use to include or exclude components from a given run.

You can use top-level parameters in the same way you would use other MCA parameters (for example, you can set them from the command line, as environment variables, or in text files).

For example, the btl MCA parameter is used to control which byte transfer layer (BTL) components are used with mpirun. The value for the btl parameter is a list of components separated by commas, with the optional prefix ^ (caret symbol).



Note - Do not mix “include” instructions with “exclude” instructions in the same command; otherwise, mpirun returns an error.



procedure icon  To Include and Exclude Components Using the Command Line

single-step bullet  Type the following command at the system prompt:


% mpirun --mca framework comp1, comp2 ^comp3

In this example, the components comp1 and comp2 are included for the framework specified by --mca framework. Component comp3 is excluded, since it is preceded by the ^ (caret) symbol.

For example, the following command excludes the tcp and openib components from the BTL framework, and implicitly includes all the other components:


% mpirun --mca btl ^tcp,openib ...

The use of the caret followed by the ellipsis in the command means “Perform the opposite action with the rest of the components.” When the mpirun -- mca command specifies components to be excluded, the caret followed by the ellipsis ellipsis implicitly includes the rest of the components in that framework. When the mpirun --mca command specifically imcludes components, the caret followed by the ellipsis means “and exclude the components not specified.”

For example, the following command includes only the self, sm, and gm components of btl and implicitly excludes the rest:


% mpirun --mca btl self,sm,gm ...

Processor and Memory Affinity

Using Processor Affinity

The term processor affinity refers to the state where the operating system allows only that process to run on a specific processor. On multi-processor machines, this can help improve performance by not allowing the operating system to move processes between processors. This can eliminate the "jitter" from performance characteristics due to the OS moving processes, which means that performance characteristics should be consistent among multiple runs. This approach can dramatically improve performance.



Note - Processor affinity should not be used when a node is over-subscribed (that is, when more processes are launched than there are processors). This can lead to a serious degradation in performance (even more than simply oversubscribing the node). Open MPI usually detects this situation and automatically disables the use of processor affinity (and displays run-time warnings to this effect). For more information about oversubscribing nodes, see Oversubscribing Nodes.


Using Memory Affinity

Memory affinity is only relevant for Non-Uniform Memory Access (NUMA) machines, such as many models of multi-processor Opterontrademark machines. In a NUMA architecture, memory is physically distributed throughout the machine, even though it is virtually treated as a single address space. That is, memory may be physically local to one or more processors; therefore, the memory is remote to other processors. This means that some memory can be accessed more quickly by a process than other memory.

Open MPI supports general and specific memory affinity, which means that it generally tries to allocate all memory local to the processor that asked for it. When shared memory is used for communication, Open MPI uses memory affinity to make certain pages local to specific processes in order to minimize memory network/bus traffic.


procedure icon  To Find Out Whether Memory Affinity Is Supported

Open MPI supports memory affinity on a variety of systems.

single-step bullet  To find out which systems are supported, type the ompi_info command and look for maffinity components to see if your system is supported. For example:


% ompi_info | grep maffinity
MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2)



Note - Memory affinity support is enabled only when processor affinity is enabled. This is because processes might allocate local memory and then move to a different processor, and the second processor might be remote from the memory that the process just allocated. This negates the purpose of specifying memory affinity.


Running MPI Jobs With Processor and Memory Affinity

If your system supports processor and memory affinity as shown using the ompi_info command, you can explicitly tell Open MPI to use affinity when running MPI jobs.



Note - Processor and memory affinity function only on multi-processor machines.


Currently, Open MPI only offers coarse-grained controls for processor affinity. For this reason, you can obtain the best results if processes in an Open MPI job using processor affinity are the only intensive processes running on the nodes being used for the job. Since most schedulers do not provide information on which processors should be used for specific processes, Open MPI assumes that its processes are "alone" on the node. Open MPI then exclusively claims CPUs, starting with the first one.

This means that if two processor-affinity-enabled jobs are running on the same node, they will both attempt to claim the first processor(s) on the node, resulting in CPU thrashing (and severely degraded performance).



Note - When running with processor affinity, all processors must be operational. Otherwise, processor affinity will not function because all the processors must be accessed in sequence.



procedure icon  To Enable Affinity Using the Command Line

single-step bullet  To enable processor (and potentially memory) affinity, set the MCA parameter opal_paffinity_alone to 1.

For example, the following command enables processor affinity while running the program a.out on four processors:


% mpirun --mca opal_paffinity_alone 1 -np 4 a.out

The command shown in this example assumes that this job is running on a single 4-processor machine or two 2-processor machines. Setting opal_paffinity_alone tells Open MPI to bind each process to a specific processor. If memory affinity is supported, Open MPI also attempts to use memory affinity for this job.

You set values for opal_paffinity_alone in the same way you set other MCA parameters. For more information about setting MCA parameters, see Using MCA Parameters.



Note - Open MPI automatically disables processor affinity on any node that is oversubscribed (that is, where more Open MPI processes are launched in a single job on a node than it has processors) and returns warning messages. However, you may use processor affinity with degraded performance mode if the nodes are not oversubscribed.


Using MCA Parameters With Sun Grid Engine

The ras_gridengine parameters enable you to specify the output from the Open MPI RAS (Resource Allocation Subsystem). The rsh PLM (Process Launch Module) contains the gridengine parameters.

The following example shows the mpirun command with a specified MCA parameter.


% mpirun -np 4 -mca plm_gridengine_debug 100 connectivity.sparc -v

The following table shows the available MCA parameters and their default values.


TABLE 7-6 MCA Parameters For Use With Sun Grid Engine Integration

MCA Parameter

Default Value

Function

ras_gridengine_debug

0

Enable debugging output for the gridengine ras component

ras_gridengine_verbose

0

Enable verbose output for the gridengine ras component

ras_gridengine_show_jobid

0

Show the JOB_ID of the Grid Engine job

ras_gridengine_priority

100

Priority of the gridengine ras component

plm_base_reuse_daemons

0

Specifies whether to reuse daemons to launch dynamically spawned processes

plm_gridengine_debug

0

Enable debugging of gridengine plm component

plm_gridengine_verbose

0

Enable verbose output of the gridengine qrsh -inherit command

plm_gridengine_priority

100

Priority of the gridengine plm component

plm_gridengine_orted

orted

The command name that the gridengine plm component will invoke for the ORTE daemon


To view a list of the RAS parameters from the command line, use the ompi_info command. The following example shows how to specify the RAS parameters and the output from the ompi_info command.


% ompi_info -param ras gridengine
                 MCA ras: parameter "ras" (current value: <none>)
                          Default selection set of components for the ras
                          framework (<none> means "use all components that can
                          be found")
                 MCA ras: parameter "ras_gridengine_debug" (current value: "0")
                          Enable debugging output for the gridengine ras
                          component
                 MCA ras: parameter "ras_gridengine_priority" (current value:
                          "100")
                          Priority of the gridengine ras component
                 MCA ras: parameter "ras_gridengine_verbose" (current value:
                          "0")
                          Enable verbose output for the gridengine ras
                          component
                 MCA ras: parameter "ras_gridengine_show_jobid" (current value:
                          "0")
                          Show the JOB_ID of the Grid Engine job

This example shows the output from the ompi_info command when the PLM parameters are specified:


% ompi_info -param plm gridengine
MCA plm: parameter "plm_base_reuse_daemons" (current value:
                          "0")
                          If nonzero, reuse daemons to launch dynamically
                          spawned processes.  If zero, do not reuse daemons
                          (default)
                 MCA plm: parameter "plm" (current value: <none>)
                          Default selection set of components for the plm
                          framework (<none> means "use all components that can
                          be found")
                 MCA plm: parameter "plm_base_verbose" (current value: "0")
                          Verbosity level for the plm framework (0 = no
                          verbosity)
                 MCA plm: parameter "plm_gridengine_debug" (current value: "0")
                          Enable debugging of gridengine plm component
                 MCA plm: parameter "plm_gridengine_verbose" (current value:
                          "0")
                          Enable verbose output of the gridengine qrsh -inherit
                          command
                 MCA plm: parameter "plm_gridengine_priority" (current value:
                          "100")
                          Priority of the gridengine plm component
                 MCA plm: parameter "plm_gridengine_orted" (current value:
                          "orted")
                          The command name that the gridengine plm component
                          will invoke for the ORTE daemon

Changing the Default Values in MCA Parameters



Note - In most cases, you do not need to change the default values in the gridengine MCA parameters. If you encounter a difficulty and want to change the values for debugging purposes, the options are available.


There are options available in the MCA PLM and RAS components and modules to allow changes of the default values.

For more information about how to change the values in MCA parameters, see the General Run-time Tuning FAQ on the Open MPi Web site at:

http://www.open-mpi.org/faq/?category=tuning#setting-mca-params

 


For More Information

For more information about the Modular Component Architecture and MCA parameters, refer to the following sources: