Sun MPI 4.0 User's Guide: With CRE

Chapter 4 Getting Information

The CRE user interface includes two commands for obtaining information about a Sun HPC cluster's configuration (mpinfo) and information about jobs running on the cluster (mpps).

mpps: Finding Out Job Status

The mpps command is comparable to the Solaris ps command. It returns information about jobs and processes currently running on the Sun HPC cluster.

By default mpps shows basic information about the user's jobs currently running in the default partition. For example,

% mpps
   JID   NPROC  UID   STATE  AOUT
   41    3      slu   RUN    AAA
   46    4      slu   EXNG   tmp
   49    1      slu   EXIT   tmp
   99    9      slu   EXNG   uname
   100   9      slu   EXNG   uname

In the response,

Table 4-1 lists the states reported by mpps. Some states refer only to jobs, some only to processes, and some to both. (See "Displaying Process Information".)

Table 4-1 Job and Process States

State 

mpps Display 

Meaning 

CORE

CORE

The job or process exited due to a signal and core was dumped. 

COREING 

CRNG 

The job is exiting due to a signal. The first process to die dumped core.  

EXIT 

EXIT 

The job or process exited normally. 

EXITING 

EXNG 

The job is exiting. At least one process exited normally. 

FAIL 

FAIL 

The job or process failed on startup or was aborted. 

FAILING 

FLNG 

Initialization of the job failed, or a job abort has been signaled. 

ORPHAN 

ORPHAN 

The process has been "orphaned," that is, the node on which it exists has gone offline.  

RUNNING 

RUN 

The job or process is running. 

SEXIT 

SEXIT 

The job or process exited due to a signal. 

SEXITING 

SEXNG 

The job is exiting due to a signal. The first process to die was killed by a signal. At least one of its processes is still in the RUN state.

SPAWNING

SPAWN

The job or process is being spawned. 

STOP 

STOP 

The job or process is stopped.  

Use the -f option to display, in addition, the start time for each job and the job's arguments.

Use the -e option to display information on all jobs, not just your jobs.

Specifying the Partition

To show information about jobs running in all partitions, use the -A option.

To show information about jobs running in a specific partition, use the -a option, followed by the name of the partition.

Displaying Process Information

Use the -p option to also view information about the processes that make up the jobs. The process information is listed below each job. For example,

% mpps -p
   JID  NPROC  UID   STATE  AOUT   RANK  PID    STATE  NODE   2320    4   shaw  RUN    sleep  0     10190  RUN    node6                                   1      4744  RUN    node7                                   2     16564  RUN    node4                                   3      9412  RUN    node5

In this example,

Displaying Specific Process and Job Information

You can also use the -P option to display one or more specific process values and the -J option to display one or more job values. Separate multiple values either with spaces or with commas and no spaces.

Arguments to -P are

You can list these via the -lp option.

Arguments to -J are

mpinfo: Configuration and Status

Use the mpinfo command to display information about the configuration of partitions and nodes, and status information about nodes.

Overview

You can display information on all partitions or nodes, or on any subset of them. You can either list the partitions or nodes, or you can use the -R option, along with a resource requirement specifier (RRS), to have the CRE determine which objects should be displayed. See "Expressing More Complex Resource Requirements" for information on RRSs. If you specify a partition, you must include only partition attributes in the RRS; if you specify a node, you must use only node attributes.

Use the -A option to specify an attribute whose value you want to display. If you want to display more than one attribute, separate them by commas with no spaces. Alternatively, you can issue multiple -A options on the same command line. If you omit -A, mpinfo displays values for a default set of attributes.

Use the -v option to display information about all attributes for one or more partitions or nodes. These include attributes defined by the system administrator.

When a Boolean attribute is displayed, yes indicates that the attribute is set, and no indicates that the attribute is not set.

Partitions

Use the -P option to display information for all partitions.

Use the -p option, followed by the name of the partition, to display information about an individual partition. To display information about multiple partitions, list the names, either separating them with commas and no spaces or enclosing the list in quotation marks.

Partition attributes whose settings you can view via mpinfo are shown in Table 4-2; the heading displayed for each attribute is shown in parentheses after its description.

The following summarizes various points discussed earlier.

Table 4-2 Partition attributes available via mpinfo

Attribute (mpadmin form)

Description (mpinfo output heading)

enabled

Set if the partition is enabled, that is, if it is ready to accept jobs (ENA).

maxt

Maximum number of simultaneously running processes allowed on each node of the partition (MAXT).

name

Name of the partition (NAME).

login

Allow logins. When login is set, LOG is set. Note that this is the inverse of the mpadmin meaning. (LOG).

mp

Allow multinode jobs. When no_mp_jobs is unset, MP is set. Note that this is the inverse of the mpadmin meaning. (MP).

nodes

Number of nodes in the partition (NODES).

The following example illustrates the default mpinfo output for partitions:

% mpinfo -P 
  NAME         NODES: Tot(cpu) Enb(cpu) Onl(cpu) ENA LOG MP
  part10                1(  4)   1(  4)   1(  4) no  yes yes
  part11                1(  4)   1(  4)   1(  4) yes yes yes

The following example displays the names, numbers of nodes, and enabled status for all partitions:

% mpinfo -A name,enabled,nodes -P
 NAME         ENA NODES: Tot(cpu) Enb(cpu) Onl(cpu)
 part10       no           1(  4)   1(  4)   1(  4)
 part11       yes          1(  4)   1(  4)   1(  4)

Nodes

Use the -N option to display information about all nodes.

Use the -n option, followed by the name(s) of one or more nodes. When listing multiple node names, separate the names with commas without spaces.

The following table shows the node attributes that you can display via mpinfo. The heading that is displayed for each attribute is shown in parentheses at the end of each description.

Note these points:

Table 4-3 Node attributes available via mpinfo

Attribute 

Short Form 

Description (mpinfo output heading)

cpu_idle

idle

Percent of time CPU is idle (IDLE).

cpu_iowait

iowait

Percent of time CPU spends waiting for I/O (IWAIT).

cpu_kernel

kernel

Percent of time CPU spends in kernel (KERNL).

cpu_swap

swap

Percent of time CPU spends waiting for swap (SWAP).

cpu_type

cpu

CPU architecture (CPU).

cpu_user

user

Percent of time CPU spends running user's program (USER).

domain

DNS domain. 

enabled

If set, node is available for spawning jobs on it. 

load1

Load average for the past minute (LOAD1).

load5

Load average for the past five minutes (LOAD5).

load15

Load average for the past 15 minutes (LOAD15).

manufacturer

manuf

Hardware manufacturer (MANUFACTURER).

mem_free

memf

Node's available RAM (in Mbytes) (FMEM).

mem_total

memr

Node's total physical memory (in Mbytes) (MEM).

name

Name of the node (NAME).

ncpus

ncpu

Number of CPU modules in the node (NCPU).

os_arch_kernel

mach

Node's kernel architecture (MACH).

os_max_proc

maxproc

Maximum number of processes allowed on the node (note that this is all processes, including cluster daemons) (MPROC).

os_name

os

Name of the operating system running on the node (OS).

os_release

osrel

Operating system's release number (OSREL).

os_release_maj

osmaj

The major number of the operating system release number (MAJ).

os_release_min

osmin

The minor number of the operating system release number (MIN).

os_version

osver

Operating system's version (OSVER).

partition

The partition of which the node is a member (PARTITION).

serial_number

serno

Hardware serial number (SERIAL).

swap_free

swapf

Node's available swap space (in Mbytes) (FSWP).

swap_total

swapr

Node's total swap space (in Mbytes) (SWAP).

The following is an example of the mpinfo output for nodes:

% mpinfo -N
node0 87 =>mpinfo -N
NAME  UP PARTITION OS    OSREL NCPU FMEM   FSWP    LOAD1 LOAD5 LOAD15
node0 y  p0        SunOS 5.6   1    0.89   158.34  0.09  0.11  0.13
node1 y  p0        SunOS 5.6   1    31.41  276.12  0.00  0.01  0.01
node2 y  p1        SunOS 5.6   1    25.59  279.77  0.00  0.00  0.01
node3 y  p1        SunOS 5.6   1    25.40  279.88  0.00  0.00  0.01

The following example shows only the names of nodes and the partition they're in:

% mpinfo -N -A name,partition
NAME         PARTITION
node0        part0
node1        part0
node2        part1
node3        part1

Cluster

Use the -C option to display information about the entire cluster. For example,

% mpinfo -C
NAME   ADMINISTRATOR    DEF_INTER_PART
node0  wmitty           part0

where: