The CRE user interface includes two commands for obtaining information about a Sun HPC cluster's configuration (mpinfo) and information about jobs running on the cluster (mpps).
The mpps command is comparable to the Solaris ps command. It returns information about jobs and processes currently running on the Sun HPC cluster.
By default mpps shows basic information about the user's jobs currently running in the default partition. For example,
% mpps JID NPROC UID STATE AOUT 41 3 slu RUN AAA 46 4 slu EXNG tmp 49 1 slu EXIT tmp 99 9 slu EXNG uname 100 9 slu EXNG uname
In the response,
JID is the executing program's job ID.
NPROC is the number of processes in the job.
UID is the user ID of the person who executed the program.
STATE is the execution status of the job's processes. (See below for a list of possible process states.)
AOUT is the name of the executable program.
Table 4-1 lists the states reported by mpps. Some states refer only to jobs, some only to processes, and some to both. (See "Displaying Process Information".)
Table 4-1 Job and Process States
State |
mpps Display |
Meaning |
---|---|---|
CORE |
CORE |
The job or process exited due to a signal and core was dumped. |
COREING |
CRNG |
The job is exiting due to a signal. The first process to die dumped core. |
EXIT |
EXIT |
The job or process exited normally. |
EXITING |
EXNG |
The job is exiting. At least one process exited normally. |
FAIL |
FAIL |
The job or process failed on startup or was aborted. |
FAILING |
FLNG |
Initialization of the job failed, or a job abort has been signaled. |
ORPHAN |
ORPHAN |
The process has been "orphaned," that is, the node on which it exists has gone offline. |
RUNNING |
RUN |
The job or process is running. |
SEXIT |
SEXIT |
The job or process exited due to a signal. |
SEXITING |
SEXNG |
The job is exiting due to a signal. The first process to die was killed by a signal. At least one of its processes is still in the RUN state. |
SPAWNING |
SPAWN |
The job or process is being spawned. |
STOP |
STOP |
The job or process is stopped. |
Use the -f option to display, in addition, the start time for each job and the job's arguments.
Use the -e option to display information on all jobs, not just your jobs.
To show information about jobs running in all partitions, use the -A option.
To show information about jobs running in a specific partition, use the -a option, followed by the name of the partition.
Use the -p option to also view information about the processes that make up the jobs. The process information is listed below each job. For example,
% mpps -p JID NPROC UID STATE AOUT RANK PID STATE NODE 2320 4 shaw RUN sleep 0 10190 RUN node6 1 4744 RUN node7 2 16564 RUN node4 3 9412 RUN node5
In this example,
RANK is the process's rank within the job.
PID is the process's process ID.
STATE is the process's execution status.
NODE is the node on which the process is running.
You can also use the -P option to display one or more specific process values and the -J option to display one or more job values. Separate multiple values either with spaces or with commas and no spaces.
Arguments to -P are
rank - the rank of the process within the job.
pid - the process's process ID.
state - the current execution state of the process.
iod - the process ID of the I/O daemon for this process.
load - the load on the node on which the process is executing.
node - the name of the node on which the process is executing.
You can list these via the -lp option.
Arguments to -J are
part - the name of the partition in which the job will run.
jid - the job's unique ID, which can be used as an argument to mpkill.
nproc- the number of processes requested (the actual number of processes started may differ if the -W or -S flags were used with mprun).
uid - the user on whose behalf the job will be run (normally the user who submitted the job; see the -U flag to mprun for details).
gid - the group on whose behalf the job will be run (normally the group of the user who submitted the job; see the -G flag to mprun for details).
state - there are six states:
BUILD - The job is being submitted.
WAIT - The job is waiting to run.
SPAWN - The job is preparing to run.
RUN - The job is running.
RSTRT - The job has been killed because one of the nodes on which it was running went down; the job will be restarted.
running - the number of processes actually running for this job. This is not always equal to the number of processes started for this job, since processes that have exited are not counted.
wkdir - the directory in which the job's processes will be (or were) started.
aout - the name of the program to be run.
paout - the full path of the program to be run.
ctime - the job creation time (when mprun was invoked for the job).
args - the command-line arguments for the program to be run.
stime - the time the job was started.
prio - the job priority (higher numbers run first).
Use the mpinfo command to display information about the configuration of partitions and nodes, and status information about nodes.
You can display information on all partitions or nodes, or on any subset of them. You can either list the partitions or nodes, or you can use the -R option, along with a resource requirement specifier (RRS), to have the CRE determine which objects should be displayed. See "Expressing More Complex Resource Requirements" for information on RRSs. If you specify a partition, you must include only partition attributes in the RRS; if you specify a node, you must use only node attributes.
Use the -A option to specify an attribute whose value you want to display. If you want to display more than one attribute, separate them by commas with no spaces. Alternatively, you can issue multiple -A options on the same command line. If you omit -A, mpinfo displays values for a default set of attributes.
Use the -v option to display information about all attributes for one or more partitions or nodes. These include attributes defined by the system administrator.
When a Boolean attribute is displayed, yes indicates that the attribute is set, and no indicates that the attribute is not set.
Use the -P option to display information for all partitions.
Use the -p option, followed by the name of the partition, to display information about an individual partition. To display information about multiple partitions, list the names, either separating them with commas and no spaces or enclosing the list in quotation marks.
Partition attributes whose settings you can view via mpinfo are shown in Table 4-2; the heading displayed for each attribute is shown in parentheses after its description.
The following summarizes various points discussed earlier.
You can specify one or more of these attributes via the -A option, or as part of an RRS as an argument to the -R option. You can use either the attribute's real name or, in some cases, a shorter version.
For attributes that are defined as negatives (for example, no_logins), you can specify a positive version (for example, logins) for -A.
You can list the settings of all attributes (including any system administrator-defined attributes) on a per-partition basis via the -v option.
You can list the names and brief descriptions of these attributes via the -lp option.
Attribute (mpadmin form) |
Description (mpinfo output heading) |
---|---|
enabled |
Set if the partition is enabled, that is, if it is ready to accept jobs (ENA). |
maxt |
Maximum number of simultaneously running processes allowed on each node of the partition (MAXT). |
name |
Name of the partition (NAME). |
login |
Allow logins. When login is set, LOG is set. Note that this is the inverse of the mpadmin meaning. (LOG). |
mp |
Allow multinode jobs. When no_mp_jobs is unset, MP is set. Note that this is the inverse of the mpadmin meaning. (MP). |
nodes |
Number of nodes in the partition (NODES). |
The following example illustrates the default mpinfo output for partitions:
% mpinfo -P NAME NODES: Tot(cpu) Enb(cpu) Onl(cpu) ENA LOG MP part10 1( 4) 1( 4) 1( 4) no yes yes part11 1( 4) 1( 4) 1( 4) yes yes yes
The following example displays the names, numbers of nodes, and enabled status for all partitions:
% mpinfo -A name,enabled,nodes -P NAME ENA NODES: Tot(cpu) Enb(cpu) Onl(cpu) part10 no 1( 4) 1( 4) 1( 4) part11 yes 1( 4) 1( 4) 1( 4)
Use the -N option to display information about all nodes.
Use the -n option, followed by the name(s) of one or more nodes. When listing multiple node names, separate the names with commas without spaces.
The following table shows the node attributes that you can display via mpinfo. The heading that is displayed for each attribute is shown in parentheses at the end of each description.
Note these points:
You can specify one or more of these attributes via the -A option, or as part of an RRS as an argument to the -R option. You can use either the attribute's real name or, in some cases, a shorter version.
You can list the settings of all attributes (including any system administrator-defined attributes) on a per-node basis via the -v option.
You can list the names and brief descriptions of these attributes via the -ln option.
Attribute |
Short Form |
Description (mpinfo output heading) |
---|---|---|
cpu_idle |
idle |
Percent of time CPU is idle (IDLE). |
cpu_iowait |
iowait |
Percent of time CPU spends waiting for I/O (IWAIT). |
cpu_kernel |
kernel |
Percent of time CPU spends in kernel (KERNL). |
cpu_swap |
swap |
Percent of time CPU spends waiting for swap (SWAP). |
cpu_type |
cpu |
CPU architecture (CPU). |
cpu_user |
user |
Percent of time CPU spends running user's program (USER). |
domain |
|
DNS domain. |
enabled |
|
If set, node is available for spawning jobs on it. |
load1 |
|
Load average for the past minute (LOAD1). |
load5 |
|
Load average for the past five minutes (LOAD5). |
load15 |
|
Load average for the past 15 minutes (LOAD15). |
manufacturer |
manuf |
Hardware manufacturer (MANUFACTURER). |
mem_free |
memf |
Node's available RAM (in Mbytes) (FMEM). |
mem_total |
memr |
Node's total physical memory (in Mbytes) (MEM). |
name |
|
Name of the node (NAME). |
ncpus |
ncpu |
Number of CPU modules in the node (NCPU). |
os_arch_kernel |
mach |
Node's kernel architecture (MACH). |
os_max_proc |
maxproc |
Maximum number of processes allowed on the node (note that this is all processes, including cluster daemons) (MPROC). |
os_name |
os |
Name of the operating system running on the node (OS). |
os_release |
osrel |
Operating system's release number (OSREL). |
os_release_maj |
osmaj |
The major number of the operating system release number (MAJ). |
os_release_min |
osmin |
The minor number of the operating system release number (MIN). |
os_version |
osver |
Operating system's version (OSVER). |
partition |
|
The partition of which the node is a member (PARTITION). |
serial_number |
serno |
Hardware serial number (SERIAL). |
swap_free |
swapf |
Node's available swap space (in Mbytes) (FSWP). |
swap_total |
swapr |
Node's total swap space (in Mbytes) (SWAP). |
The following is an example of the mpinfo output for nodes:
% mpinfo -N node0 87 =>mpinfo -N NAME UP PARTITION OS OSREL NCPU FMEM FSWP LOAD1 LOAD5 LOAD15 node0 y p0 SunOS 5.6 1 0.89 158.34 0.09 0.11 0.13 node1 y p0 SunOS 5.6 1 31.41 276.12 0.00 0.01 0.01 node2 y p1 SunOS 5.6 1 25.59 279.77 0.00 0.00 0.01 node3 y p1 SunOS 5.6 1 25.40 279.88 0.00 0.00 0.01
The following example shows only the names of nodes and the partition they're in:
% mpinfo -N -A name,partition NAME PARTITION node0 part0 node1 part0 node2 part1 node3 part1
Use the -C option to display information about the entire cluster. For example,
% mpinfo -C NAME ADMINISTRATOR DEF_INTER_PART node0 wmitty part0
where:
NAME - The name of the cluster
ADMINISTRATOR - The name of its administrator
DEF_INTER_PART - The default interactive partition