The mpps command is comparable to the Solaris ps command. It returns information about jobs and processes currently running on the Sun HPC cluster.
By default mpps shows basic information about the user's jobs currently running in the default partition. For example,
% mpps JID NPROC UID STATE AOUT 41 3 slu RUN AAA 46 4 slu EXNG tmp 49 1 slu EXIT tmp 99 9 slu EXNG uname 100 9 slu EXNG uname
In the response,
JID is the executing program's job ID.
NPROC is the number of processes in the job.
UID is the user ID of the person who executed the program.
STATE is the execution status of the job's processes. (See below for a list of possible process states.)
AOUT is the name of the executable program.
Table 4-1 lists the states reported by mpps. Some states refer only to jobs, some only to processes, and some to both. (See "Displaying Process Information".)
Table 4-1 Job and Process States
State |
mpps Display |
Meaning |
---|---|---|
CORE |
CORE |
The job or process exited due to a signal and core was dumped. |
COREING |
CRNG |
The job is exiting due to a signal. The first process to die dumped core. |
EXIT |
EXIT |
The job or process exited normally. |
EXITING |
EXNG |
The job is exiting. At least one process exited normally. |
FAIL |
FAIL |
The job or process failed on startup or was aborted. |
FAILING |
FLNG |
Initialization of the job failed, or a job abort has been signaled. |
ORPHAN |
ORPHAN |
The process has been "orphaned," that is, the node on which it exists has gone offline. |
RUNNING |
RUN |
The job or process is running. |
SEXIT |
SEXIT |
The job or process exited due to a signal. |
SEXITING |
SEXNG |
The job is exiting due to a signal. The first process to die was killed by a signal. At least one of its processes is still in the RUN state. |
SPAWNING |
SPAWN |
The job or process is being spawned. |
STOP |
STOP |
The job or process is stopped. |
Use the -f option to display, in addition, the start time for each job and the job's arguments.
Use the -e option to display information on all jobs, not just your jobs.
To show information about jobs running in all partitions, use the -A option.
To show information about jobs running in a specific partition, use the -a option, followed by the name of the partition.
Use the -p option to also view information about the processes that make up the jobs. The process information is listed below each job. For example,
% mpps -p JID NPROC UID STATE AOUT RANK PID STATE NODE 2320 4 shaw RUN sleep 0 10190 RUN node6 1 4744 RUN node7 2 16564 RUN node4 3 9412 RUN node5
In this example,
RANK is the process's rank within the job.
PID is the process's process ID.
STATE is the process's execution status.
NODE is the node on which the process is running.
You can also use the -P option to display one or more specific process values and the -J option to display one or more job values. Separate multiple values either with spaces or with commas and no spaces.
Arguments to -P are
rank - the rank of the process within the job.
pid - the process's process ID.
state - the current execution state of the process.
iod - the process ID of the I/O daemon for this process.
load - the load on the node on which the process is executing.
node - the name of the node on which the process is executing.
You can list these via the -lp option.
Arguments to -J are
part - the name of the partition in which the job will run.
jid - the job's unique ID, which can be used as an argument to mpkill.
nproc- the number of processes requested (the actual number of processes started may differ if the -W or -S flags were used with mprun).
uid - the user on whose behalf the job will be run (normally the user who submitted the job; see the -U flag to mprun for details).
gid - the group on whose behalf the job will be run (normally the group of the user who submitted the job; see the -G flag to mprun for details).
state - there are six states:
BUILD - The job is being submitted.
WAIT - The job is waiting to run.
SPAWN - The job is preparing to run.
RUN - The job is running.
RSTRT - The job has been killed because one of the nodes on which it was running went down; the job will be restarted.
running - the number of processes actually running for this job. This is not always equal to the number of processes started for this job, since processes that have exited are not counted.
wkdir - the directory in which the job's processes will be (or were) started.
aout - the name of the program to be run.
paout - the full path of the program to be run.
ctime - the job creation time (when mprun was invoked for the job).
args - the command-line arguments for the program to be run.
stime - the time the job was started.
prio - the job priority (higher numbers run first).