The second type of defunct job includes jobs that are waiting for signals from processes on nodes that have gone off line. The mpps utility displays such jobs in states such as RUNNING, EXITING, SEXTNG, or CORNG.
If the job-killing option of tm.watchd (-Yk) is enabled, the CRE will handle such situations automatically. This section assumes this option is not enabled.
Kill the job using:
% mpkill jid
There are several variants of the mpkill command, similar to the variants of the Solaris kill command. You may also use:
% mpkill -9 jid
or
% mpkill -I jid
If these do not succeed, execute mpps -pe to display the unresponsive processes. Then, execute the Solaris ps command on the each of the nodes listed. If those processes still exist on any of the nodes, you can remove them using kill -9 pid.
Once you have eliminated all defunct jobs, data about the jobs may remain in the CRE database. As root from the master node, use mpkill -C to remove this residual data.