When a parallel job finishes or is aborted, for example, by qdel, a procedure to halt the parallel environment is called. The definition and semantics of this procedure are similar to the procedures described for the startup program. The stop procedure can also be defined in a parallel environment configuration. See, for example, Configuring Parallel Environments With QMON.
If the stop procedure fails to clean up parallel environment processes, the grid engine system might have no information about processes that are running under parallel environment control. Therefore the stop procedure cannot clean up these processes. The grid engine software, of course, cleans up the processes directly associated with the job script that the system has launched.
The distribution tree of the grid engine system also contains an example of a stop procedure for the PVM parallel environment. This example resides under sge-root/pvm/stoppvm.sh. It takes the following two arguments:
The path to the host file generated by the grid engine system
The name of the host on which the stop procedure is started
Similar to the startup procedure, the stop procedure is expected to return a zero exit status on success and a nonzero exit status on failure.
You should test any stop procedures first from the command line, without using the grid engine system. Doing so avoids all errors that can be hard to trace if the procedure is integrated into the grid engine system framework.