The following table lists the consequences of different job-related error codes or exit codes. These codes are valid for every type of job.
Table 7–1 Job-Related Error or Exit Codes
Script/Method |
Exit or Error Code |
Consequence |
---|---|---|
Job script |
0 |
Success |
|
99 |
Requeue |
|
Rest |
Success: exit code in accounting file |
prolog/epilog |
0 |
Success |
|
99 |
Requeue |
|
Rest |
Queue error state, job requeued |
The following table lists the consequences of error codes or exit codes of jobs related to parallel environment (PE) configuration.
Table 7–2 Parallel-Environment-Related Error or Exit Codes
Script/Method |
Exit or Error Code |
Consequence |
---|---|---|
pe_start |
0 |
Success |
|
Rest |
Queue set to error state, job requeued |
pe_stop |
0 |
Success |
|
Rest |
Queue set to error state, job not requeued |
The following table lists the consequences of error codes or exit codes of jobs related to queue configuration. These codes are valid only if corresponding methods were overwritten.
Table 7–3 Queue-Related Error or Exit Codes
Script/Method |
Exit or Error Code |
Consequence |
---|---|---|
Job starter |
0 |
Success |
|
Rest |
Success, no other special meaning |
Suspend |
0 |
Success |
|
Rest |
Success, no other special meaning |
Resume |
0 |
Success |
|
Rest |
Success, no other special meaning |
Terminate |
0 |
Success |
|
Rest |
Success, no other special meaning |
The following table lists the consequences of error or exit codes of jobs related to checkpointing.
Table 7–4 Checkpointing-Related Error or Exit Codes
Script/Method |
Exit or Error Code |
Consequence |
---|---|---|
Checkpoint |
0 |
Success |
|
Rest |
Success. For kernel checkpoint, however, this means that the checkpoint was not successful. |
Migrate |
0 |
Success |
|
Rest |
Success. For kernel checkpoint, however, this means that the checkpoint was not successful. Migration will occur. |
Restart |
0 |
Success |
|
Rest |
Success, no other special meaning |
Clean |
0 |
Success |
|
Rest |
Success, no other special meaning |
For jobs that run successfully, the qacct -j command output shows a value of 0 in the failed field, and the output shows the exit status of the job in the exit_status field. However, the shepherd might not be able to run a job successfully. For example, the epilog script might fail, or the shepherd might not be able to start the job. In such cases, the failed field displays one of the code values listed in the following table.
Table 7–5 qacct -j failed Field Codes
Code |
Description |
acctvalid |
Meaning for Job |
---|---|---|---|
0 |
No failure |
t |
Job ran, exited normally |
1 |
Presumably before job |
f |
Job could not be started |
3 |
Before writing config |
f |
Job could not be started |
4 |
Before writing PID |
f |
Job could not be started |
5 |
On reading config file |
f |
Job could not be started |
6 |
Setting processor set |
f |
Job could not be started |
7 |
Before prolog |
f |
Job could not be started |
8 |
In prolog |
f |
Job could not be started |
9 |
Before pestart |
f |
Job could not be started |
10 |
In pestart |
f |
Job could not be started |
11 |
Before job |
f |
Job could not be started |
12 |
Before pestop |
t |
Job ran, failed before calling PE stop procedure |
13 |
In pestop |
t |
Job ran, PE stop procedure failed |
14 |
Before epilog |
t |
Job ran, failed before calling epilog script |
15 |
In epilog |
t |
Job ran, failed in epilog script |
16 |
Releasing processor set |
t |
Job ran, processor set could not be released |
24 |
Migrating (checkpointing jobs) |
t |
Job ran, job will be migrated |
25 |
Rescheduling |
t |
Job ran, job will be rescheduled |
26 |
Opening output file |
f |
Job could not be started, stderr/stdout file could not be opened |
27 |
Searching requested shell |
f |
Job could not be started, shell not found |
28 |
Changing to working directory |
f |
Job could not be started, error changing to start directory |
100 |
Assumedly after job |
t |
Job ran, job killed by a signal |
The Code column lists the value of the failed field. The Description column lists the text that appears in the qacct -j output. If acctvalid is set to t, the job accounting values are valid. If acctvalid is set to f, the resource usage values of the accounting record are not valid. The Meaning for Job column indicates whether the job ran or not.