Understanding Job and Task States
Applies To: Windows HPC Server 2008
In Windows HPC Server 2008, jobs and tasks have almost identical life cycle states (tasks do not have the ExternalValidation state). The main life cycle states are Configuring, Queued, Running, Finished, Failed, and Canceled. Jobs and tasks also move through brief transitional states.
The HPC Job Scheduler Service queues jobs and tasks, allocates resources, dispatches tasks to the compute nodes, and monitors the status of jobs, tasks, and nodes. For more information, see Configure the HPC Job Scheduler Service. You can also specify custom job activation and submission filter programs. For more information, see Understanding Activation and Submission Filters.
The following table summarizes all life cycle states.
Job and task states
State | Definition |
---|---|
Configuring |
The job or task is in the system, but has not been submitted to the queue. |
Submitted |
The job or task has been submitted and is awaiting validation before it can be queued. |
ExternalValidation |
The job is running through a submission filter application that is defined by the cluster administrator. For more information, see Understanding Activation and Submission Filters. Examples of the conditions for these filters include:
If the job passes external validation, it moves to the Validating state. If the job does not pass external validation, you receive an error message and the job moves to the Failed state. |
Validating |
The HPC Job Scheduler Service is validating the job or task. During validation, the HPC Job Scheduler Service confirms permissions, applies default settings for any properties that you have not specified, and validates each property against constraints. Default settings and constraints are defined by the job template. For more information about job templates, see Job Templates. The HPC Job Scheduler Service also confirms that job properties encompass all task properties (for example, no task has a run time that is greater in value than the run time of the job). If the job passes validation, it moves to the Queued state. If the job does not pass validation, you receive an error message and the job moves to the Failed state. |
Queued |
The job or task passed validation, and is waiting to be scheduled and activated (run). |
Running |
The job or task is running on one or more nodes. |
Finishing |
The job or task completed, and job or task clean-up is in progress. |
Finished |
The job or task completed successfully. |
Failed |
The job or task failed to complete or stopped running. Tasks that return non-zero exit codes are marked as Failed. For more information, see Tasks That Complete Successfully Are Marked As Failed. If a running task is canceled, the task is marked as Failed. Job owners and cluster administrators can manually cancel jobs or tasks. The HPC Job Scheduler Service cancels tasks if they exceed their run time or are preempted. Typically, the HPC Job Scheduler Service automatically requeues preempted jobs. See also Troubleshooting Jobs. |
Canceling |
The job or task was canceled and clean-up is in progress. |
Canceled |
The job or task was canceled before it started running. If a running task is canceled, the task is marked as Failed. Job owners and cluster administrators can manually cancel jobs or tasks. The HPC Job Scheduler Service cancels tasks if they exceed their run time or are preempted. Typically, the HPC Job Scheduler Service automatically requeues preempted jobs. |