The following sections explain the functions of the most important grid engine system components.
The master host is central to the overall cluster activity. The master host runs the master daemon sge_qmaster and the scheduler daemon sge_schedd. Both daemons control all grid engine system components, such as queues and jobs. The daemons maintain tables about the status of the components, about user access permissions, and the like.
By default, the master host is also an administration host and a submit host. See the sections that describe those hosts.
Execution hosts are systems that have permission to execute jobs. Therefore execution hosts have queue instances attached to them. Execution hosts run the execution daemon sge_execd.
Administration hosts are hosts that have permission to carry out any kind of administrative activity for the grid engine system.
Submit hosts allow users to submit and control batch jobs only. In particular, a user who is logged in to a submit host can submit jobs with the qsub command, can monitor the job status with the qstat command, and can use the grid engine system OSF/1 Motif graphical user interface QMON, which is described in QMON, the Grid Engine System's Graphical User Interface.
A system can act as more than one type of host.
Three daemons provide the functionality of the grid engine system.
The center of the cluster's management and scheduling activities, sge_qmaster maintains tables about hosts, queues, jobs, system load, and user permissions. sge_qmaster receives scheduling decisions from sge_schedd and requests actions from sge_execd on the appropriate execution hosts.
The scheduling daemon maintains an up-to-date view of the cluster's status with the help of sge_qmaster. The scheduling daemon makes the following scheduling decisions:
Which jobs are dispatched to which queues
How to reorder and reprioritize jobs to maintain share, priority, or deadline
The daemon then forwards these decisions to sge_qmaster, which initiates the required actions.
The execution daemon is responsible for the queue instances on its host and for the running of jobs in these queue instances. Periodically the execution daemon forwards to sge_qmaster information such as job status or load on its host.
A queue is a container for a class of jobs that are allowed to run on one or more hosts concurrently. A queue determines certain job attributes, for example, whether the job can be migrated. Throughout its lifetime, a running job is associated with its queue. Association with a queue affects some of the things that can happen to a job. For example, if a queue is suspended, all jobs associated with that queue are also suspended.
Jobs need not be submitted directly to a queue. You need to specify only the requirement profile of the job. A profile might include requirements such as memory, operating system, available software, and so forth. The grid engine software automatically dispatches the job to a suitable queue and a suitable host with a light execution load. If you submit a job to a specified queue, the job is bound to this queue. As a result, the grid engine system daemons are unable to select a lighter-loaded or better-suited device.
A queue can reside on a single host, or a queue can extend across multiple hosts. For this reason, grid engine system queues are also referred to as cluster queues. Cluster queues enable users and administrators to work with a cluster of execution hosts by means of a single queue configuration. Each host that is attached to a cluster queue receives its own queue instance from the cluster queue.
Submit and delete jobs
Check job status
Suspend or enable queues and jobs
The grid engine system provides the following set of ancillary programs.
To provide remote execution of interactive applications through the grid engine system. qrsh is comparable to the standard UNIX facility rsh.
To allow for the submission of batch jobs that, upon execution, support terminal I/O and terminal control. Terminal I/O includes standard output, standard error, and standard input.
To provide a submission client that remains active until the batch job finishes.
To allow for the grid engine software-controlled remote execution of the tasks of parallel jobs.
qselect – Prints a list of queue names corresponding to specified selection criteria. The output of qselect is usually fed into other grid engine system commands to apply actions on a selected set of queues.
qtcsh – A fully compatible replacement for the widely known and used UNIX C shell (csh) derivative, tcsh. qtcsh provides a command shell with the extension of transparently distributing execution of designated applications to suitable and lightly loaded hosts through grid engine software.