Sun N1 Grid Engine 6.1 User's Guide

Grid Engine System Components

The following sections explain the functions of the most important grid engine system components.

Hosts

Four types of hosts are fundamental to the grid engine system:

Master Host

The master host is central to the overall cluster activity. The master host runs the master daemon sge_qmaster and the scheduler daemon sge_schedd. Both daemons control all grid engine system components, such as queues and jobs. The daemons maintain tables about the status of the components, user access permissions, and the like.

By default, the master host is also an administration host and a submit host.

Execution Hosts

Execution hosts are systems that have permission to execute jobs. Therefore execution hosts have queue instances attached to them. Execution hosts run the execution daemon sge_execd.

Administration Hosts

Administration hosts are hosts that have permission to carry out any kind of administrative activity for the grid engine system.

Submit Hosts

Submit hosts enable users to submit and control batch jobs only. In particular, a user who is logged in to a submit host can submit jobs with the qsub command, can monitor the job status with the qstat command, and can use the grid engine system OSF/1 Motif graphical user interface QMON, which is described in QMON, the Grid Engine System's Graphical User Interface.


Note –

A system can act as more than one type of host.


Daemons

Three daemons provide the functionality of the grid engine system.

sge_qmaster – The Master Daemon

The center of the cluster's management and scheduling activities, sge_qmaster maintains tables about hosts, queues, jobs, system load, and user permissions. sge_qmaster receives scheduling decisions from sge_schedd and requests actions from sge_execd on the appropriate execution hosts.

sge_schedd – The Scheduler Daemon

The scheduling daemon maintains an up-to-date view of the cluster's status with the help of sge_qmaster. The scheduling daemon makes the following scheduling decisions:

The daemon then forwards these decisions to sge_qmaster, which initiates the required actions.

sge_execd – The Execution Daemon

The execution daemon is responsible for the queue instances on its host and for the running of jobs in these queue instances. Periodically, the execution daemon forwards information such as job status or load on its host to sge_qmaster.

Queues

A queue is a container for a class of jobs that are allowed to run on one or more hosts concurrently. A queue determines certain job attributes, for example, whether the job can be migrated. Throughout its lifetime, a running job is associated with its queue. Association with a queue affects some of the things that can happen to a job. For example, if a queue is suspended, all jobs associated with that queue are also suspended.

Jobs need not be submitted directly to a queue. You need to specify only the requirement profile of the job. A profile might include requirements such as memory, operating system, available software, and so forth. The grid engine software automatically dispatches the job to a suitable queue and a suitable host with a light execution load. If you submit a job to a specified queue, the job is bound to this queue. As a result, the grid engine system daemons are unable to select a better-suited device or a device that has a lighter load.

A queue can reside on a single host, or a queue can extend across multiple hosts. For this reason, grid engine system queues are also referred to as cluster queues. Cluster queues enable users and administrators to work with a cluster of execution hosts by means of a single queue configuration. Each host that is attached to a cluster queue receives its own queue instance from the cluster queue.

Client Commands

The command-line user interface is a set of ancillary programs (commands) that enable you to do the following tasks:

The grid engine system provides the following set of ancillary programs.