Sun N1 Grid Engine 6.1 User's Guide

Glossary

access list

A list of users and UNIX groups who are permitted or denied access to a resource such as a queue or a host. Users and groups can belong to multiple access lists, and the same access lists can be used in various contexts.

administration host

Administration hosts are hosts that have permission to carry out administrative activity for the grid engine system.

array job

A job made up of a range of independent identical tasks. Each task is similar to a separate job. Array job tasks differ among themselves only by having unique task identifiers, which are integer numbers.

batch job

A batch job is a UNIX shell script that can be run without user intervention and does not require access to a terminal.

campus grid

A grid that enables multiple projects or departments within an organization to share computing resources.

cell

A separate cluster with a separate configuration and a separate master machine. Cells can be used to loosely couple separate administrative units.

checkpointing

A procedure that saves the execution status of a job into a checkpoint, thereby allowing the job to be aborted and resumed later without loss of information and already completed work. The process is called migration if the checkpoint is moved to another host before execution resumes.

checkpointing environment

A grid engine system configuration entity that defines events, interfaces, and actions that are associated with a certain method of checkpointing.

cluster

A collection of machines, called hosts, on which grid engine system functions occur.

cluster grid

The simplest form of a grid, consisting of computer hosts working together to provide a single point of access to users in a single project or department.

cluster queue

A container for a class of jobs that are allowed to run concurrently. A queue determines certain job attributes, for example, whether it can be migrated. Throughout its lifetime, a running job is associated with its queue. Association with a queue affects some of the things that can happen to a job. For example, if a queue is suspended, all jobs associated with that queue are also suspended.

complex

A set of resource attribute definitions that can be associated with a queue, a host, or the entire cluster.

department

A list of users and groups who are treated alike in the functional and override scheduling policies of the grid engine system. Users and groups can belong to only one department.

entitlement

The same as share. The amount of resources that are planned to be consumed by a certain job, user, user group, or project.

execution host

Systems that have permission to run grid engine system jobs. These systems host queue instances, and run the execution daemon sge_execd.

functional policy

A policy that assigns specific levels of importance to jobs, users, user groups, and projects. For instance, through the functional policy, a high-priority project and all its jobs can receive a higher resource share than a low-priority project.

global grid

A collection of campus grids that cross organizational boundaries to create very large virtual systems.

grid

A collection of computing resources that perform tasks. Users treat the grid as a single computational resource.

group

A UNIX group.

hard resource requirements

The resources that must be allocated before a job can be started. Contrast with soft resource requirements.

host

A system on which grid engine system functions occur.

interactive job

An interactive job is a session started with the commands qrsh, qsh, or qlogin, which open an xterm window for user interaction or provide the equivalent of a remote login session.

job

A request from a user for computational resources from the grid.

job class

A set of jobs that are equivalent in some sense and treated similarly. A job class is defined by the identical requirements of the corresponding jobs and by the characteristics of the queues that are suitable for those jobs.

manager

A user who can manipulate all aspects of the grid engine software. The superusers of the master host and of any other machine that is declared to be an administration host have manager privileges. Manager privileges can be assigned to nonroot user accounts as well.

master host

The master host is central to the overall cluster activity. It runs the master daemon sge_qmaster and the scheduler daemon sge_schedd. By default, the master host is also an administration host and a submit host.

migration

The process of moving a checkpointing job from one host to another before execution of the job resumes.

operator

Users who can perform the same commands as managers except that they cannot change the configuration. Operators are supposed to maintain operation.

override policy

A policy commonly used to override the automated resource entitlement management of the functional and share-based policies. The cluster administrator can modify the automated policy implementation to assign override to jobs, users, user groups, and projects.

owner

Users who can suspend or resume, and disable or enable, the queues they own. Typically, users are owners of the queue instances that reside on their workstations.

parallel environment

A grid engine system configuration that defines the necessary interfaces for the grid engine software to correctly handle parallel jobs.

parallel job

A job that is made up of more than one closely correlated task. Tasks can be distributed across multiple hosts. Parallel jobs usually use communication tools such as shared memory or message passing (MPI, PVM) to synchronize and correlate tasks.

policy

A set of rules and configurations that the administrator can use to define the behavior of the grid engine system. Policies are implemented automatically by the system.

priority

The relative level of importance of a job compared to others.

project

A grid engine system project.

resource

A computational device consumed or occupied by running jobs. Typical examples are memory, CPU, I/O bandwidth, file space, software licenses, and so forth.

share

The same as entitlement. The amount of resources that are planned to be consumed by a certain job, user, or project.

share-based policy

A policy that allows definition of the entitlements of user and projects and arbitrary groups thereof in a hierarchical fashion. An enterprise, for instance, can be subdivided into divisions, departments, projects active in the departments, user groups working on those projects, and users in those user groups. The share-based hierarchy is called a share-tree, and once a share-tree is defined, its entitlement distribution is automatically implemented by the grid engine software.

share-tree

The hierarchical definition of a share-based policy.

soft resource requirements

Resources that a job needs but that do not have to be allocated before a job can be started. Allocated to a job on an as-available basis. Contrast with hard resource requirements.

submit host

Submit hosts allow for submitting and controlling batch jobs only. In particular, a user who is logged in to a submit host can submit jobs using qsub, can control the job status using qstat, and can use the grid engine system's OSF/1 Motif graphical user interface QMON.

suspension

The process of holding a running job but keeping it on the execution host (in contrast to checkpointing, where the job is aborted). A suspended job still consumes some resources, such as swap memory or file space.

ticket

A generic unit for resource share definition. The more ticket shares that a job, user, project, or other component has, the more important it is. If a job has twice as many tickets as another job, for example, that job is entitled to twice the resource consumption.

usage

Another term for “resources consumed.” Usage is determined by an administrator-configurable weighted sum of CPU time consumed, memory occupied over time, and amount of I/O performed.

users

People who can submit jobs to the grid and run them if they have a valid login ID on at least one submit host and one execution host.

userset

Either an access list or a department.