N1 Grid Engine 6 Release Notes

N1 Grid Engine 6 Update 4 Software Release Notes

Accessing Documentation

The distribution CD includes full documentation for a networked set of computer hosts that run N1TM Grid Engine 6 Update 4 software:

You can access these files directly from the CD, in either PDF or HTML formats. The files are in the cdmountpoint/N1_Grid_Engine_6u4/Docs directory.



Contents of This Software Package

The Grid Engine 6 Update 4 software distribution is made up of the following components:

The Grid Engine 6 Update 4 software distribution kit contains the following top-level directory hierarchy:

Installing the N1 Grid Engine 6 Software

If you plan to add a new installation of the N1 Grid Engine 6 software or if you are just adding new packages to your N1 Grid Engine cluster installation (like Windows support, ARCo, or GEMM which have not been installed previously), see N1GE6Update4_Installation_Guide.pdf, the N1 Grid Engine 6 Installation Guide that is included on the distribution CD. If you already installed the N1 Grid Engine 6 software packages, you should install the patches which are available on http://sunsolve.sun.com. See the directions in the patch README documents on how to install the patches.

The patch matrix below lists the available patches for N1 Grid Engine 6 which are currently available (May 2005). Newer revisions of the patches or additional patches may be available at a later time. Please check http://sunsolve.sun.com for the availability of N1 Grid Engine 6 patches.

Table 1 Patches For Packages in Sun pkgadd Format

Package Name1

Operating System

Architecture2

Patch-Id

SUNWsgee 

Solaris Sparc, 32bit 

sol-sparc 

118094-04 

SUNWsgeex 

Solaris, Sparc, 64bit 

sol-sparc64 

118130-04 

SUNWsgeei 

Solaris x86 

sol-x86 

118131-04 

SUNWsgeec 

all 

common 

118132-04 

SUNWsgeea 

all 

arco 

118133-04 

SUNWsgeed 

all 

doc 

119846-01 

  1. See pkginfo(1)

  2. N1 Grid Engine binary architecture string or common (Architecture independent packages), arco (Accounting and Reporting Console), and doc (Documentation)

Table 2 Patches for Packages in tar.gz Format

Operating System 

Architecture 

Patch-ID 

Solaris, Sparc, 32bit 

sol-sparc 

118082-04 

Solaris, Sparc, 64bit 

sol-sparc64 

118083-04 

Solaris, x86 

sol-x86 

118084-04 

Linux kernel 2.4/2.6 

x86, lx24-x86 

118085-04 

Linux kernel 2.4/2.6 

AMD64 lx24-amd64 

118086-04 

IBM AIX 4.3 

aix43 

118087-04 

IBM AIX 5.1 

aix51 

118088-04 

Apple MAC OS/X 

darwin 

118089-04 

HP-UX 11 

hp11 

118090-04 

SGI Irix 6.5 

irix65 

118091-04 

all 

common 

118092-04 

all 

arco 

118093-04 

all 

doc 

119861-01 

Changes in N1 Grid Engine 6 Update 4 Software

Along with many bug fixes, N1 Grid Engine 6 Update 4 includes the following changes.

Support for Microsoft Windows Operating Systems

N1 Grid Engine 6 Update 4 Windows client functionality (submit, administration and execution host) is now available for Microsoft Windows 2000 SP3 (or higher), Windows XP Professional SP1 (or higher) and Windows Server 2003. The N1 Grid Engine command line tools and execution host functionality are almost fully supported on these operating systems.

The support of N1 Grid Engine for Windows allows users to fully integrate Windows hosts into an existing N1 Grid Engine environment. Users are able to submit and monitor their jobs through the command line tools. Administrators can have full control over a N1 Grid Engine cluster from a Windows host. The execution host functionality allows you to use Windows desktop machines and dedicated Windows compute servers for the execution of batch workload and interactive jobs.

Installation of N1 Grid Engine 6 U4 requires Microsoft Services For UNIX (SFU) 3.5 which provides tools and libraries to integrate Windows with UNIX. SFU 3.5 is available for no license fee and is supported by Microsoft. See http://www.microsoft.com/windows/sfu/default.asp for information and requirements about SFU as well as how to get SFU.

New Grid Engine Management Module (GEMM) for Sun Control Station

GEMM is a new addition to N1GE6 which provides a web-based interface for deployment, monitoring, and diagnostics of an N1 Grid Engine installation. It operates in the framework provided by Sun Control Station 2.2, a product which must be purchased separately.

Sun Control Station (SCS) 2.2 provides overall life-cycle management of servers, from bare-metal OS provisioning, to software and patch deployment, to basic health, inventory, and hardware monitoring, all in an easy-to-user web interface. GEMM adds to this the following capabilities:

Support for Solaris 10 x64 (Solaris on Opteron systems 64-bit)

Solaris 10_x64 (on AMD Opteron hardware) is now fully supported with this release.

Other Functionality Delivered With This Update Release

This list summarizes new and improved functionality which has been added to the N1 Grid Engine 6 software since it was released in June 2004.

New Features in N1 Grid Engine 6 Software

The original N1 Grid Engine 6 provides the following new features.

Accounting and Reporting Console (ARCo)

The optional ARCo enables you to gather live accounting and reporting data from a grid engine system and store the data in a standard SQL database. ARCo also provides a web-based tool for generating information queries on that database and for retrieving the results in tabular or graphical form. ARCo enables you to store queries for later use, to run predefined queries, and to run queries in batch mode, for example, overnight.

For details, see Chapter 5, Accounting and Reporting, in N1 Grid Engine 6 User’s Guide, and Chapter 8, Installing the Accounting and Reporting Console, in N1 Grid Engine 6 Installation Guide.

Resource Reservation

The grid engine system scheduler supports a highly flexible resource reservation scheme. Jobs can reserve resources depending on criteria such as resource requirements, priority, waiting time, resource sharing entitlements, and so forth. The scheduler enforces reservations in such a way that jobs with highest urgency receive the earliest possible resource assignment. Resource reservation completely avoids well-known problems such as job starvation.

With respect to resource requirements, a job's importance can be defined on a per resource basis for arbitrary resources, as well as for administrator-defined resources such as third party licenses or network bandwidth. Reservations can be assigned across the full hierarchy of grid engine system resource containers: global, host, or queue.

For more information, see the sge_priority(5) man page.

Cluster Queues

N1 Grid Engine 6 software provides a new administrative concept for managing queues. It enables easier administration while maintaining the flexibility of the Sun Grid Engine 5.3 queue concept.

A cluster queue can extend across multiple hosts. Those hosts can be specified as a list of individual hosts, as a host group, or as a list of individual hosts and host groups. By adding a host to a cluster queue, the host receives an instance of that cluster queue. A queue instance corresponds to a queue in Sun Grid Engine 5.3.

When you modify a cluster queue, all of its queue instances are modified simultaneously. Even within a single cluster queue, you can specify differences in the configuration of queue instances, depending on individual hosts or host groups. Therefore, a typical N1 Grid Engine 6 software setup will have only a few cluster queues, and the queue instances controlled by those cluster queues remain largely in the background.

For further details, see the queue_conf(5) man page.

DRMAA

N1 Grid Engine 6 software includes a standard-compliant implementation of the Distributed Resource Management Application API (DRMAA), version 1.0. DRMAA 1.0 is a standard draft for review at Global Grid Forum. It provides a standard API for the integration of applications with Distributed Resource Management System, such as N1 Grid Engine 6 software, with external applications like ISV codes or graphical interfaces. Major functions provided by DRMAA include job submission, job monitoring, and job control. N1 Grid Engine 6 software includes an implementation for the C-language binding of DRMAA. Details are available in the drmaa_*(3) man pages and on the DRMAA home page http://www.drmaa.org/.

Scalability

N1 Grid Engine 6 software implements a number of architectural changes from previous releases in order to support increased scalability:

Scheduler Enhancements

Different scheduling profiles can be selected for setups ranging from high throughput and low scheduling overhead to full policy control. The setups can be selected during the sge_qmaster installation procedure. In addition, a series of enhancements has improved scheduler performance greatly.

Automated Installation and Backup

The N1 Grid Engine 6 software installation procedure can be completely automated to facilitate installation on large numbers of execution hosts, frequently recurring reinstallation of hosts, or integration of the installation process into system management frameworks. For more information, see the file doc/README-Autoinstall.txt.

N1 Grid Engine 6 software also includes an automatic backup script that backs up all cluster configuration files.

qping Utility

A new qping utility enables you to query the status of the sge_qmaster and sge_execd daemons.

Starting Binaries Directly

The qsub command now supports the -shell {y | n} option, which is used with the -b y option, to start a submitted binary directly without an intermediate shell.

Resource Requests for Individual make Rules

In dynamic allocation mode, the qmake command can now specify resource requests for individual make rules.

Grid Engine System Binary Directory

The environment variable SGE_BINARY_PATH is set in the job environment. This variable points to the directory where the grid engine system binaries are installed.

Known Limitations and Workarounds

The following sections contain information about product irregularities discovered during testing, but too late to fix or document.

Known Limitations of N1 Grid Engine 6 Software

This N1 Grid Engine 6 software release has the following limitations:

Known Limitations and Workarounds for the Microsoft Windows Platform

Known Limitations and Workarounds for GEMM