|Oracle® Grid Engine Administration Guide
Release 6.2 Update 7
Part Number E21978-01
|PDF · Mobi · ePub|
This chapter describes the Grid Engine administration.
As an administrator, you can choose to interact with the Grid Engine system using the command line interface, the graphical user interface (QMON), and the Distributed Resource Management Application API (DRMAA).
The command line interface provides more flexibility than the graphical user interface in configuring, monitoring, and controlling the Grid Engine system. Many experienced administrators find that using files and scripts is a more flexible, quicker, and more powerful way to change settings.
The following commands are central to Grid Engine administration:
qconf - Add, delete, and modify the current Grid Engine configuration. For more information, see Using qconf.
qhost - View current status of the available Grid Engine hosts, the queues, and the jobs associated with the queues.
qalter and qsub - Submit jobs.
qstat - Show the status of Grid Engine jobs and queues.
qquota - List each resource quota that is being used at least once or that defines a static limit.
You can use QMON, the graphical user interface (GUI) tool, to accomplish most Grid Engine system tasks. Only administrators can configure QMON using the specifically designed resource file. Reasonable defaults are compiled in $SGE_ROOT/qmon/Qmon. This file also includes a sample resource file. Refer to the comment lines in the sample Qmon file for detailed information on the possible customizations.
As the cluster administrator, you can do any of the following:
Install site-specific defaults in standard locations such as /usr/lib/X11/app-defaults/Qmon.
Include QMON-specific resource definitions in the standard .Xdefaults or.Xresources files.
Put a site-specific Qmon file in a location referenced by standard search paths such as XAPPLRESDIR.
You can automate Grid Engine functions by writing scripts that run Grid Engine commands and parse the results. However, for more consistent and efficient results, you can use the Distributed Resource Management Application API (DRMAA).
The bootstrap file contains parameters that are needed for starting up the Grid Engine components. The bootstrap file is created during the sge_qmaster installation. You cannot modify the bootstrap file in a running system.
Note:Any changes made to the bootstrap file become effective only after restarting the qmaster.
The following section provides a brief description of the individual parameters that compose the bootstrap configuration for a Grid Engine cluster.
The admin_user parameter is the administrative user account used by Grid Engine for all internal file handling operations like status spooling, message logging, and so on. This parameter can be used in cases where the root user account does not have the corresponding file access permissions. For example, on a shared file system without global root read/write access.
As the admin_user parameter is set at installation time, you cannot change the parameter in a running system. You can manually change the admin_user parameter on a shutdown cluster. However, if access to the Grid Engine spooling area is interrupted, it will result in unpredictable behavior.
The admin_user parameter has no default value. The default value can be defined during the master installation procedure.
The default_domain parameter is needed if your Grid Engine cluster covers hosts belonging to more than a single DNS domain. In this case, it can be used if your host name resolving yields both qualified and unqualified host names for the hosts in one of the DNS domains. The value of the default_domain parameter is appended to the unqualified host name to define a fully qualified host name.
The default_domain parameter will have no effect if the ignore_fqdn parameter is set to True.
As the default_domain parameter is set at installation time, you cannot change the parameter in a running system.
The default value for the default_domain parameter is None.
The ignore_fqdn parameter is used to ignore the fully qualified domain name component of host names. This parameter should be set if all hosts belonging to a Grid Engine cluster are part of a single DNS domain. The ignore_fqdn parameter is enabled if it is set to either True or 1. Enabling the ignore_fqdn parameter can solve problems with load reports caused due to different host name resolutions across the cluster.
As the ignore_fqdn parameter is set at installation time, you cannot change the parameter in a running system.
The default value for the ignore_fqdn parameter is True.
The spooling_method parameter defines how sge_qmaster writes its configuration and the status information of a running cluster. The available spooling methods are berkeleydb and classic.
The name of a shared library containing the spooling_method parameter to be loaded at sge_qmaster initialization time. The extension characterizing a shared library like .so, .sl, or .dylib is not contained in the spooling_lib parameter.
If the spooling_method parameter is set to berkeleydb during installation, the spooling_lib parameter is set to libspoolb. If the classic option is chosen as spooling_method during installation, the spooling_lib parameter is set to libspoolc.
You should note that not all operating systems allow the dynamic loading of libraries. On such operating systems a certain spooling method with the default value berkeleydb is compiled into the binaries and the spooling_lib parameter will be ignored.
The spooling_params parameter defines parameters to the chosen spooling method. These parameters are required to initialize the spooling framework. For example, you can define parameters to open database files or to connect to a certain database server.
The spooling parameter value for the berkeleydb spooling method is [rpc_server:]database directory. For example, /sge_local/default/spool/qmaster/spooldb for spooling to a local file system or myhost:sge for spooling over a Berkeley DB RPC server.
The spooling parameter value for the classic spooling method is <common_dir>;<qmaster spool dir>. For example, /sge/default/common;/sge/default/spool/qmaster.
The directory path where the Grid Engine binaries reside. It is used within the Grid Engine components to locate and start up other Grid Engine programs.
The path name given here is searched for binaries as well as any directory below with a directory name equal to the current operating system architecture. Therefore, /usr/SGE/bin will work for all architectures, if the corresponding binaries are located in subdirectories named aix43, cray, lx24-x86, hp11, irix65, tru64, sol-sparc, and so on.
The default location for the binary path is <sge_root>/bin
The location where the master spool directory resides. sge_qmaster(8) and sge_shadowd(8) need to have access to this directory. The master spool directory, in particular the job_scripts directory and the messages log file, may become quite large depending on the size of the cluster and the number of jobs. Ensure that you allocate enough disk space and regularly clean up the log files. For example, you can achieve this with the help of a cron(8) job.
As the qmaster_spool_dir parameter is set at installation time, you cannot change the parameter in a running system.
The default location for the qmaster_spool_directory parameter is <sge_root>/<cell>/spool/qmaster.
The security_mode parameter defines the set of security features used by the Grid Engine cluster. The possible security mode settings are None, AFS, DCE, KERBEROS, and CSP.
Note:Modifying the security_mode parameter generally will require you to reinstall the Grid Engine cluster. For more information on editing the security_mode parameter, contact a Grid Engine support specialist.
The listener_threads parameter defines the number of listener threads. The default number is set during installation.
The worker_threads parameter defines the number of worker threads. The default number is set during installation.
The scheduler_threads parameter defines the number of scheduler threads. The allowed values for this parameter are 0 and 1. The default value (1) is set during installation. For more information, refer qconf(1) -kt/-at option.
The jvm_threads parameter defines the number of JVM threads. The allowed values are 0 and 1. The default value is set during installation.
Note:The bootstrap file parameters admin_user, default_domain, ignore_fqdn, and binary_path also effect the behavior of the execution daemon execd and require that execd be restarted at the same time as qmaster.