N1 Grid Engine 6.1 software maintains object lists for all types of hosts except for the master host. The lists of administration host objects and submit host objects indicate whether a host has administrative or submit permission. The execution host objects include other parameters. Among these parameters are the load information that is reported by the sge_execd running on the host, and the load parameter scaling factors that are defined by the administrator.
You can configure host objects with QMON or from the command line.
QMON Host Configuration dialog box has four tabs:
Administration Host tab. See Figure 1–3.
Submit Host tab. See Figure 1–4.
Host Groups tab. See Figure 1–5.
Execution Host tab. See Figure 1–1.
The qconf command provides the command-line interface for managing host objects.
Before you configure an execution host, you must first install the software on the execution host as described in How to Install Execution Hosts in Sun N1 Grid Engine 6.1 Installation Guide.
To configure execution hosts, on the QMON Main Control window click the Host Configuration button, and then click the Execution Host tab. The Execution Host tab looks like the following figure:
Administrative or submit commands are allowed from execution hosts only if the execution hosts are also declared to be administration or submit hosts. See Configuring Administration Hosts With QMON and Configuring Submit Hosts With QMON.
The Hosts list displays the execution hosts that are already defined.
The Load Scaling list displays the currently configured load-scaling factors for the selected execution host. See Load Parameters for information about load parameters.
The Access Attributes list displays access permissions. See Chapter 4, Managing User Access for information about access permissions.
The Consumables/Fixed Attributes list displays resource availability for consumable and fixed resource attributes associated with the host. See Complex Resource Attributes for information about resource attributes.
The Reporting Variables list displays the variables that are written to the reporting file when a load report is received from an execution host. See Defining Reporting Variables for information about reporting variables.
The Usage Scaling list displays the current scaling factors for the individual usage metrics CPU, memory, and I/O for different machines. Resource usage is reported by sge_execd periodically for each currently running job. The scaling factors indicate the relative cost of resource usage on the particular machine for the user or project running a job. These factors could be used, for instance, to compare the cost of a second of CPU time on a 400 MHz processor to that of a 600 MHz CPU. Metrics that are not displayed in the Usage Scaling window have a scaling factor of 1.
To add or modify an execution host, click Add or Modify. The Add/Modify Exec Host dialog box appears.
The Add/Modify Exec Host dialog box enables you to modify all attributes associated with an execution host. The name of an existing execution host is displayed in the Host field.
If you are adding a new execution host, type its name in the Host field.
To define scaling factors, click the Scaling tab.
The Load column of the Load Scaling table lists all available load parameters, and the Scale Factor column lists the corresponding definitions of the scaling. You can edit the Scale Factor column. Valid scaling factors are positive floating-point numbers in fixed-point notation or scientific notation.
The Usage column of the Usage Scaling table lists the current scaling factors for the usage metrics CPU, memory, and I/O. The Scale Factor column lists the corresponding definitions of the scaling. You can edit the Scale Factor column. Valid scaling factors are positive floating-point numbers in fixed-point notation or scientific notation.
To define the resource attributes to associate with the host , click the Consumables/Fixed Attributes tab.
The resource attributes associated with the host are listed in the Consumables/Fixed Attributes table.
Use the Complex Configuration dialog box if you need more information about the current complex configuration, or if you want to modify it. For details about complex resource attributes, see Complex Resource Attributes.
The Consumables/Fixed Attributes table lists all resource attributes for which a value is currently defined. You can enhance the list by clicking either the Name or the Value column name. The Attribute Selection dialog box appears, which includes all resource attributes that are defined in the complex.
To add an attribute to the Consumables/Fixed Attributes table, select the attribute, and then click OK.
To modify an attribute value, double-click a Value field, and then type a value.
To delete an attribute, select the attribute, and then press Control-D or click mouse button 3. Click OK to confirm that you want to delete the attribute.
To define user access permissions to the execution host based on previously configured user access lists, click the User Access tab.
To define project access permissions to the execution host based on previously configured projects, click the Project Access tab.
To define reporting variables, click the Reporting Variables tab.
The Available list displays all the variables that can be written to the reporting file when a load report is received from the execution host.
Select a reporting variable from the Available list, and then click the red right arrow to add the selected variable to the Selected list.
To remove a reporting variable from the Selected list, select the variable, and then click the left red arrow.
To delete an execution host, on the QMON Main Control window click the Host Configuration button, and then click the Execution Host tab.
In the Execution Host dialog box, select the host that you want to delete, and then click Delete.
To shut down an execution host daemon, on the QMON Main Control window click the Host Configuration button, and then click the Execution Host tab.
In the Execution Host dialog box, select a host, and then click Shutdown.
To configure execution hosts from the command line, use the following arguments for the qconf command:
The -ae option (add execution host) displays an editor containing an execution host configuration template. The editor is either the default vi editor or an editor corresponding to the EDITOR environment variable. If you specify exec-host, which is the name of an already configured execution host, the configuration of this execution host is used as a template. The execution host is configured by changing the template and saving to disk. See the host_conf(5) man page for a detailed description of the template entries to be changed.
The -de option (delete execution host) deletes the specified host from the list of execution hosts. All entries in the execution host configuration are lost.
The -me option (modify execution host) displays an editor containing the configuration of the specified execution host as template. The editor is either the default vi editor or an editor corresponding to the EDITOR environment variable. The execution host configuration is modified by changing the template and saving to disk. See the host_conf(5) man page for a detailed description of the template entries to be changed.
The -Me option (modify execution host) uses the content of filename as execution host configuration template. The configuration in the specified file must refer to an existing execution host. The configuration of this execution host is replaced by the file content. This qconf option is useful for changing the configuration of offline execution hosts, for example, in cron jobs, as the -Me option requires no manual interaction.
The -se option (show execution host) shows the configuration of the specified execution host as defined in host_conf.
The -sel option (show execution host list) displays a list of hosts that are configured as execution hosts.
On the QMON Main Control window, click the Host Configuration button. The Host Configuration dialog box appears, displaying the Administration Host tab. The Administration Host tab looks like the following figure:
The Administration Host tab is displayed by default when you click the Host Configuration button for the first time.
Use the Administration Host tab to configure hosts on which administrative commands are allowed. The Host list displays the hosts that already have administrative permission.
To add a new administration host, type its name in the Host field, and then click Add, or press the Return key.
To delete an administration host from the list, select the host, and then click Delete.
To configure administration hosts from the command line, use the following arguments for the qconf command:
The -ah option (add administration host) adds the specified host to the list of administration hosts.
The -dh option (delete administration host) deletes the specified host from the list of administration hosts.
The -sh option (show administration hosts) displays a list of all currently configured administration hosts.
No administrative commands are allowed from submit hosts unless the hosts are also declared to be administration hosts. See Configuring Administration Hosts With QMON for more information.
To configure submit hosts, on the QMON Main Control window click the Host Configuration button, and then click the Submit Host tab. The Submit Host tab is shown in the following figure.
Use the Submit Host tab to declare the hosts from which jobs can be submitted, monitored, and controlled. The Host list displays the hosts that already have submit permission.
To add a submit host, type its name in the Host field, and then click Add, or press the Return key.
To delete a submit host, select it, and then click Delete.
To configure submit hosts from the command line, use the following arguments for the qconf command:
The -as option (add submit host) adds the specified host to the list of submit hosts.
The -ds option (delete submit host) deletes the specified host from the list of submit hosts.
The -ss option (show submit hosts) displays a list of the names of all currently configured submit hosts.
Host groups enable you to use a single name to refer to multiple hosts. You can group similar hosts together in a host group. A host group can include other host groups as well as multiple individual hosts. Host groups that are members of another host group are subgroups of that host group.
For example, you might define a host group called @bigMachines that includes the following members:
@solaris64 |
@solaris32 |
fangorn |
balrog |
The initial @ sign indicates that the name is a host group. The host group @bigMachines includes all hosts that are members of the two subgroups @solaris64 and @solaris32. @bigMachines also includes two individual hosts, fangorn and balrog.
On the QMON Main Control window, click the Host Configuration button. The Host Configuration dialog box appears.
Click the Host Groups tab. The Host Groups tab looks like the following figure.
Use the Host Groups tab to configure host groups. The Hostgroup list displays the currently configured host groups. The Members list displays all the hosts that are members of the selected host group.
To add a host group, click Add. To Modify a host group, click Modify. The Add/Modify Host Group dialog box appears.
If you are adding a new host group, type a host group name in the Hostgroup field. The host group name must begin with an @ sign.
If you are modifying an existing host group, the host group name is provided in the Hostgroup field.
To add a host to the host group that you are configuring, type the host name in the Host field, and then click the red arrow to add the name to the Members list. To add a host group as a subgroup, select a host group name from the Defined Host Groups list, and then click the red arrow to add the name to the Members list.
To remove a host or a host group from the Members list, select it, and then click the trash icon.
Click Ok to save your changes and close the dialog box. Click Cancel to close the dialog box without saving your changes.
To delete a host group, select it from the Hostgroup list, and then click Delete.
To configure host groups from the command line, use the following arguments for the qconf command:
qconf -ahgrp [host-group-name]
The -ahgrp option (add host group) adds a new host group to the list of host groups. See the hostgroup(5) man page for a detailed description of the configuration format.
The -Ahgrp option (add host group from file) displays an editor containing a host group configuration defined in filename. The editor is either the default vi editor or an editor corresponding to the EDITOR environment variable. The host group is configured by changing the configuration and saving to disk.
The -dhgrp option (delete host group) deletes the specified host group from the list of host groups. All entries in the host group configuration are lost.
The -mhgrp option (modify host group) displays an editor containing the configuration of the specified host group as template. The editor is either the default vi editor or an editor corresponding to the EDITOR environment variable. The host group configuration is modified by changing the template and saving to disk.
The -Mhgrp option (modify host group from file) uses the content of filename as host group configuration template. The configuration in the specified file must refer to an existing host group. The configuration of this host group is replaced by the file content.
The -shgrp option (show host group) shows the configuration of the specified host group.
qconf -shgrp_tree host-group-name
The -shgrp_tree option (show host group as tree) shows the configuration of the specified host group and its sub-hostgroups as a tree.
qconf -shgrp_resolved host-group-name
The -shgrp_resolved option (show host group with resolved host list) shows the configuration of the specified host group with a resolved host list.
The -shgrpl option (show host group list) displays a list of all host groups.
Use the qhost command to retrieve a quick overview of the execution host status:
% qhost |
This command produces output that is similar to the following example:
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - arwen aix43 1 - - - - - baumbart irix65 2 0.00 1.1G 91.5M 128.0M 0.0 boromir hp11 1 - 128.0M - 256.0M - carc lx24-amd64 2 0.00 3.8G 989.8M 1.0G 0.0 denethor aix51 1 4.54G - - - - durin lx24-x86 1 0.37 123.1M 46.5M 213.6M 26.6M eomer sol-sparc64 1 0.13 256.0M 248.0M 513.0M 93.0M lolek tru64 1 0.02 1.0G 790.0M 1.0G 8.0K mungo lx22-alpha 1 1.00 248.9M 78.8M 129.8M 2.5M nori sol-x86 2 0.38 1023.0M 372.0M 512.0M 37.0M pippin darwin 1 0.00 640.0M 264.0M 0.0 0.0 smeagol hp11 1 0.35 512.0M 425.0M 1.0G 95.0M |
See the qhost(1) man page for a description of the output format and for more options.
The following is a list of host names that are invalid, reserved, or otherwise not allowed to be used:
global |
template |
all |
default |
unknown |
none |
To kill grid engine system daemons from the command line, use one of the following commands:
% qconf -ke[j] {hostname,... | all} % qconf -ks % qconf -km |
You must have manager or operator privileges to use these commands. See Chapter 4, Managing User Access for more information about manager and operator privileges.
The qconf –ke command shuts down the execution daemons. However, it does not cancel active jobs. Jobs that finish while no sge_execd is running on a system are not reported to sge_qmaster until sge_execd is restarted. The job reports are not lost, however.
The qconf -kej command kills all currently active jobs and brings down all execution daemons.
Use a comma-separated list of the execution hosts you want to shut down, or specify all to shut down all execution hosts in the cluster.
The qconf -ks command shuts down the scheduler sge_schedd.
The qconf -km command forces the sge_qmaster process to terminate.
If you want to wait for any active jobs to finish before you run the shutdown procedure, use the qmod -dq command for each cluster queue, queue instance, or queue domain before you run the qconf sequence described earlier. For information about cluster queues, queue instances, and queue domains, see Configuring Queues.
% qmod -dq {cluster-queue | queue-instance | queue-domain} |
The qmod -dq command prevents new jobs from being scheduled to the disabled queue instances. You should then wait until no jobs are running in the queue instances before you kill the daemons.
Log in as root on the machine on which you want to restart grid engine system daemons.
Type the following commands to run the startup scripts:
% sge-root/cell/common/sgemaster % sge-root/cell/common/sgeexecd |
These scripts looks for the daemons normally running on this host and then start the corresponding ones.