Sun N1 Grid Engine 6.1 Administration Guide

Configuring Hosts

N1 Grid Engine 6.1 software maintains object lists for all types of hosts except for the master host. The lists of administration host objects and submit host objects indicate whether a host has administrative or submit permission. The execution host objects include other parameters. Among these parameters are the load information that is reported by the sge_execd running on the host, and the load parameter scaling factors that are defined by the administrator.

You can configure host objects with QMON or from the command line.

QMON Host Configuration dialog box has four tabs:

The qconf command provides the command-line interface for managing host objects.

Configuring Execution Hosts With QMON

Before you configure an execution host, you must first install the software on the execution host as described in How to Install Execution Hosts in Sun N1 Grid Engine 6.1 Installation Guide.

To configure execution hosts, on the QMON Main Control window click the Host Configuration button, and then click the Execution Host tab. The Execution Host tab looks like the following figure:

Figure 1–1 Execution Host Tab

Dialog box titled Host Configuration. Shows Execution
Host tab with hosts, attributes. Shows Add, Modify, Delete, Shutdown,
Done, and Help buttons.


Note –

Administrative or submit commands are allowed from execution hosts only if the execution hosts are also declared to be administration or submit hosts. See Configuring Administration Hosts With QMON and Configuring Submit Hosts With QMON.


The Hosts list displays the execution hosts that are already defined.

The Load Scaling list displays the currently configured load-scaling factors for the selected execution host. See Load Parameters for information about load parameters.

The Access Attributes list displays access permissions. See Chapter 4, Managing User Access for information about access permissions.

The Consumables/Fixed Attributes list displays resource availability for consumable and fixed resource attributes associated with the host. See Complex Resource Attributes for information about resource attributes.

The Reporting Variables list displays the variables that are written to the reporting file when a load report is received from an execution host. See Defining Reporting Variables for information about reporting variables.

The Usage Scaling list displays the current scaling factors for the individual usage metrics CPU, memory, and I/O for different machines. Resource usage is reported by sge_execd periodically for each currently running job. The scaling factors indicate the relative cost of resource usage on the particular machine for the user or project running a job. These factors could be used, for instance, to compare the cost of a second of CPU time on a 400 MHz processor to that of a 600 MHz CPU. Metrics that are not displayed in the Usage Scaling window have a scaling factor of 1.

Adding or Modifying an Execution Host

To add or modify an execution host, click Add or Modify. The Add/Modify Exec Host dialog box appears.

Dialog box titled Add/Modify Exec Host. Shows
Scaling tab with Load Scaling and Usage Scaling tables. Shows Ok and
Cancel buttons.

The Add/Modify Exec Host dialog box enables you to modify all attributes associated with an execution host. The name of an existing execution host is displayed in the Host field.

If you are adding a new execution host, type its name in the Host field.

Defining Scaling Factors

To define scaling factors, click the Scaling tab.

The Load column of the Load Scaling table lists all available load parameters, and the Scale Factor column lists the corresponding definitions of the scaling. You can edit the Scale Factor column. Valid scaling factors are positive floating-point numbers in fixed-point notation or scientific notation.

The Usage column of the Usage Scaling table lists the current scaling factors for the usage metrics CPU, memory, and I/O. The Scale Factor column lists the corresponding definitions of the scaling. You can edit the Scale Factor column. Valid scaling factors are positive floating-point numbers in fixed-point notation or scientific notation.

Defining Resource Attributes

To define the resource attributes to associate with the host , click the Consumables/Fixed Attributes tab.

Dialog box titled Add/Modify Exec Host. Shows
Consumables/Fixed Attributes tab with table of attributes. Shows Ok
and Cancel buttons.

The resource attributes associated with the host are listed in the Consumables/Fixed Attributes table.

Use the Complex Configuration dialog box if you need more information about the current complex configuration, or if you want to modify it. For details about complex resource attributes, see Complex Resource Attributes.

The Consumables/Fixed Attributes table lists all resource attributes for which a value is currently defined. You can enhance the list by clicking either the Name or the Value column name. The Attribute Selection dialog box appears, which includes all resource attributes that are defined in the complex.

Figure 1–2 Attribute Selection Dialog Box

Dialog box titled Select an Item. Shows list
of available attributes and selection text box. Shows OK, Cancel,
and Help buttons.

To add an attribute to the Consumables/Fixed Attributes table, select the attribute, and then click OK.

To modify an attribute value, double-click a Value field, and then type a value.

To delete an attribute, select the attribute, and then press Control-D or click mouse button 3. Click OK to confirm that you want to delete the attribute.

Defining Access Permissions

To define user access permissions to the execution host based on previously configured user access lists, click the User Access tab.

Dialog box titled Add/Modify Exec Host. Shows
User Access tab with user access lists. Shows Ok and Cancel buttons.

To define project access permissions to the execution host based on previously configured projects, click the Project Access tab.

Dialog box titled Add/Modify Exec Host. Shows
Project Access tab with project access lists. Shows Ok and Cancel
buttons.

Defining Reporting Variables

To define reporting variables, click the Reporting Variables tab.

Dialog box titled Add/Modify Exec Host. Shows
Reporting Variables tab with variable lists. Shows Ok and Cancel buttons.

The Available list displays all the variables that can be written to the reporting file when a load report is received from the execution host.

Select a reporting variable from the Available list, and then click the red right arrow to add the selected variable to the Selected list.

To remove a reporting variable from the Selected list, select the variable, and then click the left red arrow.

Deleting an Execution Host

To delete an execution host, on the QMON Main Control window click the Host Configuration button, and then click the Execution Host tab.

In the Execution Host dialog box, select the host that you want to delete, and then click Delete.

Shutting Down an Execution Host Daemon

To shut down an execution host daemon, on the QMON Main Control window click the Host Configuration button, and then click the Execution Host tab.

In the Execution Host dialog box, select a host, and then click Shutdown.

Configuring Execution Hosts From the Command Line

To configure execution hosts from the command line, use the following arguments for the qconf command:

Configuring Administration Hosts With QMON

On the QMON Main Control window, click the Host Configuration button. The Host Configuration dialog box appears, displaying the Administration Host tab. The Administration Host tab looks like the following figure:

Figure 1–3 Administration Host Tab

Dialog box titled Host Configuration. Shows Administration
Host tab with Hosts list. Shows Add, Modify, Delete, Shutdown, Done,
and Help buttons.


Note –

The Administration Host tab is displayed by default when you click the Host Configuration button for the first time.


Use the Administration Host tab to configure hosts on which administrative commands are allowed. The Host list displays the hosts that already have administrative permission.

Adding an Administration Host

To add a new administration host, type its name in the Host field, and then click Add, or press the Return key.

Deleting an Administration Host

To delete an administration host from the list, select the host, and then click Delete.

Configuring Administration Hosts From the Command Line

To configure administration hosts from the command line, use the following arguments for the qconf command:

Configuring Submit Hosts With QMON

No administrative commands are allowed from submit hosts unless the hosts are also declared to be administration hosts. See Configuring Administration Hosts With QMON for more information.

To configure submit hosts, on the QMON Main Control window click the Host Configuration button, and then click the Submit Host tab. The Submit Host tab is shown in the following figure.

Figure 1–4 Submit Host Tab

Dialog box titled Host Configuration. Shows Submit
Host tab with Host list. Shows Add, Modify, Delete, Shutdown, Done,
and Help buttons.

Use the Submit Host tab to declare the hosts from which jobs can be submitted, monitored, and controlled. The Host list displays the hosts that already have submit permission.

Adding a Submit Host

To add a submit host, type its name in the Host field, and then click Add, or press the Return key.

Deleting a Submit Host

To delete a submit host, select it, and then click Delete.

Configuring Submit Hosts From the Command Line

To configure submit hosts from the command line, use the following arguments for the qconf command:

Configuring Host Groups With QMON

Host groups enable you to use a single name to refer to multiple hosts. You can group similar hosts together in a host group. A host group can include other host groups as well as multiple individual hosts. Host groups that are members of another host group are subgroups of that host group.

For example, you might define a host group called @bigMachines that includes the following members:

@solaris64

@solaris32

fangorn

balrog

The initial @ sign indicates that the name is a host group. The host group @bigMachines includes all hosts that are members of the two subgroups @solaris64 and @solaris32. @bigMachines also includes two individual hosts, fangorn and balrog.

On the QMON Main Control window, click the Host Configuration button. The Host Configuration dialog box appears.

Click the Host Groups tab. The Host Groups tab looks like the following figure.

Figure 1–5 Host Groups Tab

Dialog box titled Host Configuration. Shows Host
Groups tab with Hostgroup and Members lists.

Use the Host Groups tab to configure host groups. The Hostgroup list displays the currently configured host groups. The Members list displays all the hosts that are members of the selected host group.

Adding or Modifying a Host Group

To add a host group, click Add. To Modify a host group, click Modify. The Add/Modify Host Group dialog box appears.

Dialog box titled Add/Modify Host Group. Shows
fields for defining host groups and their members. Shows Ok and Cancel
buttons.

If you are adding a new host group, type a host group name in the Hostgroup field. The host group name must begin with an @ sign.

If you are modifying an existing host group, the host group name is provided in the Hostgroup field.

To add a host to the host group that you are configuring, type the host name in the Host field, and then click the red arrow to add the name to the Members list. To add a host group as a subgroup, select a host group name from the Defined Host Groups list, and then click the red arrow to add the name to the Members list.

To remove a host or a host group from the Members list, select it, and then click the trash icon.

Click Ok to save your changes and close the dialog box. Click Cancel to close the dialog box without saving your changes.

Deleting a Host Group

To delete a host group, select it from the Hostgroup list, and then click Delete.

Configuring Host Groups From the Command Line

To configure host groups from the command line, use the following arguments for the qconf command:

Monitoring Execution Hosts With qhost

Use the qhost command to retrieve a quick overview of the execution host status:


% qhost

This command produces output that is similar to the following example:


Example 1–1 Sample qhost Output


HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
arwen                   aix43           1     -       -       -       -       -
baumbart                irix65          2  0.00    1.1G   91.5M  128.0M     0.0
boromir                 hp11            1     -  128.0M       -  256.0M       -
carc                    lx24-amd64      2  0.00    3.8G  989.8M    1.0G     0.0
denethor                aix51           1 4.54G       -       -       -       -
durin                   lx24-x86        1  0.37  123.1M   46.5M  213.6M   26.6M
eomer                   sol-sparc64     1  0.13  256.0M  248.0M  513.0M   93.0M
lolek                   tru64           1  0.02    1.0G  790.0M    1.0G    8.0K
mungo                   lx22-alpha      1  1.00  248.9M   78.8M  129.8M    2.5M
nori                    sol-x86         2  0.38 1023.0M  372.0M  512.0M   37.0M
pippin                  darwin          1  0.00  640.0M  264.0M     0.0     0.0
smeagol                 hp11            1  0.35  512.0M  425.0M    1.0G   95.0M

See the qhost(1) man page for a description of the output format and for more options.

Invalid Host Names

The following is a list of host names that are invalid, reserved, or otherwise not allowed to be used:

global

template

all

default

unknown

none

Killing Daemons From the Command Line

To kill grid engine system daemons from the command line, use one of the following commands:


% qconf -ke[j] {hostname,... | all}
% qconf -ks
% qconf -km

You must have manager or operator privileges to use these commands. See Chapter 4, Managing User Access for more information about manager and operator privileges.

If you want to wait for any active jobs to finish before you run the shutdown procedure, use the qmod -dq command for each cluster queue, queue instance, or queue domain before you run the qconf sequence described earlier. For information about cluster queues, queue instances, and queue domains, see Configuring Queues.


% qmod -dq {cluster-queue | queue-instance | queue-domain}

The qmod -dq command prevents new jobs from being scheduled to the disabled queue instances. You should then wait until no jobs are running in the queue instances before you kill the daemons.

Restarting Daemons From the Command Line

Log in as root on the machine on which you want to restart grid engine system daemons.

Type the following commands to run the startup scripts:


% sge-root/cell/common/sgemaster
% sge-root/cell/common/sgeexecd

These scripts looks for the daemons normally running on this host and then start the corresponding ones.