Sun HPC ClusterTools 3.0 Administrator's Guide: With CRE

Nodes and Network Interfaces

Ordinarily, the only administrative action that you need to take with nodes is to enable them for use. Or, if you want to temporarily make a node unavailable for use, disable it.

Other node-related administrative tasks--such as, naming the nodes, identifying the master node, setting memory and process limits, and setting the node's partition attribute--are either handled by the CRE automatically or are controlled via partition-level attributes.

There are no administrative actions required by network interface attributes. They are all controlled by the CRE. The only actions you might want to take with respect to network interfaces is to list them or display their attribute values.

Node Commands

Table 6-5 lists the mpadmin commands that can be used at the Node level.

Table 6-5 Node-Level mpadmin Commands

Command 

Synopsis 

current node

Set the context to the specified node for future commands. 

create node

Create a new node with the given name. 

delete [node]

Delete a node. 

list

List all the defined nodes. 

show [node]

Show a node's attributes. 

dump [node]

Show the attributes of the node and its network interfaces. 

set attribute[=value]

Set the specified attribute of the current node. 

unset attribute

Delete the specified attribute of the current node. 

network

Move to the network interface command level. 

up

Move to the next higher level (Top) command context. 

top

Move to the Top level command context. 

echo ...

Print the rest of the line on the standard output. 

help [command]

Show information about commands (?).

Node Attributes

Nodes are defined by many attributes, most of which are not accessible to mpadmin commands. Although you are not able to affect these attributes, it can be helpful to know of their existence and meaning; hence, they are listed and briefly described in Table 6-6.

Table 6-7 lists the Node-level attributes that can be set via mpadmin commands. However, the enabled and max_total_procs are the only node attributes that you can safely modify. See "enabled " and "max_total_procs" for details.

Table 6-6 Node Attributes That Cannot Be Set by the System Administrator

Attribute 

Kind 

Description 

cpu_idle

Value 

Percent of time CPU is idle. 

cpu_iowait

Value 

Percent of time CPU spent in I/O wait state. 

cpu_kernel

Value 

Percent of time CPU spent in kernel state. 

cpu_swap

Value 

Percent of time CPU spent waiting for swap. 

cpu_type

Value 

Type of CPU, for example, sparc.

cpu_user

Value 

Percent of time CPU spends running user's program 

load1

Value 

Load average for the past minute. 

load5

Value 

Load average for the past five minutes. 

load15

Value 

Load average for the past 15 minutes. 

manufacturer

Value 

Manufacturer of the node, e.g., Sun_Microsystems.

mem_free

Value 

Node's available RAM (in Mbytes). 

mem_total

Value 

Node's total physical memory (in Mbytes). 

ncpus

Value 

Number of CPUs in the node. 

offline

Boolean 

Set automatically by the system if the tm.spmd daemon on the node stops running or is unresponsive; if set, prevents jobs from being spawned on the node.

os_arch_kernel

Value 

Node's kernel architecture (same as output from arch -k, for example, sun4u).

os_name

Value 

Name of the operating system running on the node, for example, SunOS.

os_release

Value 

Operating system's release number, for example, 5.5.1

os_release_maj

Value 

Operating system's major release number, for example, 5.

os_release_min

Value 

Operating system's minor release number, for example, 5 or 6. 

os_version

Value 

Operating system's version, for example, GENERIC.

serial_number

Value 

Hardware serial number or host id.

swap_free

Value 

Node's available swap space (in Mbytes). 

swap_total

Value 

Node's total swap space (in Mbytes).

update_time

Value 

When this information was last updated.

update_time

Value 

When this information was last updated.

Table 6-7 Node Attributes That Can Be Set by the System Administrator

Attribute 

Kind 

Description 

enabled

Boolean 

Set if the node is enabled, that is, if it is ready to accept jobs. 

master

Boolean 

Specify node on which the master daemons are running as an argument to mprun.

max_locked_mem

Value 

Maximum amount of shared memory allowed to be locked down by Sun MPI processes (in Kbytes). 

max_total_procs

Value 

Maximum number of Sun HPC processes per node. 

min_unlocked_mem

Value 

Minimum amount of shared memory not to be locked down by Sun MPI processes (in Kbytes). 

name

Value 

Name of the node; this is predefined and must not be set via mpadmin.

partition

Value 

Partition of which node is a member. 

shmem_minfree

Value 

Fraction of swap space kept free for non-MPI use. 

enabled

The attribute enabled is set by default when the CRE daemons are start up on a node. Unsetting it prevents new jobs from being spawned on the node.

A partition can list a node that is not enabled as a member. However, jobs will execute on that partition as if that node were not a member.

master


Note -

You must not change this node attribute. The CRE automatically sets it to the hostname of the node on which the master CRE daemons are running. This happens whenever the CRE daemons start.


max_locked_mem and min_unlocked_mem


Note -

You should not change these node attributes. They are described here so that you will be able to interpret their values when node attributes are displayed via the dump or show commands.


The max_locked_mem and min_unlocked_mem attributes limit the amount of shared memory available to be locked down for use by Sun MPI processes. Locking down shared memory guarantees maximum speed for Sun MPI processes by eliminating delays caused by swapping memory to disk. However, locking physical memory can have undesirable side effects because it prevents that memory from being used by other processes on the node.

Solaris provides two related tunable kernel parameters:

The CRE parameters impose limits only on MPI programs, while the kernel parameters limit all processes. Also, the kernel parameter units are pages rather than Kbytes. Refer to your Solaris documentation for more information about tune_t_minasmem and pages_pp_maximum.

max_total_procs

You limit the number of mprun processes allowed to run concurrently on a node by setting this attribute to an integer.

[node0]
P(part0):: set max_total_procs=10[node0] P(part0):: 

By default, max_total_procs is unset. The CRE does not impose any limit on the number of processes allowed on a node.

name

A node's name is predefined by the hpc.conf file. You must not change it by setting this attribute.

partition


Note -

There is no need to set this attribute. The CRE sets it automatically if the node is included in any partition configuration(s). See "Creating Partitions" for additional details.


A node can belong to multiple partitions, but only one of those partitions can be enabled at a time. No matter how many partitions a node belongs to, the partition attribute shows only one partition name--that name is always the name of the enabled partition, if one exists for that node.

shmem_minfree


Note -

You should not change this node attribute. It is described here so that you will be able to interpret its value when node attributes are displayed via the dump or show commands.


The shmem_minfree attribute reserves some portion of the /tmp file system for non-MPI use.

For example, if /tmp is 1 Gbyte and shmem_minfree is set to 0.2, any time free space on /tmp drops below 200 Mbytes (1 Gbyte * 0.2), programs using the MPI shared memory protocol will not be allowed to run.

[node0]
N(node1):: set shmem_minfree=0.2

shmem_minfree must be set to a value between 0.0 and 1.0. When shmem_minfree is unset, it defaults to 0.1.

This attribute can be set on both nodes and partitions. If both are set to different values, the node attribute overrides the partition attribute.

Deleting Nodes

If you permanently remove a node from the Sun HPC cluster, you should then delete the corresponding node object from the CRE resource database.

Recommendations

Before deleting a node, you should first

Using the delete Command

To delete a node, use the delete command within the context of the node you want to delete.

[node0]
N(node3):: delete[node0] Node::