Sun HPC ClusterTools 3.0 Administrator's Guide: With CRE

Chapter 6 mpadmin: Detailed Description

This chapter describes the CRE cluster administration interface, mpadmin. Topics covered include:

mpadmin Syntax

The mpadmin command has six optional arguments, as follows:

mpadmin [-c command] [-f filename] [-h] [-q] [-s cluster_name] [-V]

When you invoke mpadmin with the -c, -f, -h, or -V option, mpadmin performs the requested operation and then returns to the shell level. For command arguments, you can specify most of the subcommands that are available within the mpadmin interactive environment. See "Command-Line Options" for descriptions of the mpadmin command-line options.

When you invoke mpadmin with the -q or -s option or no option, mpadmin goes into the interactive mode, displaying the mpadmin prompt. In this mode, you can execute any number of mpadmin subcommands until you quit the interactive session. See "mpadmin Command Overview" for a description of the interactive mpadmin mode.


Note -

For the rest of this discussion, mpadmin subcommands will be referred to as mpadmin commands or simply as commands.


Command-Line Options

Table 6-1 provides summary definitions of the mpadmin command-line options. This section describes their use.

Table 6-1 mpadmin Options

Option 

Description 

-c command

Execute single specified command. See  

-f file-name

Take input from specified file. 

-h

Display help/usage text. 

-q

Suppress the display of a warning message when a non-root user attempts to use restricted command mode. 

-s cluster-name

Connect to the specified Sun HPC cluster. 

-V

Display mpadmin version information.

-c command - Single Command Option

Use the -c option when you want to execute a single mpadmin command and return automatically to the shell prompt. For example, the following use of mpadmin -c changes the location of the CRE log file to /home/wmitty/cre_messages:

# mpadmin -c set logfile="/home/wmitty/cre_messages"

Note -

Most commands that are available via the interactive interface can be invoked via the -c option. See "mpadmin Command Overview" for an overview of the mpadmin command set and a list of which commands can be used as arguments to the -c option.


-f file-name - Take Input From a File

Use the -f option to supply input to mpadmin from the file specified by the file-name argument.

-h - Display Help

The -h option displays help information about mpadmin.

-q - Suppress Warning Message

Use the -q option to suppress a warning message when a non-root user attempts to invoke a restricted command.

-s cluster-name - Connect to Specified Cluster

Use the -s option to connect to the cluster specified by the cluster-name argument.

-V - Version Display Option

Use the -V option to display the version of mpadmin.

mpadmin Objects, Attributes, and Contexts

Before examining the set of mpadmin commands further, it will be useful to understand three concepts that are central to the mpadmin interface: objects, attributes, and contexts.

mpadmin Objects and Attributes

From the perspective of mpadmin, a Sun HPC cluster consists of a system of objects, which include

Each type of object has a set of attributes whose values can be operated on via mpadmin commands. These attributes control various aspects of their respective objects, such as: whether a node is enabled or disabled (that is, whether it can be used or not), the names of partitions, and which nodes a partition contains.


Note -

The CRE sets most cluster object attributes to default values each time it boots up. With few exceptions, do not change these system-defined values.


mpadmin Contexts

mpadmin commands are organized into four contexts, which correspond to the four types of mpadmin objects. These contexts are illustrated in Figure 6-1 and summarized below.

Figure 6-1 The mpadmin Contexts

Graphic

Except for Cluster, each context is nested in a higher context: Node within Cluster, Partition within Cluster, and Network within Node.

The mpadmin prompt uses one or more fields to indicate the current context. Table 6-2 shows the prompt format for each of the possible mpadmin contexts.

Table 6-2 mpadmin Prompt Formats

Prompt Formats 

Context 

[cluster-name]::

Current context = Cluster. 

[cluster-name]Node::

Current context = Node, but not a specific node. 

[cluster-name]N(node-name)::

Current context = a specific node. 

[cluster-name]Partition::

Current context = Partition, but not a specific partition. 

[cluster-name]P(partition-name)::

Current context = a specific partition. 

[cluster-name]N(node-name) Network::

Current context = Network Interface, but not a specific network interface. 

[cluster-name]N(node-name) I(net-if-name)::

Current context = a specific network interface. 


Note -

When the prompt indicates a specific network interface, it uses I as the abbreviation for Network Interface to avoid being confused with the Node abbreviation N.


mpadmin Command Overview

Types of mpadmin Commands

mpadmin provides commands for performing the following operations:

Configuration Control

A Sun HPC cluster contains one or more named partitions. Each partition contains some number of specific nodes. Likewise, each node includes one or more network interfaces that it uses for internode communication.

The CRE automatically creates the cluster, node, and network interface objects based on the contents of the hpc.conf file. Partitions are the only kind of object that the system administrator is required to create and manage.

Use the delete command to remove partitions, but no other types of cluster objects. Your remove nodes and network interfaces from a Sun HPC cluster by editing the hpc.conf file.

create

Usage:

:: create object-name

Available In:

Node, Partition, Network

The create command creates a new object with the name object-name and makes the new object the current context.

Note, partitions can only be created from within the Partition context. The following example creates the partition part0.

[node0]
Partition:: create part0[node0] P(part0)::

As the second line in the example shows, part0 becomes the new context.

delete

Usage:

:: delete [object-name]

Available In:

Node, Partition, Network

The delete command deletes the object specified by the object-name argument. The object being deleted must either be contained in the current context or must be the current context. The first example shows a partition contained in the current context being deleted.

[node0]
Partition:: delete part0[node0] Partition:: 

If the current context is the object to be deleted, the object-name argument is optional. In this case, the context reverts to the next higher context level.

[node0]
P(part0):: delete[node0] Partition:: 

Attribute Control

Each mpadmin object has a set of attributes that can be modified. Use the set command to specify a value for a given attribute. Use unset to delete an attribute.


Note -

Although you can use the set and unset commands to change any cluster attribute, the CRE requires most attributes to have their default values. Be certain to limit your attribute changes to those described in this chapter.


set

Usage:

:: set attribute[=value]

Available In:

Cluster, Node, Partition, Network

The set command sets the specified attribute of the current object.

You must be within the context of the target object to set its attributes. For example, to change an attribute of a specific partition, you must be in that partition's context.

To set a literal or numeric attribute, specify the desired value. The following example sets the node attribute for partition part0. Setting a partition's node attribute identifies the set of nodes that are members of that partition.

[node0]
P(part0):: set node=node1 node2[node0] P(part0):: 

To change the value of an attribute that has already been set, simply set it again. The following example adds node3 to partition part0.

[node0]
P(part0):: set node=+node3[node0] P(part0):: 

As shown by this example, if the value of an attribute is a list, items can be added to or removed from the list using the + and - symbols, without repeating items that are already part of the list.

To set a Boolean attribute, specify the name of the Boolean attribute to be activated. Do not include =value in the expression. The following example enables partition part0.

[node0]
P(part0):: set enabled[node0] P(part0):: 

Note -

If you mistakenly set a Boolean attribute to a value--that is, if you follow a Boolean attribute's name with the =value field, mpadmin will ignore the value assignment and will simply consider the attribute to be active.


unset

Usage:

:: unset attribute

Available In:

Cluster, Node, Partition, Network

The unset command deletes the specified attribute from the current object. You must be within the context of an object to unset any of its attributes.

Example:

[node0]
P(part0):: unset enabled[node0] P(part0):: 

disables the partition part0 (that is, makes it unavailable for use).


Note -

Remember, you cannot use the set command to set Boolean attributes to the logical 0 ( inactive) state. You must use the unset command.


Context Navigation

By default, mpadmin commands affect objects that are in the current context--that is, objects that are in the same context in which the command is invoked. For example, if the command list is invoked in the Node context, mpadmin will list all the nodes in the cluster. If list is invoked in the Partition context, it will list all the partitions in the cluster, as shown below:

[node0]
Partition:: list           part0
           part1
           part2
[node0] Partition::

mpadmin provides several context navigation commands that enable you to operate on objects and attributes outside the current context.

current

Usage:

:: current object-name 

Available In:

Cluster, Node, Partition, Network

The current command changes the current context to the context of the object specified by object-name. The target object must exist. That is, if it is a partition, you must already have used the create command to create it. If the target object is a cluster, node, or network interface, it must have been created by the CRE.

The following example changes the current context from the general Node context to the context of a specific node, node1.

[node0]
Node:: current node1 
[node0] N(node1):: 

If the name of the target object does not conflict with an mpadmin command, you can omit the current command. This is illustrated by the following example, where node1 is the name of the target object.

[node0]
Node:: node1[node0] N(hpc-node1)::

This works even when the object is in a different context.

[node0]
Partition:: node1[node0] N(node1)::

Note -

The current command must be used when the name of the object is the same as an mpadmin command. For example, if you have a partition named Partition, its name conflicts with the command Partition. In this case, to make the object Partition the current context, you would need to include the current command to make it clear that the Partition term refers to the object and is not an invocation of the command.


Top

Usage:

:: top

Available In:

Node, Partition, Network

The top command moves you to the Cluster context. The following example moves from the Partition context to the Cluster context.

[node0]
Partition:: top[node0]:: 

up

Usage:

:: up

Available In:

Node, Partition, Network

The up command moves you up one level from the current context. The following example moves from the Network context to the context of node node2.

[node0]
N[node2] Network:: up[node0] N[node2]:: 

node

Usage:

:: node

Available In:

Cluster

The node command moves you from the Cluster context to the Node context.

[node0]:: node[node0] Node:: 

partition

Usage:

:: partition

Available In:

Cluster, Node, Network

The partition command moves you from the Cluster, Node, or Network context to the Partition context.

[node0]:: partition[node0] Partition:: 

network

Usage:

:: network

Available In:

Node

The network command moves you from a specific Node context to the Network context associated with that node.

[node0]
N[node2]:: network[node0] N[node2] Network:: 

Information Retrieval

This set of commands displays information about

dump

Usage:

:: dump [object-name]

Available In:

Cluster, Node, Partition

The dump command displays the current state of the attributes of the specified object or of the current context. The object can be

The dump command outputs objects in a specific order that corresponds to the logical order of assignment when a cluster is configured. For example, nodes are output before partitions because, when a cluster is configured, nodes must exist before they can be assigned to a partition.

The dump command executes in this hierarchical manner so it can be used to back up cluster configurations in a format that allows them to be easily restored at a later time.

The following example shows the dump command being used in this way. In this example, it is invoked using the -c option on the mpadmin command line, with the output being directed to a backup file.

# mpadmin -c dump > sunhpc.configuration

Later, when it was time to restore the configuration, mpadmin could read the backup file as input, using the -f option.

# mpadmin -f sunhpc.configuration

If you wanted to modify the configuration, you could edit the backup file before before restoring it.

The following example shows the dump command being used to output the attribute states of the partition part0.

[node0]
Partition:: dump part0        set nodes = node1 node2 node3
        set max_total_procs = 4
        set
name = part0
        set enabled
        unset
no_login
[node0] Partition:: 

Note -

Each attribute is output in the form of a set or unset command so that the dump output functions as a script.


If you are within the context of the object whose attributes you want to see, you don't have to specify its name.

[node0]
P(part0):: dump        set nodes = node1 node2 node3
        set max_total_procs = 4
        set
enabled
        set name = part0
[node0] P(part0):: 

list

Usage:

:: list

Available In:

Cluster, Node, Partition, Network

The list command lists all of the defined objects in the current context. The following example shows that there are three partitions defined in the Partition context.

[node0]
Partition:: list        part0
        part1
        part2
[node0] Partition:: 

show

Usage:

:: show [object-name]

Available In:

Cluster, Node, Partition, Network

The show command displays the current state of the attributes of the specified object object-name. The following example displays the attributes for the partition part0.

[node0]
Partition:: show part0        set nodes = node0 node1 node2 node3
        set max_total_procs = 4
        set
name = part0
        set enabled
        unset
no_login
[node0] Partition:: 

If the object whose attributes you want to see is in the current context, you don't have to specify its name. For example:

[node0]
P(part0):: show        set nodes = node0 node1 node2 node3
        set max_total_procs = 4
        set
enabled
        set
name = part0
[node0] P(part0):: 

Miscellaneous Commands

connect

Usage:

:: connect cluster-name

Available In:

Cluster

In order to access any objects or attributes in a Sun HPC cluster, you must be connected to the cluster.

However, connecting to a cluster ordinarily happens automatically, so you are not likely to ever need to use the connect command.

The environment variable SUNHPC_CLUSTER names a default cluster. If no other action is taken to override this default, any mpadmin session will connect to the cluster named by this environment variable.

If you issue the mpadmin command on a node that is part of a cluster, you are automatically connected to that cluster, regardless of the SUNHPC_CLUSTER setting.

If you are not logged in to the cluster you want to use and you do not want to use the default cluster, you can use the mpadmin -s option, specifying the name of the cluster of interest as an argument to the option. See "-s cluster-name - Connect to Specified Cluster" for a description of the -s option.


Note -

When the CRE creates a cluster, it always names it after the hostname of the cluster's master node--that is, the node on which the master daemons are running. Therefore, whenever you need to specify the name of a cluster, use the hostname of the cluster's master node.


If, for some reason, you want to use the connect command, see the following example. It shows the command being used to connect to a cluster whose master node is node0.

[hpc-demo]:: connect node0[node0]::

echo

Usage:

:: echo text-message

Available In:

Cluster, Node, Partition, Network

The echo command prints the specified text on the standard output. If you write a script to be run with mpadmin -f, you can include the echo command in the script so that it will print status information as it executes.

[node0]:: echo Enabling part0 and part1Enabling part0 and part1
[node0]::

help

Usage:

:: help [command]

Available In:

Cluster, Node, Partition, Network

When invoked without a command argument, the help command lists the mpadmin commands that are available within the current context. The following example shows help being invoked at the Cluster level

[node0]:: helpconnect <cluster-name>			connect to
a Sun HPC cluster
set <attribute>[=value]			set an attribute in
the current context
unset <attribute>			delete an attribute in the current
context
show			show attributes in current context 
dump			show all objects on the cluster
node 			go to the node context
partition			go to the partition context
echo ...			print the rest of the line on standard output
quit			quit mpadmin
help [command]			show information about command command? [command]			show information about command command[node0]:: 

To get a description of a particular command, enter the command name as an argument to help.

If you specify a context command (node, partition, or network), mpadmin lists the commands available within that context. Note that you can specify network as an argument to help only at the node level.

[node0]:: help nodecurrent <node>			set the current node for future
commands
create <node>			create a new node with the given name
delete [node]			delete a node
list			list all the defined nodes
show [node]			show a node's attributes
dump [node]			show attributes for a node and its network
      
			interfaces
set <attribute>[=value]  	set the current
node's attribute
unset <attribute>			delete the current node's attribute
network			enter the network interface command mode
up			go up to the Cluster level command prompt
top			go up to the Cluster level command prompt
echo ...			print the rest of the line on standard output
help [command]			show information about command command? [command]			show information about command command[node0]:: 

The "?" character is a synonym for help.

quit/exit

Usage:

:: quit
:: exit

Available In:

Cluster, Node, Partition, Network

Entering either quit or exit causes mpadmin to terminate and return you to the shell level.

Example:

[node0]:: quit#

Example:

[node0]
N(node2):: exit#

Additional mpadmin Functionality

This section describes other functionality provided by mpadmin.

Multiple Commands on a Line

Because mpadmin interprets its input, if you issue more than one command on a line, mpadmin will execute them sequentially in the order they are input.

The following example shows how to display a list of nodes when not in the Node context. The node command switches to the Node context and the list command generates a list for that context.

[node0]:: node list        node0
        node1
        node2
        node3
[node0] Node::

The following example sets the enabled attribute on partition part1. The part1 entry acts as a command that switches the context from part0 to part1 and the set command turns on the enabled attribute.

[node0]
P[part0]:: part0 set enabled[node0] P(part0)::

Command Abbreviation

You can abbreviate commands to the shortest string of at least two letters so long as it is still unique within the current context.

[node0]
Node:: pa[node0] Partition:: li      part0
      part1
      part2
      part3
[node0] Partition:: part2[node0] P(part2):: sh      set enabled
      set max_total_procs = 4
      set name = part2
      set nodes = node0 node1
[node0] P(part2)::

Note -

The names of objects cannot be abbreviated.


Using mpadmin

This section explains how to use mpadmin to perform the principal administrative tasks involved in setting up and maintaining a Sun HPC cluster. It consists of the following sections:

Introductory Notes

This section contains information about various mpadmin topics that you will find useful when reading about cluster administration tasks in later sections.

Naming Partitions and Custom Attributes

You can assign names to partitions and to custom attributes. Custom attributes are attributes that are not part of the default CRE database; they are discussed in "Setting Custom Attributes".

Names must start with a letter and are case sensitive. The following characters can be used:

ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 0123456789-_.

The only limit to name length is the limit imposed by Solaris on host names--it is ordinarily set at 256 characters.


Note -

Do not begin an attribute name with the characters mp_. This starting sequence is reserved by the CRE.


Separate Name Spaces

Nodes and partitions have separate name spaces. Thus, you can have a partition named Parallel that contains a node named Parallel.

Log In to the Cluster

It is assumed that you are logged in to a node that is part of the cluster you want to set up. If that is not the case, you must be connected to the target cluster through one of the following alternative methods:

If the node you are logged in to is not part of any cluster, set the SUNHPC_CLUSTER environment variable to the name of the target cluster. For example,

# setenv SUNHPC_CLUSTER node0

makes node0 the default cluster. Remember, a cluster's name is the same as the host name of its master node.

Once you are connected to the cluster, you can start using mpadmin to perform the administrative tasks describedbelow.

When you start up an mpadmin interactive session, you begin at the Cluster level. Table 6-3 lists the mpadmin commands that can be used in the Cluster context.

Table 6-3 Cluster-Level mpadmin Commands

Command 

Synopsis 

connect cluster-name

Connect to a Sun HPC cluster named cluster-name. You will not need to use this command.

show

Show cluster attributes. 

dump

Show all objects in the Sun HPC cluster. 

set attribute[=value]

Set a cluster-level attribute. 

unset attribute

Delete a cluster-level attribute. 

node

Enter the node context. 

partition

Enter the partition context. 

echo ...

Print the rest of the line on the standard output. 

quit / exit

Quit mpadmin.

help [command] / ?

Show information about commands. 

Customizing Cluster-Level Attributes

This section describes various Cluster-level attributes that you may want to modify. Table 6-4 lists the attributes that can be changed in the Cluster context.

Table 6-4 Cluster-Level Attributes

Attribute 

Kind 

Description 

default_interactive_partition

Value 

Specifies the default partition. 

logfile

Value 

Specifies an optional output file for logging CRE daemon error messages. 

administrator

Value 

Specifies an email address for the system administrator(s). 

lock_max_age

Value 

Specifies the maximum amount of time a lock can remain set (the value is in seconds) 

default_interactive_partition

This attribute specifies the default partition for running MPI jobs. Its value is used by the command mprun, which is described in the Sun HPC Cluster Runtime Environment 1.0 User's Guide.

For example, to make a partition named part0 the default partition, enter the following in the Cluster context:

[node0]:: set default_interactive_partition=part0

When a user executes a program via mprun, the CRE decides where to run the program, based on the following criteria:

  1. Check for the command-line -p option. If a partition is specified, execute the program in that partition. If the specified partition is invalid, the command will fail.

  2. Check to see if the MPRUN_FLAGS environment variable specifies a default partition. If so, execute the program in that partition. If the specified partition is invalid, the command will fail.

  3. Check to see if the SUNHPC_PART environment variable has a value set. If it specifies a default partition, execute the program in that partition. If the specified partition is invalid, then check to see if the user is logged into any partition. If so, execute the program in that partition.

  4. Check to see if the user is logged into a partition. Execute the program in that partition.

  5. If none of these checks yield a partition name, check for the existence of the default_interactive_partition attribute. If it specifies a partition, execute the program in that partition.

The SUNHPC_PART environment variable is described in "CRE Environment Variables". The MPRUN_FLAGS environment variable is described in the Sun MPI 4.0 User's Guide: With CRE.

logfile

The logfile attribute allows you to log CRE messages in a file separate from all other system messages. For example, if you enter

[node0]:: set logfile=/home/wmitty/cre-messages

CRE will output its messages to the file /home/wmitty/cre-messages. If logfile is not set, CRE messages will be passed to syslog, which will store them with other system messages in /var/adm/messages.


Note -

A full path name must be specified when setting the logfile attribute.


administrator

Set the administrator attribute to specify the email address of the system administrator. For example:

[node0]:: set administrator="root@example.com"

Note the use of double quotes.

lock_max_age

The CRE uses locks for internal purposes. The lock_max_age attribute specifies the length of time that the CRE will wait before removing a lock. For example, to set the maximum lock interval to two minutes, enter the following:

[node0]:: set lock_max_age="2 minutes"

The default is 10 minutes.

Nodes and Network Interfaces

Ordinarily, the only administrative action that you need to take with nodes is to enable them for use. Or, if you want to temporarily make a node unavailable for use, disable it.

Other node-related administrative tasks--such as, naming the nodes, identifying the master node, setting memory and process limits, and setting the node's partition attribute--are either handled by the CRE automatically or are controlled via partition-level attributes.

There are no administrative actions required by network interface attributes. They are all controlled by the CRE. The only actions you might want to take with respect to network interfaces is to list them or display their attribute values.

Node Commands

Table 6-5 lists the mpadmin commands that can be used at the Node level.

Table 6-5 Node-Level mpadmin Commands

Command 

Synopsis 

current node

Set the context to the specified node for future commands. 

create node

Create a new node with the given name. 

delete [node]

Delete a node. 

list

List all the defined nodes. 

show [node]

Show a node's attributes. 

dump [node]

Show the attributes of the node and its network interfaces. 

set attribute[=value]

Set the specified attribute of the current node. 

unset attribute

Delete the specified attribute of the current node. 

network

Move to the network interface command level. 

up

Move to the next higher level (Top) command context. 

top

Move to the Top level command context. 

echo ...

Print the rest of the line on the standard output. 

help [command]

Show information about commands (?).

Node Attributes

Nodes are defined by many attributes, most of which are not accessible to mpadmin commands. Although you are not able to affect these attributes, it can be helpful to know of their existence and meaning; hence, they are listed and briefly described in Table 6-6.

Table 6-7 lists the Node-level attributes that can be set via mpadmin commands. However, the enabled and max_total_procs are the only node attributes that you can safely modify. See "enabled " and "max_total_procs" for details.

Table 6-6 Node Attributes That Cannot Be Set by the System Administrator

Attribute 

Kind 

Description 

cpu_idle

Value 

Percent of time CPU is idle. 

cpu_iowait

Value 

Percent of time CPU spent in I/O wait state. 

cpu_kernel

Value 

Percent of time CPU spent in kernel state. 

cpu_swap

Value 

Percent of time CPU spent waiting for swap. 

cpu_type

Value 

Type of CPU, for example, sparc.

cpu_user

Value 

Percent of time CPU spends running user's program 

load1

Value 

Load average for the past minute. 

load5

Value 

Load average for the past five minutes. 

load15

Value 

Load average for the past 15 minutes. 

manufacturer

Value 

Manufacturer of the node, e.g., Sun_Microsystems.

mem_free

Value 

Node's available RAM (in Mbytes). 

mem_total

Value 

Node's total physical memory (in Mbytes). 

ncpus

Value 

Number of CPUs in the node. 

offline

Boolean 

Set automatically by the system if the tm.spmd daemon on the node stops running or is unresponsive; if set, prevents jobs from being spawned on the node.

os_arch_kernel

Value 

Node's kernel architecture (same as output from arch -k, for example, sun4u).

os_name

Value 

Name of the operating system running on the node, for example, SunOS.

os_release

Value 

Operating system's release number, for example, 5.5.1

os_release_maj

Value 

Operating system's major release number, for example, 5.

os_release_min

Value 

Operating system's minor release number, for example, 5 or 6. 

os_version

Value 

Operating system's version, for example, GENERIC.

serial_number

Value 

Hardware serial number or host id.

swap_free

Value 

Node's available swap space (in Mbytes). 

swap_total

Value 

Node's total swap space (in Mbytes).

update_time

Value 

When this information was last updated.

update_time

Value 

When this information was last updated.

Table 6-7 Node Attributes That Can Be Set by the System Administrator

Attribute 

Kind 

Description 

enabled

Boolean 

Set if the node is enabled, that is, if it is ready to accept jobs. 

master

Boolean 

Specify node on which the master daemons are running as an argument to mprun.

max_locked_mem

Value 

Maximum amount of shared memory allowed to be locked down by Sun MPI processes (in Kbytes). 

max_total_procs

Value 

Maximum number of Sun HPC processes per node. 

min_unlocked_mem

Value 

Minimum amount of shared memory not to be locked down by Sun MPI processes (in Kbytes). 

name

Value 

Name of the node; this is predefined and must not be set via mpadmin.

partition

Value 

Partition of which node is a member. 

shmem_minfree

Value 

Fraction of swap space kept free for non-MPI use. 

enabled

The attribute enabled is set by default when the CRE daemons are start up on a node. Unsetting it prevents new jobs from being spawned on the node.

A partition can list a node that is not enabled as a member. However, jobs will execute on that partition as if that node were not a member.

master


Note -

You must not change this node attribute. The CRE automatically sets it to the hostname of the node on which the master CRE daemons are running. This happens whenever the CRE daemons start.


max_locked_mem and min_unlocked_mem


Note -

You should not change these node attributes. They are described here so that you will be able to interpret their values when node attributes are displayed via the dump or show commands.


The max_locked_mem and min_unlocked_mem attributes limit the amount of shared memory available to be locked down for use by Sun MPI processes. Locking down shared memory guarantees maximum speed for Sun MPI processes by eliminating delays caused by swapping memory to disk. However, locking physical memory can have undesirable side effects because it prevents that memory from being used by other processes on the node.

Solaris provides two related tunable kernel parameters:

The CRE parameters impose limits only on MPI programs, while the kernel parameters limit all processes. Also, the kernel parameter units are pages rather than Kbytes. Refer to your Solaris documentation for more information about tune_t_minasmem and pages_pp_maximum.

max_total_procs

You limit the number of mprun processes allowed to run concurrently on a node by setting this attribute to an integer.

[node0]
P(part0):: set max_total_procs=10[node0] P(part0):: 

By default, max_total_procs is unset. The CRE does not impose any limit on the number of processes allowed on a node.

name

A node's name is predefined by the hpc.conf file. You must not change it by setting this attribute.

partition


Note -

There is no need to set this attribute. The CRE sets it automatically if the node is included in any partition configuration(s). See "Creating Partitions" for additional details.


A node can belong to multiple partitions, but only one of those partitions can be enabled at a time. No matter how many partitions a node belongs to, the partition attribute shows only one partition name--that name is always the name of the enabled partition, if one exists for that node.

shmem_minfree


Note -

You should not change this node attribute. It is described here so that you will be able to interpret its value when node attributes are displayed via the dump or show commands.


The shmem_minfree attribute reserves some portion of the /tmp file system for non-MPI use.

For example, if /tmp is 1 Gbyte and shmem_minfree is set to 0.2, any time free space on /tmp drops below 200 Mbytes (1 Gbyte * 0.2), programs using the MPI shared memory protocol will not be allowed to run.

[node0]
N(node1):: set shmem_minfree=0.2

shmem_minfree must be set to a value between 0.0 and 1.0. When shmem_minfree is unset, it defaults to 0.1.

This attribute can be set on both nodes and partitions. If both are set to different values, the node attribute overrides the partition attribute.

Deleting Nodes

If you permanently remove a node from the Sun HPC cluster, you should then delete the corresponding node object from the CRE resource database.

Recommendations

Before deleting a node, you should first

Using the delete Command

To delete a node, use the delete command within the context of the node you want to delete.

[node0]
N(node3):: delete[node0] Node:: 

Partitions

Partitions are logical collections of nodes that work cooperatively to run programs on the Sun HPC cluster. An MPI job can run on a single partition or on the combination of a single partition and one or more nodes that are not members of any partition. MPI jobs cannot run in multiple partitions.

You must create a partition and enable it before you can run MPI programs on your Sun HPC cluster. Once a partition is created, you can configure it to meet the specific needs of your site and enable it for use.

Once a partition is created and enabled, you can run serial or parallel jobs on it. Serial programs run on a single node of a partition. Parallel programs run on any number of nodes of a partition in parallel.

The CRE performs load balancing on shared partitions. When you use mprun to execute a program on a shared partition, the CRE automatically runs it on the least-loaded nodes that satisfy any specified resource requirements.

Partitions are mutable. That is, after you create and configure a partition, you can change it if your site requirements change. You can add nodes to a partition or remove them. You can change a partition's attributes. Also, since you can enable and disable partitions, you can have many partitions defined and use only a few at a time according to current needs.

There are no restrictions on the number or size of partitions, so long as no node is a member of more than one enabled partition.

Partition Commands

Table 6-8 lists the mpadmin commands that can be used within the partition context.

Table 6-8 Partition-Level mpadmin Commands

Command 

Synopsis 

current partition

Set the context to the specified partition for future commands. 

create partition

Create a new partition with the given name. 

delete [partition]

Delete a partition. 

list

List all the defined partitions. 

show [partition]

Show a partition's attributes. 

dump [partition]

Show the attributes of a partition. 

set attribute[=value]

Set the current partition's attribute. 

unset attribute

Delete the current partition's attribute. 

up

Move up one level in the context hierarchy. 

top

Move to the top level in the context hierarchy. 

echo ...

Print the rest of the line on the standard output. 

help [command]

Show information about the command command.

? [command]

Show information about the command command.

Creating Partitions

Prerequisites

Viewing Existing Partitions

Before creating a new partition, you might want to list the partitions that have already been created. To do this, use the list command from within the Partition context.

[node0]
Partition:: list        part0
        part1
[node0] Partition:: 

Creating a Partition

To create a partition, use the create command, followed by the name of the new partition. "Naming Partitions and Custom Attributes" discusses the rules for naming partitions.

For example:

[node0]
Partition:: create part0[node0] P(part0):: 

The create command automatically changes the context to that of the new partition.

At this point, your partition exists by name but contains no nodes. You must assign nodes to the partition before using it. You can do this by setting the partition's nodes attribute. See "Configuring Partitions" for details.

Configuring Partitions

You can configure partitions by setting and deleting their attributes using the set and unset commands. Table 6-9 shows commonly used partition attributes.

Table 6-9 Common Partitions and Their Attributes

Partition Type 

Relevant Attributes 

Recommended Value 

Login 

no_logins 

not set 

Login 

max_total_procs 

not set or set greater than 

Dedicated 

max_total_procs no_logins 

=1 set 

Serial 

no_mp_tasks 

set 

Parallel 

no_mp_task 

not set 

You can combine the attributes listed in Table 6-10 in any way that makes sense for your site. See "Configuring Partitions" for suggestions about how to configure your partitions.

Partitions, once created, can be enabled and disabled. This lets you define many partitions but use just a few at a time. For instance, you might want to define a number of shared partitions for development use and dedicated partitions for executing production jobs, but have only a subset available for use at a given time.

Partition Attributes

Table 6-10 lists the predefined partition attributes. To see their current values, use the mpadmin show command.

Table 6-10 Predefined Partition Attributes

Attribute 

Kind 

Description 

enabled

Boolean 

Set if the partition is enabled, that is, if it is ready to accept logins or jobs. 

max_locked_mem

Value 

Maximum amount of shared memory allowed to be locked down by MPI processes (in Kbytes). 

max_total_procs

Value 

Maximum number of simultaneously running processes allowed on each node in the partition. 

min_unlocked_mem

Value 

Minimum amount of shared memory that may not be locked down by MPI processes (in Kbytes). 

name

Value 

Name of the partition. 

no_logins

Boolean 

Disallow logins. 

no_mp_tasks

Boolean 

Disallow multiprocess parallel jobs. 

nodes

Value 

List of nodes in the partition. 

shmem_minfree

Value 

Fraction of swap space kept free for non-MPI use 

enabled

Set the enabled attribute to make a partition available for use.

By default, the enabled attribute is not set when a partition is created.

max_locked_mem and min_unlocked_mem


Note -

You should not change these partition attributes. See "max_locked_mem and min_unlocked_mem" for a description of their effects.


max_total_procs

To limit the number of simultaneously running mprun processes allowed on all nodes in a partition or in all partitions, set the max_total_procs attribute in a specific node context or in the general Partition context.

[node0]
P(part0):: set max_total_procs=10[node0] P(part0):: 

You can set max_total_procs if you want to limit the load on a partition. By default, max_total_procs is unset.

The CRE does not impose any limit on the number of processes allowed on a node.

name

The name attribute is set when a partition is created. To change the name of a partition, set its name attribute to a new name.

[node0]
P(part0):: set name=part1[node0] P(part1):: 

See "Naming Partitions and Custom Attributes" for partition naming rules.

no_logins

To prohibit users from logging in to a partition, set the no_logins attribute.

[node0]
P(part1):: set no_logins[node0] P(part1):: 

no_mp_tasks

To prohibit multiprocess parallel jobs from running on a partition--that is, to make a serial partition, set the no_mp_tasks attribute.

[node0]
P(part1):: set no_mp_tasks[node0] P(part1):: 

nodes

To specify the nodes that are members of a partition, set the partition's nodes attribute.

[node0]
P(part1):: set nodes=node1[node0] P(part1):: show        set nodes = node1
        set enabled
[node0] P(part1):: 

The value you give the nodes attribute defines the entire list of nodes in the partition. To add a node to an already existing node list without retyping the names of nodes that are already present, use the + (plus) character.

[node0]
P(part1):: set nodes=+node2 node3[node0] P(part1):: show        set nodes = node0 node1 node2 node3
        set enabled
[node0] P(part1):: 

Similarly, you can use the - (minus) character to remove a node from a partition.

To assign a range of nodes to the nodes attribute, use the : (colon) syntax. This example assigns to part0 all nodes whose names are alphabetically greater than or equal to node0 and less than or equal to node3:

[node0]
P(part1):: set nodes = node0:node3[node0] P(part1):: 

Setting the nodes attribute of an enabled partition has the side effect of setting the partition attribute of the corresponding nodes. Continuing the example, setting the nodes attribute of part1 affects the partition attribute of node2:

[node0]
P(part1):: node node2[node0] N(node2):: show        set partition = part1
[node0] N(node2):: 

A node cannot be a member of more than one enabled partition. If you try to add a node that is already in an enabled partition, mpadmin returns an error message.

[node0]
P(part1):: show        set nodes = node0 node1 node2 node3
        set enabled
[node0] P(part1):: current part0[node0] P(part0):: set enabled[node0] P(part0):: set nodes=node1mpadmin: node1 must be removed from part1 before it
can be added to part0

Unsetting the nodes attribute of an enabled partition has the side effect of unsetting the partition attribute of the corresponding node.

Unsetting the nodes attribute of a disabled partition removes the nodes from the partition but does not change their partition attributes.

shmem_minfree

Use the shmem_minfree attribute to reserve some portion of the /tmp file system for non-MPI use.

For example, if /tmp is 1 Gbyte and shmem_minfree is set to 0.2, any time free space on /tmp drops below 200 Mbytes (1 Gbyte * 0.2), programs using the MPI shared memory protocol will not be allowed to run.

[node0]
P(part1):: set shmem_minfree=0.2

shmem_minfree must be set to a value between 0.0 and 1.0. When shmem_minfree is unset, it defaults to 0.1.

This attribute can be set on both nodes and partitions. If both are set, the node's shmem_minfree attribute overrides the partition's shmem_minfree attribute.

Enabling Partitions

A partition must be enabled before users can run programs on it.

Prerequisite

Before enabling a partition, you must disable any partitions that share nodes with the partition that you are about to enable.

Setting enabled

To enable a partition, set its enabled attribute.

[node0]
P(part0):: set enabled[node0] P(part0):: 

Now the partition is ready for use.

Enabling a partition has the side effect of setting the partition attribute of every node in that partition.

If you try to enable a partition that shares a node with another enabled partition, mpadmin prints an error message.

[node0]
P(part1):: show        set nodes = node1 node2 node3
        set enabled
[node0] P(part1):: current part2[node0] P(part2):: show        set nodes = node1
[node0] P(part2):: set enabledmpadmin: part1/node1: partition resource conflict

Disabling Partitions

To disable a partition, unset its enabled attribute.

[node0]
P(part0):: unset enabled[node0] P(part0):: 

Now the partition can no longer be used.

Any jobs are running on a partition when it is disabled will continue to run. After disabling a partition, you should either wait for ay running jobs to terminate or stop them using the mpkill command. This is described in the Sun HPC Cluster Runtime Environment 1.0 User's Guide.

Deleting Partitions

Delete a partition when you don't plan to use it anymore.


Note -

Although it is possible to delete a partition without first disabling it, you should disable the partition by unsetting its enabled attribute before deleting it.


To delete a partition, use the delete command in the context of the partition you want to delete.

[node0]
P(part0):: delete[node0] Partition:: 

Setting Custom Attributes

Sun HPC Cluster Tools does not limit you to the attributes listed. You can define new attributes as desired.

For example, if a node has a special resource that will not be flagged by an existing attribute, you may want to set an attribute that identifies that special characteristic. In the following example, node node3 has a frame buffer attached. This feature is captured by setting the custom attribute has_frame_buffer for that node.

[node0]
N(node3):: set has_frame_buffer[node0] N(node3):: 

Users can then use the attribute has_frame_buffer to request a node that has a frame buffer when they execute programs.

See "Naming Partitions and Custom Attributes" for restrictions on attribute names.