This chapter describes the CRE cluster administration interface, mpadmin. Topics covered include:
mpadmin syntax, the subcommands it supports, and other aspects of mpadmin functionality
how to use mpadmin in performing various cluster administration tasks
The mpadmin command has six optional arguments, as follows:
# mpadmin [-c command] [-f filename] [-h] [-q] [-s cluster_name] [-V]
When you invoke mpadmin with the -c, -f, -h, or -V option, mpadmin performs the requested operation and then returns to the shell level. For command arguments, you can specify most of the subcommands that are available within the mpadmin interactive environment. See "Command-Line Options" for descriptions of the mpadmin command-line options.
When you invoke mpadmin with the -q or -s option or no option, mpadmin goes into the interactive mode, displaying the mpadmin prompt. In this mode, you can execute any number of mpadmin subcommands until you quit the interactive session. See "mpadmin Command Overview" for a description of the interactive mpadmin mode.
For the rest of this discussion, mpadmin subcommands will be referred to as mpadmin commands or simply as commands.
Table 6-1 provides summary definitions of the mpadmin command-line options. This section describes their use.
Table 6-1 mpadmin Options
Option |
Description |
---|---|
-c command |
Execute single specified command. See |
-f file-name |
Take input from specified file. |
-h |
Display help/usage text. |
-q |
Suppress the display of a warning message when a non-root user attempts to use restricted command mode. |
-s cluster-name |
Connect to the specified Sun HPC cluster. |
-V |
Display mpadmin version information. |
Use the -c option when you want to execute a single mpadmin command and return automatically to the shell prompt. For example, the following use of mpadmin -c changes the location of the CRE log file to /home/wmitty/cre_messages:
# mpadmin -c set logfile="/home/wmitty/cre_messages"
Most commands that are available via the interactive interface can be invoked via the -c option. See "mpadmin Command Overview" for an overview of the mpadmin command set and a list of which commands can be used as arguments to the -c option.
Use the -f option to supply input to mpadmin from the file specified by the file-name argument.
The -h option displays help information about mpadmin.
Use the -q option to suppress a warning message when a non-root user attempts to invoke a restricted command.
Use the -s option to connect to the cluster specified by the cluster-name argument.
Use the -V option to display the version of mpadmin.
Before examining the set of mpadmin commands further, it will be useful to understand three concepts that are central to the mpadmin interface: objects, attributes, and contexts.
From the perspective of mpadmin, a Sun HPC cluster consists of a system of objects, which include
The cluster itself.
Each node contained in the cluster.
Each partition (logical group of nodes) defined in the cluster.
The net work interfaces used by the nodes.
Each type of object has a set of attributes whose values can be operated on via mpadmin commands. These attributes control various aspects of their respective objects, such as: whether a node is enabled or disabled (that is, whether it can be used or not), the names of partitions, and which nodes a partition contains.
The CRE sets most cluster object attributes to default values each time it boots up. With few exceptions, do not change these system-defined values.
mpadmin Contexts
mpadmin commands are organized into four contexts, which correspond to the four types of mpadmin objects. These contexts are illustrated in Figure 6-1 and summarized below.
Cluster - These commands affect cluster attributes.
Node - These commands affect node attributes.
Network - These commands affect network interface attributes.
Partition - These commands affect partition attributes.
Except for Cluster, each context is nested in a higher context: Node within Cluster, Partition within Cluster, and Network within Node.
The mpadmin prompt uses one or more fields to indicate the current context. Table 6-2 shows the prompt format for each of the possible mpadmin contexts.
Table 6-2 mpadmin Prompt Formats
Prompt Formats |
Context |
---|---|
[cluster-name]:: |
Current context = Cluster. |
[cluster-name]Node:: |
Current context = Node, but not a specific node. |
[cluster-name]N(node-name):: |
Current context = a specific node. |
[cluster-name]Partition:: |
Current context = Partition, but not a specific partition. |
[cluster-name]P(partition-name):: |
Current context = a specific partition. |
[cluster-name]N(node-name) Network:: |
Current context = Network Interface, but not a specific network interface. |
[cluster-name]N(node-name) I(net-if-name):: |
Current context = a specific network interface. |
When the prompt indicates a specific network interface, it uses I as the abbreviation for Network Interface to avoid being confused with the Node abbreviation N.
mpadmin provides commands for performing the following operations:
Configuration control - These commands are used to create and delete mpadmin objects (nodes, partitions, network interfaces); see "Configuration Control".
Attribute control - These commands are used to set and reset attribute values; see "Attribute Control".
Context navigation - These commands are used to change the current context to a different context; see "Context Navigation".
Information retrieval - These commands are used to display object and attribute information; see "Information Retrieval".
Miscellaneous - See "Miscellaneous Commands".
A Sun HPC cluster contains one or more named partitions. Each partition contains some number of specific nodes. Likewise, each node includes one or more network interfaces that it uses for internode communication.
The CRE automatically creates the cluster, node, and network interface objects based on the contents of the hpc.conf file. Partitions are the only kind of object that the system administrator is required to create and manage.
Use the delete command to remove partitions, but no other types of cluster objects. Your remove nodes and network interfaces from a Sun HPC cluster by editing the hpc.conf file.
Usage:
:: create object-name
Available In:
Node, Partition, Network
The create command creates a new object with the name object-name and makes the new object the current context.
Note, partitions can only be created from within the Partition context. The following example creates the partition part0.
[node0] Partition:: create part0[node0] P(part0)::
As the second line in the example shows, part0 becomes the new context.
Usage:
:: delete [object-name]
Available In:
Node, Partition, Network
The delete command deletes the object specified by the object-name argument. The object being deleted must either be contained in the current context or must be the current context. The first example shows a partition contained in the current context being deleted.
[node0] Partition:: delete part0[node0] Partition::
If the current context is the object to be deleted, the object-name argument is optional. In this case, the context reverts to the next higher context level.
[node0] P(part0):: delete[node0] Partition::
Each mpadmin object has a set of attributes that can be modified. Use the set command to specify a value for a given attribute. Use unset to delete an attribute.
Although you can use the set and unset commands to change any cluster attribute, the CRE requires most attributes to have their default values. Be certain to limit your attribute changes to those described in this chapter.
Usage:
:: set attribute[=value]
Available In:
Cluster, Node, Partition, Network
The set command sets the specified attribute of the current object.
You must be within the context of the target object to set its attributes. For example, to change an attribute of a specific partition, you must be in that partition's context.
To set a literal or numeric attribute, specify the desired value. The following example sets the node attribute for partition part0. Setting a partition's node attribute identifies the set of nodes that are members of that partition.
[node0] P(part0):: set node=node1 node2[node0] P(part0)::
To change the value of an attribute that has already been set, simply set it again. The following example adds node3 to partition part0.
[node0] P(part0):: set node=+node3[node0] P(part0)::
As shown by this example, if the value of an attribute is a list, items can be added to or removed from the list using the + and - symbols, without repeating items that are already part of the list.
To set a Boolean attribute, specify the name of the Boolean attribute to be activated. Do not include =value in the expression. The following example enables partition part0.
[node0] P(part0):: set enabled[node0] P(part0)::
If you mistakenly set a Boolean attribute to a value--that is, if you follow a Boolean attribute's name with the =value field, mpadmin will ignore the value assignment and will simply consider the attribute to be active.
Usage:
:: unset attribute
Available In:
Cluster, Node, Partition, Network
The unset command deletes the specified attribute from the current object. You must be within the context of an object to unset any of its attributes.
Example:
[node0] P(part0):: unset enabled[node0] P(part0)::
disables the partition part0 (that is, makes it unavailable for use).
Remember, you cannot use the set command to set Boolean attributes to the logical 0 ( inactive) state. You must use the unset command.
By default, mpadmin commands affect objects that are in the current context--that is, objects that are in the same context in which the command is invoked. For example, if the command list is invoked in the Node context, mpadmin will list all the nodes in the cluster. If list is invoked in the Partition context, it will list all the partitions in the cluster, as shown below:
[node0] Partition:: list part0 part1 part2 [node0] Partition::
mpadmin provides several context navigation commands that enable you to operate on objects and attributes outside the current context.
Usage:
:: current object-name
Available In:
Cluster, Node, Partition, Network
The current command changes the current context to the context of the object specified by object-name. The target object must exist. That is, if it is a partition, you must already have used the create command to create it. If the target object is a cluster, node, or network interface, it must have been created by the CRE.
The following example changes the current context from the general Node context to the context of a specific node, node1.
[node0] Node:: current node1 [node0] N(node1)::
If the name of the target object does not conflict with an mpadmin command, you can omit the current command. This is illustrated by the following example, where node1 is the name of the target object.
[node0] Node:: node1[node0] N(hpc-node1)::
This works even when the object is in a different context.
[node0] Partition:: node1[node0] N(node1)::
The current command must be used when the name of the object is the same as an mpadmin command. For example, if you have a partition named Partition, its name conflicts with the command Partition. In this case, to make the object Partition the current context, you would need to include the current command to make it clear that the Partition term refers to the object and is not an invocation of the command.
Usage:
:: top
Available In:
Node, Partition, Network
The top command moves you to the Cluster context. The following example moves from the Partition context to the Cluster context.
[node0] Partition:: top[node0]::
Usage:
:: up
Available In:
Node, Partition, Network
The up command moves you up one level from the current context. The following example moves from the Network context to the context of node node2.
[node0] N[node2] Network:: up[node0] N[node2]::
Usage:
:: node
Available In:
Cluster
The node command moves you from the Cluster context to the Node context.
[node0]:: node[node0] Node::
Usage:
:: partition
Available In:
Cluster, Node, Network
The partition command moves you from the Cluster, Node, or Network context to the Partition context.
[node0]:: partition[node0] Partition::
Usage:
:: network
Available In:
Node
The network command moves you from a specific Node context to the Network context associated with that node.
[node0] N[node2]:: network[node0] N[node2] Network::
This set of commands displays information about
The specified object.
If no object is specified, the current context.
Usage:
:: dump [object-name]
Available In:
Cluster, Node, Partition
The dump command displays the current state of the attributes of the specified object or of the current context. The object can be
The entire cluster.
A specific partition.
All partitions in the cluster.
A specific node.
All nodes in the cluster.
The dump command outputs objects in a specific order that corresponds to the logical order of assignment when a cluster is configured. For example, nodes are output before partitions because, when a cluster is configured, nodes must exist before they can be assigned to a partition.
The dump command executes in this hierarchical manner so it can be used to back up cluster configurations in a format that allows them to be easily restored at a later time.
The following example shows the dump command being used in this way. In this example, it is invoked using the -c option on the mpadmin command line, with the output being directed to a backup file.
# mpadmin -c dump > sunhpc.configuration
Later, when it was time to restore the configuration, mpadmin could read the backup file as input, using the -f option.
# mpadmin -f sunhpc.configuration
If you wanted to modify the configuration, you could edit the backup file before before restoring it.
The following example shows the dump command being used to output the attribute states of the partition part0.
[node0] Partition:: dump part0 set nodes = node1 node2 node3 set max_total_procs = 4 set name = part0 set enabled unset no_login [node0] Partition::
Each attribute is output in the form of a set or unset command so that the dump output functions as a script.
If you are within the context of the object whose attributes you want to see, you don't have to specify its name.
[node0] P(part0):: dump set nodes = node1 node2 node3 set max_total_procs = 4 set enabled set name = part0 [node0] P(part0)::
Usage:
:: list
Available In:
Cluster, Node, Partition, Network
The list command lists all of the defined objects in the current context. The following example shows that there are three partitions defined in the Partition context.
[node0] Partition:: list part0 part1 part2 [node0] Partition::
Usage:
:: show [object-name]
Available In:
Cluster, Node, Partition, Network
The show command displays the current state of the attributes of the specified object object-name. The following example displays the attributes for the partition part0.
[node0] Partition:: show part0 set nodes = node0 node1 node2 node3 set max_total_procs = 4 set name = part0 set enabled unset no_login [node0] Partition::
If the object whose attributes you want to see is in the current context, you don't have to specify its name. For example:
[node0] P(part0):: show set nodes = node0 node1 node2 node3 set max_total_procs = 4 set enabled set name = part0 [node0] P(part0)::
Usage:
:: connect cluster-name
Available In:
Cluster
In order to access any objects or attributes in a Sun HPC cluster, you must be connected to the cluster.
However, connecting to a cluster ordinarily happens automatically, so you are not likely to ever need to use the connect command.
The environment variable SUNHPC_CLUSTER names a default cluster. If no other action is taken to override this default, any mpadmin session will connect to the cluster named by this environment variable.
If you issue the mpadmin command on a node that is part of a cluster, you are automatically connected to that cluster, regardless of the SUNHPC_CLUSTER setting.
If you are not logged in to the cluster you want to use and you do not want to use the default cluster, you can use the mpadmin -s option, specifying the name of the cluster of interest as an argument to the option. See "-s cluster-name - Connect to Specified Cluster" for a description of the -s option.
When the CRE creates a cluster, it always names it after the hostname of the cluster's master node--that is, the node on which the master daemons are running. Therefore, whenever you need to specify the name of a cluster, use the hostname of the cluster's master node.
If, for some reason, you want to use the connect command, see the following example. It shows the command being used to connect to a cluster whose master node is node0.
[hpc-demo]:: connect node0[node0]::
Usage:
:: echo text-message
Available In:
Cluster, Node, Partition, Network
The echo command prints the specified text on the standard output. If you write a script to be run with mpadmin -f, you can include the echo command in the script so that it will print status information as it executes.
[node0]:: echo Enabling part0 and part1Enabling part0 and part1 [node0]::
Usage:
:: help [command]
Available In:
Cluster, Node, Partition, Network
When invoked without a command argument, the help command lists the mpadmin commands that are available within the current context. The following example shows help being invoked at the Cluster level
[node0]:: helpconnect <cluster-name> connect to a Sun HPC cluster set <attribute>[=value] set an attribute in the current context unset <attribute> delete an attribute in the current context show show attributes in current context dump show all objects on the cluster node go to the node context partition go to the partition context echo ... print the rest of the line on standard output quit quit mpadmin help [command] show information about command command? [command] show information about command command[node0]::
To get a description of a particular command, enter the command name as an argument to help.
If you specify a context command (node, partition, or network), mpadmin lists the commands available within that context. Note that you can specify network as an argument to help only at the node level.
[node0]:: help nodecurrent <node> set the current node for future commands create <node> create a new node with the given name delete [node] delete a node list list all the defined nodes show [node] show a node's attributes dump [node] show attributes for a node and its network interfaces set <attribute>[=value] set the current node's attribute unset <attribute> delete the current node's attribute network enter the network interface command mode up go up to the Cluster level command prompt top go up to the Cluster level command prompt echo ... print the rest of the line on standard output help [command] show information about command command? [command] show information about command command[node0]::
The "?" character is a synonym for help.
Usage:
:: quit :: exit
Available In:
Cluster, Node, Partition, Network
Entering either quit or exit causes mpadmin to terminate and return you to the shell level.
Example:
[node0]:: quit#
Example:
[node0] N(node2):: exit#
This section describes other functionality provided by mpadmin.
Because mpadmin interprets its input, if you issue more than one command on a line, mpadmin will execute them sequentially in the order they are input.
The following example shows how to display a list of nodes when not in the Node context. The node command switches to the Node context and the list command generates a list for that context.
[node0]:: node list node0 node1 node2 node3 [node0] Node::
The following example sets the enabled attribute on partition part1. The part1 entry acts as a command that switches the context from part0 to part1 and the set command turns on the enabled attribute.
[node0] P[part0]:: part0 set enabled[node0] P(part0)::
You can abbreviate commands to the shortest string of at least two letters so long as it is still unique within the current context.
[node0] Node:: pa[node0] Partition:: li part0 part1 part2 part3 [node0] Partition:: part2[node0] P(part2):: sh set enabled set max_total_procs = 4 set name = part2 set nodes = node0 node1 [node0] P(part2)::
The names of objects cannot be abbreviated.
This section explains how to use mpadmin to perform the principal administrative tasks involved in setting up and maintaining a Sun HPC cluster. It consists of the following sections:
This section contains information about various mpadmin topics that you will find useful when reading about cluster administration tasks in later sections.
You can assign names to partitions and to custom attributes. Custom attributes are attributes that are not part of the default CRE database; they are discussed in "Setting Custom Attributes".
Names must start with a letter and are case sensitive. The following characters can be used:
ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 0123456789-_.
The only limit to name length is the limit imposed by Solaris on host names--it is ordinarily set at 256 characters.
Do not begin an attribute name with the characters mp_. This starting sequence is reserved by the CRE.
Nodes and partitions have separate name spaces. Thus, you can have a partition named Parallel that contains a node named Parallel.
It is assumed that you are logged in to a node that is part of the cluster you want to set up. If that is not the case, you must be connected to the target cluster through one of the following alternative methods:
If the node you are logged in to is not part of any cluster, set the SUNHPC_CLUSTER environment variable to the name of the target cluster. For example,
# setenv SUNHPC_CLUSTER node0
makes node0 the default cluster. Remember, a cluster's name is the same as the host name of its master node.
Once you are connected to the cluster, you can start using mpadmin to perform the administrative tasks describedbelow.
When you start up an mpadmin interactive session, you begin at the Cluster level. Table 6-3 lists the mpadmin commands that can be used in the Cluster context.
Table 6-3 Cluster-Level mpadmin Commands
Command |
Synopsis |
---|---|
connect cluster-name |
Connect to a Sun HPC cluster named cluster-name. You will not need to use this command. |
show |
Show cluster attributes. |
dump |
Show all objects in the Sun HPC cluster. |
set attribute[=value] |
Set a cluster-level attribute. |
unset attribute |
Delete a cluster-level attribute. |
node |
Enter the node context. |
partition |
Enter the partition context. |
echo ... |
Print the rest of the line on the standard output. |
quit / exit |
Quit mpadmin. |
help [command] / ? |
Show information about commands. |
This section describes various Cluster-level attributes that you may want to modify. Table 6-4 lists the attributes that can be changed in the Cluster context.
Table 6-4 Cluster-Level Attributes
Attribute |
Kind |
Description |
---|---|---|
Value |
Specifies the default partition. |
|
Value |
Specifies an optional output file for logging CRE daemon error messages. |
|
Value |
Specifies an email address for the system administrator(s). |
|
Value |
Specifies the maximum amount of time a lock can remain set (the value is in seconds) |
This attribute specifies the default partition for running MPI jobs. Its value is used by the command mprun, which is described in the Sun HPC Cluster Runtime Environment 1.0 User's Guide.
For example, to make a partition named part0 the default partition, enter the following in the Cluster context:
[node0]:: set default_interactive_partition=part0
When a user executes a program via mprun, the CRE decides where to run the program, based on the following criteria:
Check for the command-line -p option. If a partition is specified, execute the program in that partition. If the specified partition is invalid, the command will fail.
Check to see if the MPRUN_FLAGS environment variable specifies a default partition. If so, execute the program in that partition. If the specified partition is invalid, the command will fail.
Check to see if the SUNHPC_PART environment variable has a value set. If it specifies a default partition, execute the program in that partition. If the specified partition is invalid, then check to see if the user is logged into any partition. If so, execute the program in that partition.
Check to see if the user is logged into a partition. Execute the program in that partition.
If none of these checks yield a partition name, check for the existence of the default_interactive_partition attribute. If it specifies a partition, execute the program in that partition.
The SUNHPC_PART environment variable is described in "CRE Environment Variables". The MPRUN_FLAGS environment variable is described in the Sun MPI 4.0 User's Guide: With CRE.
The logfile attribute allows you to log CRE messages in a file separate from all other system messages. For example, if you enter
[node0]:: set logfile=/home/wmitty/cre-messages
CRE will output its messages to the file /home/wmitty/cre-messages. If logfile is not set, CRE messages will be passed to syslog, which will store them with other system messages in /var/adm/messages.
A full path name must be specified when setting the logfile attribute.
Set the administrator attribute to specify the email address of the system administrator. For example:
[node0]:: set administrator="root@example.com"
Note the use of double quotes.
The CRE uses locks for internal purposes. The lock_max_age attribute specifies the length of time that the CRE will wait before removing a lock. For example, to set the maximum lock interval to two minutes, enter the following:
[node0]:: set lock_max_age="2 minutes"
The default is 10 minutes.
Ordinarily, the only administrative action that you need to take with nodes is to enable them for use. Or, if you want to temporarily make a node unavailable for use, disable it.
Other node-related administrative tasks--such as, naming the nodes, identifying the master node, setting memory and process limits, and setting the node's partition attribute--are either handled by the CRE automatically or are controlled via partition-level attributes.
There are no administrative actions required by network interface attributes. They are all controlled by the CRE. The only actions you might want to take with respect to network interfaces is to list them or display their attribute values.
Table 6-5 lists the mpadmin commands that can be used at the Node level.
Table 6-5 Node-Level mpadmin Commands
Command |
Synopsis |
---|---|
current node |
Set the context to the specified node for future commands. |
create node |
Create a new node with the given name. |
delete [node] |
Delete a node. |
list |
List all the defined nodes. |
show [node] |
Show a node's attributes. |
dump [node] |
Show the attributes of the node and its network interfaces. |
set attribute[=value] |
Set the specified attribute of the current node. |
unset attribute |
Delete the specified attribute of the current node. |
network |
Move to the network interface command level. |
up |
Move to the next higher level (Top) command context. |
top |
Move to the Top level command context. |
echo ... |
Print the rest of the line on the standard output. |
help [command] |
Show information about commands (?). |
Nodes are defined by many attributes, most of which are not accessible to mpadmin commands. Although you are not able to affect these attributes, it can be helpful to know of their existence and meaning; hence, they are listed and briefly described in Table 6-6.
Table 6-7 lists the Node-level attributes that can be set via mpadmin commands. However, the enabled and max_total_procs are the only node attributes that you can safely modify. See "enabled " and "max_total_procs" for details.
Table 6-6 Node Attributes That Cannot Be Set by the System AdministratorTable 6-7 Node Attributes That Can Be Set by the System Administrator
Attribute |
Kind |
Description |
---|---|---|
Boolean |
Set if the node is enabled, that is, if it is ready to accept jobs. |
|
Boolean |
Specify node on which the master daemons are running as an argument to mprun. |
|
Value |
Maximum amount of shared memory allowed to be locked down by Sun MPI processes (in Kbytes). |
|
Value |
Maximum number of Sun HPC processes per node. |
|
Value |
Minimum amount of shared memory not to be locked down by Sun MPI processes (in Kbytes). |
|
Value |
Name of the node; this is predefined and must not be set via mpadmin. |
|
Value |
Partition of which node is a member. |
|
shmem_minfree |
Value |
Fraction of swap space kept free for non-MPI use. |
The attribute enabled is set by default when the CRE daemons are start up on a node. Unsetting it prevents new jobs from being spawned on the node.
A partition can list a node that is not enabled as a member. However, jobs will execute on that partition as if that node were not a member.
You must not change this node attribute. The CRE automatically sets it to the hostname of the node on which the master CRE daemons are running. This happens whenever the CRE daemons start.
You should not change these node attributes. They are described here so that you will be able to interpret their values when node attributes are displayed via the dump or show commands.
The max_locked_mem and min_unlocked_mem attributes limit the amount of shared memory available to be locked down for use by Sun MPI processes. Locking down shared memory guarantees maximum speed for Sun MPI processes by eliminating delays caused by swapping memory to disk. However, locking physical memory can have undesirable side effects because it prevents that memory from being used by other processes on the node.
Solaris provides two related tunable kernel parameters:
tune_t_minasmem, which is similar to min_unlocked_mem
pages_pp_maximum, which is similar to max_locked_mem.
The CRE parameters impose limits only on MPI programs, while the kernel parameters limit all processes. Also, the kernel parameter units are pages rather than Kbytes. Refer to your Solaris documentation for more information about tune_t_minasmem and pages_pp_maximum.
You limit the number of mprun processes allowed to run concurrently on a node by setting this attribute to an integer.
[node0] P(part0):: set max_total_procs=10[node0] P(part0)::
By default, max_total_procs is unset. The CRE does not impose any limit on the number of processes allowed on a node.
A node's name is predefined by the hpc.conf file. You must not change it by setting this attribute.
There is no need to set this attribute. The CRE sets it automatically if the node is included in any partition configuration(s). See "Creating Partitions" for additional details.
A node can belong to multiple partitions, but only one of those partitions can be enabled at a time. No matter how many partitions a node belongs to, the partition attribute shows only one partition name--that name is always the name of the enabled partition, if one exists for that node.
You should not change this node attribute. It is described here so that you will be able to interpret its value when node attributes are displayed via the dump or show commands.
The shmem_minfree attribute reserves some portion of the /tmp file system for non-MPI use.
For example, if /tmp is 1 Gbyte and shmem_minfree is set to 0.2, any time free space on /tmp drops below 200 Mbytes (1 Gbyte * 0.2), programs using the MPI shared memory protocol will not be allowed to run.
[node0] N(node1):: set shmem_minfree=0.2
shmem_minfree must be set to a value between 0.0 and 1.0. When shmem_minfree is unset, it defaults to 0.1.
This attribute can be set on both nodes and partitions. If both are set to different values, the node attribute overrides the partition attribute.
If you permanently remove a node from the Sun HPC cluster, you should then delete the corresponding node object from the CRE resource database.
Before deleting a node, you should first
Remove it from any enabled partition by unsetting its partition attribute (automatically removing the node from the partition's nodes attribute list), or by removing it from the partition's nodes attribute list. See "Partition Attributes" for details.
Wait for any jobs running on it to terminate, or stop them using the mpkill command, which is described in the Sun HPC Cluster Runtime Environment 1.0 User's Guide.
To delete a node, use the delete command within the context of the node you want to delete.
[node0] N(node3):: delete[node0] Node::
Partitions are logical collections of nodes that work cooperatively to run programs on the Sun HPC cluster. An MPI job can run on a single partition or on the combination of a single partition and one or more nodes that are not members of any partition. MPI jobs cannot run in multiple partitions.
You must create a partition and enable it before you can run MPI programs on your Sun HPC cluster. Once a partition is created, you can configure it to meet the specific needs of your site and enable it for use.
Once a partition is created and enabled, you can run serial or parallel jobs on it. Serial programs run on a single node of a partition. Parallel programs run on any number of nodes of a partition in parallel.
The CRE performs load balancing on shared partitions. When you use mprun to execute a program on a shared partition, the CRE automatically runs it on the least-loaded nodes that satisfy any specified resource requirements.
Partitions are mutable. That is, after you create and configure a partition, you can change it if your site requirements change. You can add nodes to a partition or remove them. You can change a partition's attributes. Also, since you can enable and disable partitions, you can have many partitions defined and use only a few at a time according to current needs.
There are no restrictions on the number or size of partitions, so long as no node is a member of more than one enabled partition.
Table 6-8 lists the mpadmin commands that can be used within the partition context.
Table 6-8 Partition-Level mpadmin Commands
Command |
Synopsis |
---|---|
current partition |
Set the context to the specified partition for future commands. |
create partition |
Create a new partition with the given name. |
delete [partition] |
Delete a partition. |
list |
List all the defined partitions. |
show [partition] |
Show a partition's attributes. |
dump [partition] |
Show the attributes of a partition. |
set attribute[=value] |
Set the current partition's attribute. |
unset attribute |
Delete the current partition's attribute. |
up |
Move up one level in the context hierarchy. |
top |
Move to the top level in the context hierarchy. |
echo ... |
Print the rest of the line on the standard output. |
help [command] |
Show information about the command command. |
? [command] |
Show information about the command command. |
Nodes have to exist in the CRE database before you can add them to partitions.
A node must be enabled for it to be an active member of a partition. If a node is configured as a partition member, but is not enabled, it will not participate in jobs that run on that partition.
Before creating a new partition, you might want to list the partitions that have already been created. To do this, use the list command from within the Partition context.
[node0] Partition:: list part0 part1 [node0] Partition::
To create a partition, use the create command, followed by the name of the new partition. "Naming Partitions and Custom Attributes" discusses the rules for naming partitions.
For example:
[node0] Partition:: create part0[node0] P(part0)::
The create command automatically changes the context to that of the new partition.
At this point, your partition exists by name but contains no nodes. You must assign nodes to the partition before using it. You can do this by setting the partition's nodes attribute. See "Configuring Partitions" for details.
You can configure partitions by setting and deleting their attributes using the set and unset commands. Table 6-9 shows commonly used partition attributes.
Table 6-9 Common Partitions and Their Attributes
Partition Type |
Relevant Attributes |
Recommended Value |
---|---|---|
Login |
no_logins |
not set |
Login |
max_total_procs |
not set or set greater than |
Dedicated |
max_total_procs no_logins |
=1 set |
Serial |
no_mp_tasks |
set |
Parallel |
no_mp_task |
not set |
You can combine the attributes listed in Table 6-10 in any way that makes sense for your site. See "Configuring Partitions" for suggestions about how to configure your partitions.
Partitions, once created, can be enabled and disabled. This lets you define many partitions but use just a few at a time. For instance, you might want to define a number of shared partitions for development use and dedicated partitions for executing production jobs, but have only a subset available for use at a given time.
Table 6-10 lists the predefined partition attributes. To see their current values, use the mpadmin show command.
Table 6-10 Predefined Partition Attributes
Attribute |
Kind |
Description |
---|---|---|
Boolean |
Set if the partition is enabled, that is, if it is ready to accept logins or jobs. |
|
Value |
Maximum amount of shared memory allowed to be locked down by MPI processes (in Kbytes). |
|
Value |
Maximum number of simultaneously running processes allowed on each node in the partition. |
|
Value |
Minimum amount of shared memory that may not be locked down by MPI processes (in Kbytes). |
|
Value |
Name of the partition. |
|
Boolean |
Disallow logins. |
|
Boolean |
Disallow multiprocess parallel jobs. |
|
Value |
List of nodes in the partition. |
|
shmem_minfree |
Value |
Fraction of swap space kept free for non-MPI use |
Set the enabled attribute to make a partition available for use.
By default, the enabled attribute is not set when a partition is created.
You should not change these partition attributes. See "max_locked_mem and min_unlocked_mem" for a description of their effects.
To limit the number of simultaneously running mprun processes allowed on all nodes in a partition or in all partitions, set the max_total_procs attribute in a specific node context or in the general Partition context.
[node0] P(part0):: set max_total_procs=10[node0] P(part0)::
You can set max_total_procs if you want to limit the load on a partition. By default, max_total_procs is unset.
The CRE does not impose any limit on the number of processes allowed on a node.
The name attribute is set when a partition is created. To change the name of a partition, set its name attribute to a new name.
[node0] P(part0):: set name=part1[node0] P(part1)::
See "Naming Partitions and Custom Attributes" for partition naming rules.
To prohibit users from logging in to a partition, set the no_logins attribute.
[node0] P(part1):: set no_logins[node0] P(part1)::
To prohibit multiprocess parallel jobs from running on a partition--that is, to make a serial partition, set the no_mp_tasks attribute.
[node0] P(part1):: set no_mp_tasks[node0] P(part1)::
To specify the nodes that are members of a partition, set the partition's nodes attribute.
[node0] P(part1):: set nodes=node1[node0] P(part1):: show set nodes = node1 set enabled [node0] P(part1)::
The value you give the nodes attribute defines the entire list of nodes in the partition. To add a node to an already existing node list without retyping the names of nodes that are already present, use the + (plus) character.
[node0] P(part1):: set nodes=+node2 node3[node0] P(part1):: show set nodes = node0 node1 node2 node3 set enabled [node0] P(part1)::
Similarly, you can use the - (minus) character to remove a node from a partition.
To assign a range of nodes to the nodes attribute, use the : (colon) syntax. This example assigns to part0 all nodes whose names are alphabetically greater than or equal to node0 and less than or equal to node3:
[node0] P(part1):: set nodes = node0:node3[node0] P(part1)::
Setting the nodes attribute of an enabled partition has the side effect of setting the partition attribute of the corresponding nodes. Continuing the example, setting the nodes attribute of part1 affects the partition attribute of node2:
[node0] P(part1):: node node2[node0] N(node2):: show set partition = part1 [node0] N(node2)::
A node cannot be a member of more than one enabled partition. If you try to add a node that is already in an enabled partition, mpadmin returns an error message.
[node0] P(part1):: show set nodes = node0 node1 node2 node3 set enabled [node0] P(part1):: current part0[node0] P(part0):: set enabled[node0] P(part0):: set nodes=node1mpadmin: node1 must be removed from part1 before it can be added to part0
Unsetting the nodes attribute of an enabled partition has the side effect of unsetting the partition attribute of the corresponding node.
Unsetting the nodes attribute of a disabled partition removes the nodes from the partition but does not change their partition attributes.
Use the shmem_minfree attribute to reserve some portion of the /tmp file system for non-MPI use.
For example, if /tmp is 1 Gbyte and shmem_minfree is set to 0.2, any time free space on /tmp drops below 200 Mbytes (1 Gbyte * 0.2), programs using the MPI shared memory protocol will not be allowed to run.
[node0] P(part1):: set shmem_minfree=0.2
shmem_minfree must be set to a value between 0.0 and 1.0. When shmem_minfree is unset, it defaults to 0.1.
This attribute can be set on both nodes and partitions. If both are set, the node's shmem_minfree attribute overrides the partition's shmem_minfree attribute.
A partition must be enabled before users can run programs on it.
Before enabling a partition, you must disable any partitions that share nodes with the partition that you are about to enable.
To enable a partition, set its enabled attribute.
[node0] P(part0):: set enabled[node0] P(part0)::
Now the partition is ready for use.
Enabling a partition has the side effect of setting the partition attribute of every node in that partition.
If you try to enable a partition that shares a node with another enabled partition, mpadmin prints an error message.
[node0] P(part1):: show set nodes = node1 node2 node3 set enabled [node0] P(part1):: current part2[node0] P(part2):: show set nodes = node1 [node0] P(part2):: set enabledmpadmin: part1/node1: partition resource conflict
To disable a partition, unset its enabled attribute.
[node0] P(part0):: unset enabled[node0] P(part0)::
Now the partition can no longer be used.
Any jobs are running on a partition when it is disabled will continue to run. After disabling a partition, you should either wait for ay running jobs to terminate or stop them using the mpkill command. This is described in the Sun HPC Cluster Runtime Environment 1.0 User's Guide.
Delete a partition when you don't plan to use it anymore.
Although it is possible to delete a partition without first disabling it, you should disable the partition by unsetting its enabled attribute before deleting it.
To delete a partition, use the delete command in the context of the partition you want to delete.
[node0] P(part0):: delete[node0] Partition::
Sun HPC Cluster Tools does not limit you to the attributes listed. You can define new attributes as desired.
For example, if a node has a special resource that will not be flagged by an existing attribute, you may want to set an attribute that identifies that special characteristic. In the following example, node node3 has a frame buffer attached. This feature is captured by setting the custom attribute has_frame_buffer for that node.
[node0] N(node3):: set has_frame_buffer[node0] N(node3)::
Users can then use the attribute has_frame_buffer to request a node that has a frame buffer when they execute programs.
See "Naming Partitions and Custom Attributes" for restrictions on attribute names.