Ordinarily, the only administrative action that you need to take with nodes is to enable them for use. Or, if you want to temporarily make a node unavailable for use, disable it.
Other node-related administrative tasks--such as, naming the nodes, identifying the master node, setting memory and process limits, and setting the node's partition attribute--are either handled by the CRE automatically or are controlled via partition-level attributes.
There are no administrative actions required by network interface attributes. They are all controlled by the CRE. The only actions you might want to take with respect to network interfaces is to list them or display their attribute values.
Table 6-5 lists the mpadmin commands that can be used at the Node level.
Table 6-5 Node-Level mpadmin Commands
Command |
Synopsis |
---|---|
current node |
Set the context to the specified node for future commands. |
create node |
Create a new node with the given name. |
delete [node] |
Delete a node. |
list |
List all the defined nodes. |
show [node] |
Show a node's attributes. |
dump [node] |
Show the attributes of the node and its network interfaces. |
set attribute[=value] |
Set the specified attribute of the current node. |
unset attribute |
Delete the specified attribute of the current node. |
network |
Move to the network interface command level. |
up |
Move to the next higher level (Top) command context. |
top |
Move to the Top level command context. |
echo ... |
Print the rest of the line on the standard output. |
help [command] |
Show information about commands (?). |
Nodes are defined by many attributes, most of which are not accessible to mpadmin commands. Although you are not able to affect these attributes, it can be helpful to know of their existence and meaning; hence, they are listed and briefly described in Table 6-6.
Table 6-7 lists the Node-level attributes that can be set via mpadmin commands. However, the enabled and max_total_procs are the only node attributes that you can safely modify. See "enabled " and "max_total_procs" for details.
Table 6-6 Node Attributes That Cannot Be Set by the System AdministratorTable 6-7 Node Attributes That Can Be Set by the System Administrator
Attribute |
Kind |
Description |
---|---|---|
Boolean |
Set if the node is enabled, that is, if it is ready to accept jobs. |
|
Boolean |
Specify node on which the master daemons are running as an argument to mprun. |
|
Value |
Maximum amount of shared memory allowed to be locked down by Sun MPI processes (in Kbytes). |
|
Value |
Maximum number of Sun HPC processes per node. |
|
Value |
Minimum amount of shared memory not to be locked down by Sun MPI processes (in Kbytes). |
|
Value |
Name of the node; this is predefined and must not be set via mpadmin. |
|
Value |
Partition of which node is a member. |
|
shmem_minfree |
Value |
Fraction of swap space kept free for non-MPI use. |
The attribute enabled is set by default when the CRE daemons are start up on a node. Unsetting it prevents new jobs from being spawned on the node.
A partition can list a node that is not enabled as a member. However, jobs will execute on that partition as if that node were not a member.
You must not change this node attribute. The CRE automatically sets it to the hostname of the node on which the master CRE daemons are running. This happens whenever the CRE daemons start.
You should not change these node attributes. They are described here so that you will be able to interpret their values when node attributes are displayed via the dump or show commands.
The max_locked_mem and min_unlocked_mem attributes limit the amount of shared memory available to be locked down for use by Sun MPI processes. Locking down shared memory guarantees maximum speed for Sun MPI processes by eliminating delays caused by swapping memory to disk. However, locking physical memory can have undesirable side effects because it prevents that memory from being used by other processes on the node.
Solaris provides two related tunable kernel parameters:
tune_t_minasmem, which is similar to min_unlocked_mem
pages_pp_maximum, which is similar to max_locked_mem.
The CRE parameters impose limits only on MPI programs, while the kernel parameters limit all processes. Also, the kernel parameter units are pages rather than Kbytes. Refer to your Solaris documentation for more information about tune_t_minasmem and pages_pp_maximum.
You limit the number of mprun processes allowed to run concurrently on a node by setting this attribute to an integer.
[node0] P(part0):: set max_total_procs=10[node0] P(part0)::
By default, max_total_procs is unset. The CRE does not impose any limit on the number of processes allowed on a node.
A node's name is predefined by the hpc.conf file. You must not change it by setting this attribute.
There is no need to set this attribute. The CRE sets it automatically if the node is included in any partition configuration(s). See "Creating Partitions" for additional details.
A node can belong to multiple partitions, but only one of those partitions can be enabled at a time. No matter how many partitions a node belongs to, the partition attribute shows only one partition name--that name is always the name of the enabled partition, if one exists for that node.
You should not change this node attribute. It is described here so that you will be able to interpret its value when node attributes are displayed via the dump or show commands.
The shmem_minfree attribute reserves some portion of the /tmp file system for non-MPI use.
For example, if /tmp is 1 Gbyte and shmem_minfree is set to 0.2, any time free space on /tmp drops below 200 Mbytes (1 Gbyte * 0.2), programs using the MPI shared memory protocol will not be allowed to run.
[node0] N(node1):: set shmem_minfree=0.2
shmem_minfree must be set to a value between 0.0 and 1.0. When shmem_minfree is unset, it defaults to 0.1.
This attribute can be set on both nodes and partitions. If both are set to different values, the node attribute overrides the partition attribute.
If you permanently remove a node from the Sun HPC cluster, you should then delete the corresponding node object from the CRE resource database.
Before deleting a node, you should first
Remove it from any enabled partition by unsetting its partition attribute (automatically removing the node from the partition's nodes attribute list), or by removing it from the partition's nodes attribute list. See "Partition Attributes" for details.
Wait for any jobs running on it to terminate, or stop them using the mpkill command, which is described in the Sun HPC Cluster Runtime Environment 1.0 User's Guide.
To delete a node, use the delete command within the context of the node you want to delete.
[node0] N(node3):: delete[node0] Node::