Sun HPC ClusterTools 3.0 Administrator's Guide: With CRE

Creating and Enabling Partitions

You must create at least one partition and enable it before you can run MPI programs on your Sun HPC cluster. Even if your cluster already has the default partition all in its database, you will probably want to create other partitions with different node configurations to handle particular job requirements.

There are three essential steps involved in creating and enabling a partition:

Once a partition is created and enabled, you can run serial or parallel jobs on it. A serial program will run on a single node of the partition. Parallel programs will be distributed to as many nodes of the partition as the CRE determines to be appropriate for the job. Job placement on a partition's nodes is discussed in the Sun MPI 4.0 User's Guide: With CRE.

Example: Creating a Two-Node Partition

The following example creates and enables a two-node partition named part0. It then lists the member nodes to verify the success of the creation.

node1# mpadmin[node0]:: partition[node0] Partition:: create part0[node0] P[part0]:: set nodes=node0 node1[node0] P[part0]:: set enabled[node0] P[part0]:: list
    node0
    node1
[node0] P[part0]::

Note -

There are no restrictions on the number or size of partitions, so long as no node is a member of more than one enabled partition.


Example: Two Partitions Sharing a Node

The next example shows a second partition, part1, being created. One of its nodes, node1, is also a member of part1.

[node0]
P[part0]:: up[node0] Partition:: create part1[node0] P[part1]:: set nodes=node1 node2 node3[node0] P[part1]:: list
    node1
    node2
    node3
[node0] P[part1]::

Because node1 is shared with part0, which is already enabled, part1 is not being enabled at this time. This illustrates the rule that a node can be a member of more than one partition, but only one of those partitions can be enabled at a time.

If both partitions were enabled at the same time and you tried to run a job on either, the CRE would fail and return an error message. When you want to use part1, you will need to disable part0 first.

Note the use of the up command. The up command moves the context up one level, in this case, from the context of a particular partition (that is, from part0) to the general Partition context.

Shared vs. Dedicated Partitions

The CRE can configure a partition to allow multiple MPI jobs to be running on it concurrently. Such partitions are referred to as shared partitions. The CRE can also configure a partition to permit only one MPI job to run at a time. These are called dedicated partitions.

In the following example, the partition part0 is configured to be a dedicated partition and part1 is configured to allow shared use by up to four processes.

node1# mpadmin[node0]:: part0[node0] P[part0]:: set max_total_procs=1[node0] P[part0]:: part1[node0] P[part1]:: set max_total_procs=4[node0] P[part1]::

The max_total_procs attribute defines how many processes can be active on each node in the partition for which it is being set. In this example, it is set to 1 on part0, which means only one job can be running at a time. It is set to 4 on part1 to allow up to four jobs to be started on that partition.

Note again, that the context-changing shortcut (introduced in "Enabling Nodes") is used in the second and fourth lines of this example.