The system administrator can configure the nodes in a Sun HPC cluster into one or more logical sets, called partitions.
The CPUs in a Sun HPC 10000 server can be configured into logical nodes. These domains can be logically grouped to form partitions, which the CRE uses in the same way it deals with partitions containing other types of Sun HPC nodes.
Any job launched on a partition will run on one or more nodes in that partition, but not on nodes in any other partition. Partitioning a cluster allows multiple jobs to be executed on the partitions concurrently, without any risk of jobs on different partitions interfering with each other. This ability to isolate jobs can be beneficial in various ways: For example:
If one job requires exclusive use of a set of nodes, but other jobs also need to execute at the same time, the availability of two partitions in a cluster would allow both needs to be satisfied.
If a cluster contains a mix of nodes whose characteristics differ--such as having different memory sizes, CPU counts, or levels of I/O support--the nodes can be grouped into partitions that have similar resources. This would allow jobs that require particular resources to be run on suitable partitions, while jobs that are less resource-dependent could be relegated to less specialized partitions.
If you want your job to execute on a specific partition, the CRE provides you with the following methods for selecting the partition:
Log in to a node that is a member of the partition.
Set the environment variable SUNHPC_PART to the name of the partition.
Use the -p option to the job-launching command, mprun, to specify the partition.
These methods are listed in order of increasing priority. That is, setting the SUNHPC_PART environment variable overrides whichever partition you may be logged into. Likewise, specifying the mprun -p option overrides either of the other methods for selecting a partition.
It is possible for cluster nodes to not belong to any cluster. If you log in to one of these independent nodes and do not request a particular partition, the CRE will launch your job on the cluster's default partition. This is a partition whose name is specified by the SUNHPC_PART environment variable or is defined by an internal attribute that the system administrator is able to set.
The system administrator can also selectively enable and disable partitions. Jobs can only be executed on enabled partitions. This restriction makes it possible to define many partitions in a cluster, but have only a few active at any one time.
It is also possible for a node to belong to more than one partition, so long as only one is enabled at a time.
In addition to enabling and disabling partitions, the system administrator can set and unset other partition attributes that influence various aspects of how the partition functions. For example, if you have an MPI job that requires dedicated use of a set of nodes, you could run it on a partition that the system administrator has configured to accept only one job at a time.
The administrator could configure a different partition to allow multiple jobs to execute concurrently. This shared paritition would be used for code development or other jobs that do not require exclusive use of their nodes.
Although a job cannot be run across partition boundaries, it can be run on a partition plus independent nodes.