Solaris Resource Manager 1.3 System Administration Guide

Adding a Computational Batch Application User

This example introduces the following command:

srmkill

Kills all the active processes attached to an lnode

The Finance department owns the database system, but Joe, a user from Engineering, has to run a computational batch job and would like to use Finance's machine during off hours when the system is generally idle. The Finance department dictates that Joe's job is less important than the databases, and agrees to run his work only if it will not interfere with the system's primary job. To enforce this policy, add a new group (batch) to the lnode database, and add Joe to the new batch group of the server's lnode hierarchy:

# limadm set cpu.shares=20 databases
# limadm set cpu.shares=1 batch
# limadm set cpu.shares=1 joe
# limadm set sgroup=batch joe

Figure 10-2 Adding a Computation Batch Application

Diagram shows addition of a new group called batch to the lnode database and server hierarchy, and addition of user Joe to the new batch group.

This command sequence changes the allocation of shares so that the databases group has 20 shares, while the batch group has just one. This specifies that members of the batch group (only Joe) will use at most 1/21 of the machine if the databases group is active. The databases group receives 20/21, or 95.2 percent, more than the 60% + 20% = 80% previously determined to be sufficient to handle the database work. If the databases are not requesting their full allocation, Joe will receive more than his 4.8 percent allocation. If the databases are completely inactive, Joe's allocation might reach 100 percent. When the number of outstanding shares allocated to databases is increased from 1 to 20, there is no need to make any changes to the allocation of shares for db1 and db2. Within the databases group, there are still four shares outstanding, allocated in the 3:1 ratio. Different levels of the scheduling tree are totally independent; what matters is the ratio of shares between peer groups.

Despite these assurances, the Finance department further wants to ensure that Joe is not even able to log in during prime daytime hours. This can be accomplished by putting some login controls on the batch group. Since the controls are sensitive to time of day, run a script that only permits the batch group to log in at specific times. For example, this could be implemented with crontab entries, such as:

0 6 * * * /usr/srm/bin/limadm set flag.nologin=set batch 
0 18 * * * /usr/srm/bin/limadm set flag.nologin=clear batch

At 6:00 a.m., batch does not have permission to log in, but at 18:00 (6 p.m.), the limitation is removed.

An even stricter policy can be implemented by adding another line to the crontab entry:

01 6 * * * /usr/srm/bin/srmkill joe

This uses the srmkill(1MSRM) command to kill any processes attached to the lnode Joe at 6:01 a.m. This will not be necessary if the only resources that the job requires are those controlled by Solaris Resource Manager. This action could be useful if Joe's job could reasonably tie up other resources that would interfere with normal work. An example would be a job that holds a key database lock or dominates an I/O channel.

Joe can now log in and run his job only at night. Because Joe (and the entire batch group) has significantly fewer shares than the other applications, his application will run with less than 5 percent of the machine. Similarly, nice(1) can be used to reduce the priority of processes attached to this job, so it runs at lower priority than other jobs running with equal Solaris Resource Manager shares.

At this point, the Finance department has ensured that its database applications have sufficient access to this system and will not interfere with each other's work. The department has also accommodated Joe's overnight batch processing loads, while ensuring that his work also will not interfere with the department's mission-critical processing.