Solaris Resource Manager 1.3 System Administration Guide

Chapter 5 Managing Lnodes

The Solaris Resource Manager system is built around a fundamental addition to the kernel: a per-user structure called an lnode. An lnode is essentially a fixed-size place in which many kinds of per-user data can be stored and updated. For every unique UID defined in the password map, there should be a corresponding lnode. (This refers to every unique UID returned by successive getpwent(3C)calls.) An lnode may exist without a corresponding password map entry, but this is not recommended. Lnodes are stored on disk and automatically moved in and out of memory by the kernel. In-memory copies of lnodes that have been changed since they were read from disk are written back as part of the regular system synchronization operations, as well as on demand when the sync command is run, or when necessary to free space in the lnode cache for reading in further lnodes.

Lnodes are maintained as a tree hierarchy, with the central system administrator as the head of the tree, and other users as group headers of smaller groups of users within the tree. The central administrator is the superuser, or root user of the system.

Errors relating to lnodes, such as orphans and group loops, are discussed in Chapter 11, Troubleshooting.

Delegated Administration

The primary responsibility for the administration of lnodes rests with the central administrator. While Solaris Resource Manager introduces several resource controls that may be assigned and managed, it also allows certain administrative privileges to be selectively assigned to non-root users, thereby distributing the task of user administration.

Administrative privileges may be assigned to appropriate users by setting the user's uselimadm or admin flag. A sub-administrator is a user with a set uselimadm flag who has the same limadm program administrative privilege as the superuser. A group header with a set admin flag is called a group administrator, and has privileges (as described below) over users within the same scheduling group.

The central administrator controls the overall division of the system's resources by creating and assigning limits to scheduling groups who have root as their parent. Group administrators typically perform the same types of resource control, but limited to users within their scheduling group. The division of resources by the group administrator is limited to the resources that have been allocated to the group (for example, those allocated to the group header lnode). Note that group administrators may assign an admin flag to any user in their scheduling group, further sub-dividing the administrative responsibilities.

Group administrators can do the following:

  1. Alter the resource limits of any user within their scheduling group.

    Note that even though a group administrator can set the limit of a resource to be greater than that of the limit for the group, resources consumed by group members are also considered to be consumed for group headers, and limits on individual users will be enforced when an attempt is made to exceed the group header limit.

  2. Alter any flag or attribute (except flag.uselimadm and cpu.usage) of any lnode within their scheduling group.

    Flag assignments by group administrators are further constrained in that a user cannot be given a privilege that is not already held by the group administrator. This restriction is applied to prevent a group administrator from circumventing the security within Solaris Resource Manager.

A group administrator's main tools are the limadm(1MSRM) and limreport(1SRM) commands. The limadm program performs operations on the limits, flags, and other Solaris Resource Manager attributes of one or more existing users. Combined with the report generator, limreport, these tools allow a scheduling group to be autonomously self-managed without disturbing the resource allocations or management of other, disjoint scheduling groups.

The superuser is exempt from all resource limits, always has full administrative privileges regardless of its flag settings, can add, delete, and change user accounts and is able to change any usage, limit, or flag value of any lnode by using the limadm command.

Security

Solaris Resource Manager has a wide effect on the administration of a Solaris system, so it is important that it be installed and maintained in a manner that ensures the system is secure.

There are a number of ways in which the system administrator can ensure that the security of the Solaris Resource Manager system is maintained. The most important, as with any Solaris system, is to ensure the privacy of the root password. Anyone who knows the root user password has unrestricted access to the system's resources, the same as the central administrator.

A number of special administrative privileges can be granted to users within Solaris Resource Manager by setting certain system flags within their respective lnodes. These can help increase the security of a system because they allow delegated users to carry out the tasks that are required of them without giving them full superuser privileges.

Some of these privileges should not be granted lightly because they give the recipient user broad-ranging powers. The passwords of users possessing special privileges should be protected diligently, just as the superuser password should be protected.

There are several security precautions taken within Solaris Resource Manager to prevent misuse of the administrative privilege granted to sub-administrators: Refer to A Typical Application Server and Lnode Maintenance Programs.

There are circumstances in which the central administrator can leave the system open to security breaches if not careful with the manipulation of the structure of the scheduling tree. It is important for the central administrator to know how to correctly modify the scheduling tree and how to detect potential problems in the current structure. This is discussed in Scheduling Tree Structure.

Suggested Group Administrator Lnode Structure

A problem that group administrators might face is that they share group limits with their group members. For example, if the group header lnode has a process limit set on it, then that limit controls the number of processes that can be used by the entire group, including the group header. Unless further limited, any user within the scheduling group can prevent the group administrator from being able to create new processes simply by exceeding the process limit.

One way to prevent this is for the group administrator to set individual limits on each of the group members. However, to be effective, these limits might have to be overly restrictive. Also, forcing a group administrator to manage individual limits is at odds with the Solaris Resource Manager goal of hierarchical resource control.

An alternate way of solving this problem is for the central administrator to change the structure of the lnodes within the group. Rather than placing users directly beneath the group administrator's lnode, a "control" lnode is created below the group administrator's lnode as the only child lnode, and then users are made children of the control lnode. This results in the structure shown.

Figure 5-1 Group Administrator Lnode Structure

Diagram shows control lnode created below group administrator's lnode as only child. Users are then made children of the control lnode.

Referring to the previous figure, the UID of the group administrator's account would correspond to that of the lnode labelled "Actual," the parent of the tree. This is the lnode that would have the admin flag set. A dummy account would be created for the "Control" lnode. No login need be permitted on this account. The lnodes labelled "A," "B," and "C" correspond to users under the group administrator's control.

In this case, the process limit for the "Actual" lnode could be 100, while that of the "Control" lnode could be 90, with limits for individual users set to 0. This setup would ensure that even if users A, B, and C were using a total of 90 processes (all they are allowed), the sub-administrator can still create 10 processes.

However, it is still possible in this case for users to stop each other from creating processes. The only way to prevent this is to set individual limits on those users. In this example, those limits could be set to 40 each, still allowing flexibility while preventing a single user from completely shutting out the others.

Also note that in this example the central administrator could create extra lnodes for new users as children of the "Control" lnode without having to re-balance limits.

Limits Database

The limits database is the database of user information that the Solaris Resource Manager software uses to perform all resource control. It contains one lnode per UID, which is accessed by using the UID as a direct index into the file. If there is an lnode for a numerically large UID, the limits database will appear to be quite large. However, where the UIDs of users in the system are not sequential, the limits database will have large gaps, or holes, and on a file system type that supports it, may be stored as a sparse file. This means that no disk blocks are actually allocated for storage of the "empty" sections of the file. ufs file systems support sparse files, but tmpfs file systems do not. See Saving and Restoring the Limits Database for the implications of sparse files on saving and restoring the limits database.

Whenever you create a new user, you must create a new lnode.

Creating the Limits Database

The Solaris Resource Manager startup file (/etc/init.d/init.srm) will create an initial limits database when invoked for the first time or at any boot if the file is missing.

The limits database typically resides in the /var/srm directory.

The limits database should be owned by root, group owned by root, and readable only by the owner. Write permission is not required since only kernel code with superuser credentials writes to the file.


Caution - Caution -

If a user can write to the Solaris Resource Manager limits database, system security may be compromised.


Saving and Restoring the Limits Database

Because the limits database can be a sparse file, be careful when copying it. The file will most likely consume a lot of disk space if it is written by a utility that does not support sparse files, since the empty regions of the file will read as sequences of zeros and be written back out as real blocks instead of empty regions. This could happen if the file were being copied, backed up, or restored by a utility such as tar(1), cpio(1), or cp(1), although programs such as ufsdump(1M) and ufsrestore(1M) will preserve holes.

You can also back up and restore the limits database by using limreport to generate an ASCII version of the file and using limadm to re-create the original file from that saved ASCII version. For example, the command:

# limreport 'flag.real' - lname preserve > /var/tmp/savelnodes 

will create /var/tmp/savelnodes as an ASCII representation of the lnodes for each user in the password map. Note that this will not save lnodes for which there is no corresponding password map entry. At most, lnodes should exist for the set of all UIDs in the password map.

The command:

# limadm set -f - < /var/tmp/savelnodes

recreates the lnodes for which data was saved. This command will not delete lnodes that were not saved, so these techniques can also be used to save and restore selections of lnodes rather than the whole limits database.

The limreport and limadm Commands describes the use of the limreport and limadm commands in more detail. It is helpful for the administrator to be familiar with using these commands to save and restore lnodes, since it may be necessary to use them when a change to the interpretation of the lnode structure (as defined by the limits database) is made.

Because the contents of the limits database are changing regularly during normal system operation, perform backup operations while the system is quiescent, or in single-user mode. Similarly, restore an entire limits database only when the Solaris Resource Manager is not in use, such as when the system is in single-user mode.

Creating and Deleting Lnodes

Whenever a new user is created, a corresponding lnode should be created and its limits and privileges should be set. When using Solaris Resource Manager, the administrator should maintain the limits database in parallel with the normal Solaris password map. The command:

# limreport \!flag.real - uid lname 

can be used to print a list of the UIDs and login names of any users who do not have corresponding lnodes.

Lnodes are not automatically created and deleted by the system commands used to create and delete accounts. It is up to the administrator to perform these actions. However, lnodes can be automatically created on-demand when the user logs in; see PAM Subsystem for more details.

Similarly, just before a user account is deleted from the password map, the corresponding lnode should be removed from the limits database by using the limadm(1MSRM) command.


Note -

When deleting lnodes, ensure that sub-trees are deleted from the bottom-most lnodes up. If you start at the top of the sub-tree you are deleting, you will lose control of the children of the lnodes deleted because they will become orphaned when their parents are removed.


If the UID of a user is ever changed, the contents of the user's lnode should be copied to a new lnode corresponding to the new UID and the original lnode should be deleted. See Copying and Removing Lnodes.

Any child lnodes should be attached either to the newly created lnode or to some other suitable parent lnode. The command:

# limreport 'sgroup==X' '%u\tsgroup=Y\n' uid | limadm set -u -f - 

can be used to find all lnodes with a scheduling group parent whose UID is X, and make them children of the lnode with a UID of Y.

The following steps illustrate how to change the UID of an lnode from X to Y.

  1. Save the state of the lnode in which the UID is to be changed:

    # limreport 'uid==X' - lname preserve > /var/tmp/savelnode.X
    

  2. Change the UID of the password map entry for the user from the old value (X) to that of the new UID (Y).

  3. Create an lnode for the new UID, restoring the state from that which was previously saved:

    # limadm set -f /var/tmp/savelnode.X
    

  4. For all child lnodes of the lnode to be changed (UID X), change their scheduling group to the new lnode (UID Y):

    # limreport 'sgroup==X' '%u\tsgroup=Y\n' uid | limadm set -u -f -  
    

  5. Ensure there are no processes currently attached to the old lnode.

  6. Use the chown(2) command to change the owner of all files owned by the original UID to that of the new UID. For example:

    # find / -user X -print | xargs chown Y
    

  7. Delete the old lnode:

    # limadm delete X
    

Lnode Maintenance Programs

The limadm command is the primary tool available to administrators for maintaining a user's lnode. This command changes Solaris Resource Manager attribute values for a given list of user accounts. If an lnode does not exist for any of the users, then a default-filled blank one is created first. New lnodes are created with the following properties:

The scheduling group of the new lnode is set to user 'other' (srmother) if an lnode for that user account exists, or else to the root lnode.

The limadm invoker needs sufficient administrative privilege to perform the specified changes. The invoker must be the superuser, have a set uselimadm flag, or be a group administrator who is only changing the attributes of members of the scheduling group to which the invoker belongs. Restrictions apply to the use of limadm by group administrators.

The limadm command allows an administrator to remove an lnode without deleting the corresponding user account in the password map. To use limadm, the invoker must be the superuser or have a set uselimadm flag. If the invoker only has a set admin flag, then the invoker can only modify the lnodes of users under scheduling groups for which the invoker is the group header.

Units

Values within Solaris Resource Manager are represented in one of three types of units:

Scaled

The scaled unit is the default. It's an easily readable format used to display and enter values. Scaled units help users avoid making entry errors by reducing the number of digits that need to be entered.

Raw (or unscaled)

The raw unit is the basic unit in which a value is represented. For example, the raw units for virtual memory usage are bytes, and the raw units for virtual memory accrual are byte-seconds. These are mainly employed when billing for usage, when exact quantities are required.

Internal

The internal unit is used by Solaris Resource Manager to store memory attributes in machine-dependent units rather than in bytes.

Conversions

Solaris Resource Manager programs carry out conversions to and from the internal units used to store attribute values, so that the user is always presented with scaled units or raw units. This means that, with few exceptions, the user never need be concerned with the internal units used by Solaris Resource Manager.

The terms exa, peta, tera, giga, mega, and kilo are used within Solaris Resource Manager to represent powers of 2, not powers of 10. For example, a megabyte is 1,048,576 bytes, not 1,000,000 bytes. The powers of 2 for each term are 60 (exa), 50 (peta), 40 (tera), 30 (giga), 20 (mega), and 10 (kilo).

The programs that are the primary interface between users and the Solaris Resource Manager system are limadm, liminfo, and limreport. The conversions and scaling that they carry out are detailed in the following subsections.

The limadm Command

When changing attribute values, limadm allows numbers to be suffixed by scale characters: [EPTGMK][B][.][wdhms]. Uppercase and lowercase are interchangeable.

If the attribute has the dimension of storage (memory attributes) or of storage accrual, then a character from the first group (EPTGMK) is allowed. This multiplies by the number of bytes in 1 exabyte (E), petabyte (P), terabyte (T), gigabyte (G), megabyte (M), or kilobyte (K) as appropriate. The optional B character may be appended for user readability, but it has no effect.

If the attribute has the dimension of time (type date or time), or of storage accrual, then a character from the second group is allowed. This multiplies by the number of seconds in one week (w), day (d), hour (h), minute (m), or second (s) as appropriate.

An optional period may separate the storage and time units (for example. mh, M.h and MB.h all stand for 'megabyte hours').

Where ambiguity exists in the use of the M suffix, limadm attempts to derive its meaning from the context. If this is not possible, it is assumed to mean mega, not minutes.

When inputting large numbers, these conversion characters are useful to avoid errors in the order of magnitude of the entry, but the quantity is stored in internal units regardless of the method of entry.

A special scale character u can also be used, by itself, but only for memory attribute values. It indicates that the number is in machine-dependent (internal) units instead of bytes.

The liminfo Command

The liminfo(1SRM) command uses the same suffixes when reporting as limadm uses for input (see above). Normally, liminfo converts values into appropriate scaled formats to be printed, but the -r option can be used to cause liminfo to print values in their raw (unscaled) form. For example, memory is normally scaled to a suitable unit, such as megabytes (for example, '102 MB'), but specifying the -r option causes it to be printed in bytes (for example, 106954752 bytes).

The limreport Command

The limreport(1SRM) command always reports values in their raw (unscaled) form. If scaled values are required, the conversion must be stated explicitly in the expression used to display the value. For example, to display total virtual memory usage for all users in kilobytes, rounded up to the nearest kilobyte:

# limreport 'flag.real' '%-8.8s %d KB\n' lname '(memory.usage+1k-1)/1k' 

As this example demonstrates, you can use the scaling suffixes on numbers in expressions, which simplifies the conversion of raw units to scaled values.

Note that the internal units for some attributes are not the same as their 'raw' form. Normally, this does not concern the user because all the Solaris Resource Manager programs carry out conversion to scaled units or raw units. However, it does mean that, for example, select-expressions in limreport that specify an exact match on a number of bytes will always fail to match if a number is specified that is not an integral multiple of the relevant internal unit.

Manipulating Lnodes

The limreport and limadm Commands

The limreport and limadm commands provide the administrator with an easy way to save and restore the contents of lnodes for any number of users. Use the limreport command to select and extract the lnodes that are to be saved, and use limadm to restore them. This combination of commands is most commonly used for copying lnodes and for altering the lnode structure, as discussed in the following sections.

The limreport command also provides a flexible way to select and display users' attributes. It provides two levels of selection: selection of lnodes, and selection of attributes to be displayed for each lnode selected. The lnode selection is achieved by specification of a select-expression, which may be a single condition or a set of conditions joined by logical operators in a C-style syntax. The attribute selection is achieved by listing the attributes' symbolic names. The way in which the attributes will be displayed is specified by a format control string, similar to the C function limreport, with extensions to handle special Solaris Resource Manager types. If a format control string of '-' is specified, limreport uses default formats for each attribute displayed. Refer to limreport(1SRM) for further details.

Changing the Lnode Structure

The limadm command provides a facility to indivisibly change the contents of attributes within lnodes, given that the invoker has sufficient privilege. Change commands can be specified directly on the command line, or the name of a file containing the change commands can be specified (by using the -f option).

The limreport command is able to generate attribute value assignments using the limadm syntax (refer to the preserve identifier in the limadm syntax), the output of which can be input to limreport using the -f option. This allows the administrator to use the two programs together to selectively save and restore the contents of the limits database.

Copying and Removing Lnodes

The command:

# limreport 'uid==X' - Y preserve | limadm set -u -f -

will copy an lnode from UID X to UID Y. The expression uid==X provides the method for selecting the source lnode. The preserve identifier causes limreport to output all attribute values that are not read-only in a syntax that is suitable to pass to limadm. Placing the UID Y prior to the preserve identifier causes this to be the first item in the data passed to the limadm, thus providing the selection of the target lnode.

If the source lnode is no longer required, it can be removed using limadm.


Note -

Be careful when using a match by UID as the limreport selection expression. If multiple login names share a UID, they will all be matched. In the example above, this would not matter; the same lnode data will be preserved and loaded multiple times. In the Solaris environment, UID 0 has login names of both root and smtp.