System Administration Guide: Resource Management and Network Services

Part II Solaris 9 Resource Manager Topics

The section contains the following chapters on resource management in the Solaris operating environment.

Chapter 4, Introduction to Solaris 9 Resource Manager

Provides an overview of resource management and discusses why you would want to use the functionality on your system

Chapter 5, Projects and Tasks

Covers the project and task facilities and describes how they are used to label and separate workloads

Chapter 6, Extended Accounting

Describes the extended accounting functionality that is used to capture detailed resource consumption statistics for capacity planning or billing purposes

Chapter 7, Resource Controls

Discusses resource controls, which are used to place bounds on resource usage by applications that run on your system

Chapter 8, Fair Share Scheduler

Describes the fair share scheduler, which uses shares to specify the amounts of CPU time that is allocated to processes that run on your system

Chapter 9, Physical Memory Control Using the Resource Capping Daemon

Describes the resource capping daemon rcapd(1M), which regulates the consumption of physical memory by processes running in projects that have resource caps

Chapter 10, Resource Pools

Describes resource pools, which are used to partition system resources and guarantee that a known amount of resources is always available to a specified workload that runs on your system

Chapter 11, Resource Management Configuration Example

Describes a hypothetical server consolidation project

Chapter 12, Resource Control Functionality in the Solaris Management Console

Describes the resource management functionality available in the Solaris Management Console tool

Chapter 4 Introduction to Solaris 9 Resource Manager

Resource management functionality enables you to control how applications use available system resources. You can do the following:

Overview

Modern computing environments have to provide a flexible response to the varying workloads that are generated by different applications on a system. If resource management features are not used, the Solaris operating environment responds to workload demands by adapting to new application requests dynamically. This default response generally means that all activity on the system is given equal access to resources. Solaris resource management features enable you to treat workloads individually. You can do the following:

The ability to minimize cross-workload performance compromises, along with the facilities that monitor resource usage and utilization, is referred to as resource management. Resource management is implemented through a collection of algorithms. The algorithms handle the series of capability requests that an application presents in the course of its execution.

Resource management facilities permit you to modify the default behavior of the operating system with respect to different workloads. Behavior primarily refers to the set of decisions that are made by operating system algorithms when an application presents one or more resource requests to the system. You can use resource management facilities to do the following:

The implementation of a system configuration that uses the resource management facilities can serve several purposes. You can do the following:

When planning a resource-managed configuration, key requirements include the following:

After you identify cooperating and conflicting workloads, you can create a resource configuration that presents the least compromise to the service goals of the business, within the limitations of the system's capabilities.

Effective resource management is enabled in the Solaris environment by offering control mechanisms, notification mechanisms, and monitoring mechanisms. Many of these capabilities are provided through enhancements to existing mechanisms such as the proc(4) file system, processor sets, and scheduling classes. Other capabilities are specific to resource management. These capabilities are described in subsequent chapters.

Resource Classifications

A resource is any aspect of the computing system that can be manipulated with the intent to change application behavior. Thus, a resource is a capability that an application implicitly or explicitly requests. If the capability is denied or constrained, the execution of a robustly written application proceeds more slowly.

Classification of resources, as opposed to identification of resources, can be made along a number of axes. The axes could be implicitly requested as opposed to explicitly requested, time-based, such as CPU time, compared to time-independent, such as assigned CPU shares, and so forth.

Generally, scheduler-based resource management is applied to resources that the application can implicitly request. For example, to continue execution, an application implicitly requests additional CPU time. To write data to a network socket, an application implicitly requests bandwidth. Constraints can be placed on the aggregate total use of an implicitly requested resource.

Additional interfaces can be presented so that bandwidth or CPU service levels can be explicitly negotiated. Resources that are explicitly requested, such as a request for an additional thread, can be managed by constraint.

Resource Management Control Mechanisms

The three types of control mechanisms that are available in the Solaris operating environment are constraints, scheduling, and partitioning.

Constraints

Constraints allow the administrator or application developer to set bounds on the consumption of specific resources for a workload. With known bounds, modeling resource consumption scenarios becomes a simpler process. Bounds can also be used to control ill-behaved applications that would otherwise compromise system performance or availability through unregulated resource requests.

Constraints do present complications for the application. The relationship between the application and the system can be modified to the point that the application is no longer able to function. One approach that can mitigate this risk is to gradually narrow the constraints on applications with unknown resource behavior. The resource controls feature discussed in Chapter 7, Resource Controls provides a constraint mechanism. Newer applications can be written to be aware of their resource constraints, but not all application writers will choose to do this.

Scheduling

Scheduling refers to making a sequence of allocation decisions at specific intervals. The decision that is made is based on a predictable algorithm. An application that does not need its current allocation leaves the resource available for another application's use. Scheduling-based resource management enables full utilization of an undercommitted configuration, while providing controlled allocations in a critically committed or overcommitted scenario. The underlying algorithm defines how the term “controlled” is interpreted. In some instances, the scheduling algorithm might guarantee that all applications have some access to the resource. The fair share scheduler (FSS) described in Chapter 8, Fair Share Scheduler manages application access to CPU resources in a controlled way.

Partitioning

Partitioning is used to bind a workload to a subset of the system's available resources. This binding guarantees that a known amount of resources is always available to the workload. The resource pools functionality that is described in Chapter 10, Resource Pools enables you to limit workloads to specific subsets of the machine. Configurations that use partitioning can avoid system-wide overcommitment. However, in avoiding this overcommitment, the possibility of achieving high utilizations is less likely. A reserved group of resources, such as processors, is not available for use by another workload when the workload bound to them is idle.

Resource Management Configuration

Portions of the resource management configuration can be placed in a network name service. This feature allows the administrator to apply resource management constraints across a collection of machines, rather than on an exclusively per-machine basis. Related work can share a common identifier, and the aggregate usage of that work can be tabulated from accounting data.

Resource management configuration and workload-oriented identifiers are described more fully in Chapter 5, Projects and Tasks. The extended accounting facility that links these identifiers with application resource usage is described in Chapter 6, Extended Accounting.

When to Use Resource Management

Use resource management to ensure that your applications have the required response times.

Resource management can also increase resource utilization. By categorizing and prioritizing usage, you can effectively use reserve capacity during off-peak periods, often eliminating the need for additional processing power. You can also ensure that resources are not wasted because of load variability.

Server Consolidation

Resource management is ideal for environments that consolidate a number of applications on a single server.

The cost and complexity of managing numerous machines encourages the consolidation of several applications on larger, more scalable servers. Instead of running each workload on a separate system, with full access to that system's resources, you can use resource management software to segregate workloads within the system. Resource management enables you to lower overall total cost of ownership by running and controlling several dissimilar applications on a single Solaris system.

If you are providing Internet and application services, you can use resource management to do the following:

Supporting a Large or Varied User Population

Use resource management features in any system that has a large, diverse user base, such as an educational institution. If you have a mix of workloads, the software can be configured to give priority to specific projects.

For example, in large brokerage firms, traders intermittently require fast access to execute a query or to perform a calculation. Other system users, however, have more consistent workloads. If you allocate a proportionately larger amount of processing power to the traders' projects, the traders have the responsiveness that they need.

Resource management is also ideal for supporting thin-client systems. These platforms provide stateless consoles with frame buffers and input devices, such as smart cards. The actual computation is done on a shared server, resulting in a timesharing type of environment. Use resource management features to isolate the users on the server. Then, a user who generates excess load does not monopolize hardware resources and significantly impact others who use the system.

Setting Up Resource Management (Task Map)

The following task map gives a basic overview of the steps that are involved in setting up resource management on your system.

Task 

Description 

For Instructions 

Identify the workloads on your system. 

Review project entries in either the /etc/project database file or in the NIS map or LDAP directory service.

project Database

Prioritize the workloads on your system. 

Determine which applications are critical. These workloads might require preferential access to resources. 

Refer to your business service goals. 

Monitor real-time activity on your system. 

Use performance tools to view the current resource consumption of workloads that are running on your system. You can then evaluate whether you must restrict access to a given resource or isolate particular workloads from other workloads. 

Monitoring by System, cpustat(1M), iostat(1M), mpstat(1M), prstat(1M), sar(1), and vmstat(1M)

Make temporary modifications to the workloads that are running on your system. 

To determine which values can be altered, refer to the resource controls that are available in the Solaris environment. You can update the values from the command line while the task or process is running. 

Available Resource Controls, Actions on Resource Control Values, Temporarily Updating Resource Control Values on a Running System, rctladm(1M), and prctl(1)

Set resource control attributes for every project entry in the project database or name service project table.

Each project entry in the /etc/project database or the name service project table can contain one or more resource controls. The resource controls constrain tasks and processes attached to that project. For each threshold value that is placed on a resource control, you can associate one or more actions to be taken when that value is reached.

You can set resource controls by using the command-line interface or the Solaris Management Console. If you are setting configuration parameters across a large number of systems, use the console for this task. 

project Database, Local project File Format, Available Resource Controls, Actions on Resource Control Values, and Chapter 8, Fair Share Scheduler

Place an upper bound on the resource consumption of physical memory by processes in projects. 

The resource cap enforcement daemon rcapd will enforce the physical memory resource cap defined in the /etc/project database with the rcap.max-rss attribute.

project Database, Chapter 9, Physical Memory Control Using the Resource Capping Daemon, and rcapd(1M)

Create resource pools configurations. 

Resource pools provide a way to partition system resources, such as processors, and maintain those partitions across reboots. You can add a project.pool attribute to each entry in the /etc/project database.

Creating Pools Configurations

Make the fair share scheduler (FSS) your default system scheduler. 

Ensure that all user processes in either a single CPU system or a processor set belong to the same scheduling class. 

FSS Configuration Examples and dispadmin(1M)

Activate the extended accounting facility to monitor and record resource consumption on a task or process basis. 

Use extended accounting data to assess current resource controls and plan capacity requirements for future workloads. Aggregate usage on a system-wide basis can be tracked. To obtain complete usage statistics for related workloads that span more than one system, the project name can be shared across several machines. 

How to Activate Extended Accounting for Processes, Tasks, and Flows and acctadm(1M)

(Optional) If you determine that additional adjustments to your configuration are required, you can continue to alter the values from the command line. You can alter the values while the task or process is running. 

Modifications to existing tasks can be applied on a temporary basis without restarting the project. Tune the values until you are satisfied with the performance. Then update the current values in the /etc/project database or the name service project table.

Temporarily Updating Resource Control Values on a Running System, rctladm(1M), and prctl(1)

(Optional) Capture extended accounting data. 

Write extended accounting records for active processes and active tasks. The files that are produced can be used for planning, chargeback, and billing purposes. 

wracct(1M)

Chapter 5 Projects and Tasks

This chapter discusses the project and task facilities of Solaris resource management. Projects and tasks are used to label workloads and separate them from one another. The project provides a network-wide administrative identifier for related work. The task collects a group of processes into a manageable entity that represents a workload component.

Overview

To optimize workload response, you must first be able to identify the workloads that are running on the system you are analyzing. This information can be difficult to obtain by using either a purely process-oriented or a user-oriented method alone. In the Solaris environment, you have two additional facilities that can be used to separate and identify workloads: the project and the task.

Based on their project or task membership, running processes can be manipulated with standard Solaris commands. The extended accounting facility can report on both process usage and task usage, and tag each record with the governing project identifier. This process enables offline workload analysis to be correlated with online monitoring. The project identifier can be shared across multiple machines through the project name service database. Thus, the resource consumption of related workloads that run on (or span) multiple machines can ultimately be analyzed across all of the machines.

Projects

The project identifier is an administrative identifier that is used to identify related work. The project identifier can be thought of as a workload tag equivalent to the user and group identifiers. A user or group can belong to one or more projects. These projects can be used to represent the workloads in which the user or group of users is allowed to participate. This membership can then be the basis of chargeback that is based on, for example, usage or initial resource allocations. Although a user must have a default project assigned, the processes that the user launches can be associated with any of the projects of which that user is a member.

Determining a User's Default Project

To log in to the system, a user must be assigned a default project.

Because each process on the system possesses project membership, an algorithm to assign a default project to the login or other initial process is necessary. The algorithm to determine a default project consists of four steps. If no default project is found, the user's login, or request to start a process, is denied.

The system sequentially follows these steps to determine a user's default project:

  1. If the user has an entry with a project attribute defined in the /etc/user_attr extended user attributes database, then the value of the project attribute is the default project (see user_attr(4)).

  2. If a project with the name user.user-id is present in the project(4) database, then that project is the default project.

  3. If a project with the name group.group-name is present in the project database, where group-name is the name of the default group for the user (as specified in passwd(4)), then that project is the default project.

  4. If the special project default is present in the project database, then that project is the default project.

This logic is provided by the getdefaultproj() library function (see getprojent(3PROJECT)).

project Database

You can store project data in a local file, in a Network Information Service (NIS) project map, or in a Lightweight Directory Access Protocol (LDAP) directory service. The /etc/project database or name service is used at login and by all requests for account management by the pluggable authentication module (PAM) to bind a user to a default project.


Note –

Updates to entries in the project database, whether to the /etc/project file or to a representation of the database in a network name service, are not applied to currently active projects. The updates are applied to new tasks that join the project when login(1) or newtask(1) is used.


PAM Subsystem

Operations that change or set identify include logging in to the system, invoking an rcp or rsh command, using ftp, or using su. When an operation involves changing or setting identity, a set of configurable modules is used to provide authentication, account management, credentials management, and session management.

The account management PAM module for projects is documented in the pam_projects(5) man page. The PAM system is documented in the man pages pam(3PAM), pam.conf(4), and pam_unix(5).

Name Service Configuration

Resource management supports the name service project database. The location where the project database is stored is defined in /etc/nsswitch.conf. By default, files is listed first, but the sources can be listed in any order.


project: files [nis] [ldap]

If more than one source for project information is listed, the nsswitch.conf file directs the routine to start searching for the information in the first source listed. The routine then searches subsequent databases.

For more information on /etc/nsswitch.conf, see “The Name Service Switch (Overview)” in System Administration Guide: Naming and Directory Services (DNS, NIS, and LDAP) and nsswitch.conf(4).

Local project File Format

If you select files as your project database in nsswitch.conf, the login process searches the /etc/project file for project information (see projects(1) and project(4)). The project file contains a one-line entry for each project recognized by the system, of the following form:


projname:projid:comment:user-list:group-list:attributes

The fields are defined as follows.

projname

The name of the project. The name must be a string that consists of alphanumeric characters, the underline (_) character, and the hyphen (-). The name must begin with an alphabetic character. projname cannot contain periods (.), colons (:), or newline characters.

projid

The project's unique numerical ID (PROJID) within the system. The maximum value of the projid field is UID_MAX (2147483647).

comment

The project's description.

user-list

A comma-separated list of users who are allowed in the project.

Wildcards can be used in this field. The asterisk (*) allows all users to join the project. The exclamation point followed by the asterisk (!*) excludes all users from the project. The exclamation mark (!) followed by a user name excludes the specified user from the project.

group-list

A comma-separated list of groups of users who are allowed in the project.

Wildcards can be used in this field. The asterisk (*) allows all groups to join the project. The exclamation point followed by the asterisk (!*) excludes all groups from the project. The exclamation mark (!) followed by a group name excludes the specified group from the project.

attributes

A semicolon-separated list of name-value pairs (see Chapter 7, Resource Controls). name is an arbitrary string that specifies the object-related attribute, and value is the optional value for that attribute.


name[=value]

In the name-value pair, names are restricted to letters, digits, underscores, and the period. The period is conventionally used as a separator between the categories and subcategories of the rctl. The first character of an attribute name must be a letter. The name is case sensitive.

Values can be structured by using commas and parentheses to establish precedence. The semicolon is used to separate name-value pairs. The semicolon cannot be used in a value definition. The colon is used to separate project fields. The colon cannot be used in a value definition.


Note –

Routines that read this file halt when they encounter a malformed entry. Any project assignments that are specified after the incorrect entry are not made.


This example shows the default /etc/project file:


system:0:System:::
user.root:1:Super-User:::
noproject:2:No Project:::
default:3::::
group.staff:10::::

This example shows the default /etc/project file with project entries added at the end:


system:0:System:::
user.root:1:Super-User:::
noproject:2:No Project:::
default:3::::
group.staff:10::::
user.ml:2424:Lyle Personal:::
booksite:4113:Book Auction Project:ml,mp,jtd,kjh::

To add resource controls to the /etc/project file, see Using Resource Controls.

Name Service Configuration for NIS

If you are using NIS, you can specify in the /etc/nsswitch.conf file to search the NIS maps for projects:


project: nis files 

The NIS map, either project.byname or project.bynumber, has the same form as the /etc/project file:


projname:projid:comment:user-list:group-list:attributes

For more information, see System Administration Guide: Naming and Directory Services (DNS, NIS, and LDAP).

Directory Service Configuration for LDAP

If you are using LDAP, you can specify in the /etc/nsswitch.conf file to search the LDAP entries for projects.


project: ldap files

For more information on the schema for project entries in an LDAP database, see “Solaris Schemas” in System Administration Guide: Naming and Directory Services (DNS, NIS, and LDAP).

Tasks

With each successful login into a project, a new task that contains the login process is created. The task is a process collective that represents a set of work over time. A task can also be viewed as a workload component.

Each process is a member of one task, and each task is associated with one project.

Figure 5–1 Project and Task Tree

Diagram shows one project with three tasks under it, and two to four processes under each task.

All operations on sessions, such as signal delivery, are also supported on tasks. You can also bind tasks to processor sets and set their scheduling priorities and classes, which modifies all current and subsequent processes in the task.

Tasks are created at login (see login(1)), by cron(1M), by newtask(1), and by setproject(3PROJECT).

The extended accounting facility can provide accounting data for processes that is aggregated at the task level.

Commands Used to Administer Projects and Tasks

Command 

Description 

projects(1)

Prints the project membership of a user. 

newtask(1)

Executes the user's default shell or specified command, placing the execution command in a new task that is owned by the specified project. newtask can also be used to modify the task and the project binding for a running process.

projadd(1M)

Adds a new project entry to the /etc/project file. projadd creates a project entry only on the local system. projadd cannot change information that is supplied by the network name service.

projmod(1M)

Modifies a project's information on the local system. projmod cannot change information that is supplied by the network name service. However, the command does verify the uniqueness of the project name and project ID against the external name service.

projdel(1M)

Deletes a project from the local system. projdel cannot change information that is supplied by the network name service.

Command Options Used With Projects and Tasks

ps

Use ps -o to display task and project IDs. For example, to view the project ID, type the following:


# ps -o user,pid,uid,projid
USER PID   UID  PROJID
jtd  89430 124  4113

id

Use id -p to print the current project ID in addition to the user and group IDs. If the user operand is provided, the project associated with that user's normal login is printed:


#  id -p
uid=124(jtd) gid=10(staff) projid=4113(booksite)

pgrep and pkill

To match only processes with a project ID in a specific list, type the following:


# pgrep -J projidlist
# pkill -J projidlist

To match only processes with a task ID in a specific list, type the following:


# pgrep -T taskidlist
# pkill -T taskidlist

prstat

To display various statistics for processes and projects that are currently running on your system, type the following:


% prstat -J
	  PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
 21634 jtd      5512K 4848K cpu0    44    0   0:00.00 0.3% prstat/1
   324 root       29M   75M sleep   59    0   0:08.27 0.2% Xsun/1
 15497 jtd        48M   41M sleep   49    0   0:08.26 0.1% adeptedit/1
   328 root     2856K 2600K sleep   58    0   0:00.00 0.0% mibiisa/11
  1979 jtd      1568K 1352K sleep   49    0   0:00.00 0.0% csh/1
  1977 jtd      7256K 5512K sleep   49    0   0:00.00 0.0% dtterm/1
   192 root     3680K 2856K sleep   58    0   0:00.36 0.0% automountd/5
  1845 jtd        24M   22M sleep   49    0   0:00.29 0.0% dtmail/11
  1009 jtd      9864K 8384K sleep   49    0   0:00.59 0.0% dtwm/8
   114 root     1640K  704K sleep   58    0   0:01.16 0.0% in.routed/1
   180 daemon   2704K 1944K sleep   58    0   0:00.00 0.0% statd/4
   145 root     2120K 1520K sleep   58    0   0:00.00 0.0% ypbind/1
   181 root     1864K 1336K sleep   51    0   0:00.00 0.0% lockd/1
   173 root     2584K 2136K sleep   58    0   0:00.00 0.0% inetd/1
   135 root     2960K 1424K sleep    0    0   0:00.00 0.0% keyserv/4
PROJID    NPROC  SIZE   RSS MEMORY      TIME  CPU PROJECT
    10       52  400M  271M    68%   0:11.45 0.4% booksite
     0       35  113M  129M    32%   0:10.46 0.2% system

Total: 87 processes, 205 lwps, load averages: 0.05, 0.02, 0.02

To display various statistics for processes and tasks that are currently running on your system, type the following:


% prstat -T
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
 23023 root       26M   20M sleep   59    0   0:03:18 0.6% Xsun/1
 23476 jtd        51M   45M sleep   49    0   0:04:31 0.5% adeptedit/1
 23432 jtd      6928K 5064K sleep   59    0   0:00:00 0.1% dtterm/1
 28959 jtd        26M   18M sleep   49    0   0:00:18 0.0% .netscape.bin/1
 23116 jtd      9232K 8104K sleep   59    0   0:00:27 0.0% dtwm/5
 29010 jtd      5144K 4664K cpu0    59    0   0:00:00 0.0% prstat/1
   200 root     3096K 1024K sleep   59    0   0:00:00 0.0% lpsched/1
   161 root     2120K 1600K sleep   59    0   0:00:00 0.0% lockd/2
   170 root     5888K 4248K sleep   59    0   0:03:10 0.0% automountd/3
   132 root     2120K 1408K sleep   59    0   0:00:00 0.0% ypbind/1
   162 daemon   2504K 1936K sleep   59    0   0:00:00 0.0% statd/2
   146 root     2560K 2008K sleep   59    0   0:00:00 0.0% inetd/1
   122 root     2336K 1264K sleep   59    0   0:00:00 0.0% keyserv/2
   119 root     2336K 1496K sleep   59    0   0:00:02 0.0% rpcbind/1
   104 root     1664K  672K sleep   59    0   0:00:03 0.0% in.rdisc/1
TASKID    NPROC  SIZE   RSS MEMORY      TIME  CPU PROJECT                     
   222       30  229M  161M    44%   0:05:54 0.6% group.staff                 
   223        1   26M   20M   5.3%   0:03:18 0.6% group.staff                 
    12        1   61M   33M   8.9%   0:00:31 0.0% group.staff                 
     1       33   85M   53M    14%   0:03:33 0.0% system                      

Total: 65 processes, 154 lwps, load averages: 0.04, 0.05, 0.06	

Note –

The -J and -T options cannot be used together.


Using cron and su With Projects and Tasks

cron

The cron command issues a settaskid to ensure that each cron, at, and batch job executes in a separate task, with the appropriate default project for the submitting user. Also, the at and batch commands capture the current project ID and ensure that the project ID is restored when running an at job.

su

To switch the user's default project, and thus create a new task (as part of simulating a login) type the following:


# su - user

To retain the project ID of the invoker, issue su without the - flag.


# su user

Project Administration Examples

How to Define a Project and View the Current Project

This example shows how to use the projadd and projmod commands.

  1. Become superuser.

  2. View the default /etc/project file on your system.


    # cat /etc/project
    system:0::::
    user.root:1::::
    noproject:2::::
    default:3::::
    group.staff:10::::
  3. Add a project called booksite and assign it to a user named mark with project ID number 4113.


    # projadd -U mark -p 4113 booksite
    
  4. View the /etc/project file again to see the project addition.


    # cat /etc/project
    system:0::::
    user.root:1::::
    noproject:2::::
    default:3::::
    group.staff:10::::
    booksite:4113::mark::
  5. Add a comment that describes the project in the comment field.


    # projmod -c `Book Auction Project' booksite
    
  6. View the changes in the /etc/project file.


    # cat /etc/project
    system:0::::
    user.root:1::::
    noproject:2::::
    default:3::::
    group.staff:10::::
    booksite:4113:Book Auction Project:mark::

How to Delete a Project From the /etc/project File

This example shows how to use the projdel command to delete a project.

  1. Become superuser.

  2. Remove the project booksite by using the projdel command.


    # projdel booksite
    
  3. Display the /etc/project file.


    # cat /etc/project
    system:0::::
    user.root:1::::
    noproject:2::::
    default:3::::
    group.staff:10::::
  4. Log in as user mark and type projects to view the projects assigned.


    # su - mark
    # projects
    default

How to Obtain User and Project Membership Information

Use the id command with the -p flag to view the current project membership of the invoking process.


$ id -p
uid=100(mark) gid=1(other) projid=3(default)

How to Create a New Task

  1. Become superuser.

  2. Create a new task in the booksite project by using the newtask command with the -v (verbose) option to obtain the system task ID.


    # newtask -v -p booksite
    16

    The execution of newtask creates a new task in the specified project, and places the user's default shell in this task.

  3. View the current project membership of the invoking process.


    # id -p
    uid=100(mark) gid=1(other) projid=4113(booksite)

    The process is now a member of the new project.

How to Move a Running Process Into a New Task

This example shows how to associate a running process with a different task and project. To perform this task, you must either be superuser, or be the owner of the process and be a member of the new project.

  1. Become superuser.

  2. Obtain the process ID of the book_catalog process.


    # pgrep book_catalog
    	8100
  3. Associate process 8100 with a new task ID in the booksite project.


    # newtask -v -p booksite -c 8100
    	17

    The -c option specifies that newtask operate on the existing named process.

  4. Confirm the task to process ID mapping.


    # pgrep -T 17
    	8100

Chapter 6 Extended Accounting

By using the project and task facilities that are described in Chapter 5, Projects and Tasks to label and separate workloads, you can monitor resource consumption by each workload. You can use the extended accounting subsystem to capture a detailed set of resource consumption statistics on both running processes and tasks. The extended accounting subsystem labels the usage records with the project for which the work was done. You can also use extended accounting, in conjunction with the Internet Protocol Quality of Service (IPQoS) flow accounting module described in “Using Flow Accounting and Statistics Gathering (Tasks)” in IPQoS Administration Guide, to capture network flow information on a system.

To begin using extended accounting, see How to Activate Extended Accounting for Processes, Tasks, and Flows.

Overview

Before you can apply resource management mechanisms, you must first be able to characterize the resource consumption demands that various workloads place on a system. The extended accounting facility in the Solaris operating environment provides a flexible way to record system and network resource consumption on a task or process basis, or on the basis of selectors provided by IPQoS (see ipqos(7IPP)). Unlike online monitoring tools, which measure system usage in real time, extended accounting enables you to examine historical usage. You can then make assessments of capacity requirements for future workloads.

With extended accounting data available, you can develop or purchase software for resource chargeback, workload monitoring, or capacity planning.

How Extended Accounting Works

The extended accounting facility in the Solaris environment uses a versioned, extensible file format to contain accounting data. Files that use this data format can be accessed or be created by using the API provided in the included library, libexacct(3LIB). These files can then be analyzed on any platform with extended accounting enabled, and their data can be used for capacity planning and chargeback.

If extended accounting is active, statistics are gathered that can be examined by the libexacct API. libexacct allows examination of the exacct files either forward or backward. The API supports third-party files that are generated by libexacct as well as those files that are created by the kernel.There is a Practical Extraction and Report Language (Perl) interface to libexacct that enables you to develop customized reporting and extraction scripts. See Perl Interface to libexacct.

With extended accounting enabled, the task tracks the aggregate resource usage of its member processes. A task accounting record is written at task completion. Interim records can also be written. For more information on tasks, see Chapter 5, Projects and Tasks.

Figure 6–1 Task Tracking With Extended Accounting Activated

Flow diagram shows how aggregate resource usage of a task's processes is captured in the record that is written at task completion.

Extensible Format

The extended accounting format is substantially more extensible than the SunOSTM legacy system accounting software format (see “What is System Accounting?” in System Administration Guide: Advanced Administration). Extended accounting permits accounting metrics to be added and removed from the system between releases, and even during system operation.


Note –

Both extended accounting and legacy system accounting software can be active on your system at the same time.


exacct Records and Format

Routines that allow exacct records to be created serve two purposes.

The format permits different forms of accounting records to be captured without requiring that every change be an explicit version change. Well-written applications that consume accounting data must ignore records they do not understand.

The libexacct library converts and produces files in the exacct format. This library is the only supported interface to exacct format files.


Note –

The getacct, putacct, and wracct system calls do not apply to flows. The kernel creates flow records and writes them to the file when IPQoS flow accounting is configured.


Extended Accounting Configuration

The /etc/acctadm.conf file contains the current extended accounting configuration. The file is edited through the acctadm interface, not by the user.

The directory /var/adm/exacct is the standard location for placing extended accounting data. You can use the acctadm(1M) command to specify a different location for the process and task accounting-data files.

Commands Used With Extended Accounting

Command 

Description 

acctadm(1M)

Modifies various attributes of the extended accounting facility, stops and starts extended accounting, and is used to select accounting attributes to track for processes, tasks, and flows. 

wracct(1M)

Writes extended accounting records for active processes and active tasks. 

lastcomm(1)

Displays previously invoked commands. lastcomm can consume either standard accounting-process data or extended-accounting process data.

For information on commands that are associated with tasks and projects, see Commands Used to Administer Projects and Tasks. For information on IPQoS flow accounting, see ipqosconf(1M).

Perl Interface to libexacct

The Perl interface allows you to create Perl scripts that can read the accounting files produced by the exacct framework. You can also create Perl scripts that write exacct files.

The interface is functionally equivalent to the underlying C API. When possible, the data obtained from the underlying C API is presented as Perl data types. This feature makes accessing the data easier and it removes the need for buffer pack and unpack operations. Moreover, all memory management is performed by the Perl library.

The various project, task, and exacct-related functions are separated into groups. Each group of functions is located in a separate Perl module. Each module begins with the Sun standard Sun::Solaris:: Perl package prefix. All of the classes provided by the Perl exacct library are found under the Sun::Solaris::Exacct module.

The underlying libexacct(3LIB) library provides operations on exacct format files, catalog tags, and exacct objects. exacct objects are subdivided into two types:

The following table summarizes each of the modules.

Module 

Description 

For More Information 

Sun::Solaris::Project

This module provides functions to access the project manipulation functions getprojid(2), endprojent(3PROJECT) , fgetprojent(3PROJECT), getdefaultproj(3PROJECT), getprojbyid(3PROJECT), getprojbyname(3PROJECT), getprojent(3PROJECT), getprojidbyname(3PROJECT), inproj(3PROJECT), project_walk(3PROJECT), setproject(3PROJECT), and setprojent(3PROJECT).

Project(3PERL)

Sun::Solaris::Task

This module provides functions to access the task manipulation functions gettaskid(2) and settaskid(2).

Task(3PERL)

Sun::Solaris::Exacct

This module is the top-level exacct module. This module provides functions to access the exacct-related system calls getacct(2), putacct(2), and wracct(2). This module also provides functions to access the libexacct(3LIB) library function ea_error(3EXACCT). Constants for all of the exacct EO_*, EW_*, EXR_*, P_*, and TASK_* macros are also provided in this module.

Exacct(3PERL)

Sun::Solaris::Exacct:: Catalog

This module provides object-oriented methods to access the bitfields in an exacct catalog tag. This module also provides access to the constants for the EXC_*, EXD_*, and EXD_* macros.

Exacct::Catalog(3PERL)

Sun::Solaris::Exacct:: File

This module provides object-oriented methods to access the libexacct accounting file functions ea_open(3EXACCT), ea_close(3EXACCT), ea_get_creator(3EXACCT), ea_get_hostname(3EXACCT), ea_next_object(3EXACCT), ea_previous_object(3EXACCT), and ea_write_object(3EXACCT).

Exacct::File(3PERL)

Sun::Solaris::Exacct:: Object

This module provides object-oriented methods to access an individual exacct accounting file object. An exacct object is represented as an opaque reference blessed into the appropriate Sun::Solaris::Exacct::Object subclass. This module is further subdivided into the object types Item and Group. At this level, there are methods to access the ea_match_object_catalog(3EXACCT) and ea_attach_to_object(3EXACCT) functions.

Exacct::Object(3PERL)

Sun::Solaris::Exacct:: Object::Item

This module provides object-oriented methods to access an individual exacct accounting file Item. Objects of this type inherit from Sun::Solaris::Exacct::Object.

Exacct::Object::Item(3PERL)

Sun::Solaris::Exacct:: Object::Group

This module provides object-oriented methods to access an individual exacct accounting file Group. Objects of this type inherit from Sun::Solaris::Exacct::Object. These objects provide access to the ea_attach_to_group(3EXACCT) function. The Items contained within the Group are presented as a Perl array.

Exacct::Object::Group(3PERL)

For examples that show how to use the modules described in the previous table, see Using the Perl Interface to libexacct.

Using Extended Accounting Functionality

How to Activate Extended Accounting for Processes, Tasks, and Flows

To activate the extended accounting facility for tasks, processes, and flows, use the acctadm(1M) command. The optional final parameter to acctadm indicates whether the command should act on the process, system task, or flow accounting components of the extended accounting facility.

  1. Become superuser.

  2. Activate extended accounting for processes.


    # acctadm -e extended -f /var/adm/exacct/proc process 
    
  3. Activate extended accounting for tasks.


    # acctadm -e extended,mstate -f /var/adm/exacct/task task
    
  4. Activate extended accounting for flows.


    # acctadm -e extended -f /var/adm/exacct/flow flow
    

How to Activate Extended Accounting With a Startup Script

Activate extended accounting on an ongoing basis by linking the /etc/init.d/acctadm script into /etc/rc2.d.


# ln -s /etc/init.d/acctadm /etc/rc2.d/Snacctadm
# ln -s /etc/init.d/acctadm /etc/rc2.d/Knacctadm

The n variable is replaced by a number.

See Extended Accounting Configuration for information on accounting configuration.

How to Display Extended Accounting Status

Type acctadm without arguments to display the current status of the extended accounting facility.


# acctadm
                 Task accounting: active
            Task accounting file: /var/adm/exacct/task
          Tracked task resources: extended
        Untracked task resources: none
              Process accounting: active
         Process accounting file: /var/adm/exacct/proc
       Tracked process resources: extended
     Untracked process resources: host,mstate
                 Flow accounting: active
            Flow accounting file: /var/adm/exacct/flow
          Tracked flow resources: extended
        Untracked flow resources: none

In the previous example, system task accounting is active in extended mode and mstate mode. Process and flow accounting are active in extended mode.


Note –

In the context of extended accounting, microstate (mstate) refers to the extended data, associated with microstate process transitions, that is available in the process usage file (see proc(4)). This data provides much more detail about the activities of the process than basic or extended records.


How to View Available Accounting Resources

Available resources can vary from system to system, and from platform to platform. Use the -r option to view the available accounting resources on the system.


# acctadm -r
process:
extended pid,uid,gid,cpu,time,command,tty,projid,taskid,ancpid,wait-status,flag
basic    pid,uid,gid,cpu,time,command,tty,flag
task:
extended taskid,projid,cpu,time,host,mstate,anctaskid
basic    taskid,projid,cpu,timeprocess:
extended pid,uid,gid,cpu,time,command,tty,projid,taskid,ancpid,wait-status,flag
basic    pid,uid,gid,cpu,time,command,tty,flag
task:
extended taskid,projid,cpu,time,host,mstate,anctaskid
basic    taskid,projid,cpu,time
flow:
extended 
saddr,daddr,sport,dport,proto,dsfield,nbytes,npkts,action,ctime,lseen,projid,uid
basic    saddr,daddr,sport,dport,proto,nbytes,npkts,action

How to Deactivate Process, Task, and Flow Accounting

To deactivate process, task, and flow accounting, turn off each of them individually.

  1. Become superuser.

  2. Turn off process accounting.


    # acctadm -x process 
    
  3. Turn off task accounting.


    # acctadm -x task
    
  4. Turn off flow accounting.


    # acctadm -x flow
    
  5. Verify that task accounting, process accounting, and flow accounting have been turned off.


    	# acctadm
                Task accounting: inactive
           Task accounting file: none
         Tracked task resources: extended
       Untracked task resources: none
             Process accounting: inactive
        Process accounting file: none
      Tracked process resources: extended
    Untracked process resources: host,mstate
                Flow accounting: inactive
           Flow accounting file: none
         Tracked flow resources: extended
       Untracked flow resources: none

Using the Perl Interface to libexacct

How to Recursively Print the Contents of an exacct Object

Use the following code to recursively print the contents of an exacct object. Note that this capability is provided by the library as the Sun::Solaris::Exacct::Object::dump() function. This capability is also available through the ea_dump_object() convenience function.


sub dump_object
     {
             my ($obj, $indent) = @_;
             my $istr = '  ' x $indent;

             #
             # Retrieve the catalog tag.  Because we are 
             # doing this in an array context, the
             # catalog tag will be returned as a (type, catalog, id) 
             # triplet, where each member of the triplet will behave as 
             # an integer or a string, depending on context.
             # If instead this next line provided a scalar context, e.g.
             #    my $cat  = $obj->catalog()->value();
             # then $cat would be set to the integer value of the 
             # catalog tag.
             #
             my @cat = $obj->catalog()->value();

             #
             # If the object is a plain item
             #
             if ($obj->type() == &EO_ITEM) {
                     #
                     # Note: The '%s' formats provide s string context, so
                     # the components of the catalog tag will be displayed
                     # as the symbolic values. If we changed the '%s'
                     # formats to '%d', the numeric value of the components
                     # would be displayed.
                     #
                     printf("%sITEM\n%s  Catalog = %s|%s|%s\n", 
                        $istr, $istr, @cat);
                     $indent++;

                     #
                     # Retrieve the value of the item.  If the item contains
                     # in turn a nested exacct object (i.e., an item or
                     # group),then the value method will return a reference
                     # to the appropriate sort of perl object
                     # (Exacct::Object::Item or Exacct::Object::Group).
                     # We could of course figure out that the item contained
                     # a nested item orgroup by examining the catalog tag in
                     # @cat and looking for a type of EXT_EXACCT_OBJECT or
                     # EXT_GROUP.
                     #
                     my $val = $obj->value();
                     if (ref($val)) {
                             # If it is a nested object, recurse to dump it.
                             dump_object($val, $indent);
                     } else {
                             # Otherwise it is just a 'plain' value, so
                             # display it.
                             printf("%s  Value = %s\n", $istr, $val);
                     }

             #
             # Otherwise we know we are dealing with a group.  Groups
             # represent contents as a perl list or array (depending on
             # context), so we can process the contents of the group
             # with a 'foreach' loop, which provides a list context.
             # In a list context the value method returns the content
             # of the group as a perl list, which is the quickest
             # mechanism, but doesn't allow the group to be modified.
             # If we wanted to modify the contents of the group we could
             # do so like this:
             #    my $grp = $obj->value();   # Returns an array reference
             #    $grp->[0] = $newitem;
             # but accessing the group elements this way is much slower.
             #
             } else {
                     printf("%sGROUP\n%s  Catalog = %s|%s|%s\n",
                         $istr, $istr, @cat);
                     $indent++;
                     # 'foreach' provides a list context.
                     foreach my $val ($obj->value()) {
                             dump_object($val, $indent);
                     }
                     printf("%sENDGROUP\n", $istr);
             }
     }

How to Create a New Group Record and Write It to a File

Use this script to create a new group record and write it to a file named /tmp/exacct.


#!/usr/perl5/5.6.1/bin/perl

use strict;
use warnings;
use Sun::Solaris::Exacct qw(:EXACCT_ALL);
# Prototype list of catalog tags and values.
     my @items = (
             [ &EXT_STRING | &EXC_DEFAULT | &EXD_CREATOR      => "me"       ],
             [ &EXT_UINT32 | &EXC_DEFAULT | &EXD_PROC_PID     => $$         ],
             [ &EXT_UINT32 | &EXC_DEFAULT | &EXD_PROC_UID     => $<         ],
             [ &EXT_UINT32 | &EXC_DEFAULT | &EXD_PROC_GID     => $(         ],
             [ &EXT_STRING | &EXC_DEFAULT | &EXD_PROC_COMMAND => "/bin/rec" ],
     );

     # Create a new group catalog object.
     my $cat = ea_new_catalog(&EXT_GROUP | &EXC_DEFAULT | &EXD_NONE)

     # Create a new Group object and retrieve its data array.
     my $group = ea_new_group($cat);
     my $ary = $group->value();

     # Push the new Items onto the Group array.
     foreach my $v (@items) {
             push(@$ary, ea_new_item(ea_new_catalog($v->[0]), $v->[1]));
     }

     # Open the exacct file, write the record & close.
     my $f = ea_new_file('/tmp/exacct', &O_RDWR | &O_CREAT | &O_TRUNC)
        || die("create /tmp/exacct failed: ", ea_error_str(), "\n");
     $f->write($group);
     $f = undef;

How to Print the Contents of an exacct File

Use the following Perl script to print the contents of an exacct file.


#!/usr/perl5/5.6.1/bin/perl

     use strict;
     use warnings;
     use Sun::Solaris::Exacct qw(:EXACCT_ALL);

     die("Usage is dumpexacct <exacct file>\n") unless (@ARGV == 1);

     # Open the exact file and display the header information.
     my $ef = ea_new_file($ARGV[0], &O_RDONLY) || die(error_str());
     printf("Creator:  %s\n", $ef->creator());
     printf("Hostname: %s\n\n", $ef->hostname());

     # Dump the file contents
     while (my $obj = $ef->get()) {
             ea_dump_object($obj);
     }

     # Report any errors
     if (ea_error() != EXR_OK && ea_error() != EXR_EOF)  {
             printf("\nERROR: %s\n", ea_error_str());
             exit(1);
     }
     exit(0);

Example Output From Sun::Solaris::Exacct::Object->dump()

Here is example output produced by running Sun::Solaris::Exacct::Object->dump() on the file created in How to Create a New Group Record and Write It to a File.


Creator:  root
Hostname: localhost
GROUP
       Catalog = EXT_GROUP|EXC_DEFAULT|EXD_NONE
       ITEM
         Catalog = EXT_STRING|EXC_DEFAULT|EXD_CREATOR
         Value = me
       ITEM
         Catalog = EXT_UINT32|EXC_DEFAULT|EXD_PROC_PID
         Value = 845523
       ITEM
         Catalog = EXT_UINT32|EXC_DEFAULT|EXD_PROC_UID
         Value = 37845
       ITEM
         Catalog = EXT_UINT32|EXC_DEFAULT|EXD_PROC_GID
         Value = 10
       ITEM
         Catalog = EXT_STRING|EXC_DEFAULT|EXD_PROC_COMMAND
         Value = /bin/rec
ENDGROUP

Chapter 7 Resource Controls

After you determine the resource consumption of workloads on your system as described in Chapter 6, Extended Accounting, you can place bounds on resource usage. Bounds prevent workloads from over-consuming resources. The resource controls facility, which extends the UNIX resource limit concept, is one constraint mechanism that is used for this purpose.

Overview

UNIX systems have traditionally provided a resource limits facility (rlimits). The rlimits facility allows administrators to set one or more numerical limits on the amount of resources a process can consume. These limits include per-process CPU time used, per-process core file size, and per-process maximum heap size. Heap size is the amount of memory that is allocated for the process data segment.

In the Solaris operating environment, the concept of a per-process resource limit has been extended to the task and project entities described in Chapter 5, Projects and Tasks. These enhancements are provided by the resource controls (rctls) facility. A resource control is identified by the prefix project, task, or process. Resource controls can be observed on a system-wide basis.

The resource controls facility provides compatibility interfaces for the resource limits facility. Existing applications that use resource limits continue to run unchanged. These applications can be observed in the same way as applications that are modified to take advantage of the resource controls facility.

Resource controls provide a mechanism for constraint on system resources. Processes, tasks, and projects can be prevented from consuming amounts of specified system resources. This mechanism leads to a more manageable system by preventing over-consumption of resources.

Constraint mechanisms can be used to support capacity-planning processes. An encountered constraint can provide information about application resource needs without necessarily denying the resource to the application.

Resource controls can also serve as a simple attribute mechanism for resource management facilities. For example, the number of CPU shares made available to a project in the fair share scheduler (FSS) scheduling class is defined by the project.cpu-shares resource control. Because a project is assigned a fixed number of shares by the control, the various actions associated with exceeding a control are not relevant. In this context, the current value for the project.cpu-shares control is considered an attribute on the specified project.

Another type of project attribute is used to regulate the resource consumption of physical memory by collections of processes attached to a project. These attributes have the prefix rcap, for example, rcap.max-rss. Like a resource control, this type of attribute is configured in the project database. However, while resource controls enforce limits from the kernel, rcap project attributes are enforced at the user level by the rcapd(1M) resource cap enforcement daemon. For information on rcapd, see Chapter 9, Physical Memory Control Using the Resource Capping Daemon.

Administering Resource Controls

The resource controls facility is configured through the project database (see Chapter 5, Projects and Tasks). Resource control attributes are set in the final field of the project database entry. The values associated with each resource control are enclosed in parentheses, and appear as plain text separated by commas. The values in parentheses constitute an “action clause.” Each action clause is composed of a privilege level, a threshold value, and an action that is associated with the particular threshold. Each resource control can have multiple action clauses, which are also separated by commas. The following entry defines a per-process address-space limit and a per-task lightweight process limit on a project entity.


development:101:Developers:::task.max-lwps=(privileged,10,deny);
  process.max-address-space=(privileged,209715200,deny)

The rctladm(1M) command allows you to make runtime interrogations of and modifications to the resource controls facility, with global scope. The prctl(1) command allows you to make runtime interrogations of and modifications to the resource controls facility, with local scope.

Available Resource Controls

A list of the standard resource controls that are available in this release is shown in the following table.

The table describes the resource that is constrained by each control. The table also identifies the default units that are used by the project database for that resource. The default units are of two types:

Thus, project.cpu-shares specifies the number of shares to which the project is entitled. process.max-file-descriptor specifies the highest file number that can be assigned to a process by the open(2) system call.

Table 7–1 Standard Resource Controls

Control Name 

Description 

Default Unit 

project.cpu-shares

The number of CPU shares that are granted to this project for use with FSS(7)

Quantity (shares) 

task.max-cpu-time

Maximum CPU time that is available to this task's processes 

Time (seconds) 

task.max-lwps

Maximum number of LWPs simultaneously available to this task's processes 

Quantity (LWPs) 

process.max-cpu-time

Maximum CPU time that is available to this process 

Time (seconds) 

process.max-file-descriptor

Maximum file descriptor index that is available to this process 

Index (maximum file descriptor) 

process.max-file-size

Maximum file offset that is available for writing by this process 

Size (bytes) 

process.max-core-size

Maximum size of a core file that is created by this process 

Size (bytes) 

process.max-data-size

Maximum heap memory that is available to this process 

Size (bytes) 

process.max-stack-size

Maximum stack memory segment that is available to this process 

Size (bytes) 

process.max-address-space

Maximum amount of address space, as summed over segment sizes, available to this process 

Size (bytes) 

Resource Control Values and Privilege Levels

A threshold value on a resource control constitutes an enforcement point where local actions can be triggered or global actions, such as logging, can occur.

Each threshold value must be associated with a privilege level of one of the following three types.

A resource control is guaranteed to have one system value, which is defined by the system, or resource provider. The system value represents how much of the resource the current implementation of the operating system is capable of providing.

Any number of privileged values can be defined, and only one basic value is allowed. Operations that are performed without specifying a privilege value are assigned a basic privilege by default.

The privilege level for a resource control value is defined in the privilege field of the resource control block as RCTL_BASIC, RCTL_PRIVILEGED, or RCTL_SYSTEM. See getrctl(2) for more information. You can use the prctl command to modify values that are associated with basic and privileged levels.

Actions on Resource Control Values

For each threshold value that is placed on a resource control, you can associate one or more actions.

Due to implementation restrictions, the global properties of each control can restrict the set of available actions that can be set on the threshold value. A list of available signal actions is presented in the following table. For additional information on signals, see signal(3HEAD).

Table 7–2 Signals Available to Resource Control Values

Signal 

Notes 

SIGABRT 

 

SIGHUP 

 

SIGTERM 

 

SIGKILL 

 

SIGSTOP 

 

SIGXRES 

 

SIGXFSZ 

Available only to resource controls with the RCTL_GLOBAL_FILE_SIZE property (process.max-file-size). See rctlblk_set_value(3C) for more information.

SIGXCPU 

Available only to resource controls with the RCTL_GLOBAL_CPUTIME property (process.max-cpu-time). See rctlblk_set_value(3C) for more information.

Resource Control Flags and Properties

Each resource control on the system has a certain set of associated properties. This set of properties is defined as a set of global flags, which are associated with all controlled instances of that resource. Global flags cannot be modified, but the flags can be retrieved by using either rctladm or the getrctl system call.

Local flags define the default behavior and configuration for a specific threshold value of that resource control on a specific process or process collective. The local flags for one threshold value do not affect the behavior of other defined threshold values for the same resource control. However, the global flags affect the behavior for every value associated with a particular control. Local flags can be modified, within the constraints supplied by their corresponding global flags, by the prctl command or the setrctl system call (see setrctl(2)).

For the complete list of local flags, global flags, and their definitions, see rctlblk_set_value(3C).

To determine system behavior when a threshold value for a particular resource control is reached, use rctladm to display the global flags for the resource control. For example, to display the values for process.max-cpu-time, type the following:


$ rctladm process.max-cpu-time
	process.max-cpu-time   syslog=off   [ lowerable no-deny cpu-time inf ]

The global flags indicate the following.

lowerable

Superuser privileges are not required to lower the privileged values for this control.

no-deny

Even when threshold values are exceeded, access to the resource is never denied.

cpu-time

SIGXCPU is available to be sent when threshold values of this resource are reached.

inf

Any value with RCTL_LOCAL_MAXIMAL defined actually represents an infinite quantity, and the value is never enforced.

Use prctl to display local values and actions for the resource control.


$ prctl -n process.max-cpu-time $$
	353939: -ksh
	process.max-cpu-time   [ lowerable no-deny cpu-time inf ]
		18446744073709551615 privileged signal=XCPU   [ max ]
		18446744073709551615 system     deny          [ max ]

The max (RCTL_LOCAL_MAXIMAL) flag is set for both threshold values, and the inf (RCTL_GLOBAL_INFINITE) flag is defined for this resource control. Hence, as configured, both threshold quantities represent infinite values and they will never be exceeded.

Resource Control Enforcement

More than one resource control can exist on a resource. A resource control can exist at each containment level in the process model. If resource controls are active on the same resource at different container levels, the smallest container's control is enforced first. Thus, action is taken on process.max-cpu-time before task.max-cpu-time if both controls are encountered simultaneously.

Figure 7–1 Process Collectives, Container Relationships, and Their Resource Control Sets

Diagram shows enforcement of each resource control at its containment level.

Global Monitoring of Resource Control Events

Often, the resource consumption of processes is unknown. To get more information, try using the global resource control actions that are available with rctladm(1M). Use rctladm to establish a syslog action on a resource control. Then, if any entity managed by that resource control encounters a threshold value, a system message is logged at the configured logging level.

Configuration

Each resource control listed in Table 7–1 can be assigned to a project at login or when newtask(1) or the other project-aware launchers at(1), batch (see at(1)), or cron(1M) are invoked. Each command that is initiated is launched in a separate task with the invoking user's default project.

Updates to entries in the project database, whether to the /etc/project file or to a representation of the database in a network name service, are not applied to currently active projects. The updates are applied when a new task joins the project through login(1) or newtask.

Temporarily Updating Resource Control Values on a Running System

Values changed in the project database only become effective for new tasks that are started in a project. However, you can use the rctladm and prctl commands to update resource controls on a running system.

Updating Logging Status

The rctladm command affects the global logging state of each resource control on a system-wide basis. This command can be used to view the global state and to set up the level of syslog logging when controls are exceeded.

Updating Resource Controls

You can view and temporarily alter resource control values and actions on a per-process, per-task, or per-project basis by using prctl. A project, task, or process ID is given as input, and the command operates on the resource control at the level where it is defined.

Any modifications to values and actions take effect immediately. However, these modifications apply to the current session only. The changes are not recorded in the project database. If the system is restarted, the modifications are lost. Permanent changes to resource controls must be made in the project database.

All resource control settings that can be modified in the project database can also be modified with the prctl command. Both basic and privileged values can be added or be deleted and their actions can be modified. By default, the basic type is assumed for all set operations, but processes and users with superuser privileges can also modify privileged resource controls. System resource controls cannot be altered.

Using Resource Controls

How to Set the Maximum Number of LWPs for Each Task in a Project

Type this entry in the /etc/project database to set the maximum number of LWPs in each task in project x-files to 3.


x-files:100::root::task.max-lwps=(privileged,3,deny)

When superuser creates a new task in project x-files by joining it with newtask, superuser will not be able to create more than three LWPs while running in this task. This is shown in the following annotated sample session.


# newtask -p x-files csh

# prctl -n task.max-lwps $$
688: csh
task.max-lwps
                            3 privileged deny
                   2147483647 system     deny
# id -p
uid=0(root) gid=1(other) projid=100(x-files)

# ps -o project,taskid -p $$
 PROJECT TASKID
 x-files   236

# csh        /* creates second LWP */

# csh        /* creates third LWP */

# csh        /* cannot create more LWPs */
Vfork failed

#

How to Set Multiple Controls on a Project

The /etc/project file can contain settings for multiple resource controls for each project as well as multiple threshold values for each control. Threshold values are defined in action clauses, which are comma-separated for multiple values.

The following line in the file sets a basic control with no action on the maximum LWPs per task for project x-files. The line also sets a privileged deny control on the maximum LWPs per task. This control causes any LWP creation that exceeds the maximum to fail, as shown in the previous example. Finally, the maximum file descriptors per process are limited at the basic level, which forces failure of any open call that exceeds the maximum.


x-files:101::root::task.max-lwps=(basic,10,none),(privileged,500,deny);
    process.max-file-descriptor=(basic,128,deny)

How to Use prctl

As superuser, type prctl to display the maximum file descriptor for the current shell that is running:


# prctl -n process.max-file-descriptor $$
8437:   sh
process.max-file-descriptor              [ lowerable deny ]
                          256 basic      deny
                        65536 privileged deny
                   2147483647 system     deny

Use the prctl command to temporarily add a new privileged value to deny the use of more than three LWPs per task for the x-files project. The result is identical to the result in How to Set the Maximum Number of LWPs for Each Task in a Project, as shown in the following annotated sample session:


# newtask -p x-files

# id -p
uid=0(root) gid=1(other) projid=101(x-files)

# prctl -n task.max-lwps -t privileged -v 3 -e deny -i project x-files

# prctl -n task.max-lwps -i project x-files
670:    sh
task.max-lwps
                            3 privileged deny
                   2147483647 system     deny

You can also use prctl -r to change the lowest value of a resource control.


# prctl -n process.max-file-descriptor -r -v 128 $$

How to Use rctladm

You can use rctladm to enable the global syslog attribute of a resource control. When the control is exceeded, notification is logged at the specified syslog level. Type the following:


# rctladm -e syslog process.max-file-descriptor

Capacity Warnings

A global action on a resource control enables you to receive notice of any entity that is tripping over a resource control value.

For example, assume you want to determine whether a web server possesses sufficient CPUs for its typical workload. You could analyze sar(1) data for idle CPU time and load average. You could also examine extended accounting data to determine the number of simultaneous processes that are running for the web server process.

However, an easier approach is to place the web server in a task. You can then set a global action, using syslog, to notify you whenever a task exceeds a scheduled number of LWPs appropriate for the machine's capabilities.

How to Determine Whether a Web Server Is Allocated Enough CPU Capacity

  1. Use the prctl command to place a privileged (superuser-owned) resource control on the tasks that contain an httpd process. Limit each task's total number of LWPs to 40, and disable all local actions.


    # prctl -n task.max-lwps -v 40 -t privileged -d all `pgrep httpd`
    
  2. Enable a system log global action on the task.max-lwps resource control.


    # rctladm -e syslog task.max-lwps
    
  3. Observe whether the workload trips the resource control.

    If it does, you will see /var/adm/messages such as:


    Jan  8 10:15:15 testmachine unix: [ID 859581 kern.notice] 
    NOTICE: privileged rctl task.max-lwps exceeded by task 19

Chapter 8 Fair Share Scheduler

An analysis of workload data can indicate that a particular workload or group of workloads is monopolizing CPU resources. If these workloads are not violating resource constraints on CPU usage, you can modify the allocation policy for CPU time on the system. The fair share scheduling class described in this chapter enables you to allocate CPU time based on shares instead of the priority scheme of the timesharing (TS) scheduling class.

Overview

A fundamental job of the operating system is to arbitrate which processes get access to the system's resources. The process scheduler, which is also called the dispatcher, is the portion of the kernel that controls allocation of the CPU to processes. The scheduler supports the concept of scheduling classes. Each class defines a scheduling policy that is used to schedule processes within the class. The default scheduler in the Solaris operating environment, the TS scheduler, tries to give every process relatively equal access to the available CPUs. However, you might want to specify that certain processes be given more resources than others.

You can use the fair share scheduler (FSS) to control the allocation of available CPU resources among workloads, based on their importance. This importance is expressed by the number of shares of CPU resources that you assign to each workload.

You give each project CPU shares to control the project's entitlement to CPU resources. The FSS guarantees a fair dispersion of CPU resources among projects that is based on allocated shares, independent of the number of processes that are attached to a project. The FSS achieves fairness by reducing a project's entitlement for heavy CPU usage and increasing its entitlement for light usage, in accordance with other projects.

The FSS consists of a kernel scheduling class module and class-specific versions of the dispadmin(1M) and priocntl(1) commands. Project shares used by the FSS are specified through the project.cpu-shares property in the project(4) database.

CPU Share Definition

The term “share” is used to define a portion of the system's CPU resources that is allocated to a project. If you assign a greater number of CPU shares to a project, relative to other projects, the project receives more CPU resources from the fair share scheduler.

CPU shares are not equivalent to percentages of CPU resources. Shares are used to define the relative importance of workloads in relation to other workloads. When you assign CPU shares to a project, your primary concern is not the number of shares the project has. Knowing how many shares the project has in comparison with other projects is more important. You must also take into account how many of those other projects will be competing with it for CPU resources.


Note –

Processes in projects with zero shares always run at the lowest system priority (0). These processes only run when projects with nonzero shares are not using CPU resources.


CPU Shares and Process State

In the Solaris operating environment, a project workload usually consists of more than one process. From the fair share scheduler perspective, each project workload can be in either an idle state or an active state. A project is considered idle if none of its processes are using any CPU resources. This usually means that such processes are either sleeping (waiting for I/O completion) or stopped. A project is considered active if at least one of its processes is using CPU resources. The sum of shares of all active projects is used in calculating the portion of CPU resources to be assigned to projects.

The following formula shows how the FSS scheduler calculates per-project allocation of CPU resources.

Figure 8–1 FSS Scheduler Share Calculation

Equation formula. The context describes the graphic.

When more projects become active, each project's CPU allocation is reduced, but the proportion between the allocations of different projects does not change.

CPU Share Versus Utilization

Share allocation is not the same as utilization. A project that is allocated 50 percent of the CPU resources might average only a 20 percent CPU use. Moreover, shares serve to limit CPU usage only when there is competition from other projects. Regardless of how low a project's allocation is, it always receives 100 percent of the processing power if it is running alone on the system. Available CPU cycles are never wasted. They are distributed between projects.

The allocation of a small share to a busy workload might slow its performance. However, the workload is not prevented from completing its work if the system is not overloaded.

CPU Share Examples

Assume you have a system with two CPUs running two parallel CPU-bound workloads called A and B, respectively. Each workload is running as a separate project. The projects have been configured so that project A is assigned SA shares, and project B is assigned SB shares.

On average, under the traditional TS scheduler, each of the workloads that is running on the system would be given the same amount of CPU resources. Each workload would get 50 percent of the system's capacity.

When run under the control of the FSS scheduler with SA=SB, these projects are also given approximately the same amounts of CPU resources. However, if the projects are given different numbers of shares, their CPU resource allocations are different.

The next three examples illustrate how shares work in different configurations. These examples show that shares are only mathematically accurate for representing the usage if demand meets or exceeds available resources.

Example 1: Two CPU-Bound Processes in Each Project

If A and B each have two CPU-bound processes, and SA = 1 and SB = 3, then the total number of shares is 1 + 3 = 4. In this configuration, given sufficient CPU demand, projects A and B are allocated 25 percent and 75 percent of CPU resources, respectively.

Illustration. The context describes the graphic.

Example 2: No Competition Between Projects

If A and B have only one CPU-bound process each, and SA = 1 and SB = 100, then the total number of shares is 101. Each project cannot use more than one CPU because each project has only one running process. Because no competition exists between projects for CPU resources in this configuration, projects A and B are each allocated 50 percent of all CPU resources. In this configuration, CPU share values are irrelevant. The projects' allocations would be the same (50/50), even if both projects were assigned zero shares.

Illustration. The context describes the graphic.

Example 3: One Project Unable to Run

If A and B have two CPU-bound processes each, and project A is given 1 share and project B is given 0 shares, then project B is not allocated any CPU resources and project A is allocated all CPU resources. Processes in B always run at system priority 0, so they will never be able to run because processes in project A always have higher priorities.

Illustration. The context describes the graphic.

FSS Configuration

Projects and Users

Projects are the workload containers in the FSS scheduler. Groups of users who are assigned to a project are treated as single controllable blocks. Note that you can create a project with its own number of shares for an individual user.

Users can be members of multiple projects that have different numbers of shares assigned. By moving processes from one project to another project, processes can be assigned CPU resources in varying amounts.

For more information on the project(4) database and name services, see project Database.

CPU Shares Configuration

The configuration of CPU shares is managed by the name service as a property of the project database.

When the first task (or process) associated with a project is created through the setproject(3PROJECT) library function, the number of CPU shares defined as resource control project.cpu-shares in the project database is passed to the kernel. A project that does not have the project.cpu-shares resource control defined is assigned one share.

In the following example, this entry in the /etc/project file sets the number of shares for project x-files to 5:


x-files:100::::project.cpu-shares=(privileged,5,none)

If you alter the number of CPU shares allocated to a project in the database when processes are already running, the number of shares for that project will not be modified at that point. The project must be restarted for the change to become effective.

If you want to temporarily change the number of shares assigned to a project without altering the project's attributes in the project database, use prctl(1). For example, to change the value of project x-files's project.cpu-shares resource control to 3 while processes associated with that project are running, type the following:


# prctl -r -n project.cpu-shares -v 3 -i project x-files

-r

Replaces the current value for the named resource control.

-n name

Specifies the name of the resource control.

-v val

Specifies the value for the resource control.

-i idtype

Specifies the ID type of the next argument.

x-files

Specifies the object of the change. In this instance, project x-files is the object.

Project system with project ID 0 includes all system daemons that are started by the boot-time initialization scripts. system can be viewed as a project with an unlimited number of shares. This means that system is always scheduled first, regardless of how many shares have been given to other projects. If you do not want the system project to have unlimited shares, you can specify a number of shares for this project in the project database.

As stated previously, processes that belong to projects with zero shares are always given zero system priority. Projects with one or more shares are running with priorities one and higher. Thus, projects with zero shares are only scheduled when CPU resources are available that are not requested by a nonzero share project.

The maximum number of shares that can be assigned to one project is 65535.

FSS and Processor Sets

The FSS can be used in conjunction with processor sets to provide more fine-grained controls over allocations of CPU resources among projects that run on each processor set than would be available with processor sets alone. The FSS scheduler treats processor sets as entirely independent partitions, with each processor set controlled independently with respect to CPU allocations.

The CPU allocations of projects running in one processor set are not affected by the CPU shares or activity of projects running in another processor set because the projects are not competing for the same resources. Projects only compete with each other if they are running within the same processor set.

The number of shares that is allocated to a project is system wide. Regardless of which processor set it is running on, each portion of a project is given the same amount of shares.

When processor sets are used, project CPU allocations are calculated for active projects that run within each processor set, as shown in the following figure.

Figure 8–2 FSS Scheduler Share Calculation With Processor Sets

Equation shows formula for how the FSS scheduler calculates per-project allocation of CPU resources for projects running within processor sets.

Project partitions that run on different processor sets might have different CPU allocations. The CPU allocation for each project partition in a processor set depends only on the allocations of other projects that run on the same processor set.

The performance and availability of applications that run within the boundaries of their processor sets are not affected by the introduction of new processor sets. The applications are also not affected by changes that are made to the share allocations of projects that run on other processor sets.

Empty processor sets (sets without processors in them) or processor sets without processes bound to them do not have any impact on the FSS scheduler behavior.

FSS and Processor Sets Examples

Assume that a server with eight CPUs is running several CPU-bound applications in projects A, B, and C. Project A is allocated one share, project B is allocated two shares, and project C is allocated three shares.

Project A is running only on processor set 1. Project B is running on processor sets 1 and 2. Project C is running on processor sets 1, 2, and 3. Assume that each project has enough processes to utilize all available CPU power. Thus, there is always competition for CPU resources on each processor set.

Diagram shows total system-wide project CPU allocations on a server with eight CPUs that is running several CPU-bound applications in three projects.

The total system-wide project CPU allocations on such a system are shown in the following table.

Project 

Allocation 

Project A 

4% = (1/6 X 2/8)pset1

Project B 

28% = (2/6 X 2/8)pset1+ (2/5 * 4/8)pset2

Project C 

67% = (3/6 X 2/8)pset1+ (3/5 X 4/8)pset2+ (3/3 X 2/8)pset3

These percentages do not match the corresponding amounts of CPU shares that are given to projects. However, within each processor set, the per-project CPU allocation ratios are proportional to their respective shares.

On the same system without processor sets, the distribution of CPU resources would be different, as shown in the following table.

Project 

Allocation 

Project A 

16.66% = (1/6) 

Project B 

33.33% = (2/6) 

Project C 

50% = (3/6) 

Combining FSS With Other Scheduling Classes

By default, the FSS scheduling class uses the same range of priorities (0 to 59) as the timesharing (TS), interactive (IA), and fixed priority (FX) scheduling classes. Therefore, you should avoid having processes from these scheduling classes share the same processor set. A mix of processes in the FSS, TS, IA, and FX classes could result in unexpected scheduling behavior.

With the use of processor sets, you can mix TS, IA, and FX with FSS in one system. However, all the processes that run on each processor set must be in one scheduling class, so they do not compete for the same CPUs. The FX scheduler in particular should not be used in conjunction with the FSS scheduling class unless processor sets are used. This action prevents applications in the FX class from using priorities high enough to starve applications in the FSS class.

You can mix processes in the TS and IA classes in the same processor set, or on the same system without processor sets.

The Solaris operating environment also offers a real-time (RT) scheduler to users with superuser privileges. By default, the RT scheduling class uses system priorities in a different range (usually from 100 to 159) than FSS. Because RT and FSS are using disjoint ranges of priorities, FSS can coexist with the RT scheduling class within the same processor set. However, the FSS scheduling class does not have any control over processes that run in the RT class.

For example, on a four-processor system, a single-threaded RT process can consume one entire processor if the process is CPU bound. If the system also runs FSS, regular user processes compete for the three remaining CPUs that are not being used by the RT process. Note that the RT process might not use the CPU continuously. When the RT process is idle, FSS utilizes all four processors.

You can type the following command to find out which scheduling classes the processor sets are running in and ensure that each processor set is configured to run either TS, IA, FX, or FSS processes.


$ ps -ef -o pset,class | grep -v CLS | sort | uniq
1 FSS
1 SYS
2 TS
2 RT
3 FX

To set the default scheduler for the system, see FSS Configuration Examples and dispadmin(1M). To move running processes into a different scheduling class, see FSS Configuration Examples and priocntl(1).

Monitoring the FSS

You can use prstat(1M) to monitor CPU usage by active projects.

You can use the extended accounting data for tasks to obtain per-project statistics on the amount of CPU resources that are consumed over longer periods. See Chapter 6, Extended Accounting for more information.

How to Monitor System CPU Usage by Projects

To monitor the CPU usage of projects that run on the system, type the following:


% prstat -J

How to Monitor CPU Usage by Projects in Processor Sets

To monitor the CPU usage of projects on a list of processor sets, type the following:


% prstat -J -C pset-list
pset-list

List of processor set IDs that are separated by commas

FSS Configuration Examples

As with other scheduling classes in the Solaris environment, commands to set the scheduler class, configure the scheduler's tunable parameters, and configure the properties of individual processes can be used with FSS.

How to Set the Scheduler Class

Use the dispadmin command to set FSS as the default scheduler for the system.


# dispadmin -d FSS

This change takes effect on the next reboot. After reboot, every process on the system runs in the FSS scheduling class.

How to Manually Move Processes From the TS Into the FSS Class

You can manually move processes from the TS scheduling class into the FSS scheduling class without changing the default scheduling class and rebooting.

  1. Become superuser.

  2. Move the init process (pid 1) into the FSS scheduling class.


    # priocntl -s -c FSS -i pid 1
    
  3. Move all processes from the TS scheduling class into the FSS scheduling class.


    # priocntl -s -c FSS -i class TS
    

All processes again run in the TS scheduling class after reboot.

How to Manually Move Processes From all User Classes Into the FSS Class

You might be using a default class other than TS. For example, your system might be running a window environment that uses the IA class by default. You can manually move all processes into the FSS scheduling class without changing the default scheduling class and rebooting.

  1. Become superuser.

  2. Move the init process (pid 1) into the FSS scheduling class.


    # priocntl -s -c FSS -i pid 1
    
  3. Move all processes from their current scheduling classes into the FSS scheduling class.


    # priocntl -s -c FSS -i all
    

All processes again run in the default scheduling class after reboot.

How to Move a Project's Processes Into the FSS Class

You can manually move processes in a particular project from their current scheduling class to the FSS scheduling class.

  1. Become superuser.

  2. Move processes that run in project ID 10 to the FSS scheduling class.


# priocntl -s -c FSS -i projid 10

The project's processes again run in the default scheduling class after reboot.

How to Tune Scheduler Parameters

You can use the dispadmin command to examine and tune the FSS scheduler's time quantum value. Time quantum is the amount of time that a thread is allowed to run before it must relinquish the processor. To display the current time quantum for the FSS scheduler, type the following:


$ dispadmin -c FSS -g
#
# Fair Share Scheduler Configuration
#
RES=1000
#
# Time Quantum
#
QUANTUM=110

When you use the -g option, you can also use the -r option to specify the resolution that is used for printing time quantum values. If no resolution is specified, time quantum values are displayed in milliseconds by default. Type the following:


$ dispadmin -c FSS -g -r 100
#
# Fair Share Scheduler Configuration
#
RES=100
#
# Time Quantum
#
QUANTUM=11

To set scheduling parameters for the FSS scheduling class, use dispadmin -s. The values in file must be in the format output by the -g option. These values overwrite the current values in the kernel. Type the following:


$ dispadmin -c FSS -s file

References

For more information on how to use the FSS scheduler, see priocntl(1), ps(1), dispadmin(1M), and FSS(7).

Chapter 9 Physical Memory Control Using the Resource Capping Daemon

The resource capping daemon rcapd regulates physical memory consumption by processes running in projects that have resource caps defined.

Resource Capping Daemon Overview

A resource cap is an upper bound placed on the consumption of a resource, such as physical memory. Per-project physical memory caps are supported.

The resource capping daemon and its associated utilities provide mechanisms for physical memory resource cap enforcement and administration.

Like the resource control, the resource cap can be defined by using attributes of project entries in the project database. However, while resource controls are synchronously enforced by the kernel, resource caps are asynchronously enforced at the user level by the resource capping daemon. With asynchronous enforcement, a small delay occurs as a result of the sampling interval used by the daemon.

For information about rcapd, see the rcapd(1M) man page. For information about projects and the project database, see Chapter 5, Projects and Tasks and the project(4) man page. For information about resource controls, see Chapter 7, Resource Controls.

How Resource Capping Works

The daemon repeatedly samples the resource utilization of projects that have physical memory caps. The sampling interval used by the daemon is specified by the administrator. See Determining Sample Intervals for additional information. When the system's physical memory utilization exceeds the threshold for cap enforcement, and other conditions are met, the daemon takes action to reduce the resource consumption of projects with memory caps to levels at or below the caps.

The virtual memory system divides physical memory into segments known as pages. Pages are the fundamental unit of physical memory in the Solaris memory management subsystem. To read data from a file into memory, the virtual memory system reads in one page at a time, or pages in a file. To reduce resource consumption, the daemon can page out, or relocate, infrequently used pages to a swap device, which is an area outside of physical memory.

The daemon manages physical memory by regulating the size of a project workload's resident set relative to the size of its working set. The resident set is the set of pages that are resident in physical memory. The working set is the set of pages that the workload actively uses during its processing cycle. The working set changes over time, depending on the process's mode of operation and the type of data being processed. Ideally, every workload has access to enough physical memory to enable its working set to remain resident. However, the working set can also include the use of secondary disk storage to hold the memory that does not fit in physical memory.

Only one instance of rcapd can run at any given time.

Attribute to Limit Physical Memory Usage

To define a physical memory resource cap for a project, establish a resident set size (RSS) cap by adding this attribute to the project database entry:

rcap.max-rss

The total amount of physical memory, in bytes, that is available to processes in the project.

For example, the following line in the /etc/project database sets an RSS cap of 10 gigabytes for a project named db.


db:100::db,root::rcap.max-rss=10737418240

Note –

The system might round the specified cap value to a page size.


rcapd Configuration

You use the rcapadm command to configure the resource capping daemon. You can perform the following actions:

To configure the daemon, you must have superuser privileges or have the Process Management profile in your list of profiles. The Process Management role and the System Administrator role both include the Process Management profile. See “RBAC Elements: Reference Information” in System Administration Guide: Security Services.

Configuration changes can be incorporated into rcapd according to the configuration interval (see rcapd Operation Intervals) or on demand by sending a SIGHUP (see the kill(1) man page).

If used without arguments, rcapadm displays the current status of the resource capping daemon if it has been configured.

The following subsections discuss cap enforcement, cap values, and rcapd operation intervals.

Memory Cap Enforcement Threshold

The memory cap enforcement threshold is the percentage of physical memory utilization on the system that triggers cap enforcement. When the system exceeds this utilization, caps are enforced. The physical memory used by applications and the kernel is included in this percentage. The percentage of utilization determines the way in which memory caps are enforced.

To enforce caps, memory can be paged out from project workloads.

A workload is permitted to use physical memory up to its cap. A workload can use additional memory as long as the system's memory utilization stays below the memory cap enforcement threshold.

To set the value for cap enforcement, see How to Set the Memory Cap Enforcement Threshold.

Determining Cap Values

If a project cap is set too low, there might not be enough memory for the workload to proceed effectively under normal conditions. The paging that occurs because the workload requires more memory has a negative effect on system performance.

Projects that have caps set too high can consume available physical memory before their caps are exceeded. In this case, physical memory is effectively managed by the kernel and not by rcapd.

In determining caps on projects, consider these factors.

Impact on I/O system

The daemon can attempt to reduce a project workload's physical memory usage whenever the sampled usage exceeds the project's cap. During cap enforcement, the swap devices and other devices that contain files that the workload has mapped are used. The performance of the swap devices is a critical factor in determining the performance of a workload that routinely exceeds its cap. The execution of the workload is similar to running it on a machine with the same amount of physical memory as the workload's cap.

Impact on CPU usage

The daemon's CPU usage varies with the number of processes in the project workloads it is capping and the sizes of the workloads' address spaces.

A small portion of the daemon's CPU time is spent sampling the usage of each workload. Adding processes to workloads increases the time spent sampling usage.

Another portion of the daemon's CPU time is spent enforcing caps when they are exceeded. The time spent is proportional to the amount of virtual memory involved. CPU time spent increases or decreases in response to corresponding changes in the total size of a workload's address space. This information is reported in the vm column of rcapstat output. See Monitoring Resource Utilization With rcapstat and the rcapstat(1) man page for more information.

Reporting on shared memory

The daemon cannot determine which pages of memory are shared with other processes or which are mapped multiple times within the same process. Since rcapd assumes that each page is unique, this results in a discrepancy between the reported (estimated) RSS and the actual RSS.

Certain workloads, such as databases, use shared memory extensively. For these workloads, you can sample a project's regular usage to determine a suitable initial cap value. Use output from the prstat command with the -J option. See the prstat(1M) man page.

rcapd Operation Intervals

You can tune the intervals for the periodic operations performed by rcapd.

All intervals are specified in seconds. The rcapd operations and their default interval values are described in the following table.

Operation 

Default Interval Value in Seconds 

Description 

scan

15 

Number of seconds between scans for processes that have joined or left a project workload. Minimum value is 1 second. 

sample

Number of seconds between samplings of resident set size and subsequent cap enforcements. Minimum value is 1 second. 

report

5  

Number of seconds between updates to paging statistics. If set to 0, statistics are not updated, and output from rcapstat is not current.

config

60 

Number of seconds between reconfigurations. In a reconfiguration event, rcapadm reads the configuration file for updates, and scans the project database for new or revised project caps. Sending a SIGHUP to rcapd causes an immediate reconfiguration.

To tune intervals, see How to Set Operation Intervals.

Determining rcapd Scan Intervals

The scan interval controls how often rcapd looks for new processes. On systems with many processes running, the scan through the list takes more time, so it might be preferable to lengthen the interval in order to reduce the overall CPU time spent. However, the scan interval also represents the minimum amount of time that a process must exist to be attributed to a capped workload. If there are workloads that run many short-lived processes, rcapd might not attribute the processes to a workload if the scan interval is lengthened.

Determining Sample Intervals

The sample interval configured with rcapadm is the shortest amount of time rcapd waits between sampling a workload's usage and enforcing the cap if it is exceeded. If you reduce this interval, rcapd will, under most conditions, enforce caps more frequently, possibly resulting in increased I/O due to paging. However, a shorter sample interval can also lessen the impact that a sudden increase in a particular workload's physical memory usage might have on other workloads. The window between samplings, in which the workload can consume memory unhindered and possibly take memory from other capped workloads, is narrowed.

If the sample interval specified to rcapstat is shorter than the interval specified to rcapd with rcapadm, the output for some intervals can be zero. This situation occurs because rcapd does not update statistics more frequently than the interval specified with rcapadm. The interval specified with rcapadm is independent of the sampling interval used by rcapstat.

Monitoring Resource Utilization With rcapstat

Use rcapstat to monitor the resource utilization of capped projects. To view an example rcapstat report, see Producing Reports With rcapstat.

You can set the sampling interval for the report and specify the number of times that statistics are repeated.

interval

Specifies the sampling interval in seconds. The default interval is 5 seconds.

count

Specifies the number of times that the statistics are repeated. By default, rcapstat reports statistics until a termination signal is received or until the rcapd process exits.

The paging statistics in the first report issued by rcapstat show the activity since the daemon was started. Subsequent reports reflect the activity since the last report was issued.

The following table defines the column headings in an rcapstat report.

rcapstat Column Headings

Description 

id

The project ID of the capped project. 

project

The project name. 

nproc

The number of processes in the project. 

vm

The total amount of virtual memory size used by processes in the project, in kilobytes (K), megabytes (M), or gigabytes (G). 

rss

The estimated amount of the total resident set size (RSS) of the processes in the project, in kilobytes (K), megabytes (M), or gigabytes (G), not accounting for pages that are shared. 

cap

The RSS cap defined for the project. See Attribute to Limit Physical Memory Usage or the rcapd(1M) man page for information about how to specify memory caps.

at

The total amount of memory that rcapd attempted to page out since the last rcapstat sample.

avgat

The average amount of memory that rcapd attempted to page out during each sample cycle that occurred since the last rcapstat sample. The rate at which rcapd samples collection RSS can be set with rcapadm. See rcapd Operation Intervals.

pg

The total amount of memory that rcapd successfully paged out since the last rcapstat sample.

avgpg

An estimate of the average amount of memory that rcapd successfully paged out during each sample cycle that occurred since the last rcapstat sample. The rate at which rcapd samples process RSS sizes can be set with rcapadm. See rcapd Operation Intervals.

Administering the Resource Capping Daemon With rcapadm

This section contains procedures for configuring the resource capping daemon with rcapadm. See rcapd Configuration and the rcapadm(1M) man page for more information.

If used without arguments, rcapadm displays the current status of the resource capping daemon if it has been configured.

How to Set the Memory Cap Enforcement Threshold

Caps can be configured so that they will not be enforced until the physical memory available to processes is low. See Memory Cap Enforcement Threshold for more information.

The minimum (and default) value is 0, which means that memory caps are always enforced. To set a different minimum, follow this procedure.

  1. Become superuser.

  2. Use the -c option of rcapadm to set a different physical memory utilization value for memory cap enforcement.


    # rcapadm -c percent
    

    percent is in the range 0 to 100. Higher values are less restrictive. A higher value means capped project workloads can execute without having caps enforced until the system's memory utilization exceeds this threshold.

To display the current physical memory utilization and the cap enforcement threshold, see Reporting Memory Utilization and the Memory Cap Enforcement Threshold.

How to Set Operation Intervals

rcapd Operation Intervals contains information about the intervals for the periodic operations performed by rcapd. To set operation intervals using rcapadm, follow this procedure.

  1. Become superuser.

  2. Use the -i option to set interval values.


    # rcapadm -i interval=value,...,interval=value 
    

All interval values are specified in seconds.

How to Enable Resource Capping

There are two ways to enable resource capping on your system.

  1. Become superuser.

  2. Enable the resource capping daemon in one of the following ways:

    • To enable the resource capping daemon so that it will be started now and also be started each time the system is booted, type:


      # rcapadm -E
      
    • To enable the resource capping daemon at boot without starting it now, also specify the -n option:


      # rcapadm -n -E
      

How to Disable Resource Capping

There are two ways to disable resource capping on your system.

  1. Become superuser.

  2. Disable the resource capping daemon in one of the following ways:

    • To disable the resource capping daemon so that it will be stopped now and not be started when the system is booted, type:


      # rcapadm -D
      
    • To disable the resource capping daemon without stopping it, also specify the -n option:


      # rcapadm -n -D
      

Note –

Use rcapadm -D to safely disable rcapd. If the daemon is killed (see the kill(1) man page), processes might be left in a stopped state and need to be manually restarted. To resume a process running, use the prun command. See the prun(1) man page for more information.


Producing Reports With rcapstat

Use rcapstat to report resource capping statistics. Monitoring Resource Utilization With rcapstat explains how to use the rcapstat command to generate reports. That section also describes the column headings in the report. The rcapstat(1) man page also contains this information.

The following subsections use examples to illustrate how to produce reports for specific purposes.

Reporting Cap and Project Information

In this example, caps are defined for two projects associated with two users. user1 has a cap of 50 megabytes, and user2 has a cap of 10 megabytes.

The following command produces five reports at 5-second sampling intervals.


user1machine% rcapstat 5 5
    id project  nproc     vm    rss   cap    at avgat    pg avgpg
112270   user1     24   123M    35M   50M   50M    0K 3312K    0K
 78194   user2      1  2368K  1856K   10M    0K    0K    0K    0K
    id project  nproc     vm    rss   cap    at avgat    pg avgpg
112270   user1     24   123M    35M   50M    0K    0K    0K    0K
 78194   user2      1  2368K  1856K   10M    0K    0K    0K    0K
    id project  nproc     vm    rss   cap    at avgat    pg avgpg
112270   user1     24   123M    35M   50M    0K    0K    0K    0K
 78194   user2      1  2368K  1928K   10M    0K    0K    0K    0K
    id project  nproc     vm    rss   cap    at avgat    pg avgpg
112270   user1     24   123M    35M   50M    0K    0K    0K    0K
 78194   user2      1  2368K  1928K   10M    0K    0K    0K    0K
    id project  nproc     vm    rss   cap    at avgat    pg avgpg
112270   user1     24   123M    35M   50M    0K    0K    0K    0K
 78194   user2      1  2368K  1928K   10M    0K    0K    0K    0K 

The first three lines of output constitute the first report, which contains the cap and project information for the two projects and paging statistics since rcapd was started. The at and pg columns are a number greater than zero for user1 and zero for user2, which indicates that at some time in the daemon's history, user1 exceeded its cap but user2 did not.

The subsequent reports show no significant activity.

Monitoring the RSS of a Project

The following example shows project user1, which has an RSS in excess of its RSS cap.

The following command produces five reports at 5-second sampling intervals.


user1machine% rcapstat 5 5

    id project  nproc    vm   rss   cap    at avgat     pg  avgpg
376565   user1      3 6249M 6144M 6144M  690M  220M  5528K  2764K
376565   user1      3 6249M 6144M 6144M    0M  131M  4912K  1637K
376565   user1      3 6249M 6171M 6144M   27M  147M  6048K  2016K
376565   user1      3 6249M 6146M 6144M 4872M  174M  4368K  1456K
376565   user1      3 6249M 6156M 6144M   12M  161M  3376K  1125K

The user1 project has three processes that are actively using physical memory. The positive values in the pg column indicate that rcapd is consistently paging out memory as it attempts to meet the cap by lowering the physical memory utilization of the project's processes. However, rcapd does not succeed in keeping the RSS below the cap value. This is indicated by the varying rss values that do not show a corresponding decrease. As soon as memory is paged out, the workload uses it again and the RSS count goes back up. This means that all of the project's resident memory is being actively used and the working set size (WSS) is greater than the cap. Thus, rcapd is forced to page out some of the working set to meet the cap. Under this condition, the system will continue to experience high page fault rates, and associated I/O, until one of the following occurs:

In this situation, shortening the sample interval might reduce the discrepancy between the RSS value and the cap value by causing rcapd to sample the workload and enforce caps more frequently.


Note –

A page fault occurs when either a new page must be created or the system must copy in a page from a swap device.


Determining the Working Set Size of a Project

The following example is a continuation of the previous example, and it uses the same project.

The previous example shows that the user1 project is using more physical memory than its cap allows. This example shows how much memory the project workload requires.


user1machine% rcapstat 5 5
    id project  nproc    vm   rss   cap    at avgat     pg  avgpg
376565   user1      3 6249M 6144M 6144M  690M    0K   689M     0K
376565   user1      3 6249M 6144M 6144M    0K    0K     0K     0K
376565   user1      3 6249M 6171M 6144M   27M    0K    27M     0K
376565   user1      3 6249M 6146M 6144M 4872K    0K  4816K     0K
376565   user1      3 6249M 6156M 6144M   12M    0K    12M     0K
376565   user1      3 6249M 6150M 6144M 5848K    0K  5816K     0K
376565   user1      3 6249M 6155M 6144M   11M    0K    11M     0K
376565   user1      3 6249M 6150M   10G   32K    0K    32K     0K
376565   user1      3 6249M 6214M   10G    0K    0K     0K     0K
376565   user1      3 6249M 6247M   10G    0K    0K     0K     0K
376565   user1      3 6249M 6247M   10G    0K    0K     0K     0K
376565   user1      3 6249M 6247M   10G    0K    0K     0K     0K
376565   user1      3 6249M 6247M   10G    0K    0K     0K     0K
376565   user1      3 6249M 6247M   10G    0K    0K     0K     0K
376565   user1      3 6249M 6247M   10G    0K    0K     0K     0K

Halfway through the cycle, the cap on the user1 project was increased from 6 gigabytes to 10 gigabytes. This increase stops cap enforcement and allows the resident set size to grow, limited only by other processes and the amount of memory in the machine. The rss column might stabilize to reflect the project working set size (WSS), 6247M in this example. This is the minimum cap value that allows the project's processes to operate without continually incurring page faults.

The following two figures graphically show the effect rcapd has on user1 while the cap is 6 gigabytes and 10 gigabytes. Every 5 seconds, corresponding to the sample interval, the RSS decreases and I/O increases as rcapd pages out some of the workload's memory. Shortly after the page out completes, the workload, needing those pages, pages them back in as it continues running. This cycle repeats until the cap is raised to 10 gigabytes approximately halfway through the example, and the RSS stabilizes at 6.1 gigabytes. Since the workload's RSS is now below the cap, no more paging occurs. The I/O associated with paging stops as well, as the vmstat (see vmstat(1M)) or iostat (see iostat(1M)) commands would show. Thus, you can infer that the project required 6.1 gigabytes to perform the work it was doing at the time it was being observed.

Figure 9–1 Stabilizing RSS Values After Raising the Cap of user1 Higher Than user1's WSS

Graph shows stabilizing values after cap of user1 is raised higher than user1's WSS.

Figure 9–2 Relationship Between Page Ins and Page Outs, and the Stabilization of I/O After user1's Cap Is Raised

Graph shows relationship between page ins and page outs, and I/O stabilization after user1's cap is raised.

Reporting Memory Utilization and the Memory Cap Enforcement Threshold

You can use the -g option of rcapstat to report the following:

The -g option causes a memory utilization and cap enforcement line to be printed at the end of the report for each interval.


# rcapstat -g
    id project   nproc    vm   rss   cap    at avgat   pg  avgpg
376565    rcap       0    0K    0K   10G    0K    0K   0K     0K
physical memory utilization: 55%   cap enforcement threshold: 0%
    id project   nproc    vm   rss   cap    at avgat   pg  avgpg
376565    rcap       0    0K    0K   10G    0K    0K   0K     0K
physical memory utilization: 55%   cap enforcement threshold: 0%

Chapter 10 Resource Pools

This chapter discusses resource pools, which are used for partitioning machine resources. Resource pools enable you to separate workloads so that workload consumption of certain resources does not overlap. This resource reservation helps to achieve predictable performance on systems with mixed workloads.

Overview

Resource pools provide a persistent configuration mechanism for processor set configuration and, optionally, scheduling class assignment.

Figure 10–1 Resource Pool Framework

Illustration shows that a pool is made up of a processor set and an optional scheduling class.

By grouping multiple partitions, pools provide a handle to associate with labeled workloads. Each project entry in the /etc/project database can have a pool associated with it. New work that is started on a project is bound to the appropriate pool.

The pools mechanism is primarily for use on large machines of more than four CPUs. However, small machines can still benefit from this functionality. On small machines, you can create pools that share noncritical resource partitions. The pools are separated only on the basis of critical resources.

When to Use Pools

Resource pools offer a versatile mechanism that can be applied to many administrative scenarios, as described in the following sections.

Batch Compute Server

Use pools functionality to split a server into two pools.

One pool is used for login sessions and interactive work by timesharing users. The other pool is used for jobs that are submitted through the batch system.

Application or Database Server

Partition the resources for interactive applications in accordance with the applications' requirements.

Turning on Applications in Phases

Set user expectations.

You might initially deploy a machine that is running only a fraction of the services that the machine is ultimately expected to deliver. User difficulties can occur if reservation-based resource management mechanisms are not established when the machine comes online.

For example, the fair share scheduler optimizes CPU utilization. The response times for a machine that is running only one application can be misleadingly fast when compared to the response times users see with multiple applications loaded. By using separate pools for each application, you can ensure that a ceiling on the number of CPUs available to each application is in place before all applications are deployed.

Complex Timesharing Server

Partition a server that supports large user populations.

Server partitioning provides an isolation mechanism that leads to a more predictable per-user response.

By dividing users into groups that bind to separate pools, and using the fair share scheduling (FSS) facility, you can tune CPU allocations to favor sets of users that have priority. This assignment can be based on user role, accounting chargeback, and so forth.

Workloads That Change Seasonally

Use resource pools to adjust to changing demand.

Your site might experience predictable shifts in workload demand over long periods of time, such as monthly, quarterly, or annual cycles. If your site experiences these shifts, you can alternate between multiple pools configurations by invoking pooladm from a cron(1M) job.

Real-Time Applications

Create a real-time pool by using the RT scheduler and designated processor resources.

Administering Pools

The commands that are shown in the following table provide the primary administrative interface to the pools facility.

Command 

Description 

pooladm(1M)

Activates a particular configuration or deactivates the current configuration. If run without options, pooladm prints out the current running pools configuration.

poolbind(1M)

Enables the manual binding of projects, tasks, and processes to a pool. 

poolcfg(1M)

Creates and modifies pools configuration files. If run with the info subcommand argument to the -c option, poolcfg displays the current configuration.

A library API is provided by libpool(3LIB). The library can be used by programs to manipulate pool configurations.

Pools Framework

The resource pools framework stores its view of the machine in a private configuration file. (The location of the file is private to the implementation of the pools framework.) This configuration file represents the pools framework's view of the machine. The file also contains information about configured pools and the organization of partitionable resources. Each pool can contain the following:

Implementing Pools on a System

Pools can be implemented on a system by using one of these methods.

  1. When the Solaris software boots, an init script checks to see if the /etc/pooladm.conf file exists. If this file is found, then pooladm is invoked to make this configuration the active pools configuration. The system creates a private configuration file to reflect the organization that is requested in /etc/pooladm.conf, and the machine's resources are partitioned accordingly.

  2. When the Solaris environment is running, a pools configuration can either be activated if it is not already present, or modified by using the pooladm command. By default, pooladm operates on /etc/pooladm.conf. However, you can optionally specify an alternate location and file name, and use this file to update the pools configuration.

Dynamic Reconfiguration Operations and Resource Pools

Dynamic reconfiguration (DR) enables you to reconfigure hardware while the system is running. Because DR affects available resource amounts, the pools facility must be included in these operations. When a DR operation is initiated, the pools framework acts to validate the configuration.

If the DR operation can proceed without causing the current pools configuration to become invalid, then the private configuration file is updated. An invalid configuration is one that cannot be supported by the available resources.

If the DR operation would cause the pools configuration to be invalid, then the operation fails and you are notified by a message to the message log. If you want to force the configuration to completion, you must use the DR force option. The pools configuration is then modified to comply with the new resource configuration.

Creating Pools Configurations

The configuration file contains a description of the pools to be created on the system. The file describes the entities and resource types that can be manipulated.

Type 

Description 

pset

A processor set resource 

pool

Named collection of resource associations 

system

The machine-level entity 

See poolcfg(1M) for more information on elements that be manipulated.

You can create a structured /etc/pooladm.conf file in two ways.

Use poolcfg or libpool to modify the /etc/pooladm.conf file. Do not directly edit this file.

How to Create a Configuration by Discovery

Use the discover subcommand argument to the -c option of /usr/sbin/poolcfg to create the pools configuration file. The resulting file, /etc/pooladm.conf, contains any existing processor sets.

  1. Become superuser.

  2. Type the following:


    # poolcfg -c discover
    

You can also supply a file name to use instead of the default /etc/pooladm.conf. If the file name is supplied, then the poolcfg commands are applied to the contents of the named file.

For example, to place a discovered configuration in /tmp/foo, do the following:

  1. Become superuser.

  2. Type the following:


    # poolcfg -c discover /tmp/foo
    

How to Create a New Configuration

Use the create subcommand argument to the -c option of /usr/sbin/poolcfg to create a simple configuration file for a system named tester. Note that you must quote subcommand arguments that contain white space.

  1. Become superuser.

  2. Type the following:


    # poolcfg -c 'create system tester'
    
  3. View the contents of the configuration file in readable form.


    # poolcfg -c info
    system tester
            int system.version 1
            boolean system.bind-default true
            string system.comment

How to Modify a Configuration

To enhance your simple configuration, create a processor set named batch and a pool named batch. Then join them with an association. Note that you must quote subcommand arguments that contain white space.

  1. Become superuser.

  2. Create processor set batch.


    # poolcfg -c 'create pset batch (uint pset.min = 2; uint pset.max = 10)'
    
  3. Create pool batch.


    # poolcfg -c 'create pool batch'
    
  4. Join with an association.


    # poolcfg -c 'associate pool batch (pset batch)'
    
  5. Display the edited configuration.


    # poolcfg -c info
    system tester
            int system.version 1
            boolean system.bind-default true
            string system.comment
    
            pool batch
                    boolean pool.default false
                    boolean pool.active true
                    int pool.importance 1
                    string pool.comment
                    pset batch
    
            pset batch
                    int pset.sys_id -2
                    string pset.units population
                    boolean pset.default true
                    uint pset.max 10
                    uint pset.min 2
                    string pset.comment
                    boolean pset.escapable false
                    uint pset.load 0
                    uint pset.size 0

How to Associate a Pool With a Scheduling Class

You can associate a pool with a scheduling class so that all processes bound to the pool use this scheduler. To do this, set the pool.scheduler property to the name of the scheduler class. This example shows how to associate the pool batch with the FSS.

  1. Become superuser.

  2. Modify pool batch to be associated with the FSS.


    # poolcfg -c 'modify pool batch (string pool.scheduler="FSS")'
    
  3. Display the edited configuration.


    # poolcfg -c info
    system tester
            int system.version 1
            boolean system.bind-default true
            string system.comment
    
            pool batch
                    boolean pool.default false
                    boolean pool.active true
                    int pool.importance 1
                    string pool.comment
                    string pool.scheduler FSS
                    pset batch
    
            pset batch
                    int pset.sys_id -2
                    string pset.units population
                    boolean pset.default true
                    uint pset.max 10
                    uint pset.min 2
                    string pset.comment
                    boolean pset.escapable false
                    uint pset.load 0
                    uint pset.size 0

How to Use Command Files With poolcfg

poolcfg -f can take input from a text file that contains poolcfg subcommand arguments to the -c option. This technique is appropriate when you want a set of operations to be performed atomically. When processing multiple commands, the configuration is only updated if all of the commands succeed. For large or complex configurations, this technique can be more useful than per-subcommand invocations.

  1. Create the input file.


    $ cat > poolcmds.txt
    create system tester
    create pset batch (uint pset.min = 2; uint pset.max = 10)
    create pool batch
    associate pool batch (pset batch)
    
  2. Become superuser.

  3. Type the following:


    # /usr/sbin/poolcfg -f poolcmds.txt
    

Activating and Deactivating Pools Configurations

Use pooladm(1M) to make a particular pool configuration active or to remove an active pools configuration.

How to Activate a Pools Configuration

To activate the configuration in the default static configuration file, /etc/pooladm.conf, invoke pooladm with the -c option, “commit configuration.”

  1. Become superuser.

  2. Type the following:


    # /usr/sbin/pooladm -c
    

How to Deactivate a Pools Configuration

To remove the running configuration and all associated resources, such as processor sets, use the -x option for “remove configuration.”

  1. Become superuser.

  2. Type the following:


    # /usr/sbin/pooladm -x
    

The -x option to pooladm removes the dynamic private configuration file as well as all resource configurations that are associated with the dynamic configuration. Thus, the -x option provides a mechanism for recovering from a poorly designed pools configuration. All processes share all of the resources on the machine.


Note –

Mixing scheduling classes within one processor set can lead to unpredictable results. If you use pooladm -x to recover from a bad configuration, you should then use priocntl(1) to move running processes into a different scheduling class.


Binding to a Pool

You can bind a running process to a pool in two ways.

How to Bind Processes to a Pool

The following procedure manually binds a process (for example, the current shell) to a pool named ohare.

  1. Become superuser.

  2. Type the following:


    # poolbind -p ohare $$
    

How to Bind Tasks or Projects to a Pool

To bind tasks or projects to a pool, use poolbind with the -i option. The following example binds all processes in the airmiles project to the laguardia pool.

  1. Become superuser.

  2. Type the following:


    # poolbind -i project -p laguardia airmiles
    

How to Use project Attributes to Bind New Processes to a Pool

To automatically bind new processes in a project to a pool, add the project.pool attribute to each entry in the project database.

For example, assume you have a configuration with two pools that are named studio and backstage. The /etc/project file has the following contents.


user.paul:1024::::project.pool=studio
user.george:1024::::project.pool=studio
user.ringo:1024::::project.pool=backstage
passes:1027::paul::project.pool=backstage

With this configuration, processes that are started by user paul are bound by default to the studio pool.

How to Use project Attributes to Bind a Process to a Different Pool

Using the previous configuration, user paul can modify the pool binding for processes he starts. He can use newtask to bind work to the backstage pool as well, by launching in the passes project.

  1. Launch a process in the passes project.


    $ newtask -l -p passes
    
  2. Verify the pool binding for the process.


    $ poolbind -q $$
    process id 6384 : pool 'backstage'

Chapter 11 Resource Management Configuration Example

This chapter reviews the resource management framework and describes a hypothetical server consolidation project. In this example, five applications are being consolidated onto a single system. The target applications have resource requirements that vary, different user populations, and different architectures.

Configuration to Be Consolidated

Currently, each application exists on a dedicated server that is designed to meet the requirements of the application. The applications and their characteristics are identified in the following table.

Application Description 

Characteristics 

Application server 

Exhibits negative scalability beyond 2 CPUs 

Database instance for application server 

Heavy transaction processing 

Application server in test and development environment 

GUI-based, with untested code execution 

Transaction processing server 

Primary concern is response time 

Standalone database instance 

Processes a large number of transactions and serves multiple time zones 

Consolidation Configuration

The following configuration is used to consolidate the applications onto a single system.

Creating the Configuration

Edit the project database file. Add entries to implement the required resource controls and to map users to resource pools, and then view the file.


# cat /etc/project
.
.
.
user.app_server:2001:Production Application Server:::project.pool=appserver_pool
user.app_db:2002:App Server DB:::project.pool=db_pool,project.cpu-shares(privileged,1,deny)
development:2003:Test and development::staff:project.pool=dev_pool,
  process.max-address-space=(privileged,536870912,deny)
user.tp_engine:2004:Transaction Engine:::project.pool=tp_pool
user.geo_db:2005:EDI DB:::project.pool=db_pool,project.cpu-shares=(privileged,3,deny)
.
.
.

Note –

The development team has to execute tasks in the development project because access for this project is based on a user's group ID (GID).


Create an input file named pool.host, which will be used to configure the required resource pools. View the file.


# cat pool.host

create system host
create pset default_pset (uint pset.min = 1)
create pset dev_pset (uint pset.max = 2)
create pset tp_pset (uint pset.min = 2)
create pset db_pset (uint pset.min = 4; uint pset.max = 6)
create pset app_pset (uint pset.min = 1; uint pset.max = 2)
create pool default_pool (string pool.scheduler="TS"; boolean pool.default = true)
create pool dev_pool (string pool.scheduler="IA")
create pool appserver_pool (string pool.scheduler="TS")
create pool db_pool (string pool.scheduler="FSS")
create pool tp_pool (string pool.scheduler="TS")
associate pool default_pool (pset default_pset)
associate pool dev_pool (pset dev_pset)
associate pool appserver_pool (pset app_pset)
associate pool db_pool (pset db_pset)
associate pool tp_pool (pset tp_pset)

Type the following:


# poolcfg -f pool.host

Make the configuration active.


# pooladm -c

The framework is now functional on the system.

Viewing the Configuration

To view the framework configuration, type:


# pooladm
system host
        int system.version 1
        boolean system.bind-default true
        string system.comment

        pool default_pool
                boolean pool.default true
                boolean pool.active true
                int pool.importance 1
                string pool.comment
                string pool.scheduler TS
                pset default_pset

        pool dev_pool
                boolean pool.default false
                boolean pool.active true
                int pool.importance 1
                string pool.comment
                string pool.scheduler IA
                pset dev_pset

        pool appserver_pool
                boolean pool.default false
                boolean pool.active true
                int pool.importance 1
                string pool.comment
                string pool.scheduler TS
                pset app_pset

        pool db_pool
                boolean pool.default false
                boolean pool.active true
                int pool.importance 1
                string pool.comment
                string pool.scheduler FSS
                pset db_pset

        pool tp_pool
                boolean pool.default false
                boolean pool.active true
                int pool.importance 1
                string pool.comment
                string pool.scheduler TS
                pset tp_pset

        pset default_pset
                int pset.sys_id -1
                string pset.units population
                boolean pset.default true
                uint pset.max 4294967295
                uint pset.min 1
                string pset.comment
                boolean pset.escapable false
                uint pset.load 0
                uint pset.size 0

        pset dev_pset
                int pset.sys_id 1
                string pset.units population
                boolean pset.default false
                uint pset.max 2
                uint pset.min 0
                string pset.comment
                boolean pset.escapable false
                uint pset.load 0
                uint pset.size 0

        pset tp_pset
                int pset.sys_id 2
                string pset.units population
                boolean pset.default false
                uint pset.max 4294967295
                uint pset.min 2
                string pset.comment
                boolean pset.escapable false
                uint pset.load 0
                uint pset.size 0

        pset db_pset
                int pset.sys_id 3
                string pset.units population
                boolean pset.default false
                uint pset.max 6
                uint pset.min 4
                string pset.comment
                boolean pset.escapable false
                uint pset.load 0
                uint pset.size 0

        pset app_pset
                int pset.sys_id 4
                string pset.units population
                boolean pset.default false
                uint pset.max 2
                uint pset.min 1
                string pset.comment
                boolean pset.escapable false
                uint pset.load 0
                uint pset.size 0

A graphic representation of the framework follows.

Figure 11–1 Server Consolidation Configuration

Illustration shows the hypothetical server configuration.


Note –

In the db_pool, the standalone database instance is guaranteed 75 percent of the CPU resource.


Chapter 12 Resource Control Functionality in the Solaris Management Console

This chapter describes the resource control and performance monitoring features in the Solaris Management Console.

You can use the console to monitor system performance and to enter resource control values for projects, tasks, and processes. The console provides a convenient, secure alternative to the command-line interface (CLI) for managing hundreds of configuration parameters that are spread across many systems. Each system is managed individually. The console's graphical interface supports all experience levels.

Using the Console (Task Map)

Task 

Description 

For Instructions 

Use the console 

Start the Solaris Management Console in a local environment or in a name service or directory service environment. Note that the performance tool is not available in a name service environment. 

“Starting the Solaris Management Console” in System Administration Guide: Basic Administration and “Using the Solaris Management Tools in a Name Service Environment (Task Map)” in System Administration Guide: Basic Administration

Monitor system performance 

Access the Performance tool under System Status. 

How to Access the Performance Tool

Add resource controls to projects 

Access the Resource Controls tab under System Configuration. 

How to Access the Resource Controls Tab

Overview

Resource management functionality is a component of the Solaris Management Console. The console is a container for GUI-based administrative tools that are stored in collections called toolboxes. For information on the console and how to use it, see “Working With the Management Console (Tasks)” in System Administration Guide: Basic Administration.

When you use the console and its tools, the main source of documentation is the online help system in the console itself. For a description of the documentation available in the online help, see “Solaris Management Console (Overview)” in System Administration Guide: Basic Administration.

Management Scope

The term management scope refers to the name service environment that you choose to use with the selected management tool. The management scope choices for the resource control and performance tools are the/etc/project local file, or NIS.

The management scope that you select during a console session should correspond to the primary name service that is identified in the /etc/nsswitch.conf file.

Performance Tool

The Performance tool is used to monitor resource utilization. Resource utilization can be summarized for the system, viewed by project, or viewed for an individual user.

Figure 12–1 Performance Tool in the Solaris Management Console

Screen capture shows Performance under Management Tools in Navigation pane and summary of system performance Attribute and Value pane.

How to Access the Performance Tool

The Performance tool is located under System Status in the Navigation pane. To access the Performance tool, do the following:

  1. Click the System Status control entity in the Navigation pane.

    The control entity is used to expand menu items in the Navigation pane.

  2. Click the Performance control entity.

  3. Click the System control entity.

  4. Double-click Summary, Projects, or Users.

    Your choice depends on the usage you want to monitor.

Monitoring by System

Values are shown for the following attributes.

Attribute 

Description 

Active Processes 

Number of processes that are active on the system 

Physical Memory Used 

Amount of system memory that is in use  

Physical Memory Free 

Amount of system memory that is available 

Swap Used 

Amount of system swap space that is in use 

Swap Free 

Amount of free system swap space 

Page Rate 

Rate of system paging activity 

System Calls 

Number of system calls per second 

Network Packets 

Number of network packets that are transmitted per second 

CPU Usage 

Percentage of CPU that is currently in use 

Load Average 

Number of processes in the system run queue which are averaged over the last 1, 5, and 15 minutes 

Monitoring by Project or User Name

Values are shown for the following attributes.

Attribute 

Short Name 

Description 

Input Blocks 

inblk

Number of blocks read 

Blocks Written 

oublk

Number of blocks written 

Chars Read/Written 

ioch

Number of characters read and written 

Data Page Fault Sleep Time 

dftime

Amount of time spent processing data page faults 

Involuntary Context Switches 

ictx

Number of involuntary context switches 

System Mode Time 

stime

Amount of time spent in the kernel mode 

Major Page Faults 

majfl

Number of major page faults 

Messages Received 

mrcv

Number of messages received 

Messages Sent 

msend

Number of messages sent 

Minor Page Faults 

minf

Number of minor page faults 

Num Processes 

nprocs

Number of processes owned by the user or the project 

Num LWPs 

count

Number of lightweight processes 

Other Sleep Time 

slptime

Sleep time other than tftime, dftime, kftime, and ltime

CPU Time 

pctcpu

Percentage of recent CPU time used by the process, the user, or the project 

Memory Used 

pctmem

Percentage of system memory used by the process, the user, or the project 

Heap Size 

brksize

Amount of memory allocated for the process data segment 

Resident Set Size 

rsssize

Current amount of memory claimed by the process 

Process Image Size 

size

Size of the process image in Kbytes 

Signals Received 

sigs

Number of signals received 

Stopped Time 

stoptime

Amount of time spent in the stopped state 

Swap Operations 

swaps

Number of swap operations in progress 

System Calls Made 

sysc

Number of system calls made over the last time interval 

System Page Fault Sleep Time 

kftime

Amount of time spent processing page faults 

System Trap Time 

ttime

Amount of time spent processing system traps 

Text Page Fault Sleep Time 

tftime

Amount of time spent processing text page faults 

User Lock Wait Sleep Time 

ltime

Amount of time spent waiting for user locks 

User Mode Time 

utime

Amount of time spent in the user mode 

User and System Mode Time 

time

The cumulative CPU execution time 

Voluntary Context Switches 

vctx

Number of voluntary context switches 

Wait CPU Time 

wtime

Amount of time spent waiting for CPU (latency) 

Resource Controls Tab

Resource controls allow you to associate a project with a set of resource constraints. These constraints determine the allowable resource usage of tasks and processes that run in the context of the project.

Figure 12–2 Resource Controls Tab in the Solaris Management Console

Screen capture shows the Resource Controls tab. Resource controls and their values appear on the tab.

How to Access the Resource Controls Tab

The Resource Controls tab is located under System Configuration in the Navigation pane. To access Resource Controls, do the following:

  1. Click the System Configuration control entity in the Navigation pane.

  2. Double-click Projects.

  3. Click on a project in the console main window to select it.

  4. Select Properties from the Action menu.

  5. Click the Resource Controls tab.

    View, add, edit, or delete resource control values for processes, projects, and tasks.

Resource Controls You Can Set

To view the list of available resource controls, see About Resource Controls in the console or Available Resource Controls.

Setting Values

You can view, add, edit, or delete resource control values for processes, projects, and tasks. These operations are performed through dialog boxes in the console.

Resource controls and values are viewed in tables in the console. The Resource Control column lists the resource controls that can be set. The Value column displays the properties that are associated with each resource control. In the table, these values are enclosed in parentheses, and they appear as plain text separated by commas. The values in parentheses comprise an “action clause.” Each action clause is composed of a threshold, a privilege level, one signal, and one local action that is associated with the particular threshold. Each resource control can have multiple action clauses, which are also separated by commas.


Note –

On a running system, values that are altered in the project database through the console only take effect for new tasks that are started in a project.


References

For information on projects and tasks, see Chapter 5, Projects and Tasks. For information on resource controls, see Chapter 7, Resource Controls. For information on the fair share scheduler (FSS), see Chapter 8, Fair Share Scheduler.