8.2 Subsystems

8.2.1 blkio Parameters
8.2.2 cpu Parameters
8.2.3 cpuacct Parameters
8.2.4 cpuset Parameters
8.2.5 devices Parameters
8.2.6 freezer Parameter
8.2.7 memory Parameters
8.2.8 net_cls Parameter

You control the access that cgroups have to system resources by specifying parameters to various kernel modules known as subsystems (or as resource controllers in some cgroups documentation).

The following table lists the subsystems that are provided with the cgroups package.

Subsystem

Description

blkio

Controls and reports block I/O operations. See Section 8.2.1, “blkio Parameters”.

Note

The blkio subsystem is enabled in the 2.6.39 UEK, but not in the 2.6.32 UEK.

cpu

Controls access to CPU resources. See Section 8.2.2, “cpu Parameters”.

cpuacct

Reports usage of CPU resources. See Section 8.2.3, “cpuacct Parameters”.

cpuset

Controls access to CPU cores and memory nodes (for systems with NUMA architectures). See Section 8.2.4, “cpuset Parameters”.

devices

Controls access to system devices. See Section 8.2.5, “devices Parameters”.

freezer

Suspends or resumes cgroup tasks. See Section 8.2.6, “freezer Parameter”.

memory

Controls access to memory resources, and reports on memory usage. See Section 8.2.7, “memory Parameters”.

net_cls

Tags network packets for use by network traffic control. See Section 8.2.8, “net_cls Parameter”.

The following sections describe the parameters that you can set for each subsystem.

8.2.1 blkio Parameters

The following blkio parameters are defined:

blkio.io_merged

Reports the number of BIOS requests that have been merged into async, read, sync, or write I/O operations.

blkio.io_queued

Reports the number of requests for async, read, sync, or write I/O operations.

blkio.io_service_bytes

Reports the number of bytes transferred by async, read, sync, or write I/O operations to or from the devices specified by their major and minor numbers as recorded by the completely fair queueing (CFQ) scheduler, but not updated while it is operating on a request queue.

blkio.io_serviced

Reports the number of async, read, sync, or write I/O operations to or from the devices specified by their major and minor numbers as recorded by the CFQ scheduler, but not updated while it is operating on a request queue.

blkio.io_service_time

Reports the time in nanoseconds taken to complete async, read, sync, or write I/O operations to or from the devices specified by their major and minor numbers.

blkio.io_wait_time

Reports the total time in nanoseconds that a cgroup spent waiting for async, read, sync, or write I/O operations to complete to or from the devices specified by their major and minor numbers.

blkio.reset_stats

Resets the statistics for a cgroup if an integer is written to this parameter.

blkio.sectors

Reports the number of disk sectors written to or read from the devices specified by their major and minor numbers.

blkio.throttle.io_service_bytes

Reports the number of bytes transferred by async, read, sync, or write I/O operations to or from the devices specified by their major and minor numbers even while the CFQ scheduler is operating on a request queue.

blkio.throttle.io_serviced

Reports the number of async, read, sync, or write I/O operations to or from the devices specified by their major and minor numbers even while the CFQ scheduler is operating on a request queue.

blkio.throttle.read_bps_device

Specifies the maximum number of bytes per second that a cgroup may read from a device specified by its major and minor numbers. For example, the setting 8:1 4194304 specifies that a maximum of 4 MB per second may be read from /dev/sda1.

blkio.throttle.read_iops_device

Specifies the maximum number of read operations per second that a cgroup may perform on a device specified by its major and minor numbers. For example, the setting 8:1 100 specifies that a maximum of 100 read operations per second may be performed on/dev/sda1.

blkio.throttle.write_bps_device

Specifies the maximum number of bytes per second that a cgroup may write to a device specified by its major and minor numbers. For example, the setting 8:2 2097152 specifies a maximum of 2 MB per second may be written to /dev/sda2.

blkio.throttle.write_iops_device

Specifies the maximum number of write operations per second that a cgroup may perform on a device specified by its major and minor numbers. For example, the setting 8:2 50 specifies that a maximum of 50 write operations per second may be performed on /dev/sda2.

blkio.time

Reports the time in milliseconds that I/O access was available to a device specified by its major and minor numbers.

blkio.weight

Specifies a bias value from 100 to 1000 that determines a cgroup's share of access to block I/O. The default value is 1000. The value is overridden by the setting for an individual device (see blkio.weight_device).

blkio.weight_device

Specifies a bias value from 100 to 1000 that determines a cgroup's share of access to block I/O on a device specified by its major and minor numbers. For example, the setting 8:17 100 specifies a bias value of 100 for /dev/sdb1.

8.2.2 cpu Parameters

The following cpu parameters are defined:

cpu.rt_period_us

Specifies how often in microseconds that a cgroup's access to a CPU should be rescheduled. The default value is 1000000 (1 second).

cpu.rt_runtime.us

Specifies for how long in microseconds that a cgroup has access to a CPU between rescheduling operations. The default value is 950000 (0.95 seconds).

cpu.shares

Specifies the bias value that determines a cgroup's share of CPU time. The default value is 1024.

8.2.3 cpuacct Parameters

The following cpuacct parameters are defined:

cpuacct.stat

Reports the total CPU time in nanoseconds spent in user and system mode by all tasks in the cgroup.

cpuacct.usage

Reports the total CPU time in nanoseconds for all tasks in the cgroup. Setting this parameter to 0 resets its value, and also resets the value of cpuacct.usage_percpu.

cpuacct.usage_percpu

Reports the total CPU time in nanoseconds on each CPU core for all tasks in the cgroup.

8.2.4 cpuset Parameters

The following cpuset parameters are defined:

cpuset.cpu_exclusive

Specifies whether the CPUs specified by cpuset.cpus are exclusively allocated to this CPU set and cannot be shared with other CPU sets. The default value of 0 specifies that CPUs are not exclusively allocated. A value of 1 enables exclusive use of the CPUs by a CPU set.

cpuset.cpus

Specifies a list of CPU cores to which a cgroup has access. For example, the setting 0,1,5-8 allows access to cores 0, 1, 5, 6, 7, and 8. The default setting includes all the available CPU cores.

Note

If you associate the cpuset subsystem with a cgroup, you must specify a value for the cpuset.cpus parameter.

cpuset.mem_exclusive

Specifies whether the memory nodes specified by cpuset.mems are exclusively allocated to this CPU set and cannot be shared with other CPU sets. The default value of 0 specifies that memory nodes are not exclusively allocated. A value of 1 enables exclusive use of the memory nodes by a CPU set.

cpuset.mem_hardwall

Specifies whether the kernel allocates pages and buffers to the memory nodes specified by cpuset.mems exclusively to this CPU set and cannot be shared with other CPU sets. The default value of 0 specifies that memory nodes are not exclusively allocated. A value of 1 allows you to separate the memory nodes that are allocated to different cgroups.

cpuset.memory_migrate

Specifies whether memory pages are allowed to migrate between memory nodes if the value of cpuset.mems changes. The default value of 0 specifies that memory nodes are not allowed to migrate. A value of 1 allows pages to migrate between memory nodes, maintaining their relative position on the node list where possible.

cpuset.memory_pressure

If cpuset.memory_pressure_enabled has been set to 1, reports the memory pressure, which represents the number of attempts per second by processes to reclaim in-use memory. The reported value scales the actual number of attempts up by a factor of 1000.

cpuset.memory_pressure_enabled

Specifies whether the memory pressure statistic should be gathered. The default value of 0 disables the counter. A value of 1 enables the counter.

cpuset.memory_spread_page

Specifies whether file system buffers are distributed between the allocated memory nodes. The default value of 0 results in the buffers being placed on the same memory node as the process that owns them. A value of 1 allows the buffers to be distributed across the memory nodes of the CPU set.

cpuset.memory_spread_slab

Specifies whether I/O slab caches are distributed between the allocated memory nodes. The default value of 0 results in the caches being placed on the same memory node as the process that owns them. A value of 1 allows the caches to be distributed across the memory nodes of the CPU set.

cpuset.mems

Specifies the memory nodes to which a cgroup has access. For example, the setting 0-2,4 allows access to memory nodes 0, 1, 2, and 4. The default setting includes all available memory nodes. The parameter has a value of 0 on systems that do not have a NUMA architecture.

Note

If you associate the cpuset subsystem with a cgroup, you must specify a value for the cpuset.mems parameter.

cpuset.sched_load_balance

Specifies whether the kernel should attempt to balance CPU load by moving processes between the CPU cores allocated to a CPU set. The default value of 1 turns on load balancing. A value of 0 disables load balancing. Disabling load balancing for a cgroup has no effect if load balancing is enabled in the parent cgroup.

cpuset.sched_relax_domain_level

If cpuset.sched_load_balance is set to 1, specifies one of the following load-balancing schemes.

Setting

Description

-1

Use the system's default load balancing scheme. This is the default behavior.

0

Perform periodic load balancing. Higher numeric values enable immediate load balancing.

1

Perform load balancing for threads running on the same core.

2

Perform load balancing for cores of the same CPU.

3

Perform load balancing for all CPU cores on the same system.

4

Perform load balancing for a subset of CPU cores on a system with a NUMA architecture.

5

Perform load balancing for all CPU cores on a system with a NUMA architecture.

8.2.5 devices Parameters

The following devices parameters are defined:

devices.allow

Specifies a device that a cgroup is allowed to access by its type (a for any, b for block, or c for character), its major and minor numbers, and its access modes (m for create permission, r for read access, and w for write access).

For example, b 8:17 rw would allow read and write access to the block device /dev/sdb1.

You can use the wildcard * to represent any major or minor number. For example, b 8:* rw would allow read and write access to any /dev/sd* block device.

Each device that you specify is added to the list of allowed devices.

devices.deny

Specifies a device that a cgroup is not allowed to access.

Removes each device that you specify from the list of allowed devices.

devices.list

Reports those devices for which access control is set. If no devices are controlled, all devices are reported as being available in all access modes: a *:* rwm.

8.2.6 freezer Parameter

The following freezer parameter is defined:

freezer.state

Specifies one of the following operations.

Setting

Description

FROZEN

Suspends all the tasks in a cgroup. You cannot move a process into a frozen cgroup.

THAWED

Resumes all the tasks in a cgroup.

Note

You cannot set the FREEZING state. If displayed, this state indicates that the system is currently suspending the tasks in the cgroup.

The freezer.state parameter is not available in the root cgroup.

8.2.7 memory Parameters

The following memory parameters are defined:

memory.failcnt

Specifies the number of times that the amount of memory used by a cgroup has risen to memory.limit_in_bytes.

memory.force_empty

If a cgroup has no tasks, setting the value to 0 removes all pages from memory that were used by tasks in the cgroup. Setting the parameter in this way avoids a parent cgroup from being assigned the defunct page caches when you remove its child cgroup.

memory.limit_in_bytes

Specifies the maximum usage permitted for user memory including the file cache. The default units are bytes, but you can also specify a k or K, m or M, and g or G suffix for kilobytes, megabytes, and gigabytes respectively. A value of -1 removes the limit.

To avoid an out-of-memory error, set the value of memory.limit_in_bytes lower than memory.memsw.limit_in_bytes, and set memory.memsw.limit_in_bytes lower than the amount of available swap space.

memory.max_usage_in_bytes

Reports the maximum amount of user memory in bytes used by tasks in the cgroup.

memory.memsw.failcnt

Specifies the number of times that the amount of memory and swap space used by a cgroup has risen to memory.memsw.limit_in_bytes.

memory.memsw.limit_in_bytes

Specifies the maximum usage permitted for user memory plus swap space. The default units are bytes, but you can also specify a k or K, m or M, and g or G suffix for kilobytes, megabytes, and gigabytes respectively. A value of -1 removes the limit.

memory.memsw.max_usage_in_bytes

Reports the maximum amount of user memory and swap space in bytes used by tasks in the cgroup.

memory.memsw.usage_in_bytes

Reports the total size in bytes of the memory and swap space used by tasks in the cgroup.

memory.move_charge_at_immigrate

Specifies whether a task's charges are moved when you migrate the task between cgroups. You can specify the following values.

Setting

Description

0

Disable moving task charges.

1

Moves charges for an in-use or swapped-out anonymous page exclusively owned by the task.

2

Moves charges for file pages that are memory mapped by the task.

3

Equivalent to specifying both 1 and 2.

memory.numa_stat

Reports the NUMA memory usage in bytes for each memory node (N0, N1,...) together with the following statistics.

Statistic

Description

anon

The size in bytes of anonymous and swap cache.

file

The size in bytes of file-backed memory.

total

The sum of the anon, file and unevictable values.

unevictable

The size in bytes of unreclaimable memory.

memory.oom_control

Displays the values of the out-of-memory (OOM) notification control feature.

Setting

Description

oom_kill_disable

Whether the OOM killer is enabled (0) or disabled (1).

under_oom

Whether the cgroup is under OOM control (1) allowing tasks to be stopped, or not under OOM control (0).

memory.soft_limit_in_bytes

Specifies a soft, upper limit for user memory including the file cache. The default units are bytes, but you can also specify a k or K, m or M, and g or G suffix for kilobytes, megabytes, and gigabytes respectively. A value of -1 removes the limit.

The soft limit should be lower than the hard-limit value of memory.limit_in_bytes as the hard limit always takes precedence.

memory.stat

Reports the following memory statistics.

Statistic

Description

active_anon

The size in bytes of anonymous and swap cache on active least-recently-used (LRU) list (includes tmpfs).

active_file

The size in bytes of file-backed memory on active LRU list.

cache

The size in bytes of page cache (includes tmpfs).

hierarchical_memory_limit

The size in bytes of the limit of memory for the cgroup hierarchy.

hierarchical_memsw_limit

The size in bytes of the limit of memory plus swap for the cgroup hierarchy.

inactive_anon

The size in bytes of anonymous and swap cache on inactive LRU list (includes tmpfs).

inactive_file

The size in bytes of file-backed memory on inactive LRU list.

mapped_file

The size in bytes of memory-mapped files (includes tmpfs).

pgfault

The number of page faults, where the kernel has to allocate and initialize physical memory for use in the virtual address space of a process.

pgmajfault

The number of major page faults, where the kernel has to actively free physical memory before allocation and initialization.

pgpgin

The number of paged-in pages of memory.

pgpgout

The number of paged-out pages of memory.

rss

The size in bytes of anonymous and swap cache (does not include tmpfs). The actual resident set size is given by the sum of rss and mapped_file.

swap

The size in bytes of used swap space.

total_*

The value of the appended statistic for the cgroup and all of its children.

unevictable

The size in bytes of memory that in not reclaimable.

memory.swappiness

Specifies a bias value for the kernel to swap out memory pages used by processes in the cgroup rather than reclaim pages from the page cache. A value smaller than the default value of 60 reduces the kernel's preference for swapping out. A value greater than 60 increases the preference for swapping out. A value greater than 100 allows the system to swap out pages that fall within the address space of the cgroup's tasks.

memory.usage_in_bytes

Reports the total size in bytes of the memory used by all the tasks in the cgroup.

memory.use_hierarchy

Specifies whether the kernel should attempt to reclaim memory from a cgroup's hierarchy. The default value of 0 prevents memory from being reclaimed from other tasks in the hierarchy. A value of 1 allows memory to be reclaimed from other tasks in the hierarchy.

8.2.8 net_cls Parameter

The following net_cls parameter is defined:

net_cls.classid

Specifies the hexadecimal class identifier that the system uses to tag network packets for use with the Linux traffic controller.