You control the access that cgroups have to system resources by specifying parameters to various kernel modules known as subsystems (or as resource controllers in some cgroups documentation).
The following table lists the subsystems that are provided with
the cgroups package.
Subsystem | Description |
|---|---|
| Controls and reports block I/O operations. See Section 8.2.1, “blkio Parameters”. Note The |
| Controls access to CPU resources. See Section 8.2.2, “cpu Parameters”. |
| Reports usage of CPU resources. See Section 8.2.3, “cpuacct Parameters”. |
| Controls access to CPU cores and memory nodes (for systems with NUMA architectures). See Section 8.2.4, “cpuset Parameters”. |
| Controls access to system devices. See Section 8.2.5, “devices Parameters”. |
| Suspends or resumes cgroup tasks. See Section 8.2.6, “freezer Parameter”. |
| Controls access to memory resources, and reports on memory usage. See Section 8.2.7, “memory Parameters”. |
| Tags network packets for use by network traffic control. See Section 8.2.8, “net_cls Parameter”. |
The following sections describe the parameters that you can set for each subsystem.
The following blkio parameters are defined:
blkio.io_merged
Reports the number of BIOS requests that have been merged into
async, read,
sync, or write I/O
operations.
blkio.io_queued
Reports the number of requests for async,
read, sync, or
write I/O operations.
blkio.io_service_bytes
Reports the number of bytes transferred by
async, read,
sync, or write I/O
operations to or from the devices specified by their major and
minor numbers as recorded by the completely fair queueing (CFQ)
scheduler, but not updated while it is operating on a request
queue.
blkio.io_serviced
Reports the number of async,
read, sync, or
write I/O operations to or from the devices
specified by their major and minor numbers as recorded by the
CFQ scheduler, but not updated while it is operating on a
request queue.
blkio.io_service_time
Reports the time in nanoseconds taken to complete
async, read,
sync, or write I/O
operations to or from the devices specified by their major and
minor numbers.
blkio.io_wait_time
Reports the total time in nanoseconds that a cgroup spent
waiting for async, read,
sync, or write I/O
operations to complete to or from the devices specified by their
major and minor numbers.
blkio.reset_stats
Resets the statistics for a cgroup if an integer is written to this parameter.
blkio.sectors
Reports the number of disk sectors written to or read from the devices specified by their major and minor numbers.
blkio.throttle.io_service_bytes
Reports the number of bytes transferred by
async, read,
sync, or write I/O
operations to or from the devices specified by their major and
minor numbers even while the CFQ scheduler is operating on a
request queue.
blkio.throttle.io_serviced
Reports the number of async,
read, sync, or
write I/O operations to or from the devices
specified by their major and minor numbers even while the CFQ
scheduler is operating on a request queue.
blkio.throttle.read_bps_device
Specifies the maximum number of bytes per second that a cgroup may read from a device
specified by its major and minor numbers. For example, the setting 8:1
4194304 specifies that a maximum of 4 MB per second may be read from
/dev/sda1.
blkio.throttle.read_iops_device
Specifies the maximum number of read operations per second that
a cgroup may perform on a device specified by its major and
minor numbers. For example, the setting 8:1
100 specifies that a maximum of 100 read operations
per second may be performed on/dev/sda1.
blkio.throttle.write_bps_device
Specifies the maximum number of bytes per second that a cgroup may write to a device
specified by its major and minor numbers. For example, the setting 8:2
2097152 specifies a maximum of 2 MB per second may be written to
/dev/sda2.
blkio.throttle.write_iops_device
Specifies the maximum number of write operations per second that
a cgroup may perform on a device specified by its major and
minor numbers. For example, the setting 8:2
50 specifies that a maximum of 50 write operations per
second may be performed on /dev/sda2.
blkio.time
Reports the time in milliseconds that I/O access was available to a device specified by its major and minor numbers.
blkio.weight
Specifies a bias value from 100 to 1000 that determines a
cgroup's share of access to block I/O. The default value is
1000. The value is overridden by the setting for an individual
device (see blkio.weight_device).
blkio.weight_device
Specifies a bias value from 100 to 1000 that determines a
cgroup's share of access to block I/O on a device specified by
its major and minor numbers. For example, the setting
8:17 100 specifies a bias value of 100 for
/dev/sdb1.
The following cpu parameters are defined:
cpu.rt_period_us
Specifies how often in microseconds that a cgroup's access to a CPU should be rescheduled. The default value is 1000000 (1 second).
cpu.rt_runtime.us
Specifies for how long in microseconds that a cgroup has access to a CPU between rescheduling operations. The default value is 950000 (0.95 seconds).
cpu.shares
Specifies the bias value that determines a cgroup's share of CPU time. The default value is 1024.
The following cpuacct parameters are defined:
cpuacct.stat
Reports the total CPU time in nanoseconds spent in user and system mode by all tasks in the cgroup.
cpuacct.usage
Reports the total CPU time in nanoseconds for all tasks in the
cgroup. Setting this parameter to 0 resets its value, and also
resets the value of cpuacct.usage_percpu.
cpuacct.usage_percpu
Reports the total CPU time in nanoseconds on each CPU core for all tasks in the cgroup.
The following cpuset parameters are defined:
cpuset.cpu_exclusive
Specifies whether the CPUs specified by cpuset.cpus are exclusively
allocated to this CPU set and cannot be shared with other CPU sets. The default value of 0
specifies that CPUs are not exclusively allocated. A value of 1 enables exclusive use of the
CPUs by a CPU set.
cpuset.cpus
Specifies a list of CPU cores to which a cgroup has access. For
example, the setting 0,1,5-8 allows access to
cores 0, 1, 5, 6, 7, and 8. The default setting includes all the
available CPU cores.
If you associate the cpuset subsystem with
a cgroup, you must specify a value for the
cpuset.cpus parameter.
cpuset.mem_exclusive
Specifies whether the memory nodes specified by cpuset.mems are
exclusively allocated to this CPU set and cannot be shared with other CPU sets. The default
value of 0 specifies that memory nodes are not exclusively allocated. A value of 1 enables
exclusive use of the memory nodes by a CPU set.
cpuset.mem_hardwall
Specifies whether the kernel allocates pages and buffers to the memory nodes specified
by cpuset.mems exclusively to this CPU set and cannot be shared with
other CPU sets. The default value of 0 specifies that memory nodes are not exclusively
allocated. A value of 1 allows you to separate the memory nodes that are allocated to
different cgroups.
cpuset.memory_migrate
Specifies whether memory pages are allowed to migrate between
memory nodes if the value of cpuset.mems
changes. The default value of 0 specifies that memory nodes are
not allowed to migrate. A value of 1 allows pages to migrate
between memory nodes, maintaining their relative position on the
node list where possible.
cpuset.memory_pressure
If cpuset.memory_pressure_enabled has been
set to 1, reports the memory pressure,
which represents the number of attempts per second by processes
to reclaim in-use memory. The reported value scales the actual
number of attempts up by a factor of 1000.
cpuset.memory_pressure_enabled
Specifies whether the memory pressure statistic should be gathered. The default value of 0 disables the counter. A value of 1 enables the counter.
cpuset.memory_spread_page
Specifies whether file system buffers are distributed between the allocated memory nodes. The default value of 0 results in the buffers being placed on the same memory node as the process that owns them. A value of 1 allows the buffers to be distributed across the memory nodes of the CPU set.
cpuset.memory_spread_slab
Specifies whether I/O slab caches are distributed between the allocated memory nodes. The default value of 0 results in the caches being placed on the same memory node as the process that owns them. A value of 1 allows the caches to be distributed across the memory nodes of the CPU set.
cpuset.mems
Specifies the memory nodes to which a cgroup has access. For
example, the setting 0-2,4 allows access to
memory nodes 0, 1, 2, and 4. The default setting includes all
available memory nodes. The parameter has a value of 0 on
systems that do not have a NUMA architecture.
If you associate the cpuset subsystem with
a cgroup, you must specify a value for the
cpuset.mems parameter.
cpuset.sched_load_balance
Specifies whether the kernel should attempt to balance CPU load by moving processes between the CPU cores allocated to a CPU set. The default value of 1 turns on load balancing. A value of 0 disables load balancing. Disabling load balancing for a cgroup has no effect if load balancing is enabled in the parent cgroup.
cpuset.sched_relax_domain_level
If cpuset.sched_load_balance is set to 1, specifies one of the
following load-balancing schemes.
|
Setting |
Description |
|---|---|
|
|
Use the system's default load balancing scheme. This is the default behavior. |
|
|
Perform periodic load balancing. Higher numeric values enable immediate load balancing. |
|
|
Perform load balancing for threads running on the same core. |
|
|
Perform load balancing for cores of the same CPU. |
|
|
Perform load balancing for all CPU cores on the same system. |
|
|
Perform load balancing for a subset of CPU cores on a system with a NUMA architecture. |
|
|
Perform load balancing for all CPU cores on a system with a NUMA architecture. |
The following devices parameters are defined:
devices.allow
Specifies a device that a cgroup is allowed to access by its type (a
for any, b for block, or c for character), its major
and minor numbers, and its access modes (m for create permission,
r for read access, and w for write access).
For example, b 8:17 rw would allow read and
write access to the block device /dev/sdb1.
You can use the wildcard * to represent any
major or minor number. For example, b 8:* rw
would allow read and write access to any
/dev/sd* block device.
Each device that you specify is added to the list of allowed devices.
devices.deny
Specifies a device that a cgroup is not allowed to access.
Removes each device that you specify from the list of allowed devices.
devices.list
Reports those devices for which access control is set. If no devices are controlled,
all devices are reported as being available in all access modes: a *:*
rwm.
The following freezer parameter is defined:
freezer.state
Specifies one of the following operations.
|
Setting |
Description |
|---|---|
|
|
Suspends all the tasks in a cgroup. You cannot move a process into a frozen cgroup. |
|
|
Resumes all the tasks in a cgroup. |
You cannot set the FREEZING state. If
displayed, this state indicates that the system is currently
suspending the tasks in the cgroup.
The freezer.state parameter is not
available in the root cgroup.
The following memory parameters are defined:
memory.failcnt
Specifies the number of times that the amount of memory used by
a cgroup has risen to memory.limit_in_bytes.
memory.force_empty
If a cgroup has no tasks, setting the value to 0 removes all pages from memory that were used by tasks in the cgroup. Setting the parameter in this way avoids a parent cgroup from being assigned the defunct page caches when you remove its child cgroup.
memory.limit_in_bytes
Specifies the maximum usage permitted for user memory including
the file cache. The default units are bytes, but you can also
specify a k or K,
m or M, and
g or G suffix for
kilobytes, megabytes, and gigabytes respectively. A value of -1
removes the limit.
To avoid an out-of-memory error, set the value of
memory.limit_in_bytes lower than
memory.memsw.limit_in_bytes, and set
memory.memsw.limit_in_bytes lower than the
amount of available swap space.
memory.max_usage_in_bytes
Reports the maximum amount of user memory in bytes used by tasks in the cgroup.
memory.memsw.failcnt
Specifies the number of times that the amount of memory and swap
space used by a cgroup has risen to
memory.memsw.limit_in_bytes.
memory.memsw.limit_in_bytes
Specifies the maximum usage permitted for user memory plus swap
space. The default units are bytes, but you can also specify a
k or K,
m or M, and
g or G suffix for
kilobytes, megabytes, and gigabytes respectively. A value of -1
removes the limit.
memory.memsw.max_usage_in_bytes
Reports the maximum amount of user memory and swap space in bytes used by tasks in the cgroup.
memory.memsw.usage_in_bytes
Reports the total size in bytes of the memory and swap space used by tasks in the cgroup.
memory.move_charge_at_immigrate
Specifies whether a task's charges are moved when you migrate the task between cgroups. You can specify the following values.
|
Setting |
Description |
|---|---|
|
|
Disable moving task charges. |
|
|
Moves charges for an in-use or swapped-out anonymous page exclusively owned by the task. |
|
|
Moves charges for file pages that are memory mapped by the task. |
|
|
Equivalent to specifying both 1 and 2. |
memory.numa_stat
Reports the NUMA memory usage in bytes for each memory node (N0, N1,...) together with the following statistics.
|
Statistic |
Description |
|---|---|
|
|
The size in bytes of anonymous and swap cache. |
|
|
The size in bytes of file-backed memory. |
|
|
The sum of the |
|
|
The size in bytes of unreclaimable memory. |
memory.oom_control
Displays the values of the out-of-memory (OOM) notification control feature.
|
Setting |
Description |
|---|---|
|
|
Whether the OOM killer is enabled (0) or disabled (1). |
|
|
Whether the cgroup is under OOM control (1) allowing tasks to be stopped, or not under OOM control (0). |
memory.soft_limit_in_bytes
Specifies a soft, upper limit for user memory including the file
cache. The default units are bytes, but you can also specify a
k or K,
m or M, and
g or G suffix for
kilobytes, megabytes, and gigabytes respectively. A value of -1
removes the limit.
The soft limit should be lower than the hard-limit value of
memory.limit_in_bytes as the hard limit
always takes precedence.
memory.stat
Reports the following memory statistics.
|
Statistic |
Description |
|---|---|
|
|
The size in bytes of anonymous and swap cache on active least-recently-used (LRU) list (includes tmpfs). |
|
|
The size in bytes of file-backed memory on active LRU list. |
|
|
The size in bytes of page cache (includes |
|
|
The size in bytes of the limit of memory for the cgroup hierarchy. |
|
|
The size in bytes of the limit of memory plus swap for the cgroup hierarchy. |
|
|
The size in bytes of anonymous and swap cache on inactive LRU list
(includes |
|
|
The size in bytes of file-backed memory on inactive LRU list. |
|
|
The size in bytes of memory-mapped files (includes
|
|
|
The number of page faults, where the kernel has to allocate and initialize physical memory for use in the virtual address space of a process. |
|
|
The number of major page faults, where the kernel has to actively free physical memory before allocation and initialization. |
|
|
The number of paged-in pages of memory. |
|
|
The number of paged-out pages of memory. |
|
|
The size in bytes of anonymous and swap cache (does not include
|
|
|
The size in bytes of used swap space. |
|
|
The value of the appended statistic for the cgroup and all of its children. |
|
|
The size in bytes of memory that in not reclaimable. |
memory.swappiness
Specifies a bias value for the kernel to swap out memory pages used by processes in the cgroup rather than reclaim pages from the page cache. A value smaller than the default value of 60 reduces the kernel's preference for swapping out. A value greater than 60 increases the preference for swapping out. A value greater than 100 allows the system to swap out pages that fall within the address space of the cgroup's tasks.
memory.usage_in_bytes
Reports the total size in bytes of the memory used by all the tasks in the cgroup.
memory.use_hierarchy
Specifies whether the kernel should attempt to reclaim memory from a cgroup's hierarchy. The default value of 0 prevents memory from being reclaimed from other tasks in the hierarchy. A value of 1 allows memory to be reclaimed from other tasks in the hierarchy.