5.2 About the /proc Virtual File System

5.2.1 Virtual Files and Directories Under /proc
5.2.2 Changing Kernel Parameters
5.2.3 Parameters that Control System Performance
5.2.4 Parameters that Control Kernel Panics

The files in the /proc directory hierarchy contain information about your system hardware and the processes that are running on the system. You can change the configuration of the kernel by writing to certain files that have write permission.

The name of the proc file system stems from its original purpose on the Oracle Solaris operating system, which was to allow access by debugging tools to the data structures inside running processes. Linux added this interface and extended it to allow access to data structures in the kernel. Over time, /proc became quite disordered and the sysfs file system was created in an attempt to tidy it up. For more information, see Section 5.3, “About the /sys Virtual File System”.

Files under the /proc directory are virtual files that the kernel creates on demand to present a browsable view of the underlying data structures and system information. As such, /proc is an example of a virtual file system. Most virtual files are listed as zero bytes in size, but they contain a large amount of information when viewed.

Virtual files such as /proc/interrupts, /proc/meminfo, /proc/mounts, and /proc/partitions provide a view of the system’s hardware. Others, such as /proc/filesystems and the files under /proc/sys provide information about the system's configuration and allow this configuration to be modified.

Files that contain information about related topics are grouped into virtual directories. For example, a separate directory exists in /proc for each process that is currently running on the system, and the directory's name corresponds to the numeric process ID. /proc/1 corresponds to the init process, which has a PID of 1.

You can use commands such as cat, less, and view to examine virtual files within /proc. For example, /proc/cpuinfo contains information about the system's CPUs:

# cat /proc/cpuinfo
processor         : 0
vendor_id         : GenuineIntel
cpu family        : 6
model             : 42
model name        : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
stepping          : 7
cpu MHz           : 2393.714
cache size        : 6144 KB
physical id       : 0
siblings          : 2
core id           : 0
cpu cores         : 2
apicid            : 0
initial apicid    : 0
fpu               : yes
fpu_exception     : yes
cpuid level       : 5
wp                : yes
...

Certain files under /proc require root privileges for access or contain information that is not human-readable. You can use utilities such as lspci, free, and top to access the information in these files. For example, lspci lists all PCI devices on a system:

# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:02.0 VGA compatible controller: InnoTek Systemberatung GmbH VirtualBox Graphics Adapter
00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
00:04.0 System peripheral: InnoTek Systemberatung GmbH VirtualBox Guest Service
00:05.0 Multimedia audio controller: Intel Corporation 82801AA AC'97 Audio Controller (rev 01)
00:06.0 USB controller: Apple Inc. KeyLargo/Intrepid USB
00:07.0 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:0b.0 USB controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB2 EHCI Controller
00:0d.0 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [AHCI mode]
        (rev 02)
...

5.2.1 Virtual Files and Directories Under /proc

The following table lists the most useful virtual files and directories under the /proc directory hierarchy.

Table 5.1 Useful Virtual Files and Directories Under /proc

Virtual File or DirectoryDescription

PID (Directory)

Provides information about the process with the process ID (PID). The directory's owner and group is same as the process's. Useful files under the directory include:

cmdline

Command path.

cwd

Symbolic link to the process's current working directory.

environ

Environment variables.

exe

Symbolic link to the command executable.

fd/N

File descriptors.

maps

Memory maps to executable and library files.

root

Symbolic link to the effective root directory for the process.

status

Run state and memory usage.

buddyinfo

Provides information for diagnosing memory fragmentation.

bus (directory)

Contains information about the various buses (such as pci and usb) that are available on the system. You can use commands such as lspci, lspcmcia, and lsusb to display information for such devices.

cmdline

Lists parameters passed to the kernel at boot time.

cpuinfo

Provides information about the system's CPUs.

crypto

Provides information about all installed cryptographic cyphers.

devices

Lists the names and major device numbers of all currently configured characters and block devices.

dma

Lists the direct memory access (DMA) channels that are currently in use.

driver (directory)

Contains information about drivers used by the kernel, such as those for non-volatile RAM (nvram), the real-time clock (rtc), and memory allocation for sound (snd-page-alloc).
execdomains

Lists the execution domains for binaries that the Oracle Linux kernel supports.

filesystems

Lists the file system types that the kernel supports. Entries marked with nodev are not in use.

fs (directory)

Contains information about the file systems that are mounted, organized by file system type.

interrupts

Records the number of interrupts per interrupt request queue (IRQ) for each CPU since system startup.

iomem

Lists the system memory map for each physical device.

ioports

Lists the range of I/O port addresses that the kernel uses with devices.

irq (directory)

Contains information about each IRQ. You can configure the affinity between each IRQ and the system CPUs.

kcore

Presents the system's physical memory in core file format that you can examine using a debugger such as crash or gdb. This file is not human-readable.

kmsg

Records kernel-generated messages, which are picked up by programs such as dmesg.

loadavg

Displays the system load averages (number of queued processes) for the past 1, 5, and 15 minutes, the number of running processes, the total number of processes, and the PID of the process that is running.

locks

Displays information about the file locks that the kernel is currently holding on behalf of processes. The information provided includes:

  • lock class (FLOCK or POSIX)

  • lock type (ADVISORY or MANDATORY)

  • access type (READ or WRITE)

  • process ID

  • major device, minor device, and inode numbers

  • bounds of the locked region

mdstat

Lists information about multiple-disk RAID devices.

meminfo

Reports the system's usage of memory in more detail than is available using the free or top commands.

modules

Displays information about the modules that are currently loaded into the kernel. The lsmod command formats and displays the same information, excluding the kernel memory offset of a module.

mounts

Lists information about all mounted file systems.

net (directory)

Provides information about networking protocol, parameters, and statistics. Each directory and virtual file describes aspects of the configuration of the system's network.

partitions

Lists the major and minor device numbers, number of blocks, and name of partitions mounted by the system.

scsi/device_info

Provides information about supported SCSI devices.

scsi/scsi and

scsi/sg/*

Provide information about configured SCSI devices, including vendor, model, channel, ID, and LUN data .

self

Symbolic link to the process that is examining /proc.

slabinfo

Provides detailed information about slab memory usage.

softirq

Displays information about software interrupts (softirqs). A softirq is similar to a hardware interrupt (hardirq) and allow the kernel to perform asynchronous processing that would take too long during a hardware interrupt.

stat

Records information about the system since it was started, including:

cpu

Total CPU time (measured in jiffies) spent in user mode, low-priority user mode, system mode, idle, waiting for I/O, handling hardirq events, and handling softirq events.

cpuN

Times for CPU N.

swaps

Provides information on swap devices. The units of size and usage are kilobytes.

sys (directory)

Provides information about the system and also allows you to enable, disable, or modify kernel features. You can write new settings to any file that has write permission. See Section 5.2.2, “Changing Kernel Parameters”.

The following subdirectory hierarchies of /proc/sys contain virtual files, some of whose values you can usefully alter:

dev

Device parameters.

fs

File system parameters.

kernel

Kernel configuration parameters.

net

Networking parameters.

sysvipc (directory)

Provides information about the usage of System V Interprocess Communication (IPC) resources for messages (msg), semaphores (sem), and shared memory (shm).

ttys (directory)

Provides information about the available and currently used terminal devices on the system. The devices virtual file lists the devices that are currently configured.

vmstat

Provides information about virtual memory usage.


For more information, see the proc(5) manual page.

5.2.2 Changing Kernel Parameters

Some virtual files under /proc, and under /proc/sys in particular, are writable and you can use them to adjust settings in the kernel. For example, to change the host name, you can write a new value to /proc/sys/kernel/hostname:

# echo www.mydomain.com > /proc/sys/kernel/hostname

Other files take value that take binary or Boolean values. For example, the value of /proc/sys/net/ipv4/ip_forward determines whether the kernel forwards IPv4 network packets.

# cat /proc/sys/net/ipv4/ip_forward
0
# echo 1 > /proc/sys/net/ipv4/ip_forward
# cat /proc/sys/net/ipv4/ip_forward
1

You can use the sysctl command to view or modify values under the /proc/sys directory.

To display all of the current kernel settings:

# sysctl -a
kernel.sched_child_runs_first = 0
kernel.sched_min_granularity_ns = 2000000
kernel.sched_latency_ns = 10000000
kernel.sched_wakeup_granularity_ns = 2000000
kernel.sched_shares_ratelimit = 500000
...
Note

The delimiter character in the name of a setting is a period (.) rather than a slash (/) in a path relative to /proc/sys. For example, net.ipv4.ip_forward represents net/ipv4/ip_forward and kernel.msgmax represents kernel/msgmax.

To display an individual setting, specify its name as the argument to sysctl:

# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 0

To change the value of a setting, use the following form of the command:

# sysctl -w net.ipv4.ip_forward=1
net.ipv4.ip_forward = 1

Changes that you make in this way remain in force only until the system is rebooted. To make configuration changes persist after the system is rebooted, you must add them to the /etc/sysctl.conf file. Any changes that you make to this file take effect when the system reboots or if you run the sysctl -p command, for example:

# sed -i '/net.ipv4.ip_forward/s/= 0/= 1/' /etc/sysctl.conf 
# grep ip_forward /etc/sysctl.conf
net.ipv4.ip_forward = 1
# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 0
# sysctl -p
net.ipv4.ip_forward = 1
net.ipv4.conf.default.rp_filter = 1
...
kernel.shmall = 4294967296
# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

For more information, see the sysctl(8) and sysctl.conf(5) manual pages.

5.2.3 Parameters that Control System Performance

The following parameters control aspects of system performance:

fs.file-max

Specifies the maximum number of open files for all processes. Increase the value of this parameter if you see messages about running out of file handles.

net.core.netdev_max_backlog

Specifies the size of the receiver backlog queue, which is used if an interface receives packets faster than the kernel can process them. If this queue is too small, packets are lost at the receiver, rather than on the network.

net.core.rmem_max

Specifies the maximum read socket buffer size. To minimize network packet loss, this buffer must be large enough to handle incoming network packets.

net.core.wmem_max

Specifies the maximum write socket buffer size. To minimize network packet loss, this buffer must be large enough to handle outgoing network packets.

net.ipv4.tcp_available_congestion_control

Displays the TCP congestion avoidance algorithms that are available for use. Use the modprobe command if you need to load additional modules such as tcp_htcp to implement the htcp algorithm.

net.ipv4.tcp_congestion_control

Specifies which TCP congestion avoidance algorithm is used.

net.ipv4.tcp_max_syn_backlog

Specifies the number of outstanding SYN requests that are allowed. Increase the value of this parameter if you see synflood warnings in your logs, and investigation shows that they are occurring because the server is overloaded by legitimate connection attempts.

net.ipv4.tcp_rmem

Specifies minimum, default, and maximum receive buffer sizes that are used for a TCP socket. The maximum value cannot be larger than net.core.rmem_max.

net.ipv4.tcp_wmem

Specifies minimum, default, and maximum send buffer sizes that are used for a TCP socket. The maximum value cannot be larger than net.core.wmem_max.

vm.swappiness

Specifies how likely the kernel is to write loaded pages to swap rather than drop pages from the system page cache. When set to 0, swapping only occurs to avoid an out of memory condition. When set to 100, the kernel swaps aggressively. For a desktop system, setting a lower value can improve system responsiveness by decreasing latency. The default value is 60.

5.2.4 Parameters that Control Kernel Panics

The following parameters control the circumstances under which a kernel panic can occur:

kernel.hung_task_panic

If set to 1, the kernel panics if any user or kernel thread sleeps in the TASK_UNINTERRUPTIBLE state (D state) for more than kernel.hung_task_timeout_secs seconds. A process remains in D state while waiting for I/O to complete. You cannot killed or interrupt a process in this state.

The default value is 0, which disables the panic.

kernel.hung_task_timeout_secs

Specifies how long a user or kernel thread can remain in D state before a message is generated or the kernel panics (if the value of kernel.hung_task_panic is 1). The default value is 120 seconds.

kernel.panic

Specifies the number of seconds after a panic before a system will automatically reset itself.

If the value is 0, the system hangs, which allows you to collect detailed information about the panic for troubleshooting. This is the default value.

To enable automatic reset, set a non-zero value. If you require a memory image (vmcore), allow enough time for Kdump to create this image. The suggested value is 30 seconds, although large systems will require a longer time.

kernel.panic_on_oops

If set to 0, the system tries to continue operations if the kernel encounters an oops or BUG condition. When set to 1 (default), the system delays a few seconds to give the kernel log daemon, klogd, time to record the oops output before the panic occurs.

In an OCFS2 cluster. set the value to 1 to specify that a system must panic if a kernel oops occurs. If a kernel thread required for cluster operation crashes, the system must reset itself. Otherwise, another node might not be able to tell whether a node is slow to respond or unable to respond, causing cluster operations to hang.

vm.panic_on_oom

If set to 0 (default), the kernel’s OOM-killer scans through the entire task list and attempts to kill a memory-hogging process to avoid a panic. When set to 1, the kernel panics but can survive under certain conditions. If a process limits allocations to certain nodes by using memory policies or cpusets, and those nodes reach memory exhaustion status, the OOM-killer can kill one process. No panic occurs in this case because other nodes’ memory might be free and the system as a whole might not yet be out of memory. When set to 2, the kernel always panics when an OOM condition occurs. Settings of 1 and 2 are for intended for use with clusters, depending on your preferred failover policy.