Appendix A Other Changes
The following sections describe other features of Unbreakable Enterprise Kernel Release 3 (UEK R3). The mainline version in which a feature was introduced is noted in parentheses.
A.1 Architecture
vsysscall
emulation andvsyscall
parameter. (3.1)INTEL_MID
configuration. (3.1)mrst_pmu
driver for Intel Moorestown Power Management Unit. (3.1)Hardware memory error recovery support for ACPI, APEI, and GHES. (3.1)
printk()
support for recoverable error via NMI for ACPI, APEI, and GHES. (3.1)
A.2 Block Devices
Strict CPU affinity can be enabled by setting the value of
/sys/block/
to 2. Performance on some systems benefits from being directed to the strict requester CPU rather than using per-socket steering. (3.1)blkdev
/queue/rq_affinityCFQ I/O scheduler performance tuning adds think time check for a group, which makes bandwidth usage more efficient by not leaving queues active when there are no further requests for the group. (3.1)
Flakey target support in the device mapper adds the
corrupt_bio_byte
parameter to simulate corruption by overwriting a byte at a specified position with a specified value while the device is down. Thedrop_writes
option parameter drops writes silently while the device is down. (3.1)The device mapper supports MD RAID-1 personality through the
dm-raid
target. (3.1)The device mapper supports the ability to parse and use metadata devices with
dm-raid
. Without the metadata devices, many RAID features would be unavailable. (3.1)Experimental support for thin provisioning in the device mapper allows the creation of multiple thinly provisioned volumes from a storage pool and recursive snapshots to an arbitrary depth. (3.2)
I/O-less dirty throttling and reduced file-system writeback from page reclamation greatly reduces I/O seeks and CPU contention. (3.2)
The
cfq_target_latency
parameter undersysfs
allows throughput and read latency to be tuned. (3.4)The device mapper supports adding and removing space at the end of the devices when resizing RAID-10 arrays with
near
andoffset
layouts. (3.4)Thin target in the device mapper supports discards. When non-discard I/O completes and the associated mappings are quiesced, any discards that were deferred (via
ds_add_work()
inprocess_discard()
) are queued for processing by the worker thread. (3.4)Thin target in the device mapper provides user-space access to pool metadata. Two new messages can be sent to the thin pool target allowing it to take a snapshot of the metadata. This read-only snapshot can be accessed from user space concurrently with the live target. (3.5)
Thin target in the device mapper uses dedicated slab caches (whose names are prefixed with
dm_
) rather than relying onkmalloc
memory pools backed by generic slab caches. This allows independent accounting of memory usage and any associated memory leakage by thin provisioning. (3.5)RAID-5 XOR checksumming is optimized by taking advantage of the 256-bit YMM registers introduced by Advanced Vector Extensions (AVX). (3.5)
RAID-6 includes Supplemental Streaming SIMD Extensions 3 (SSSE3) optimized recovery functions and a new algorithm for selecting the most appropriate function to use for recovery. (3.5)
MD allows a reshape operation to be reversed by implementing a new
reshape_direction
attribute that can be set whendelta_disks
is zero, and which can take one of the valuesforward
orbackwards
. (3.5)A RAID-10 array can be reshaped to a different
near
oroffset
layout, a different chunk size, and a different number of devices. The number of copies cannot be changed. (3.5)An existing partition can be resized, even if currently in use, by using the operation code
BLKPG_RESIZE_PARTITION
with theBLKPG
ioctl()
. (3.6)Add MD support for
RAID10
(striped mirrors) andRAID1E
(integrated adjacent stripe mirroring). (3.6)Thin target in the device mapper adds
read-only
andfail-io
modes to thin provisioning. If a transaction commit fails, a pool's metadata device transitions toread-only
mode. If a commit fails when the device is inread-only
mode, a transition tofail-io
mode occurs. Infail-io
mode, the pool and all associated thin devices report a status offail
if a commit fails. (3.6)The persistent data debug space map checker has been removed from the device mapper. The feature consumed a lot of memory and caused other issues when enabled on large pools. (3.6)
RAID-1 in MD now prevents the merging of large requests to enhance the performance of SSD devices that function more efficiently with large request transfers. (3.6)
Support for the
WRITE SAME
request implemented on some SCSI devices to allow a block to be efficiently replicated throughout a block range. Only a single logical block need be transferred from the host. The storage device writes the same data to all blocks specified by the request. (3.7)The
BLKZEROOUT ioctl()
can be used to zero out block ranges viablkdev_issue_zerooout()
. (3.7)Fastmap support provides a method for attaching an unsorted block image (UBI) device in real-time. Rather than scanning the entire device, Fastmap locates a checkpoint. (3.7)
MD adds
TRIM
discard support for linear RAID-0, RAID-1, RAID-5, and RAID-10. (3.7)DM adds rebuild capacity and replacement slot validation for RAID-10 arrays. (3.7)
RAID-6 recovery is optimized by taking advantage of the 256-bit YMM registers introduced by Advanced Vector Extensions 2 (AVX2). (3.8)
A.3 Core Kernel Functionality
Add a lock-less NULL-terminated single list. (3.1)
Add a library function implementing a
crc8
algorithm to support thebrcm80211
driver. (3.1)Make the
gen_pool
memory allocator lockless. This change makes it safe to use the memory allocator in NMI handlers and other special unblockable contexts where deadlocks might occur. (3.1)Implement the
PTRACE_INTERRUPT
,PTRACE_LISTEN
,PTRACE_SEIZE
, andTRAP_NOTIFY
ptrace()
requests. (3.1)Adds
/sys/module/
files to all module entries to provide a method for managing built-in modules from user space. (3.1)module_name
/ueventAdd support for the implementation of
SEEK_HOLE
andSEEK_DATA
inlseek()
. (3.1)Add the
!
escape character to/
inhostname
andcomm
strings in core dumps. (3.1)If the value of the
sysctl
parametershm_rmid_forced
is set to 11, all shared memory objects are marked for removal withIPC_RMID
. As this change breaks POSIX compliance, you need to ensure that no threads are using the orphaned memory. (3.1)Add support for generic I/O power management domains (v8) by introducing common headers, helper functions, and callbacks to allow platforms to use simple, generic power domains for runtime power management. (3.1)
Add system-wide power transitions (system suspend and hibernation) support for generic domains (v5). Add
suspend
,resume
,freeze
,thaw
,poweroff
, andrestore
callbacks that are associated withstruct generic_pm_domain
objects and havepm_genpd_init()
interpret them as appropriate. (3.1)Add wakeup device support for system-sleep transitions. Introduce a new generic power management domain callback routine,
.active_wakeup()
. This routine is used during thenoirq
phase of system suspend and hibernation to decide how to handle wakeup devices. (3.1)Add the ability to set a maximum limit for allowable CPU bandwidth to the process bandwidth controller. The limit is specified as a quota and a period for a group of processes. (3.2)
To reduce the performance impact from using
i_mutex
lock withgeneric_file_llseek()
, an almost locklessgeneric_file_llseek()
is added to VFS that allows the maximum file size of the file system to be passed in, instead of always usingmaxbytes
from the superblock. (3.2)A boot parameter of the form
root=PARTUUID=
extends theuuid
,PARTNROFF=partition_number_offset
root=PARTUUID=
syntax to select the root partition by specifying an integer offset from a known, unique partition. (3.2)uuid
Add a fault reporting mechanism to the input/output memory management unit (IOMMU) API. (3.2)
Allow partition creation from user space and add discard support for loop devices. (3.2)
When performing AIO, allocate
kiocb
structures in batches to reduce the CPU overhead of a process taking and releasing the context lock. (3.2)Add support for the tagged files ease-of-use feature in
sysfs
. (3.2)Add a
comm
change event to the process connector. (3.2)Add architecture-independent support for
highmem
page poisoning and verification todebug-pagealloc
. (3.2)Add support for
poll()
insysctl
so that user-space applications can be notified of changes tosysctl
entries. (3.2)The x32 kernel ABI (kABI) allows programs to take advantage of x86-64 features such as a larger number of CPU registers, better floating-point performance, faster position-independent code shared libraries, function parameters passed via registers, and faster system-call instructions. The kABI uses 32-bit pointers and avoids the overhead of 64-bit pointers. The program is limited to a 4-GB virtual address space. However, reducing the memory footprint can also allow a program to run faster. (3.4)
The
nomodule
kernel parameter can be used to disable module loading as an alternative to usingsysctl
.The
prctl()
PR_GET_CHILD_SUBREAPER
andPR_SET_CHILD_SUBREAPER
options implement simple process supervision of orphaned processes. (3.4)Thread stacks are now marked correctly for
proc/
underpid
/mapsprocfs
. (3.4)Restore the
sysctl
settingkernel.pty.max
as the global limit of pseudo terminals (by default, 4096). (3.4)Add abilities to turn the reboot notifier on or off, and to enter the debugger and stop kernel execution before rebooting. (3.4)
To improve performance, VFS now uses
unsigned long
accesses fordcache
name comparison and hashing. (3.4)/proc/
entries provide information about task children and can be useful for process checkpoint and restore operations. (3.5)pid
/task/tid
/children/proc/
now reports whether file pages arepid
/pagemapshared-anon
orfile-page
. (3.5)The
skew_tick
boot option mitigatesxtime_lock
contention on larger systems or read-copy-update (RCU) lock contention on all systems whenCONFIG_MAXSMP
is set. This option increases power consumption and should only be enabled if the system runs jitter-sensitive workloads (typically, HPC or RT). (3.5)Inode
stat
information is moved closer together to increase the likelihood of cache hits. (3.5)The
fallocate()
file-system operation allows preallocation space for a file. (3.5)Stale power-aware scheduling remnants and dysfunctional knobs have been removed from the process scheduler. (3.5)
The
EPOLLWAKEUP
flag prevents system suspension whileepoll
events are ready. (3.5)ramoops
uses thepstore
interface instead of/dev/mem
. (3.5)Add ECC support to
pstore/ram
. (3.5)make tools is now integrated with the kernel build system. (3.5)
The kernel parameter
RCU_FANOUT_LEAF
can be used to control leaf-level fanout for RCU locking to reduce cache-miss initialization latencies on large systems. (3.5)RCU locking now implements a direct algorithmic sleepable RCU (SRCU) implementation to prevent OS jitter and performance degredation. (3.5)
Add
rbtree
node caching support to IPCmqueue
for the case where the queue is empty, improve performance ofsend/recv
, and update maximums for themqueue
subsystem. (3.5)Add symbolic and hard link restrictions to VFS to address security issues. (3.6)
Improvements to the IOMMU group implementation. (3.6)
Remove the non-working x86 power estimation feature from the process scheduler. (3.6)
Add hysteresis attributes (used by most thermal sensors) on a per-trip-point basis to the thermal framework. (3.6)
Add support for states that affect multiple CPUs. This is potentially useful in implementations where CPUs leverage a shared, coupled power state. (3.6)
The
rcutree.rcu_fanout_leaf
boot parameter allows the value ofRCU_FANOUT_LEAF
to be increased but not decreased. (3.6)Firmware files can be loaded directly from the file system rather than from
udev
. (3.7)xattr
support in cgroups allow run-time metadata to be attached to cgroups. (3.7)The disable_nmi command in kdb disables NMI-entry and releases the port. (3.7)
Add a special serial console driver to allow the temporary use of an NMI debugger port as a normal console via the nmi_console command. (3.7)
RCU locking changes:
Control grace period duration from
sysfs
.Make
rcutree
module parameters visible insysfs
.Allow an RCU lock to be placed in an extended quiescent state when the CPU runs in user space.
(3.7)
Add system call to enforce that kernel modules are loaded only from a read-only cryptographically verified root file system. (3.8)
Applications can choose between using 1-GB and 2-MB huge pages. Typically, this feature is used in conjunction with a NUMA policy. (3.8)
Add option to allow assignment of a memory node as movable memory, which allows an entire node to be hot-pluggable. (3.8)
Add
sysctl
variables to tune checkpoint/restart in user space (CRIU) including specifying the ID of the next IPC object to be allocated. (3.8)Introduce CRIU message queue copy feature so that all pending IPC messages can be retrieved without deleting them from the queue. (3.8)
Correct the implementation of hierarchy support for the freezer cgroup. If a cgroup is frozen, all its descendants are also frozen. (3.8)
Implement the
PTRACE_O_EXITKILL
ptrace()
request. (3.8)Add the
VmFlags
field to/proc/
output. Required by CRIU. (3.8)PID
/smapsAdd
TIOCGPKT
,TIOCGPTLCK
andTIOCGEXCL
ioctl()
calls to obtain the package mode and locking state of a pseudo terminal, and to obtain exclusive mode on a tty. (3.8)Add a module parameter to force the use of expedited RCU primitives, which can benefit some embedded applications. (3.8)
Allow selected CPUs to have RCU callbacks offloaded to kthreads to prevent or minimize OS jitter. (3.8)
Provide support in
sysfs
to determine the maximum number of virtual functions (VFs) and Single Root I/O Virtualization (SR-IOV) capable PCIe devices that are supported, and the methods that are available for enabling and disabling VFs on a per-device basis. (3.8)Add a
sysfs
node to present the available frequencies for power management. (3.8)Add the
PM_QOS_FLAG_NO_POWER_OFF
andPM_QOS_FLAG_REMOTE_WAKEUP
power management QoS device flags. (3.8)Add a
sysfs
node to present frequency transition information for power management. (3.8)
A.4 Cryptography
Ablkcipher now support encryption and decryption for AES, DES, and 3DES. (3.1)
Add an eCryptfs mount option to check that the UID of the device being mounted is the same as the expected UID. (3.1).
The
encrypted
key type has been extended with the introduction of theecryptfs
format, intended for use with the eCryptfs file system. Theecryptfs
format stores an authentication token structure inside an encrypted key payload, containing a randomly generated symmetric key. (3.1)An new user-space configuration API enables the instantiation, removal, and display of cryptographic algorithms from user space. (3.2)
An x86-64 implementation of Blowfish provides two sets of assembler functions:
Regular one-block-at-a-time (1-way) encryption and decryption functions
Four-blocks-at-a-time (4-way) functions that provide improved performance on out-of-order CPUs
On in-order CPUs, the performance of 4-way functions should be equal to that of 1-way functions. (3.2)
An x86-64 assembler implementation of the SHA1 algorithm uses Supplemental Streaming SIMD Extensions 3 (SSSE3) instructions or Advanced Vector Extensions (AVX) if available. Testing with the
tcrypt
module demonstrates that raw hash performance is up to 2.3 times faster than the C implementation. (3.2)A 3-way parallel x86-64 assembler implementation of Twofish encrypts data in three-block chunks, which improves cipher performance on out-of-order CPUs. (3.2)
Add support for MD5 algorithms to CAAM. (3.3)
RSA digital-signature verification is implemented using the multiprecision math library from GnuPG, and is used by the IMA/EVM digital signature extension. (3.3)
A 4-way parallel i586/SSE2 assembler implementation of Serpent encrypts data in 4-block chunks. (3.3)
An 8-way parallel x86-64/SSE2 assembler implementation of Serpent encrypts data in 8-block chunks (two 4-block chunk SSE2 operations are performed in parallel to improve performance on out-of-order CPUs). (3.3)
LRW and XTS support added to Serpent-sse2. (3.3)
HMAC algorithms added to Talitos. (3.3)
XTS support added to
twofish-x86_64-3way
. (3.3)Add sha224 and sha384 variants to existing AEAD algorithms in CAAM. (3.4)
Add x86-64 assembler implementation of the Camellia block cipher. Two sets of functions are provided:
Regular one-block-at-a-time (1-way) encryption and decryption functions
Two-blocks-at-a-time (2-way) functions that provide improved performance on out-of-order CPUs
On in-order CPUs, the performance of 2-way functions should be equal to that of 1-way functions. (3.4)
Add Tegra AES hardware driver supporting
ecb
,cbc
,ofb
, andansi_x9.31rng
modes, and 128, 192 and 256-bit key sizes. (3.4)Add a slice-by-8 algorithm to the existing slice-by-4 algorithm in
crc32
. The BITS size is expanded from 32 to 64, tables are extended fromtab[4][256]
totab[8][256]
, and inner-loop code is added. (3.4)Improve performance of
aesni_intel
by using parallel LRW and XTS encryption with AES-NI hardware pipelines. (3.7)Add IPSec extended sequence number (ESN) support to CAAM and Talitos. (3.7)
A x86-64/AVX assembler implementation of the Cast5 block cipher allows 16 blocks to be processed in parallel. (3.7)
Implement signature verification algorithms for RSA public key cryptography. At present, only the signature verification algorithm is supported (PKCS# | RFC3447). (3.7)
Add a crypto key parser for binary (DER) X.509 certifications, an ASN.1 decoder, and a simple ASN.1 grammar compiler. (3.7)
Add HASH-HMAC with SHA algorithms and MD5 to CAAM. (3.6)
Add hardware random number generator support to CAAM. (3.6)
Add a x86-64/AVX assembler implementation of the Serpent block cipher. (3.6)
Add x86-64/AVX assembler implementation of the Twofish block cipher. (3.6)
Add sha224, sha384, and sha512 to the existing AEAD algorithms in Talitos so that it supports all combinations of CBC (AES, 3DES-EDE) and HMAC (SHA-1, 224, 256, 384, and 512). (3.6)
A.5 Device Mapper
The always writable feature indicates that a target does not support read-only mode. (3.2)
The immutable feature indicates that a target type cannot be mixed with any other target type. Once loaded into a device, it cannot be replaced with a table that contains a different type. (3.2)
Add a singleton table that can contain only one target. (3.2)
Log device dependency allows registration of a log device so that it is included in the list of device dependencies. (3.2)
A verity target allows a device to store cryptographic hashes of file system blocks. The device can be used to check every read of the file system. If the hash of the block does not match that of the file system, the read fails. (3.4)
A.6 Driver Support
Broadcom NetXtreme II 10Gbps network adapter driver (
bnx2x
): Add AutogrEEEn support for BCM84833 and 5418se, and multiple concurrent I2 traffic classes. (3.1)Broadcom NetXtreme II iSCSI driver (
bnx2i
): Add support for 57800, 57810, and 57840. (3.1)Brocade BFA FC SCSI driver (
bfa
):FAA support
HBA diagnostic support
CEE information and statistics query
Flash configuration
Collect and reset
fcport
statisticsConfigure LUN masking
Configure QoS and collect statistics
Support for obtaining SFP information
Support for FC-transport based Asynchronous Event Notification
Support for I/O profiling
Collect or reset fabric statistics
Configure and query flash boot partition
Configure trunking on Brocade adapter ports
store driver configuration in flash memory
Brocade-1860 Fabric Adapter 16Gbs support and flash controller fixes
Brocade-1860 Fabric Adapter Hardware enablement
Brocade-1860 Fabric Adapter vHBA support
Initiator-based LUN masking
(3.1)
Emulex Blade Engine 2 10Gbps adapter driver (
be2net
): Add support for multiple Tx queues. (3.1)Emulex FC/FCoE driver (
lpfc
): Add FCF priority failover functionality. (3.1)Intel PRO/1000 PCI-Express Gigabit network adapter driver (
e1000e
): Add Jumbo Frame support for the 82583 Gigabit Ethernet Controller. (3.1)QLogic 1/10 GbE Converged/Intelligent Ethernet Adapter driver (
qlcnic
): Add multi-protocol internal loopback support. Driver can now generate loopback traffic, conduct tests, and return the results to an application. (3.1)coretemp
: Add core and package threshold support. The thresholds are configured using thetempX_max
andtempX_max_hyst
interfaces insysfs
. An interrupt is generated if the CPU temperature reaches or crosses abovetempX_max
or if it drops belowtempX_max_hyst
. To allow the hysteresis mechanism to work, the value oftempX_max
should be configured to be several degrees higher than the value oftempX_max_hyst
. (3.1)
A.7 File Systems
btrfs
Add a
DCACHE_NEED_LOOKUP
flag tod_flags
to improve the performance of ls andreaddir()
. (3.1)Switching from tree locks to reader/writer locks improves the performance of read and write-intensive workloads. (3.1)
Performance improvements in several areas, particularly for random write workloads. (3.2)
Allowing overcommit of
ENOSPC
reservations to improve performance. (3.2)Add automatic backup of superblock information about tree roots for the previous 4 commits. Add the -o recovery mount option to enable use the root history log if required. (3.2)
Add code to follow back references, replacing the manual process for walking those references, and including more detailed corruption messages. (3.2)
Allow user-space utilities to inspect metadata. (3.2)
Improve performance of checksum verification of read-aheads. (3.2)
Add the nospace_cache mount option to disable cache loading without clearing the cache. (3.2)
Improve performance of committing transactions. (3.2)
When mounting a subvolume, allow a path relative to the tree root to be specified to -o subvol. (3.2)
Rework the logic for cluster allocation. (3.3)
Rewrite the block group trimming code. (3.3)
Increase the size of system chunks. (3.3)
Remove caching code that caused unnecessary fragmentation and complexity. (3.4)
Remove the code to silently switching single chunks to RAID-0 when balancing a file system. The restriper now allows a choice of RAID-0 or concatenation. (3.4)
Support metadata blocks that are larger than 4 KB. (3.5)
The
thread_pool
size can be changed at remount time. (3.5)Add the
DEVICE_READY
ioctl()
to be used in conjunction with btrfs device readydevice
, providing a lightweight method of telling if all the devices required for a file system are currently in the cache. (3.6)Allow compression to be disabled by specifying the compress=no mount option. (3.6)
Improve multithread buffer reads. (3.6)
Support UUIDs for subvolumes, and introduce
ctime
,otime
,stime
, andrtime
for subvolumes, including atransid
for each time. (3.6)Rework the
DEV_STATS
ioctl()
to allow it to either get or reset device statistics depending on the argument specified. (3.6)Make the compress and nodatacow mount options mutually exclusive. To improve
O_SYNC
performance, asynchronous metadata checksumming is not performed under some circumstances. (3.7)
For more information, see https://btrfs.wiki.kernel.org/index.php/Changelog.
cifs
Add UID/GID to SID mapping. (3.2)
Add backup mount option. (3.2)
Allow larger rsize (up to 16 MB) and change the default to 1 MB. (3.2)
Introduce credit-based flow control. (3.4)
Add the cache=strict|none mount option to specify the cache type instead of the strictcache and forcedirectio options. The legacy options are now mutually exclusive. (3.5)
The vers=2.1 mount option forces an SMB2 mount. By default, vers=1 (CIFS) is used. (3.5)
The vers=2.0 mount option forces an SMB2.02 mount. (3.8)
ext4
Reduce CPU overhead when appending files preallocated using
fallocate()
with modeFALLOC_FL_KEEP_SIZE
via direct I/O. (3.2)Reduce CPU overhead by optimizing
memmove()
lengths in extent and index insertions. (3.2)Support block sizes of up to 1 MB using the -C option to mkfs.ext4. This change is not backwards compatible with older kernels. (3.2)
Remove the resize and journal=update mount option. (3.4)
Improve performance of truncate and unlink. (3.7)
Support online resizing of metablock group (
META_BG
) and 64-bit file systems. (3.7)Add max_dir_size_kb mount option to specify a maximum directory size. (3.7)
Re-enable -o discard functionality in no-journal mode. (3.7)
Remove support for disabling extended attributes. (3.8)
Implement support for
SEEK_DATA
andSEEK_HOLE
. (3.8)
NFS
Add support for the RAID-5 read-4-write interface. (3.2)
Add v4.0 and v4.1 mount options. (3.4)
The kernel can deduce the value of clientaddr if this mount option is not specified for NFS v4. (3.4)
Add the migration mount option that specifies whether a server supports Transparent State Migration (TSM). (3.7)
Handle IPv6 remote addresses from GETDEVICEINFO (required for pNFS). (3.8)
Remove the deprecated
nfsctl()
system call and all related code. (3.8)
pstore
Add runtime logging support for kernel messages to allow debugging of hangs caused by hardware issues. (3.6)
Add console message handling. The log size is configurable by using the
ramoops.console_size
module option, and the log is accessible at
. (3.6)pstore-mountpoint
/console-ramoopsAdd persistent function tracing. The kernel can save the function call chain log to a persistent RAM buffer, which can be decoded and dumped after a reboot. You can use the log to determine the function that was called immediately prior to a reset or panic. (3.6)
tmpfs
Increase the file size limit for tmpfs. (3.1)
Support
fallocate()
FALLOC_FL_PUNCH_HOLE
and preallocation. (3.5)
XFS
Improve performance of the inode cache. (3.1)
Improve scalability of per-file-system quotas. (3.4)
Implement support for
SEEK_DATA
andSEEK_HOLE
. (3.5)Make the
inode32
andinode64
mount options work with remounts. (3.7)Make
inode64
the default allocation mode. (3.7)Add the
XFS_IOC_FREE_EOFBLOCKS
ioctl()
to enableEOFBLOCKS
scanning. (3.8)
A.8 Memory Management
Add
memory.vmscan_stat
memory control group that displays numbers of scanned, rotated, and freed pages, and elapsed times for direct reclaim and soft reclaim. (3.1)Extend the memory hotplug API to allow memory hotplug in virtual machines. Also required for the Xen balloon driver. (3.1)
Fix significant stalls in the page allocator when copying large amounts of data on NUMA machines. (3.1)
Add
slub_debug
method to theslub
slab allocator to check if memory is not freed and help diagnose memory usage. (3.1)Reduce CPU overhead of
slub_debug
. (3.1)The cross memory attach feature adds the system calls
process_vm_readv
andprocess_vm_writev()
, which allow data to be transferred between the address spaces of the two processes without passing through kernel space. (3.2)Add a block plug for page reclaim to
vmscan
that reduces CPU overhead by reducing lock contention and merging requests. (3.2)Implement per-CPU cache in slub for partial pages. (3.2)
Restrict access to slab files under
procfs
andsysfs
, hidingslabinfo
and/sys/kernel/slab/*
. (3.2)Add the
slab_max_order
kernel parameter that determines the maximum allowed order for slabs. High settings can cause OOMs due to memory fragmentation. The default value is 1 for systems with more than 32 MB of RAM. Otherwise, the default value is 0. (3.3)To increase the probability of detecting memory corruption, change the buddy allocator to retain more free, protected pages and to interlace free, protected pages and allocated pages. (3.3)
Charge the pages dirtied by an exited process to random dirtying tasks. (3.3)
Allow the poll time and call intervals to balance dirty pages to be controlled by the value of the
max_pause
parameter. (3.3)Fix dirtied pages accounting on sub-page writes. (3.3)
Introduce the dirty rate limit to compensate a task's think time when computing the final pause time. (3.3)
Reduce dirty throttling polls and CPU overhead. (3.3)
Avoid tiny dirty poll intervals. (3.3)
Make swap-in read-ahead skip over holes, allowing the system to swap back in at several MB/s, instead of a few hundred kB/s. (3.4)
Introduce bit-optimized iterator and radix tree cleanup in the core page cache. (3.4)
Improve allocation of contiguous memory chunks by adding DMA mapping helper functions. (3.5)
Remove swap token code and lumpy reclaim. (3.5)
Improve throughput and reduce CPU overhead by allowing swap read-ahead to be merged. (3.6)
Add cgroup controller that allows HugeTLB usage per control group to be limited and enforces the limit during page faults. (3.6)
A.9 Networking
Add CPU fanout policies for hashing to the packet interface based on mapping socket buffers to Rx hashes, and a pure round-robin scheme. (3.1)
Improve the client announcement mechanism in the Better Approach To Mobile Adhoc Networking (B.A.T.M.A.N.) routing protocol. The change resolves performance and latency issues with the previous implementation by appending client changes (new client joined or client left) to the OGM. System overhead is reduced by allowing nodes to modify their global tables by means of updates. The new
ROAMING_ADVERTISEMENT
packet type eliminates latency and packet drop issues seen with OGM broadcasting. (3.1)Add support for zero-copy socket buffers. Adds user-space buffer support in the socket buffer shared information. (3.1)
Use MD5 to compute protocol sequence numbers and fragment IDs per RFC1948. Update code to take into account current CPU speeds and to use a full 32-bit sequence number. (3.1)
Add a multicast group for DCB to provide a clean method for disseminating kernel DCB link attributes to user space. (3.1)
Add SELinux context support to the
AUDIT
target ofnetfilter
. (3.1)Add range support for IPv4 to
netfilter
. (3.1)Lower the default init retransmission timeout (RTO) from 3 seconds to 1 second per RFC2988bis. The RTO falls back to 3 seconds if a
SYN
orSYN-ACK
packet has been retransmitted and the TCP time stamp option is not on. (3.1)Implement support for Auto-ASCONF (see RFC5061) in the Stream Control Transmission Protocol (SCTP) stack. The change includes features for enabling and configuring settings. (3.1)
Reduce the false sharing effect. (3.1)
Reduce CPU overhead of
check_leaf()
with the route cache disabled. (3.1)Add support to the
virtio_net
driver to obtain Rx and Tx ring parameter information from an Ethernet device. Used by the ethtool -g ethX
command. (3.2)Implement AP isolation on the receiver and sender side for B.A.T.M.A.N. When a node receives a unicast packet, it checks whether the source and destination client can communicate due to the AP isolation. (3.2)
Remove the IPv4
gc_interval
fromsysctl
. (3.2)Add
TPACKET_V3
support including a flexible buffer implementation. (3.2)Allow forwarding of some link-local frames by network bridges. You can use
/sys/class/net/br
inX
/bridge/group_fwd_masksysfs
to control frame forwarding. (3.2)Implement TCP proportional rate reduction. (3.2)
Add
netlink
-based Content Addressable Network (CAN) routing. (3.2)Add support for the socket monitoring interface used by the ss tool. (3.3)
Add support for the SCSI RDMP Protocol (SRP) target driver. The SRP protocol allows an initiator to access a block storage device on another host (target) over a network that supports the RDMA protocol. Currently, the RDMA protocol is supported by InfiniBand. (3.3)
Add unresolved queue limits to
neigh
. Deprecate/proc/sys/net/ipv4/neigh/default/unres_qlen
, and replace it withunres_qlen_bytes
. (3.3)Add CAIF USB support. (3.3)
Add an extended accounting infrastructure for
netfilter
overnfnetlink
, which allows the display of real-time traffic accounting without requiring a complicated and resource-consuming implementation in user space. (3.3)Add
nfacct match
tonetfilter
, which supports extended accounting. (3.3)Add reverse patch filter (
rpfilter
) tonetfilter
, which allows matching of packets where replies use the same interface on which the packet arrived. (3.3)Add adaptive random early detection (RED) active queue management (AQM) to the packet scheduler. (3.3)
Add an optional RED on top of stochastic fairness queueing (SFQ) to the packet scheduler, enabling SFQ features such as specifying a smaller per flow limit for in-flight packets, up to 65408 active flows (as compared to 127 previously), head drops instead of tail drops, and optional RED on each SFQ flow queue. (3.3)
Add 802.1q
netpoll
support tovlan
. (3.3)Add
NTF_USE
bridge support plus other changes to allow the control of forwarding database vianetlink
. (3.3)New plug-queuing discipline allows a user space application to plug or unplug a network output queue via the Netlink interface. (3.4)
Add the ability to change the routing algorithm at runtime to B.A.T.M.A.N. (3.4)
RCU conversion in TCP allows access to MD5 keys without locking the listener socket. (3.4)
For some workloads, allowing
splice()
to build full TSO packets can reduce number of logical packets sent by an order of magnitude, making zero-copy TCP faster than one-copy. (3.4)Add the
SO_PEEK_OFF
socket option. (3.4)Support peeking offset for datagram sockets, seqpacket sockets, and stream sockets. (3.4)
Add
MSG_TRUNC
support for datagram sockets so thatrecv()
returns the real length of the packet, even if it is longer than the passed buffer. (3.4)Add missing
SO_NOFCS
socket option. (3.4)Add timeout extension to
netfilter
, which allows timeout policies to be attached to the flow via the connection tracking target. Add thecttimeout
infrastructure for fine timeout tuning. (3.4)Add NAT support for expectation classes in
netfilter
. (3.4)Add exceptions support to
netfilter
. (3.4)Merge
ipt_LOG
andip6_LOG
intoxt_log
innetfilter
. (3.4)Add hardware-independent IEEE 802.15.4 networking stack for softMAC devices. (3.5)
Tune performance of
sk_add_backlog
. (3.5)Add binary option type, a load-balancer module, a per-port option for enabling or disabling ports, and support for per-port options to the
team
device. (3.5)Add raw packet
QP
typeIB_QPT_RAW_PACKET
to InfiniBand core. This allows applications to build a complete packet, including L2 headers, when sending. On the receive side, the hardware does not strip any headers. This feature is designed for user-space direct access to Ethernet. (3.5)Treat ND option 31 as user land (DNSSL support) in IPv6 per RFC6106. The 8-bit identifier of the DNSSL option type assigned by the IANA has the value 31. (3.5)
Replace basic bridge loop avoidance code in the
batman-adv
module. (3.5)Set traffic class for CAIF packets based on socket priority, CAIF protocol type, or type of message. (3.5)
Add generic
PF_BRIDGE:RTM_FDB
hooks and two new flags:NTF_MASTER
andNTF_SELF
. (3.5)Add Explicit Congestion Notification (ECN) capability to
pktsched
. Instead of dropping packets, attempt to mark them as ECN. (3.5)Remove support for token ring. (3.5)
Remove support for Econet protocol. (3.5)
Add an optional QoS attribute to DCB netlink to allow the setting of a rate limit for an ETS TC. 3.5
Add CEE notify calls when an APP change or
setall
command is made from user space. (3.5)Add HMARK target support to
netfilter
. (3.5)If
net.bridge.bridge-nf-filter-vlan-tagged
is enabled insysctl
, bridgenetfilter
removes thevlan
header temporarily and feeds the packet toiptables
orip6tables
. Addbridge-nf-pass-vlan-input-device
, which if set toon
(default isoff
),netfilter
also sets thein
interface to thevlan
interface if this interface exists. This change allows theiptables
REDIRECT
target work with vlan-on-top-of-bridge configurations and the use of iptables -i" to match the vlan device name. (3.5)Allow byte-based limit mode can be used with
netfilter
, for example, to support ingress-traffic policing or to detect when a host or port consumes more bandwidth than expected. (3.5)Add support for sync threads to
netfilter
. (3.5)Remove
ip_queue
support fromnetfilter
. (3.5)Add support for Layer 2 Tunneling Protocol (L2TP) over UDP in IPv6. (3.5)
Add L2TPv3 IP encapsulation support for IPv6. (3.5)
Add
netlink
API for L2TPv3 unmanaged tunnels over IPv6. (3.5)Remove IPv4 routing cache that was vulnerable to denial of service attacks. (3.6)
Implement RFC 5691 3.2 and RFC 5961 4.2 (Mitigation against Blind Reset attack using RST bit and SYN bit). (3.6)
Add VTI support. (3.6)
Add an interface option
route_localnet
that enables the routing of the 127/8 address block and processing of ARP requests on a specific interface (for example, to address a pool of virtual guests behind a load balancer). (3.6)Add
multiqueue
andnetpoll
support toteam
. (3.6)Add experimental zero-copy Tx support to
tun
. (3.6)Add support for 40GbE. (3.6)
Add fail-open support to
netfilter
, where the queue-full condition does not drop packets. (3.6)Add user-space connection tracking helper infrastructure to
netfilter
. (3.6)Extends the
ethtool
interface to add support for the EEE commands:get_eee
'andset_eee
. (3.6)Add Generic Routing Encapsulation (GRE) over IPv6, generic segmentation offload (GSO), and GRO capability. (3.7)
Set default MTU for
loopback
devices to 64 KB. Allows TCP stacks to build large frames and significantly reduces stack overhead. (3.7)Add an extended attribute to store data for the mapping between inode numbers in
sockfs
and protocol types for use by lsof. 3.7Implement a per-task fragmentation allocator, which can improve TCP stream performance by 20% on
loopback
devices. (3.7)Various
netfilter
changes:Add a protocol-independent NAT core.
Add IPv6
MASQUERADE
target.Add IPv6
NETMAP
target.Add IPv6
REDIRECT
target.Add IPv6
AT
support.Support IPv6 FTP NAT helper.
Support IPv6 IRC NAT helper.
Support IPv6 SIP NAT helper.
Support IPv6 in the amanda NAT helper.
Add stateless IPv6-to-IPv6 Network Prefix Translation target.
Remove
xt_NOTRACK
.
(3.7)
Add link layer control (LLC) core layer to HCI 2, add an SHDLC
llc
module to thelic
core, and add LLCP raw socket support to NFC. (3.7)Support IPv6 transmit hashing (and TCP or UDP over IPv6) in the bonding driver. (3.7)
Add support for dumping diagnostic core and basic socket information (family, type and protocol) at socket creation time. (3.7)
Add support to
ethtool
for setting the MDI/MDI-x state for twisted-pair wiring. (3.7)Add 64-bit statistics support to PPP, including
tx_bytes
,rx_bytes
,tx_packets
, andrx_packets
. 3.7Add generic
netlink
support fortcp_metrics
that allows unlinking and deletion of entries after a grace period. (3.7)Add bridge port parameters over
netlink
to permit dumping, monitoring, and changing the bridge multicast database. (3.8)Add support for RFC 5961 5.2 Blind Data Injection Attack Mitigation. (3.8)
Change default TCP hash size, and add support for hardware-offloaded encapsulation and offloading of encapsulated packets for VXLAN and IP GRE. (3.8)
Add vlan tag access to
netfilter
. (3.8)Add extensions to VXLAN to support Distributed Overlay Virtual Ethernet (DOVE) networks. (3.8)
Add IPv6
set
action functionality toopenswitch
. (3.8)Add GSO support to IPIP tunnels, increasing the performance of a single TCP flow. (3.8)
Implement IPv6 fragment handling for IPVS (3.8)
Add support in
netfilter
for querying the destination address of a redirected connection. (3.8)Add
NOTRACK
target recovery tonetfilter
. (3.8)Implement QFQ+ in
sched
. (3.8)Add support for RTM_GETNETCONF to routing
netlink
. (3.8)Add support for per-association statistics by implementing the
SCTP_GET_ASSOC_STATS
call for the Stream Control Transmission Protocol (SCTP). (3.8)Add a
sysctl
that allows the selection of the HMAC algorithm (static or dynamic) used by SCTP. (3.8)Add support for
SO_ATTACH_FILTER
required to save the full state of a socket. (3.8)Convert tun/tap into a multiqueue device and expose the queues as file descriptors in user space. (3.8)
A.10 perf Utility
Add the --symfs option to perf annotate. (3.2)
Add the
drop monitor
script. (3.2)Add the -o and --append options to perf stat. (3.2)
Add the -M option. (3.2)
Add annotation output controls to all perf tools that have integrated annotation. (3.2)
Include information about the host environment in
perf.data
:-
HEADER_HOSTNAME
Host name.
-
HEADER_OSRELEASE
Kernel release number.
-
HEADER_ARCH
Hardware architecture.
-
HEADER_CPUDESC
Generic CPU description.
-
HEADER_NRCPUS
Number of online, available CPUs.
-
HEADER_CMDLINE
perf command line.
-
HEADER_VERSION
perf version.
-
HEADER_TOPOLOGY
CPU topology.
-
HEADER_EVENT_DESC
Full event description (
attrs
).-
HEADER_CPUID
Easy-to-parse, low-level CPU identification.
(3.2)
-
Accept FIFOs as input files. (3.3)
Add -a option for system-wide profiling. (3.3)
Implement printing snapshots to files. (3.6)
Add sort by source line number. (3.6)
Add PMU event alias support. (3.6)
Add support for perf kvm stat to analyze
kvm
vmexit
,mmio
, andioport
. (3.7)Add union member access. (3.7)
Add --list-opts option to print long option names for use with bash. (3.7)
Add script browser. (3.8)
Add new display options (-F, -p, and -P) to perf diff. (3.8)
perf inject now supports input from a file. 3.8
Add --pre and --post options to perf stat. (3.8)
Add gtk.
command
config option to launch the GTK browser. This is equivalent to specifying --gtk option on command line (3.8)Add new features to perf trace. (3.8)
Expose hardware events translations in
sysfs
. (3.8)Add
trace_options
boot parameter to set trace options at boot time, such as enabling event stack dumps. (3.8)
A.11 Power Management
Add a generic DVFS framework with device-specific (non-CPU) OPPs. (3.2)
Improve performance of LZO/plain hibernation. (3.2)
Implement per-device power management QoS contraints. (3.2)
A.12 Security
Add
/sys/kernel/security/tomoyo/audit_interface
, which generates audit logs in the form of domain policy so they can be reused and appended to domain_policy interface by the TOMOYO auditing daemon (tomoyo-auditd
). TOMOYO is a kernel security module which implements mandatory access control (MAC). (3.1)Add ACL group support for TOMOYO, which allows permissions to be globally granted. (3.1)
Add policy namespace support for LXC (Linux containers). The policy namespace has its own set of domain policy, exception policy and profiles, independent of other namespaces. (3.1)
Add built-in policy support needed to support enforcing mode from early in the boot sequence. (3.1)
Make several TOMOYO options configurable to support activating access controls without calling an external policy loader program. (3.1)
Permit the use of the following properties as conditions with TOMOYO:
argv[]
,envp[]
,execve()
, executable's real path and symlink target, owner or group of file objects, and the UID or GID of the current thread. (3.1)Implement Extended Verification Module (EVM), which protects a file's security extended attributes (
xattrs
) against integrity attacks. (3.2)Implement Smack protections for domain transition: BPRM unsafe flags, secure exec, clear unsafe personality bits, and clear parent death signal. (3.2)
Enhance performance of Smack rule list lookups. (3.2)
Allow user access to
/smack/access
, removing the requirement forCAP_MAC_ADMIN
. (3.2)Add environment variable name restriction to TOMOYO. (3.2)
Add socket operation restriction to TOMOYO. (3.2)
Add control for generation of access granted logs in TOMOYO. (3.2)
Allow domain transition without
execve()
in TOMOYO. (3.2)Allow audit matching on inode
gid
. (3.3)Allow inter-field comparison in audit rules between the
gid
of a running task and thegid
of an inode. (3.3)Add a new audit filter type
AUDIT_FIELD_COMPARE
to indicate which fields should be compared. (3.3)Allow system call exit filter matching based on the
uid
of the owner of an inode used in the call. (3.3)Add support for digital signature verification in EVM. File metadata can be protected using digital signatures instead of HMAC. (3.3)
Add a Yama Linux security module to collect DAC security improvements. (3.4)
Add AppArmor security module file tracking to
securityfs
. (3.4)Add AppArmor security module initial features directory to
securityfs
for displaying boolean features flags and the known capability mask. (3.4)Add
default_type
statements to SELinux. (3.5)Add default source and target selectors for the user, role, and range of new objects in SELinux. (3.5)
Allow seek operations on the file-exposing policy used by the sesearch SELinux policy query tool. (3.5)
Add auditing of failed attempts to set invalid labels in SELinux. (3.5)
Add checking for the open permission on truncate calls to SELinux. (3.5)
Support long Smack labels. (3.5)
Set recursive transmute attribute for Smack in all cases. (3.5)
Allow manager programs which do not start with
/
in TOMOYO to handle differences between distributions. (3.5)Add two modes to the Yama
ptrace
restrictions. (3.5)Add support for invalidating a key. (3.5)
Implement revoking of all rules for a subject label in Smack. (3.7)
Allow Yama to be unconditionally stacked, regardless of which LSM module is primary. (3.7)
Add the Integrity Measurement Architecture, which supports audit log hashes, digital signature verification, and the integrity appraisal extension. (3.7)
A.13 Storage
Block management in the software RAID MD layer now adds bad blocks to a bad-block list so that the system does not use them. (3.1)
A.14 Virtualization
Add memory hotplug support for the Xen balloon driver. (3.1)
Add Xen PCI backend driver. (3.1)
Implement discard requests and support old-style BARRIER. (3.2)
Increase recommended maximum number of VCPU from 64 to 160. (3.4)
Allow host IRQ sharing for assigned PCI 2.3 devices. (3.4)
Add infrastructure for software and hardware-based TSC rate. (3.4)
Move the Hyper-V storage driver out of the staging area. (3.4)
Add support for VLAN trunking to Hyper-V. Linux guests can now configure multiple VLANs using a single synthetic NIC on a Windows 8 Hyper-V host. (3.4)
Support new KVP message types. (3.4)
Support new KVP verbs for Hyper-V in the user level daemon. (3.4)
Implements multiconsole support for Hyper-V. 3.4
Support enumeration from all available pools for Hyper-V. (3.4)
Update Xen ACPI processor to implement C and P state driver that uploads ACPI data to the hypervisor. (3.4)
Add netconsole support to Xen. (3.4)
Use the S4 code to provide S3 support for
virtio
devices. (3.4)Add a
virtio
-based remote processor messaging bus to allow message-based communication with the remote processor (if supported by the firmware). (3.4)Add direct MSI message injection for in-kernel IRQ chips. (3.5)
Unregister from the
hwrng
interface and remove thevirtio
queue before entering the S3 or S4 states. On restore, add thevirtio
queue and re-register withhwrng
. (3.6)Add
mcelog
support to Xen. (3.6)Reduce the I/O path in the guest kernel to achieve high IOPS and lower latency. (3.7)
Add Xen EFI video mode support. (3.7)
Implement backend support for paged out grant targets (retry loop and hooks). (3.7)
Implement Xen ACPI processor aggregator driver (
pad
). (3.8)Remove support for i386 processors. (3.8)