Appendix A Other Changes
The following sections describe other features of Unbreakable Enterprise Kernel Release 3 (UEK R3). The mainline version in which a feature was introduced is noted in parentheses.
A.1 Architecture
vsysscallemulation andvsyscallparameter. (3.1)INTEL_MIDconfiguration. (3.1)mrst_pmudriver for Intel Moorestown Power Management Unit. (3.1)Hardware memory error recovery support for ACPI, APEI, and GHES. (3.1)
printk()support for recoverable error via NMI for ACPI, APEI, and GHES. (3.1)
A.2 Block Devices
Strict CPU affinity can be enabled by setting the value of
/sys/block/to 2. Performance on some systems benefits from being directed to the strict requester CPU rather than using per-socket steering. (3.1)blkdev/queue/rq_affinityCFQ I/O scheduler performance tuning adds think time check for a group, which makes bandwidth usage more efficient by not leaving queues active when there are no further requests for the group. (3.1)
Flakey target support in the device mapper adds the
corrupt_bio_byteparameter to simulate corruption by overwriting a byte at a specified position with a specified value while the device is down. Thedrop_writesoption parameter drops writes silently while the device is down. (3.1)The device mapper supports MD RAID-1 personality through the
dm-raidtarget. (3.1)The device mapper supports the ability to parse and use metadata devices with
dm-raid. Without the metadata devices, many RAID features would be unavailable. (3.1)Experimental support for thin provisioning in the device mapper allows the creation of multiple thinly provisioned volumes from a storage pool and recursive snapshots to an arbitrary depth. (3.2)
I/O-less dirty throttling and reduced file-system writeback from page reclamation greatly reduces I/O seeks and CPU contention. (3.2)
The
cfq_target_latencyparameter undersysfsallows throughput and read latency to be tuned. (3.4)The device mapper supports adding and removing space at the end of the devices when resizing RAID-10 arrays with
nearandoffsetlayouts. (3.4)Thin target in the device mapper supports discards. When non-discard I/O completes and the associated mappings are quiesced, any discards that were deferred (via
ds_add_work()inprocess_discard()) are queued for processing by the worker thread. (3.4)Thin target in the device mapper provides user-space access to pool metadata. Two new messages can be sent to the thin pool target allowing it to take a snapshot of the metadata. This read-only snapshot can be accessed from user space concurrently with the live target. (3.5)
Thin target in the device mapper uses dedicated slab caches (whose names are prefixed with
dm_) rather than relying onkmallocmemory pools backed by generic slab caches. This allows independent accounting of memory usage and any associated memory leakage by thin provisioning. (3.5)RAID-5 XOR checksumming is optimized by taking advantage of the 256-bit YMM registers introduced by Advanced Vector Extensions (AVX). (3.5)
RAID-6 includes Supplemental Streaming SIMD Extensions 3 (SSSE3) optimized recovery functions and a new algorithm for selecting the most appropriate function to use for recovery. (3.5)
MD allows a reshape operation to be reversed by implementing a new
reshape_directionattribute that can be set whendelta_disksis zero, and which can take one of the valuesforwardorbackwards. (3.5)A RAID-10 array can be reshaped to a different
nearoroffsetlayout, a different chunk size, and a different number of devices. The number of copies cannot be changed. (3.5)An existing partition can be resized, even if currently in use, by using the operation code
BLKPG_RESIZE_PARTITIONwith theBLKPGioctl(). (3.6)Add MD support for
RAID10(striped mirrors) andRAID1E(integrated adjacent stripe mirroring). (3.6)Thin target in the device mapper adds
read-onlyandfail-iomodes to thin provisioning. If a transaction commit fails, a pool's metadata device transitions toread-onlymode. If a commit fails when the device is inread-onlymode, a transition tofail-iomode occurs. Infail-iomode, the pool and all associated thin devices report a status offailif a commit fails. (3.6)The persistent data debug space map checker has been removed from the device mapper. The feature consumed a lot of memory and caused other issues when enabled on large pools. (3.6)
RAID-1 in MD now prevents the merging of large requests to enhance the performance of SSD devices that function more efficiently with large request transfers. (3.6)
Support for the
WRITE SAMErequest implemented on some SCSI devices to allow a block to be efficiently replicated throughout a block range. Only a single logical block need be transferred from the host. The storage device writes the same data to all blocks specified by the request. (3.7)The
BLKZEROOUT ioctl()can be used to zero out block ranges viablkdev_issue_zerooout(). (3.7)Fastmap support provides a method for attaching an unsorted block image (UBI) device in real-time. Rather than scanning the entire device, Fastmap locates a checkpoint. (3.7)
MD adds
TRIMdiscard support for linear RAID-0, RAID-1, RAID-5, and RAID-10. (3.7)DM adds rebuild capacity and replacement slot validation for RAID-10 arrays. (3.7)
RAID-6 recovery is optimized by taking advantage of the 256-bit YMM registers introduced by Advanced Vector Extensions 2 (AVX2). (3.8)
A.3 Core Kernel Functionality
Add a lock-less NULL-terminated single list. (3.1)
Add a library function implementing a
crc8algorithm to support thebrcm80211driver. (3.1)Make the
gen_poolmemory allocator lockless. This change makes it safe to use the memory allocator in NMI handlers and other special unblockable contexts where deadlocks might occur. (3.1)Implement the
PTRACE_INTERRUPT,PTRACE_LISTEN,PTRACE_SEIZE, andTRAP_NOTIFYptrace()requests. (3.1)Adds
/sys/module/files to all module entries to provide a method for managing built-in modules from user space. (3.1)module_name/ueventAdd support for the implementation of
SEEK_HOLEandSEEK_DATAinlseek(). (3.1)Add the
!escape character to/inhostnameandcommstrings in core dumps. (3.1)If the value of the
sysctlparametershm_rmid_forcedis set to 11, all shared memory objects are marked for removal withIPC_RMID. As this change breaks POSIX compliance, you need to ensure that no threads are using the orphaned memory. (3.1)Add support for generic I/O power management domains (v8) by introducing common headers, helper functions, and callbacks to allow platforms to use simple, generic power domains for runtime power management. (3.1)
Add system-wide power transitions (system suspend and hibernation) support for generic domains (v5). Add
suspend,resume,freeze,thaw,poweroff, andrestorecallbacks that are associated withstruct generic_pm_domainobjects and havepm_genpd_init()interpret them as appropriate. (3.1)Add wakeup device support for system-sleep transitions. Introduce a new generic power management domain callback routine,
.active_wakeup(). This routine is used during thenoirqphase of system suspend and hibernation to decide how to handle wakeup devices. (3.1)Add the ability to set a maximum limit for allowable CPU bandwidth to the process bandwidth controller. The limit is specified as a quota and a period for a group of processes. (3.2)
To reduce the performance impact from using
i_mutexlock withgeneric_file_llseek(), an almost locklessgeneric_file_llseek()is added to VFS that allows the maximum file size of the file system to be passed in, instead of always usingmaxbytesfrom the superblock. (3.2)A boot parameter of the form
root=PARTUUID=extends theuuid,PARTNROFF=partition_number_offsetroot=PARTUUID=syntax to select the root partition by specifying an integer offset from a known, unique partition. (3.2)uuidAdd a fault reporting mechanism to the input/output memory management unit (IOMMU) API. (3.2)
Allow partition creation from user space and add discard support for loop devices. (3.2)
When performing AIO, allocate
kiocbstructures in batches to reduce the CPU overhead of a process taking and releasing the context lock. (3.2)Add support for the tagged files ease-of-use feature in
sysfs. (3.2)Add a
commchange event to the process connector. (3.2)Add architecture-independent support for
highmempage poisoning and verification todebug-pagealloc. (3.2)Add support for
poll()insysctlso that user-space applications can be notified of changes tosysctlentries. (3.2)The x32 kernel ABI (kABI) allows programs to take advantage of x86-64 features such as a larger number of CPU registers, better floating-point performance, faster position-independent code shared libraries, function parameters passed via registers, and faster system-call instructions. The kABI uses 32-bit pointers and avoids the overhead of 64-bit pointers. The program is limited to a 4-GB virtual address space. However, reducing the memory footprint can also allow a program to run faster. (3.4)
The
nomodulekernel parameter can be used to disable module loading as an alternative to usingsysctl.The
prctl()PR_GET_CHILD_SUBREAPERandPR_SET_CHILD_SUBREAPERoptions implement simple process supervision of orphaned processes. (3.4)Thread stacks are now marked correctly for
proc/underpid/mapsprocfs. (3.4)Restore the
sysctlsettingkernel.pty.maxas the global limit of pseudo terminals (by default, 4096). (3.4)Add abilities to turn the reboot notifier on or off, and to enter the debugger and stop kernel execution before rebooting. (3.4)
To improve performance, VFS now uses
unsigned longaccesses fordcachename comparison and hashing. (3.4)/proc/entries provide information about task children and can be useful for process checkpoint and restore operations. (3.5)pid/task/tid/children/proc/now reports whether file pages arepid/pagemapshared-anonorfile-page. (3.5)The
skew_tickboot option mitigatesxtime_lockcontention on larger systems or read-copy-update (RCU) lock contention on all systems whenCONFIG_MAXSMPis set. This option increases power consumption and should only be enabled if the system runs jitter-sensitive workloads (typically, HPC or RT). (3.5)Inode
statinformation is moved closer together to increase the likelihood of cache hits. (3.5)The
fallocate()file-system operation allows preallocation space for a file. (3.5)Stale power-aware scheduling remnants and dysfunctional knobs have been removed from the process scheduler. (3.5)
The
EPOLLWAKEUPflag prevents system suspension whileepollevents are ready. (3.5)ramoopsuses thepstoreinterface instead of/dev/mem. (3.5)Add ECC support to
pstore/ram. (3.5)make tools is now integrated with the kernel build system. (3.5)
The kernel parameter
RCU_FANOUT_LEAFcan be used to control leaf-level fanout for RCU locking to reduce cache-miss initialization latencies on large systems. (3.5)RCU locking now implements a direct algorithmic sleepable RCU (SRCU) implementation to prevent OS jitter and performance degredation. (3.5)
Add
rbtreenode caching support to IPCmqueuefor the case where the queue is empty, improve performance ofsend/recv, and update maximums for themqueuesubsystem. (3.5)Add symbolic and hard link restrictions to VFS to address security issues. (3.6)
Improvements to the IOMMU group implementation. (3.6)
Remove the non-working x86 power estimation feature from the process scheduler. (3.6)
Add hysteresis attributes (used by most thermal sensors) on a per-trip-point basis to the thermal framework. (3.6)
Add support for states that affect multiple CPUs. This is potentially useful in implementations where CPUs leverage a shared, coupled power state. (3.6)
The
rcutree.rcu_fanout_leafboot parameter allows the value ofRCU_FANOUT_LEAFto be increased but not decreased. (3.6)Firmware files can be loaded directly from the file system rather than from
udev. (3.7)xattrsupport in cgroups allow run-time metadata to be attached to cgroups. (3.7)The disable_nmi command in kdb disables NMI-entry and releases the port. (3.7)
Add a special serial console driver to allow the temporary use of an NMI debugger port as a normal console via the nmi_console command. (3.7)
RCU locking changes:
Control grace period duration from
sysfs.Make
rcutreemodule parameters visible insysfs.Allow an RCU lock to be placed in an extended quiescent state when the CPU runs in user space.
(3.7)
Add system call to enforce that kernel modules are loaded only from a read-only cryptographically verified root file system. (3.8)
Applications can choose between using 1-GB and 2-MB huge pages. Typically, this feature is used in conjunction with a NUMA policy. (3.8)
Add option to allow assignment of a memory node as movable memory, which allows an entire node to be hot-pluggable. (3.8)
Add
sysctlvariables to tune checkpoint/restart in user space (CRIU) including specifying the ID of the next IPC object to be allocated. (3.8)Introduce CRIU message queue copy feature so that all pending IPC messages can be retrieved without deleting them from the queue. (3.8)
Correct the implementation of hierarchy support for the freezer cgroup. If a cgroup is frozen, all its descendants are also frozen. (3.8)
Implement the
PTRACE_O_EXITKILLptrace()request. (3.8)Add the
VmFlagsfield to/proc/output. Required by CRIU. (3.8)PID/smapsAdd
TIOCGPKT,TIOCGPTLCKandTIOCGEXCLioctl()calls to obtain the package mode and locking state of a pseudo terminal, and to obtain exclusive mode on a tty. (3.8)Add a module parameter to force the use of expedited RCU primitives, which can benefit some embedded applications. (3.8)
Allow selected CPUs to have RCU callbacks offloaded to kthreads to prevent or minimize OS jitter. (3.8)
Provide support in
sysfsto determine the maximum number of virtual functions (VFs) and Single Root I/O Virtualization (SR-IOV) capable PCIe devices that are supported, and the methods that are available for enabling and disabling VFs on a per-device basis. (3.8)Add a
sysfsnode to present the available frequencies for power management. (3.8)Add the
PM_QOS_FLAG_NO_POWER_OFFandPM_QOS_FLAG_REMOTE_WAKEUPpower management QoS device flags. (3.8)Add a
sysfsnode to present frequency transition information for power management. (3.8)
A.4 Cryptography
Ablkcipher now support encryption and decryption for AES, DES, and 3DES. (3.1)
Add an eCryptfs mount option to check that the UID of the device being mounted is the same as the expected UID. (3.1).
The
encryptedkey type has been extended with the introduction of theecryptfsformat, intended for use with the eCryptfs file system. Theecryptfsformat stores an authentication token structure inside an encrypted key payload, containing a randomly generated symmetric key. (3.1)An new user-space configuration API enables the instantiation, removal, and display of cryptographic algorithms from user space. (3.2)
An x86-64 implementation of Blowfish provides two sets of assembler functions:
Regular one-block-at-a-time (1-way) encryption and decryption functions
Four-blocks-at-a-time (4-way) functions that provide improved performance on out-of-order CPUs
On in-order CPUs, the performance of 4-way functions should be equal to that of 1-way functions. (3.2)
An x86-64 assembler implementation of the SHA1 algorithm uses Supplemental Streaming SIMD Extensions 3 (SSSE3) instructions or Advanced Vector Extensions (AVX) if available. Testing with the
tcryptmodule demonstrates that raw hash performance is up to 2.3 times faster than the C implementation. (3.2)A 3-way parallel x86-64 assembler implementation of Twofish encrypts data in three-block chunks, which improves cipher performance on out-of-order CPUs. (3.2)
Add support for MD5 algorithms to CAAM. (3.3)
RSA digital-signature verification is implemented using the multiprecision math library from GnuPG, and is used by the IMA/EVM digital signature extension. (3.3)
A 4-way parallel i586/SSE2 assembler implementation of Serpent encrypts data in 4-block chunks. (3.3)
An 8-way parallel x86-64/SSE2 assembler implementation of Serpent encrypts data in 8-block chunks (two 4-block chunk SSE2 operations are performed in parallel to improve performance on out-of-order CPUs). (3.3)
LRW and XTS support added to Serpent-sse2. (3.3)
HMAC algorithms added to Talitos. (3.3)
XTS support added to
twofish-x86_64-3way. (3.3)Add sha224 and sha384 variants to existing AEAD algorithms in CAAM. (3.4)
Add x86-64 assembler implementation of the Camellia block cipher. Two sets of functions are provided:
Regular one-block-at-a-time (1-way) encryption and decryption functions
Two-blocks-at-a-time (2-way) functions that provide improved performance on out-of-order CPUs
On in-order CPUs, the performance of 2-way functions should be equal to that of 1-way functions. (3.4)
Add Tegra AES hardware driver supporting
ecb,cbc,ofb, andansi_x9.31rngmodes, and 128, 192 and 256-bit key sizes. (3.4)Add a slice-by-8 algorithm to the existing slice-by-4 algorithm in
crc32. The BITS size is expanded from 32 to 64, tables are extended fromtab[4][256]totab[8][256], and inner-loop code is added. (3.4)Improve performance of
aesni_intelby using parallel LRW and XTS encryption with AES-NI hardware pipelines. (3.7)Add IPSec extended sequence number (ESN) support to CAAM and Talitos. (3.7)
A x86-64/AVX assembler implementation of the Cast5 block cipher allows 16 blocks to be processed in parallel. (3.7)
Implement signature verification algorithms for RSA public key cryptography. At present, only the signature verification algorithm is supported (PKCS# | RFC3447). (3.7)
Add a crypto key parser for binary (DER) X.509 certifications, an ASN.1 decoder, and a simple ASN.1 grammar compiler. (3.7)
Add HASH-HMAC with SHA algorithms and MD5 to CAAM. (3.6)
Add hardware random number generator support to CAAM. (3.6)
Add a x86-64/AVX assembler implementation of the Serpent block cipher. (3.6)
Add x86-64/AVX assembler implementation of the Twofish block cipher. (3.6)
Add sha224, sha384, and sha512 to the existing AEAD algorithms in Talitos so that it supports all combinations of CBC (AES, 3DES-EDE) and HMAC (SHA-1, 224, 256, 384, and 512). (3.6)
A.5 Device Mapper
The always writable feature indicates that a target does not support read-only mode. (3.2)
The immutable feature indicates that a target type cannot be mixed with any other target type. Once loaded into a device, it cannot be replaced with a table that contains a different type. (3.2)
Add a singleton table that can contain only one target. (3.2)
Log device dependency allows registration of a log device so that it is included in the list of device dependencies. (3.2)
A verity target allows a device to store cryptographic hashes of file system blocks. The device can be used to check every read of the file system. If the hash of the block does not match that of the file system, the read fails. (3.4)
A.6 Driver Support
Broadcom NetXtreme II 10Gbps network adapter driver (
bnx2x): Add AutogrEEEn support for BCM84833 and 5418se, and multiple concurrent I2 traffic classes. (3.1)Broadcom NetXtreme II iSCSI driver (
bnx2i): Add support for 57800, 57810, and 57840. (3.1)Brocade BFA FC SCSI driver (
bfa):FAA support
HBA diagnostic support
CEE information and statistics query
Flash configuration
Collect and reset
fcportstatisticsConfigure LUN masking
Configure QoS and collect statistics
Support for obtaining SFP information
Support for FC-transport based Asynchronous Event Notification
Support for I/O profiling
Collect or reset fabric statistics
Configure and query flash boot partition
Configure trunking on Brocade adapter ports
store driver configuration in flash memory
Brocade-1860 Fabric Adapter 16Gbs support and flash controller fixes
Brocade-1860 Fabric Adapter Hardware enablement
Brocade-1860 Fabric Adapter vHBA support
Initiator-based LUN masking
(3.1)
Emulex Blade Engine 2 10Gbps adapter driver (
be2net): Add support for multiple Tx queues. (3.1)Emulex FC/FCoE driver (
lpfc): Add FCF priority failover functionality. (3.1)Intel PRO/1000 PCI-Express Gigabit network adapter driver (
e1000e): Add Jumbo Frame support for the 82583 Gigabit Ethernet Controller. (3.1)QLogic 1/10 GbE Converged/Intelligent Ethernet Adapter driver (
qlcnic): Add multi-protocol internal loopback support. Driver can now generate loopback traffic, conduct tests, and return the results to an application. (3.1)coretemp: Add core and package threshold support. The thresholds are configured using thetempX_maxandtempX_max_hystinterfaces insysfs. An interrupt is generated if the CPU temperature reaches or crosses abovetempX_maxor if it drops belowtempX_max_hyst. To allow the hysteresis mechanism to work, the value oftempX_maxshould be configured to be several degrees higher than the value oftempX_max_hyst. (3.1)
A.7 File Systems
btrfs
Add a
DCACHE_NEED_LOOKUPflag tod_flagsto improve the performance of ls andreaddir(). (3.1)Switching from tree locks to reader/writer locks improves the performance of read and write-intensive workloads. (3.1)
Performance improvements in several areas, particularly for random write workloads. (3.2)
Allowing overcommit of
ENOSPCreservations to improve performance. (3.2)Add automatic backup of superblock information about tree roots for the previous 4 commits. Add the -o recovery mount option to enable use the root history log if required. (3.2)
Add code to follow back references, replacing the manual process for walking those references, and including more detailed corruption messages. (3.2)
Allow user-space utilities to inspect metadata. (3.2)
Improve performance of checksum verification of read-aheads. (3.2)
Add the nospace_cache mount option to disable cache loading without clearing the cache. (3.2)
Improve performance of committing transactions. (3.2)
When mounting a subvolume, allow a path relative to the tree root to be specified to -o subvol. (3.2)
Rework the logic for cluster allocation. (3.3)
Rewrite the block group trimming code. (3.3)
Increase the size of system chunks. (3.3)
Remove caching code that caused unnecessary fragmentation and complexity. (3.4)
Remove the code to silently switching single chunks to RAID-0 when balancing a file system. The restriper now allows a choice of RAID-0 or concatenation. (3.4)
Support metadata blocks that are larger than 4 KB. (3.5)
The
thread_poolsize can be changed at remount time. (3.5)Add the
DEVICE_READYioctl()to be used in conjunction with btrfs device readydevice, providing a lightweight method of telling if all the devices required for a file system are currently in the cache. (3.6)Allow compression to be disabled by specifying the compress=no mount option. (3.6)
Improve multithread buffer reads. (3.6)
Support UUIDs for subvolumes, and introduce
ctime,otime,stime, andrtimefor subvolumes, including atransidfor each time. (3.6)Rework the
DEV_STATSioctl()to allow it to either get or reset device statistics depending on the argument specified. (3.6)Make the compress and nodatacow mount options mutually exclusive. To improve
O_SYNCperformance, asynchronous metadata checksumming is not performed under some circumstances. (3.7)
For more information, see https://btrfs.wiki.kernel.org/index.php/Changelog.
cifs
Add UID/GID to SID mapping. (3.2)
Add backup mount option. (3.2)
Allow larger rsize (up to 16 MB) and change the default to 1 MB. (3.2)
Introduce credit-based flow control. (3.4)
Add the cache=strict|none mount option to specify the cache type instead of the strictcache and forcedirectio options. The legacy options are now mutually exclusive. (3.5)
The vers=2.1 mount option forces an SMB2 mount. By default, vers=1 (CIFS) is used. (3.5)
The vers=2.0 mount option forces an SMB2.02 mount. (3.8)
ext4
Reduce CPU overhead when appending files preallocated using
fallocate()with modeFALLOC_FL_KEEP_SIZEvia direct I/O. (3.2)Reduce CPU overhead by optimizing
memmove()lengths in extent and index insertions. (3.2)Support block sizes of up to 1 MB using the -C option to mkfs.ext4. This change is not backwards compatible with older kernels. (3.2)
Remove the resize and journal=update mount option. (3.4)
Improve performance of truncate and unlink. (3.7)
Support online resizing of metablock group (
META_BG) and 64-bit file systems. (3.7)Add max_dir_size_kb mount option to specify a maximum directory size. (3.7)
Re-enable -o discard functionality in no-journal mode. (3.7)
Remove support for disabling extended attributes. (3.8)
Implement support for
SEEK_DATAandSEEK_HOLE. (3.8)
NFS
Add support for the RAID-5 read-4-write interface. (3.2)
Add v4.0 and v4.1 mount options. (3.4)
The kernel can deduce the value of clientaddr if this mount option is not specified for NFS v4. (3.4)
Add the migration mount option that specifies whether a server supports Transparent State Migration (TSM). (3.7)
Handle IPv6 remote addresses from GETDEVICEINFO (required for pNFS). (3.8)
Remove the deprecated
nfsctl()system call and all related code. (3.8)
pstore
Add runtime logging support for kernel messages to allow debugging of hangs caused by hardware issues. (3.6)
Add console message handling. The log size is configurable by using the
ramoops.console_sizemodule option, and the log is accessible at. (3.6)pstore-mountpoint/console-ramoopsAdd persistent function tracing. The kernel can save the function call chain log to a persistent RAM buffer, which can be decoded and dumped after a reboot. You can use the log to determine the function that was called immediately prior to a reset or panic. (3.6)
tmpfs
Increase the file size limit for tmpfs. (3.1)
Support
fallocate()FALLOC_FL_PUNCH_HOLEand preallocation. (3.5)
XFS
Improve performance of the inode cache. (3.1)
Improve scalability of per-file-system quotas. (3.4)
Implement support for
SEEK_DATAandSEEK_HOLE. (3.5)Make the
inode32andinode64mount options work with remounts. (3.7)Make
inode64the default allocation mode. (3.7)Add the
XFS_IOC_FREE_EOFBLOCKSioctl()to enableEOFBLOCKSscanning. (3.8)
A.8 Memory Management
Add
memory.vmscan_statmemory control group that displays numbers of scanned, rotated, and freed pages, and elapsed times for direct reclaim and soft reclaim. (3.1)Extend the memory hotplug API to allow memory hotplug in virtual machines. Also required for the Xen balloon driver. (3.1)
Fix significant stalls in the page allocator when copying large amounts of data on NUMA machines. (3.1)
Add
slub_debugmethod to theslubslab allocator to check if memory is not freed and help diagnose memory usage. (3.1)Reduce CPU overhead of
slub_debug. (3.1)The cross memory attach feature adds the system calls
process_vm_readvandprocess_vm_writev(), which allow data to be transferred between the address spaces of the two processes without passing through kernel space. (3.2)Add a block plug for page reclaim to
vmscanthat reduces CPU overhead by reducing lock contention and merging requests. (3.2)Implement per-CPU cache in slub for partial pages. (3.2)
Restrict access to slab files under
procfsandsysfs, hidingslabinfoand/sys/kernel/slab/*. (3.2)Add the
slab_max_orderkernel parameter that determines the maximum allowed order for slabs. High settings can cause OOMs due to memory fragmentation. The default value is 1 for systems with more than 32 MB of RAM. Otherwise, the default value is 0. (3.3)To increase the probability of detecting memory corruption, change the buddy allocator to retain more free, protected pages and to interlace free, protected pages and allocated pages. (3.3)
Charge the pages dirtied by an exited process to random dirtying tasks. (3.3)
Allow the poll time and call intervals to balance dirty pages to be controlled by the value of the
max_pauseparameter. (3.3)Fix dirtied pages accounting on sub-page writes. (3.3)
Introduce the dirty rate limit to compensate a task's think time when computing the final pause time. (3.3)
Reduce dirty throttling polls and CPU overhead. (3.3)
Avoid tiny dirty poll intervals. (3.3)
Make swap-in read-ahead skip over holes, allowing the system to swap back in at several MB/s, instead of a few hundred kB/s. (3.4)
Introduce bit-optimized iterator and radix tree cleanup in the core page cache. (3.4)
Improve allocation of contiguous memory chunks by adding DMA mapping helper functions. (3.5)
Remove swap token code and lumpy reclaim. (3.5)
Improve throughput and reduce CPU overhead by allowing swap read-ahead to be merged. (3.6)
Add cgroup controller that allows HugeTLB usage per control group to be limited and enforces the limit during page faults. (3.6)
A.9 Networking
Add CPU fanout policies for hashing to the packet interface based on mapping socket buffers to Rx hashes, and a pure round-robin scheme. (3.1)
Improve the client announcement mechanism in the Better Approach To Mobile Adhoc Networking (B.A.T.M.A.N.) routing protocol. The change resolves performance and latency issues with the previous implementation by appending client changes (new client joined or client left) to the OGM. System overhead is reduced by allowing nodes to modify their global tables by means of updates. The new
ROAMING_ADVERTISEMENTpacket type eliminates latency and packet drop issues seen with OGM broadcasting. (3.1)Add support for zero-copy socket buffers. Adds user-space buffer support in the socket buffer shared information. (3.1)
Use MD5 to compute protocol sequence numbers and fragment IDs per RFC1948. Update code to take into account current CPU speeds and to use a full 32-bit sequence number. (3.1)
Add a multicast group for DCB to provide a clean method for disseminating kernel DCB link attributes to user space. (3.1)
Add SELinux context support to the
AUDITtarget ofnetfilter. (3.1)Add range support for IPv4 to
netfilter. (3.1)Lower the default init retransmission timeout (RTO) from 3 seconds to 1 second per RFC2988bis. The RTO falls back to 3 seconds if a
SYNorSYN-ACKpacket has been retransmitted and the TCP time stamp option is not on. (3.1)Implement support for Auto-ASCONF (see RFC5061) in the Stream Control Transmission Protocol (SCTP) stack. The change includes features for enabling and configuring settings. (3.1)
Reduce the false sharing effect. (3.1)
Reduce CPU overhead of
check_leaf()with the route cache disabled. (3.1)Add support to the
virtio_netdriver to obtain Rx and Tx ring parameter information from an Ethernet device. Used by the ethtool -g ethXcommand. (3.2)Implement AP isolation on the receiver and sender side for B.A.T.M.A.N. When a node receives a unicast packet, it checks whether the source and destination client can communicate due to the AP isolation. (3.2)
Remove the IPv4
gc_intervalfromsysctl. (3.2)Add
TPACKET_V3support including a flexible buffer implementation. (3.2)Allow forwarding of some link-local frames by network bridges. You can use
/sys/class/net/brinX/bridge/group_fwd_masksysfsto control frame forwarding. (3.2)Implement TCP proportional rate reduction. (3.2)
Add
netlink-based Content Addressable Network (CAN) routing. (3.2)Add support for the socket monitoring interface used by the ss tool. (3.3)
Add support for the SCSI RDMP Protocol (SRP) target driver. The SRP protocol allows an initiator to access a block storage device on another host (target) over a network that supports the RDMA protocol. Currently, the RDMA protocol is supported by InfiniBand. (3.3)
Add unresolved queue limits to
neigh. Deprecate/proc/sys/net/ipv4/neigh/default/unres_qlen, and replace it withunres_qlen_bytes. (3.3)Add CAIF USB support. (3.3)
Add an extended accounting infrastructure for
netfilterovernfnetlink, which allows the display of real-time traffic accounting without requiring a complicated and resource-consuming implementation in user space. (3.3)Add
nfacct matchtonetfilter, which supports extended accounting. (3.3)Add reverse patch filter (
rpfilter) tonetfilter, which allows matching of packets where replies use the same interface on which the packet arrived. (3.3)Add adaptive random early detection (RED) active queue management (AQM) to the packet scheduler. (3.3)
Add an optional RED on top of stochastic fairness queueing (SFQ) to the packet scheduler, enabling SFQ features such as specifying a smaller per flow limit for in-flight packets, up to 65408 active flows (as compared to 127 previously), head drops instead of tail drops, and optional RED on each SFQ flow queue. (3.3)
Add 802.1q
netpollsupport tovlan. (3.3)Add
NTF_USEbridge support plus other changes to allow the control of forwarding database vianetlink. (3.3)New plug-queuing discipline allows a user space application to plug or unplug a network output queue via the Netlink interface. (3.4)
Add the ability to change the routing algorithm at runtime to B.A.T.M.A.N. (3.4)
RCU conversion in TCP allows access to MD5 keys without locking the listener socket. (3.4)
For some workloads, allowing
splice()to build full TSO packets can reduce number of logical packets sent by an order of magnitude, making zero-copy TCP faster than one-copy. (3.4)Add the
SO_PEEK_OFFsocket option. (3.4)Support peeking offset for datagram sockets, seqpacket sockets, and stream sockets. (3.4)
Add
MSG_TRUNCsupport for datagram sockets so thatrecv()returns the real length of the packet, even if it is longer than the passed buffer. (3.4)Add missing
SO_NOFCSsocket option. (3.4)Add timeout extension to
netfilter, which allows timeout policies to be attached to the flow via the connection tracking target. Add thecttimeoutinfrastructure for fine timeout tuning. (3.4)Add NAT support for expectation classes in
netfilter. (3.4)Add exceptions support to
netfilter. (3.4)Merge
ipt_LOGandip6_LOGintoxt_loginnetfilter. (3.4)Add hardware-independent IEEE 802.15.4 networking stack for softMAC devices. (3.5)
Tune performance of
sk_add_backlog. (3.5)Add binary option type, a load-balancer module, a per-port option for enabling or disabling ports, and support for per-port options to the
teamdevice. (3.5)Add raw packet
QPtypeIB_QPT_RAW_PACKETto InfiniBand core. This allows applications to build a complete packet, including L2 headers, when sending. On the receive side, the hardware does not strip any headers. This feature is designed for user-space direct access to Ethernet. (3.5)Treat ND option 31 as user land (DNSSL support) in IPv6 per RFC6106. The 8-bit identifier of the DNSSL option type assigned by the IANA has the value 31. (3.5)
Replace basic bridge loop avoidance code in the
batman-advmodule. (3.5)Set traffic class for CAIF packets based on socket priority, CAIF protocol type, or type of message. (3.5)
Add generic
PF_BRIDGE:RTM_FDBhooks and two new flags:NTF_MASTERandNTF_SELF. (3.5)Add Explicit Congestion Notification (ECN) capability to
pktsched. Instead of dropping packets, attempt to mark them as ECN. (3.5)Remove support for token ring. (3.5)
Remove support for Econet protocol. (3.5)
Add an optional QoS attribute to DCB netlink to allow the setting of a rate limit for an ETS TC. 3.5
Add CEE notify calls when an APP change or
setallcommand is made from user space. (3.5)Add HMARK target support to
netfilter. (3.5)If
net.bridge.bridge-nf-filter-vlan-taggedis enabled insysctl, bridgenetfilterremoves thevlanheader temporarily and feeds the packet toiptablesorip6tables. Addbridge-nf-pass-vlan-input-device, which if set toon(default isoff),netfilteralso sets theininterface to thevlaninterface if this interface exists. This change allows theiptablesREDIRECTtarget work with vlan-on-top-of-bridge configurations and the use of iptables -i" to match the vlan device name. (3.5)Allow byte-based limit mode can be used with
netfilter, for example, to support ingress-traffic policing or to detect when a host or port consumes more bandwidth than expected. (3.5)Add support for sync threads to
netfilter. (3.5)Remove
ip_queuesupport fromnetfilter. (3.5)Add support for Layer 2 Tunneling Protocol (L2TP) over UDP in IPv6. (3.5)
Add L2TPv3 IP encapsulation support for IPv6. (3.5)
Add
netlinkAPI for L2TPv3 unmanaged tunnels over IPv6. (3.5)Remove IPv4 routing cache that was vulnerable to denial of service attacks. (3.6)
Implement RFC 5691 3.2 and RFC 5961 4.2 (Mitigation against Blind Reset attack using RST bit and SYN bit). (3.6)
Add VTI support. (3.6)
Add an interface option
route_localnetthat enables the routing of the 127/8 address block and processing of ARP requests on a specific interface (for example, to address a pool of virtual guests behind a load balancer). (3.6)Add
multiqueueandnetpollsupport toteam. (3.6)Add experimental zero-copy Tx support to
tun. (3.6)Add support for 40GbE. (3.6)
Add fail-open support to
netfilter, where the queue-full condition does not drop packets. (3.6)Add user-space connection tracking helper infrastructure to
netfilter. (3.6)Extends the
ethtoolinterface to add support for the EEE commands:get_eee'andset_eee. (3.6)Add Generic Routing Encapsulation (GRE) over IPv6, generic segmentation offload (GSO), and GRO capability. (3.7)
Set default MTU for
loopbackdevices to 64 KB. Allows TCP stacks to build large frames and significantly reduces stack overhead. (3.7)Add an extended attribute to store data for the mapping between inode numbers in
sockfsand protocol types for use by lsof. 3.7Implement a per-task fragmentation allocator, which can improve TCP stream performance by 20% on
loopbackdevices. (3.7)Various
netfilterchanges:Add a protocol-independent NAT core.
Add IPv6
MASQUERADEtarget.Add IPv6
NETMAPtarget.Add IPv6
REDIRECTtarget.Add IPv6
ATsupport.Support IPv6 FTP NAT helper.
Support IPv6 IRC NAT helper.
Support IPv6 SIP NAT helper.
Support IPv6 in the amanda NAT helper.
Add stateless IPv6-to-IPv6 Network Prefix Translation target.
Remove
xt_NOTRACK.
(3.7)
Add link layer control (LLC) core layer to HCI 2, add an SHDLC
llcmodule to theliccore, and add LLCP raw socket support to NFC. (3.7)Support IPv6 transmit hashing (and TCP or UDP over IPv6) in the bonding driver. (3.7)
Add support for dumping diagnostic core and basic socket information (family, type and protocol) at socket creation time. (3.7)
Add support to
ethtoolfor setting the MDI/MDI-x state for twisted-pair wiring. (3.7)Add 64-bit statistics support to PPP, including
tx_bytes,rx_bytes,tx_packets, andrx_packets. 3.7Add generic
netlinksupport fortcp_metricsthat allows unlinking and deletion of entries after a grace period. (3.7)Add bridge port parameters over
netlinkto permit dumping, monitoring, and changing the bridge multicast database. (3.8)Add support for RFC 5961 5.2 Blind Data Injection Attack Mitigation. (3.8)
Change default TCP hash size, and add support for hardware-offloaded encapsulation and offloading of encapsulated packets for VXLAN and IP GRE. (3.8)
Add vlan tag access to
netfilter. (3.8)Add extensions to VXLAN to support Distributed Overlay Virtual Ethernet (DOVE) networks. (3.8)
Add IPv6
setaction functionality toopenswitch. (3.8)Add GSO support to IPIP tunnels, increasing the performance of a single TCP flow. (3.8)
Implement IPv6 fragment handling for IPVS (3.8)
Add support in
netfilterfor querying the destination address of a redirected connection. (3.8)Add
NOTRACKtarget recovery tonetfilter. (3.8)Implement QFQ+ in
sched. (3.8)Add support for RTM_GETNETCONF to routing
netlink. (3.8)Add support for per-association statistics by implementing the
SCTP_GET_ASSOC_STATScall for the Stream Control Transmission Protocol (SCTP). (3.8)Add a
sysctlthat allows the selection of the HMAC algorithm (static or dynamic) used by SCTP. (3.8)Add support for
SO_ATTACH_FILTERrequired to save the full state of a socket. (3.8)Convert tun/tap into a multiqueue device and expose the queues as file descriptors in user space. (3.8)
A.10 perf Utility
Add the --symfs option to perf annotate. (3.2)
Add the
drop monitorscript. (3.2)Add the -o and --append options to perf stat. (3.2)
Add the -M option. (3.2)
Add annotation output controls to all perf tools that have integrated annotation. (3.2)
Include information about the host environment in
perf.data:-
HEADER_HOSTNAME Host name.
-
HEADER_OSRELEASE Kernel release number.
-
HEADER_ARCH Hardware architecture.
-
HEADER_CPUDESC Generic CPU description.
-
HEADER_NRCPUS Number of online, available CPUs.
-
HEADER_CMDLINE perf command line.
-
HEADER_VERSION perf version.
-
HEADER_TOPOLOGY CPU topology.
-
HEADER_EVENT_DESC Full event description (
attrs).-
HEADER_CPUID Easy-to-parse, low-level CPU identification.
(3.2)
-
Accept FIFOs as input files. (3.3)
Add -a option for system-wide profiling. (3.3)
Implement printing snapshots to files. (3.6)
Add sort by source line number. (3.6)
Add PMU event alias support. (3.6)
Add support for perf kvm stat to analyze
kvmvmexit,mmio, andioport. (3.7)Add union member access. (3.7)
Add --list-opts option to print long option names for use with bash. (3.7)
Add script browser. (3.8)
Add new display options (-F, -p, and -P) to perf diff. (3.8)
perf inject now supports input from a file. 3.8
Add --pre and --post options to perf stat. (3.8)
Add gtk.
commandconfig option to launch the GTK browser. This is equivalent to specifying --gtk option on command line (3.8)Add new features to perf trace. (3.8)
Expose hardware events translations in
sysfs. (3.8)Add
trace_optionsboot parameter to set trace options at boot time, such as enabling event stack dumps. (3.8)
A.11 Power Management
Add a generic DVFS framework with device-specific (non-CPU) OPPs. (3.2)
Improve performance of LZO/plain hibernation. (3.2)
Implement per-device power management QoS contraints. (3.2)
A.12 Security
Add
/sys/kernel/security/tomoyo/audit_interface, which generates audit logs in the form of domain policy so they can be reused and appended to domain_policy interface by the TOMOYO auditing daemon (tomoyo-auditd). TOMOYO is a kernel security module which implements mandatory access control (MAC). (3.1)Add ACL group support for TOMOYO, which allows permissions to be globally granted. (3.1)
Add policy namespace support for LXC (Linux containers). The policy namespace has its own set of domain policy, exception policy and profiles, independent of other namespaces. (3.1)
Add built-in policy support needed to support enforcing mode from early in the boot sequence. (3.1)
Make several TOMOYO options configurable to support activating access controls without calling an external policy loader program. (3.1)
Permit the use of the following properties as conditions with TOMOYO:
argv[],envp[],execve(), executable's real path and symlink target, owner or group of file objects, and the UID or GID of the current thread. (3.1)Implement Extended Verification Module (EVM), which protects a file's security extended attributes (
xattrs) against integrity attacks. (3.2)Implement Smack protections for domain transition: BPRM unsafe flags, secure exec, clear unsafe personality bits, and clear parent death signal. (3.2)
Enhance performance of Smack rule list lookups. (3.2)
Allow user access to
/smack/access, removing the requirement forCAP_MAC_ADMIN. (3.2)Add environment variable name restriction to TOMOYO. (3.2)
Add socket operation restriction to TOMOYO. (3.2)
Add control for generation of access granted logs in TOMOYO. (3.2)
Allow domain transition without
execve()in TOMOYO. (3.2)Allow audit matching on inode
gid. (3.3)Allow inter-field comparison in audit rules between the
gidof a running task and thegidof an inode. (3.3)Add a new audit filter type
AUDIT_FIELD_COMPAREto indicate which fields should be compared. (3.3)Allow system call exit filter matching based on the
uidof the owner of an inode used in the call. (3.3)Add support for digital signature verification in EVM. File metadata can be protected using digital signatures instead of HMAC. (3.3)
Add a Yama Linux security module to collect DAC security improvements. (3.4)
Add AppArmor security module file tracking to
securityfs. (3.4)Add AppArmor security module initial features directory to
securityfsfor displaying boolean features flags and the known capability mask. (3.4)Add
default_typestatements to SELinux. (3.5)Add default source and target selectors for the user, role, and range of new objects in SELinux. (3.5)
Allow seek operations on the file-exposing policy used by the sesearch SELinux policy query tool. (3.5)
Add auditing of failed attempts to set invalid labels in SELinux. (3.5)
Add checking for the open permission on truncate calls to SELinux. (3.5)
Support long Smack labels. (3.5)
Set recursive transmute attribute for Smack in all cases. (3.5)
Allow manager programs which do not start with
/in TOMOYO to handle differences between distributions. (3.5)Add two modes to the Yama
ptracerestrictions. (3.5)Add support for invalidating a key. (3.5)
Implement revoking of all rules for a subject label in Smack. (3.7)
Allow Yama to be unconditionally stacked, regardless of which LSM module is primary. (3.7)
Add the Integrity Measurement Architecture, which supports audit log hashes, digital signature verification, and the integrity appraisal extension. (3.7)
A.13 Storage
Block management in the software RAID MD layer now adds bad blocks to a bad-block list so that the system does not use them. (3.1)
A.14 Virtualization
Add memory hotplug support for the Xen balloon driver. (3.1)
Add Xen PCI backend driver. (3.1)
Implement discard requests and support old-style BARRIER. (3.2)
Increase recommended maximum number of VCPU from 64 to 160. (3.4)
Allow host IRQ sharing for assigned PCI 2.3 devices. (3.4)
Add infrastructure for software and hardware-based TSC rate. (3.4)
Move the Hyper-V storage driver out of the staging area. (3.4)
Add support for VLAN trunking to Hyper-V. Linux guests can now configure multiple VLANs using a single synthetic NIC on a Windows 8 Hyper-V host. (3.4)
Support new KVP message types. (3.4)
Support new KVP verbs for Hyper-V in the user level daemon. (3.4)
Implements multiconsole support for Hyper-V. 3.4
Support enumeration from all available pools for Hyper-V. (3.4)
Update Xen ACPI processor to implement C and P state driver that uploads ACPI data to the hypervisor. (3.4)
Add netconsole support to Xen. (3.4)
Use the S4 code to provide S3 support for
virtiodevices. (3.4)Add a
virtio-based remote processor messaging bus to allow message-based communication with the remote processor (if supported by the firmware). (3.4)Add direct MSI message injection for in-kernel IRQ chips. (3.5)
Unregister from the
hwrnginterface and remove thevirtioqueue before entering the S3 or S4 states. On restore, add thevirtioqueue and re-register withhwrng. (3.6)Add
mcelogsupport to Xen. (3.6)Reduce the I/O path in the guest kernel to achieve high IOPS and lower latency. (3.7)
Add Xen EFI video mode support. (3.7)
Implement backend support for paged out grant targets (retry loop and hooks). (3.7)
Implement Xen ACPI processor aggregator driver (
pad). (3.8)Remove support for i386 processors. (3.8)