Appendix A Other Changes

The following sections describe other features of Unbreakable Enterprise Kernel Release 3 (UEK R3). The mainline version in which a feature was introduced is noted in parentheses.

A.1 Architecture

  • vsysscall emulation and vsyscall parameter. (3.1)

  • INTEL_MID configuration. (3.1)

  • mrst_pmu driver for Intel Moorestown Power Management Unit. (3.1)

  • Hardware memory error recovery support for ACPI, APEI, and GHES. (3.1)

  • printk() support for recoverable error via NMI for ACPI, APEI, and GHES. (3.1)

A.2 Block Devices

  • Strict CPU affinity can be enabled by setting the value of /sys/block/blkdev/queue/rq_affinity to 2. Performance on some systems benefits from being directed to the strict requester CPU rather than using per-socket steering. (3.1)

  • CFQ I/O scheduler performance tuning adds think time check for a group, which makes bandwidth usage more efficient by not leaving queues active when there are no further requests for the group. (3.1)

  • Flakey target support in the device mapper adds the corrupt_bio_byte parameter to simulate corruption by overwriting a byte at a specified position with a specified value while the device is down. The drop_writes option parameter drops writes silently while the device is down. (3.1)

  • The device mapper supports MD RAID-1 personality through the dm-raid target. (3.1)

  • The device mapper supports the ability to parse and use metadata devices with dm-raid. Without the metadata devices, many RAID features would be unavailable. (3.1)

  • Experimental support for thin provisioning in the device mapper allows the creation of multiple thinly provisioned volumes from a storage pool and recursive snapshots to an arbitrary depth. (3.2)

  • I/O-less dirty throttling and reduced file-system writeback from page reclamation greatly reduces I/O seeks and CPU contention. (3.2)

  • The cfq_target_latency parameter under sysfs allows throughput and read latency to be tuned. (3.4)

  • The device mapper supports adding and removing space at the end of the devices when resizing RAID-10 arrays with near and offset layouts. (3.4)

  • Thin target in the device mapper supports discards. When non-discard I/O completes and the associated mappings are quiesced, any discards that were deferred (via ds_add_work() in process_discard()) are queued for processing by the worker thread. (3.4)

  • Thin target in the device mapper provides user-space access to pool metadata. Two new messages can be sent to the thin pool target allowing it to take a snapshot of the metadata. This read-only snapshot can be accessed from user space concurrently with the live target. (3.5)

  • Thin target in the device mapper uses dedicated slab caches (whose names are prefixed with dm_) rather than relying on kmalloc memory pools backed by generic slab caches. This allows independent accounting of memory usage and any associated memory leakage by thin provisioning. (3.5)

  • RAID-5 XOR checksumming is optimized by taking advantage of the 256-bit YMM registers introduced by Advanced Vector Extensions (AVX). (3.5)

  • RAID-6 includes Supplemental Streaming SIMD Extensions 3 (SSSE3) optimized recovery functions and a new algorithm for selecting the most appropriate function to use for recovery. (3.5)

  • MD allows a reshape operation to be reversed by implementing a new reshape_direction attribute that can be set when delta_disks is zero, and which can take one of the values forward or backwards. (3.5)

  • A RAID-10 array can be reshaped to a different near or offset layout, a different chunk size, and a different number of devices. The number of copies cannot be changed. (3.5)

  • An existing partition can be resized, even if currently in use, by using the operation code BLKPG_RESIZE_PARTITION with the BLKPG ioctl(). (3.6)

  • Add MD support for RAID10 (striped mirrors) and RAID1E (integrated adjacent stripe mirroring). (3.6)

  • Thin target in the device mapper adds read-only and fail-io modes to thin provisioning. If a transaction commit fails, a pool's metadata device transitions to read-only mode. If a commit fails when the device is in read-only mode, a transition to fail-io mode occurs. In fail-io mode, the pool and all associated thin devices report a status of fail if a commit fails. (3.6)

  • The persistent data debug space map checker has been removed from the device mapper. The feature consumed a lot of memory and caused other issues when enabled on large pools. (3.6)

  • RAID-1 in MD now prevents the merging of large requests to enhance the performance of SSD devices that function more efficiently with large request transfers. (3.6)

  • Support for the WRITE SAME request implemented on some SCSI devices to allow a block to be efficiently replicated throughout a block range. Only a single logical block need be transferred from the host. The storage device writes the same data to all blocks specified by the request. (3.7)

  • The BLKZEROOUT ioctl() can be used to zero out block ranges via blkdev_issue_zerooout(). (3.7)

  • Fastmap support provides a method for attaching an unsorted block image (UBI) device in real-time. Rather than scanning the entire device, Fastmap locates a checkpoint. (3.7)

  • MD adds TRIM discard support for linear RAID-0, RAID-1, RAID-5, and RAID-10. (3.7)

  • DM adds rebuild capacity and replacement slot validation for RAID-10 arrays. (3.7)

  • RAID-6 recovery is optimized by taking advantage of the 256-bit YMM registers introduced by Advanced Vector Extensions 2 (AVX2). (3.8)

A.3 Core Kernel Functionality

  • Add a lock-less NULL-terminated single list. (3.1)

  • Add a library function implementing a crc8 algorithm to support the brcm80211 driver. (3.1)

  • Make the gen_pool memory allocator lockless. This change makes it safe to use the memory allocator in NMI handlers and other special unblockable contexts where deadlocks might occur. (3.1)

  • Implement the PTRACE_INTERRUPT, PTRACE_LISTEN, PTRACE_SEIZE, and TRAP_NOTIFY ptrace() requests. (3.1)

  • Adds /sys/module/module_name/uevent files to all module entries to provide a method for managing built-in modules from user space. (3.1)

  • Add support for the implementation of SEEK_HOLE and SEEK_DATA in lseek(). (3.1)

  • Add the ! escape character to / in hostname and comm strings in core dumps. (3.1)

  • If the value of the sysctl parameter shm_rmid_forced is set to 11, all shared memory objects are marked for removal with IPC_RMID. As this change breaks POSIX compliance, you need to ensure that no threads are using the orphaned memory. (3.1)

  • Add support for generic I/O power management domains (v8) by introducing common headers, helper functions, and callbacks to allow platforms to use simple, generic power domains for runtime power management. (3.1)

  • Add system-wide power transitions (system suspend and hibernation) support for generic domains (v5). Add suspend, resume, freeze, thaw, poweroff, and restore callbacks that are associated with struct generic_pm_domain objects and have pm_genpd_init() interpret them as appropriate. (3.1)

  • Add wakeup device support for system-sleep transitions. Introduce a new generic power management domain callback routine, .active_wakeup(). This routine is used during the noirq phase of system suspend and hibernation to decide how to handle wakeup devices. (3.1)

  • Add the ability to set a maximum limit for allowable CPU bandwidth to the process bandwidth controller. The limit is specified as a quota and a period for a group of processes. (3.2)

  • To reduce the performance impact from using i_mutex lock with generic_file_llseek(), an almost lockless generic_file_llseek() is added to VFS that allows the maximum file size of the file system to be passed in, instead of always using maxbytes from the superblock. (3.2)

  • A boot parameter of the form root=PARTUUID=uuid,PARTNROFF=partition_number_offset extends the root=PARTUUID=uuid syntax to select the root partition by specifying an integer offset from a known, unique partition. (3.2)

  • Add a fault reporting mechanism to the input/output memory management unit (IOMMU) API. (3.2)

  • Allow partition creation from user space and add discard support for loop devices. (3.2)

  • When performing AIO, allocate kiocb structures in batches to reduce the CPU overhead of a process taking and releasing the context lock. (3.2)

  • Add support for the tagged files ease-of-use feature in sysfs. (3.2)

  • Add a comm change event to the process connector. (3.2)

  • Add architecture-independent support for highmem page poisoning and verification to debug-pagealloc. (3.2)

  • Add support for poll() in sysctl so that user-space applications can be notified of changes to sysctl entries. (3.2)

  • The x32 kernel ABI (kABI) allows programs to take advantage of x86-64 features such as a larger number of CPU registers, better floating-point performance, faster position-independent code shared libraries, function parameters passed via registers, and faster system-call instructions. The kABI uses 32-bit pointers and avoids the overhead of 64-bit pointers. The program is limited to a 4-GB virtual address space. However, reducing the memory footprint can also allow a program to run faster. (3.4)

  • The nomodule kernel parameter can be used to disable module loading as an alternative to using sysctl.

  • The prctl() PR_GET_CHILD_SUBREAPER and PR_SET_CHILD_SUBREAPER options implement simple process supervision of orphaned processes. (3.4)

  • Thread stacks are now marked correctly for proc/pid/maps under procfs. (3.4)

  • Restore the sysctl setting kernel.pty.max as the global limit of pseudo terminals (by default, 4096). (3.4)

  • Add abilities to turn the reboot notifier on or off, and to enter the debugger and stop kernel execution before rebooting. (3.4)

  • To improve performance, VFS now uses unsigned long accesses for dcache name comparison and hashing. (3.4)

  • /proc/pid/task/tid/children entries provide information about task children and can be useful for process checkpoint and restore operations. (3.5)

  • /proc/pid/pagemap now reports whether file pages are shared-anon or file-page. (3.5)

  • The skew_tick boot option mitigates xtime_lock contention on larger systems or read-copy-update (RCU) lock contention on all systems when CONFIG_MAXSMP is set. This option increases power consumption and should only be enabled if the system runs jitter-sensitive workloads (typically, HPC or RT). (3.5)

  • Inode stat information is moved closer together to increase the likelihood of cache hits. (3.5)

  • The fallocate() file-system operation allows preallocation space for a file. (3.5)

  • Stale power-aware scheduling remnants and dysfunctional knobs have been removed from the process scheduler. (3.5)

  • The EPOLLWAKEUP flag prevents system suspension while epoll events are ready. (3.5)

  • ramoops uses the pstore interface instead of /dev/mem. (3.5)

  • Add ECC support to pstore/ram. (3.5)

  • make tools is now integrated with the kernel build system. (3.5)

  • The kernel parameter RCU_FANOUT_LEAF can be used to control leaf-level fanout for RCU locking to reduce cache-miss initialization latencies on large systems. (3.5)

  • RCU locking now implements a direct algorithmic sleepable RCU (SRCU) implementation to prevent OS jitter and performance degredation. (3.5)

  • Add rbtree node caching support to IPC mqueue for the case where the queue is empty, improve performance of send/recv, and update maximums for the mqueue subsystem. (3.5)

  • Add symbolic and hard link restrictions to VFS to address security issues. (3.6)

  • Improvements to the IOMMU group implementation. (3.6)

  • Remove the non-working x86 power estimation feature from the process scheduler. (3.6)

  • Add hysteresis attributes (used by most thermal sensors) on a per-trip-point basis to the thermal framework. (3.6)

  • Add support for states that affect multiple CPUs. This is potentially useful in implementations where CPUs leverage a shared, coupled power state. (3.6)

  • The rcutree.rcu_fanout_leaf boot parameter allows the value of RCU_FANOUT_LEAF to be increased but not decreased. (3.6)

  • Firmware files can be loaded directly from the file system rather than from udev. (3.7)

  • xattr support in cgroups allow run-time metadata to be attached to cgroups. (3.7)

  • The disable_nmi command in kdb disables NMI-entry and releases the port. (3.7)

  • Add a special serial console driver to allow the temporary use of an NMI debugger port as a normal console via the nmi_console command. (3.7)

  • RCU locking changes:

    • Control grace period duration from sysfs.

    • Make rcutree module parameters visible in sysfs.

    • Allow an RCU lock to be placed in an extended quiescent state when the CPU runs in user space.

    (3.7)

  • Add system call to enforce that kernel modules are loaded only from a read-only cryptographically verified root file system. (3.8)

  • Applications can choose between using 1-GB and 2-MB huge pages. Typically, this feature is used in conjunction with a NUMA policy. (3.8)

  • Add option to allow assignment of a memory node as movable memory, which allows an entire node to be hot-pluggable. (3.8)

  • Add sysctl variables to tune checkpoint/restart in user space (CRIU) including specifying the ID of the next IPC object to be allocated. (3.8)

  • Introduce CRIU message queue copy feature so that all pending IPC messages can be retrieved without deleting them from the queue. (3.8)

  • Correct the implementation of hierarchy support for the freezer cgroup. If a cgroup is frozen, all its descendants are also frozen. (3.8)

  • Implement the PTRACE_O_EXITKILL ptrace() request. (3.8)

  • Add the VmFlags field to /proc/PID/smaps output. Required by CRIU. (3.8)

  • Add TIOCGPKT, TIOCGPTLCK and TIOCGEXCL ioctl() calls to obtain the package mode and locking state of a pseudo terminal, and to obtain exclusive mode on a tty. (3.8)

  • Add a module parameter to force the use of expedited RCU primitives, which can benefit some embedded applications. (3.8)

  • Allow selected CPUs to have RCU callbacks offloaded to kthreads to prevent or minimize OS jitter. (3.8)

  • Provide support in sysfs to determine the maximum number of virtual functions (VFs) and Single Root I/O Virtualization (SR-IOV) capable PCIe devices that are supported, and the methods that are available for enabling and disabling VFs on a per-device basis. (3.8)

  • Add a sysfs node to present the available frequencies for power management. (3.8)

  • Add the PM_QOS_FLAG_NO_POWER_OFF and PM_QOS_FLAG_REMOTE_WAKEUP power management QoS device flags. (3.8)

  • Add a sysfs node to present frequency transition information for power management. (3.8)

A.4 Cryptography

  • Ablkcipher now support encryption and decryption for AES, DES, and 3DES. (3.1)

  • Add an eCryptfs mount option to check that the UID of the device being mounted is the same as the expected UID. (3.1).

  • The encrypted key type has been extended with the introduction of the ecryptfs format, intended for use with the eCryptfs file system. The ecryptfs format stores an authentication token structure inside an encrypted key payload, containing a randomly generated symmetric key. (3.1)

  • An new user-space configuration API enables the instantiation, removal, and display of cryptographic algorithms from user space. (3.2)

  • An x86-64 implementation of Blowfish provides two sets of assembler functions:

    • Regular one-block-at-a-time (1-way) encryption and decryption functions

    • Four-blocks-at-a-time (4-way) functions that provide improved performance on out-of-order CPUs

    On in-order CPUs, the performance of 4-way functions should be equal to that of 1-way functions. (3.2)

  • An x86-64 assembler implementation of the SHA1 algorithm uses Supplemental Streaming SIMD Extensions 3 (SSSE3) instructions or Advanced Vector Extensions (AVX) if available. Testing with the tcrypt module demonstrates that raw hash performance is up to 2.3 times faster than the C implementation. (3.2)

  • A 3-way parallel x86-64 assembler implementation of Twofish encrypts data in three-block chunks, which improves cipher performance on out-of-order CPUs. (3.2)

  • Add support for MD5 algorithms to CAAM. (3.3)

  • RSA digital-signature verification is implemented using the multiprecision math library from GnuPG, and is used by the IMA/EVM digital signature extension. (3.3)

  • A 4-way parallel i586/SSE2 assembler implementation of Serpent encrypts data in 4-block chunks. (3.3)

  • An 8-way parallel x86-64/SSE2 assembler implementation of Serpent encrypts data in 8-block chunks (two 4-block chunk SSE2 operations are performed in parallel to improve performance on out-of-order CPUs). (3.3)

  • LRW and XTS support added to Serpent-sse2. (3.3)

  • HMAC algorithms added to Talitos. (3.3)

  • XTS support added to twofish-x86_64-3way. (3.3)

  • Add sha224 and sha384 variants to existing AEAD algorithms in CAAM. (3.4)

  • Add x86-64 assembler implementation of the Camellia block cipher. Two sets of functions are provided:

    • Regular one-block-at-a-time (1-way) encryption and decryption functions

    • Two-blocks-at-a-time (2-way) functions that provide improved performance on out-of-order CPUs

    On in-order CPUs, the performance of 2-way functions should be equal to that of 1-way functions. (3.4)

  • Add Tegra AES hardware driver supporting ecb, cbc, ofb, and ansi_x9.31rng modes, and 128, 192 and 256-bit key sizes. (3.4)

  • Add a slice-by-8 algorithm to the existing slice-by-4 algorithm in crc32. The BITS size is expanded from 32 to 64, tables are extended from tab[4][256] to tab[8][256], and inner-loop code is added. (3.4)

  • Improve performance of aesni_intel by using parallel LRW and XTS encryption with AES-NI hardware pipelines. (3.7)

  • Add IPSec extended sequence number (ESN) support to CAAM and Talitos. (3.7)

  • A x86-64/AVX assembler implementation of the Cast5 block cipher allows 16 blocks to be processed in parallel. (3.7)

  • Implement signature verification algorithms for RSA public key cryptography. At present, only the signature verification algorithm is supported (PKCS# | RFC3447). (3.7)

  • Add a crypto key parser for binary (DER) X.509 certifications, an ASN.1 decoder, and a simple ASN.1 grammar compiler. (3.7)

  • Add HASH-HMAC with SHA algorithms and MD5 to CAAM. (3.6)

  • Add hardware random number generator support to CAAM. (3.6)

  • Add a x86-64/AVX assembler implementation of the Serpent block cipher. (3.6)

  • Add x86-64/AVX assembler implementation of the Twofish block cipher. (3.6)

  • Add sha224, sha384, and sha512 to the existing AEAD algorithms in Talitos so that it supports all combinations of CBC (AES, 3DES-EDE) and HMAC (SHA-1, 224, 256, 384, and 512). (3.6)

A.5 Device Mapper

  • The always writable feature indicates that a target does not support read-only mode. (3.2)

  • The immutable feature indicates that a target type cannot be mixed with any other target type. Once loaded into a device, it cannot be replaced with a table that contains a different type. (3.2)

  • Add a singleton table that can contain only one target. (3.2)

  • Log device dependency allows registration of a log device so that it is included in the list of device dependencies. (3.2)

  • A verity target allows a device to store cryptographic hashes of file system blocks. The device can be used to check every read of the file system. If the hash of the block does not match that of the file system, the read fails. (3.4)

A.6 Driver Support

  • Broadcom NetXtreme II 10Gbps network adapter driver (bnx2x): Add AutogrEEEn support for BCM84833 and 5418se, and multiple concurrent I2 traffic classes. (3.1)

  • Broadcom NetXtreme II iSCSI driver (bnx2i): Add support for 57800, 57810, and 57840. (3.1)

  • Brocade BFA FC SCSI driver (bfa):

    • FAA support

    • HBA diagnostic support

    • CEE information and statistics query

    • Flash configuration

    • Collect and reset fcport statistics

    • Configure LUN masking

    • Configure QoS and collect statistics

    • Support for obtaining SFP information

    • Support for FC-transport based Asynchronous Event Notification

    • Support for I/O profiling

    • Collect or reset fabric statistics

    • Configure and query flash boot partition

    • Configure trunking on Brocade adapter ports

    • store driver configuration in flash memory

    • Brocade-1860 Fabric Adapter 16Gbs support and flash controller fixes

    • Brocade-1860 Fabric Adapter Hardware enablement

    • Brocade-1860 Fabric Adapter vHBA support

    • Initiator-based LUN masking

    (3.1)

  • Emulex Blade Engine 2 10Gbps adapter driver (be2net): Add support for multiple Tx queues. (3.1)

  • Emulex FC/FCoE driver (lpfc): Add FCF priority failover functionality. (3.1)

  • Intel PRO/1000 PCI-Express Gigabit network adapter driver (e1000e): Add Jumbo Frame support for the 82583 Gigabit Ethernet Controller. (3.1)

  • QLogic 1/10 GbE Converged/Intelligent Ethernet Adapter driver (qlcnic): Add multi-protocol internal loopback support. Driver can now generate loopback traffic, conduct tests, and return the results to an application. (3.1)

  • coretemp: Add core and package threshold support. The thresholds are configured using the tempX_max and tempX_max_hyst interfaces in sysfs. An interrupt is generated if the CPU temperature reaches or crosses above tempX_max or if it drops below tempX_max_hyst. To allow the hysteresis mechanism to work, the value of tempX_max should be configured to be several degrees higher than the value of tempX_max_hyst. (3.1)

A.7 File Systems

btrfs

  • Add a DCACHE_NEED_LOOKUP flag to d_flags to improve the performance of ls and readdir(). (3.1)

  • Switching from tree locks to reader/writer locks improves the performance of read and write-intensive workloads. (3.1)

  • Performance improvements in several areas, particularly for random write workloads. (3.2)

  • Allowing overcommit of ENOSPC reservations to improve performance. (3.2)

  • Add automatic backup of superblock information about tree roots for the previous 4 commits. Add the -o recovery mount option to enable use the root history log if required. (3.2)

  • Add code to follow back references, replacing the manual process for walking those references, and including more detailed corruption messages. (3.2)

  • Allow user-space utilities to inspect metadata. (3.2)

  • Improve performance of checksum verification of read-aheads. (3.2)

  • Add the nospace_cache mount option to disable cache loading without clearing the cache. (3.2)

  • Improve performance of committing transactions. (3.2)

  • When mounting a subvolume, allow a path relative to the tree root to be specified to -o subvol. (3.2)

  • Rework the logic for cluster allocation. (3.3)

  • Rewrite the block group trimming code. (3.3)

  • Increase the size of system chunks. (3.3)

  • Remove caching code that caused unnecessary fragmentation and complexity. (3.4)

  • Remove the code to silently switching single chunks to RAID-0 when balancing a file system. The restriper now allows a choice of RAID-0 or concatenation. (3.4)

  • Support metadata blocks that are larger than 4 KB. (3.5)

  • The thread_pool size can be changed at remount time. (3.5)

  • Add the DEVICE_READY ioctl() to be used in conjunction with btrfs device ready device, providing a lightweight method of telling if all the devices required for a file system are currently in the cache. (3.6)

  • Allow compression to be disabled by specifying the compress=no mount option. (3.6)

  • Improve multithread buffer reads. (3.6)

  • Support UUIDs for subvolumes, and introduce ctime, otime, stime, and rtime for subvolumes, including a transid for each time. (3.6)

  • Rework the DEV_STATS ioctl() to allow it to either get or reset device statistics depending on the argument specified. (3.6)

  • Make the compress and nodatacow mount options mutually exclusive. To improve O_SYNC performance, asynchronous metadata checksumming is not performed under some circumstances. (3.7)

For more information, see https://btrfs.wiki.kernel.org/index.php/Changelog.

cifs

  • Add UID/GID to SID mapping. (3.2)

  • Add backup mount option. (3.2)

  • Allow larger rsize (up to 16 MB) and change the default to 1 MB. (3.2)

  • Introduce credit-based flow control. (3.4)

  • Add the cache=strict|none mount option to specify the cache type instead of the strictcache and forcedirectio options. The legacy options are now mutually exclusive. (3.5)

  • The vers=2.1 mount option forces an SMB2 mount. By default, vers=1 (CIFS) is used. (3.5)

  • The vers=2.0 mount option forces an SMB2.02 mount. (3.8)

ext4

  • Reduce CPU overhead when appending files preallocated using fallocate() with mode FALLOC_FL_KEEP_SIZE via direct I/O. (3.2)

  • Reduce CPU overhead by optimizing memmove() lengths in extent and index insertions. (3.2)

  • Support block sizes of up to 1 MB using the -C option to mkfs.ext4. This change is not backwards compatible with older kernels. (3.2)

  • Remove the resize and journal=update mount option. (3.4)

  • Improve performance of truncate and unlink. (3.7)

  • Support online resizing of metablock group (META_BG) and 64-bit file systems. (3.7)

  • Add max_dir_size_kb mount option to specify a maximum directory size. (3.7)

  • Re-enable -o discard functionality in no-journal mode. (3.7)

  • Remove support for disabling extended attributes. (3.8)

  • Implement support for SEEK_DATA and SEEK_HOLE. (3.8)

NFS

  • Add support for the RAID-5 read-4-write interface. (3.2)

  • Add v4.0 and v4.1 mount options. (3.4)

  • The kernel can deduce the value of clientaddr if this mount option is not specified for NFS v4. (3.4)

  • Add the migration mount option that specifies whether a server supports Transparent State Migration (TSM). (3.7)

  • Handle IPv6 remote addresses from GETDEVICEINFO (required for pNFS). (3.8)

  • Remove the deprecated nfsctl() system call and all related code. (3.8)

pstore

  • Add runtime logging support for kernel messages to allow debugging of hangs caused by hardware issues. (3.6)

  • Add console message handling. The log size is configurable by using the ramoops.console_size module option, and the log is accessible at pstore-mountpoint/console-ramoops. (3.6)

  • Add persistent function tracing. The kernel can save the function call chain log to a persistent RAM buffer, which can be decoded and dumped after a reboot. You can use the log to determine the function that was called immediately prior to a reset or panic. (3.6)

tmpfs

  • Increase the file size limit for tmpfs. (3.1)

  • Support fallocate() FALLOC_FL_PUNCH_HOLE and preallocation. (3.5)

XFS

  • Improve performance of the inode cache. (3.1)

  • Improve scalability of per-file-system quotas. (3.4)

  • Implement support for SEEK_DATA and SEEK_HOLE. (3.5)

  • Make the inode32 and inode64 mount options work with remounts. (3.7)

  • Make inode64 the default allocation mode. (3.7)

  • Add the XFS_IOC_FREE_EOFBLOCKS ioctl() to enable EOFBLOCKS scanning. (3.8)

A.8 Memory Management

  • Add memory.vmscan_stat memory control group that displays numbers of scanned, rotated, and freed pages, and elapsed times for direct reclaim and soft reclaim. (3.1)

  • Extend the memory hotplug API to allow memory hotplug in virtual machines. Also required for the Xen balloon driver. (3.1)

  • Fix significant stalls in the page allocator when copying large amounts of data on NUMA machines. (3.1)

  • Add slub_debug method to the slub slab allocator to check if memory is not freed and help diagnose memory usage. (3.1)

  • Reduce CPU overhead of slub_debug. (3.1)

  • The cross memory attach feature adds the system calls process_vm_readv and process_vm_writev(), which allow data to be transferred between the address spaces of the two processes without passing through kernel space. (3.2)

  • Add a block plug for page reclaim to vmscan that reduces CPU overhead by reducing lock contention and merging requests. (3.2)

  • Implement per-CPU cache in slub for partial pages. (3.2)

  • Restrict access to slab files under procfs and sysfs, hiding slabinfo and /sys/kernel/slab/*. (3.2)

  • Add the slab_max_order kernel parameter that determines the maximum allowed order for slabs. High settings can cause OOMs due to memory fragmentation. The default value is 1 for systems with more than 32 MB of RAM. Otherwise, the default value is 0. (3.3)

  • To increase the probability of detecting memory corruption, change the buddy allocator to retain more free, protected pages and to interlace free, protected pages and allocated pages. (3.3)

  • Charge the pages dirtied by an exited process to random dirtying tasks. (3.3)

  • Allow the poll time and call intervals to balance dirty pages to be controlled by the value of the max_pause parameter. (3.3)

  • Fix dirtied pages accounting on sub-page writes. (3.3)

  • Introduce the dirty rate limit to compensate a task's think time when computing the final pause time. (3.3)

  • Reduce dirty throttling polls and CPU overhead. (3.3)

  • Avoid tiny dirty poll intervals. (3.3)

  • Make swap-in read-ahead skip over holes, allowing the system to swap back in at several MB/s, instead of a few hundred kB/s. (3.4)

  • Introduce bit-optimized iterator and radix tree cleanup in the core page cache. (3.4)

  • Improve allocation of contiguous memory chunks by adding DMA mapping helper functions. (3.5)

  • Remove swap token code and lumpy reclaim. (3.5)

  • Improve throughput and reduce CPU overhead by allowing swap read-ahead to be merged. (3.6)

  • Add cgroup controller that allows HugeTLB usage per control group to be limited and enforces the limit during page faults. (3.6)

A.9 Networking

  • Add CPU fanout policies for hashing to the packet interface based on mapping socket buffers to Rx hashes, and a pure round-robin scheme. (3.1)

  • Improve the client announcement mechanism in the Better Approach To Mobile Adhoc Networking (B.A.T.M.A.N.) routing protocol. The change resolves performance and latency issues with the previous implementation by appending client changes (new client joined or client left) to the OGM. System overhead is reduced by allowing nodes to modify their global tables by means of updates. The new ROAMING_ADVERTISEMENT packet type eliminates latency and packet drop issues seen with OGM broadcasting. (3.1)

  • Add support for zero-copy socket buffers. Adds user-space buffer support in the socket buffer shared information. (3.1)

  • Use MD5 to compute protocol sequence numbers and fragment IDs per RFC1948. Update code to take into account current CPU speeds and to use a full 32-bit sequence number. (3.1)

  • Add a multicast group for DCB to provide a clean method for disseminating kernel DCB link attributes to user space. (3.1)

  • Add SELinux context support to the AUDIT target of netfilter. (3.1)

  • Add range support for IPv4 to netfilter. (3.1)

  • Lower the default init retransmission timeout (RTO) from 3 seconds to 1 second per RFC2988bis. The RTO falls back to 3 seconds if a SYN or SYN-ACK packet has been retransmitted and the TCP time stamp option is not on. (3.1)

  • Implement support for Auto-ASCONF (see RFC5061) in the Stream Control Transmission Protocol (SCTP) stack. The change includes features for enabling and configuring settings. (3.1)

  • Reduce the false sharing effect. (3.1)

  • Reduce CPU overhead of check_leaf() with the route cache disabled. (3.1)

  • Add support to the virtio_net driver to obtain Rx and Tx ring parameter information from an Ethernet device. Used by the ethtool -g ethX command. (3.2)

  • Implement AP isolation on the receiver and sender side for B.A.T.M.A.N. When a node receives a unicast packet, it checks whether the source and destination client can communicate due to the AP isolation. (3.2)

  • Remove the IPv4 gc_interval from sysctl. (3.2)

  • Add TPACKET_V3 support including a flexible buffer implementation. (3.2)

  • Allow forwarding of some link-local frames by network bridges. You can use /sys/class/net/brX/bridge/group_fwd_mask in sysfs to control frame forwarding. (3.2)

  • Implement TCP proportional rate reduction. (3.2)

  • Add netlink-based Content Addressable Network (CAN) routing. (3.2)

  • Add support for the socket monitoring interface used by the ss tool. (3.3)

  • Add support for the SCSI RDMP Protocol (SRP) target driver. The SRP protocol allows an initiator to access a block storage device on another host (target) over a network that supports the RDMA protocol. Currently, the RDMA protocol is supported by InfiniBand. (3.3)

  • Add unresolved queue limits to neigh. Deprecate /proc/sys/net/ipv4/neigh/default/unres_qlen, and replace it with unres_qlen_bytes. (3.3)

  • Add CAIF USB support. (3.3)

  • Add an extended accounting infrastructure for netfilter over nfnetlink, which allows the display of real-time traffic accounting without requiring a complicated and resource-consuming implementation in user space. (3.3)

  • Add nfacct match to netfilter, which supports extended accounting. (3.3)

  • Add reverse patch filter (rpfilter) to netfilter, which allows matching of packets where replies use the same interface on which the packet arrived. (3.3)

  • Add adaptive random early detection (RED) active queue management (AQM) to the packet scheduler. (3.3)

  • Add an optional RED on top of stochastic fairness queueing (SFQ) to the packet scheduler, enabling SFQ features such as specifying a smaller per flow limit for in-flight packets, up to 65408 active flows (as compared to 127 previously), head drops instead of tail drops, and optional RED on each SFQ flow queue. (3.3)

  • Add 802.1q netpoll support to vlan. (3.3)

  • Add NTF_USE bridge support plus other changes to allow the control of forwarding database via netlink. (3.3)

  • New plug-queuing discipline allows a user space application to plug or unplug a network output queue via the Netlink interface. (3.4)

  • Add the ability to change the routing algorithm at runtime to B.A.T.M.A.N. (3.4)

  • RCU conversion in TCP allows access to MD5 keys without locking the listener socket. (3.4)

  • For some workloads, allowing splice() to build full TSO packets can reduce number of logical packets sent by an order of magnitude, making zero-copy TCP faster than one-copy. (3.4)

  • Add the SO_PEEK_OFF socket option. (3.4)

  • Support peeking offset for datagram sockets, seqpacket sockets, and stream sockets. (3.4)

  • Add MSG_TRUNC support for datagram sockets so that recv() returns the real length of the packet, even if it is longer than the passed buffer. (3.4)

  • Add missing SO_NOFCS socket option. (3.4)

  • Add timeout extension to netfilter, which allows timeout policies to be attached to the flow via the connection tracking target. Add the cttimeout infrastructure for fine timeout tuning. (3.4)

  • Add NAT support for expectation classes in netfilter. (3.4)

  • Add exceptions support to netfilter. (3.4)

  • Merge ipt_LOG and ip6_LOG into xt_log in netfilter. (3.4)

  • Add hardware-independent IEEE 802.15.4 networking stack for softMAC devices. (3.5)

  • Tune performance of sk_add_backlog. (3.5)

  • Add binary option type, a load-balancer module, a per-port option for enabling or disabling ports, and support for per-port options to the team device. (3.5)

  • Add raw packet QP type IB_QPT_RAW_PACKET to InfiniBand core. This allows applications to build a complete packet, including L2 headers, when sending. On the receive side, the hardware does not strip any headers. This feature is designed for user-space direct access to Ethernet. (3.5)

  • Treat ND option 31 as user land (DNSSL support) in IPv6 per RFC6106. The 8-bit identifier of the DNSSL option type assigned by the IANA has the value 31. (3.5)

  • Replace basic bridge loop avoidance code in the batman-adv module. (3.5)

  • Set traffic class for CAIF packets based on socket priority, CAIF protocol type, or type of message. (3.5)

  • Add generic PF_BRIDGE:RTM_FDB hooks and two new flags: NTF_MASTER and NTF_SELF. (3.5)

  • Add Explicit Congestion Notification (ECN) capability to pktsched. Instead of dropping packets, attempt to mark them as ECN. (3.5)

  • Remove support for token ring. (3.5)

  • Remove support for Econet protocol. (3.5)

  • Add an optional QoS attribute to DCB netlink to allow the setting of a rate limit for an ETS TC. 3.5

  • Add CEE notify calls when an APP change or setall command is made from user space. (3.5)

  • Add HMARK target support to netfilter. (3.5)

  • If net.bridge.bridge-nf-filter-vlan-tagged is enabled in sysctl, bridge netfilter removes the vlan header temporarily and feeds the packet to iptables or ip6tables. Add bridge-nf-pass-vlan-input-device, which if set to on (default is off), netfilter also sets the in interface to the vlan interface if this interface exists. This change allows the iptables REDIRECT target work with vlan-on-top-of-bridge configurations and the use of iptables -i" to match the vlan device name. (3.5)

  • Allow byte-based limit mode can be used with netfilter, for example, to support ingress-traffic policing or to detect when a host or port consumes more bandwidth than expected. (3.5)

  • Add support for sync threads to netfilter. (3.5)

  • Remove ip_queue support from netfilter. (3.5)

  • Add support for Layer 2 Tunneling Protocol (L2TP) over UDP in IPv6. (3.5)

  • Add L2TPv3 IP encapsulation support for IPv6. (3.5)

  • Add netlink API for L2TPv3 unmanaged tunnels over IPv6. (3.5)

  • Remove IPv4 routing cache that was vulnerable to denial of service attacks. (3.6)

  • Implement RFC 5691 3.2 and RFC 5961 4.2 (Mitigation against Blind Reset attack using RST bit and SYN bit). (3.6)

  • Add VTI support. (3.6)

  • Add an interface option route_localnet that enables the routing of the 127/8 address block and processing of ARP requests on a specific interface (for example, to address a pool of virtual guests behind a load balancer). (3.6)

  • Add multiqueue and netpoll support to team. (3.6)

  • Add experimental zero-copy Tx support to tun. (3.6)

  • Add support for 40GbE. (3.6)

  • Add fail-open support to netfilter, where the queue-full condition does not drop packets. (3.6)

  • Add user-space connection tracking helper infrastructure to netfilter. (3.6)

  • Extends the ethtool interface to add support for the EEE commands: get_eee'and set_eee. (3.6)

  • Add Generic Routing Encapsulation (GRE) over IPv6, generic segmentation offload (GSO), and GRO capability. (3.7)

  • Set default MTU for loopback devices to 64 KB. Allows TCP stacks to build large frames and significantly reduces stack overhead. (3.7)

  • Add an extended attribute to store data for the mapping between inode numbers in sockfs and protocol types for use by lsof. 3.7

  • Implement a per-task fragmentation allocator, which can improve TCP stream performance by 20% on loopback devices. (3.7)

  • Various netfilter changes:

    • Add a protocol-independent NAT core.

    • Add IPv6 MASQUERADE target.

    • Add IPv6 NETMAP target.

    • Add IPv6 REDIRECT target.

    • Add IPv6 AT support.

    • Support IPv6 FTP NAT helper.

    • Support IPv6 IRC NAT helper.

    • Support IPv6 SIP NAT helper.

    • Support IPv6 in the amanda NAT helper.

    • Add stateless IPv6-to-IPv6 Network Prefix Translation target.

    • Remove xt_NOTRACK.

    (3.7)

  • Add link layer control (LLC) core layer to HCI 2, add an SHDLC llc module to the lic core, and add LLCP raw socket support to NFC. (3.7)

  • Support IPv6 transmit hashing (and TCP or UDP over IPv6) in the bonding driver. (3.7)

  • Add support for dumping diagnostic core and basic socket information (family, type and protocol) at socket creation time. (3.7)

  • Add support to ethtool for setting the MDI/MDI-x state for twisted-pair wiring. (3.7)

  • Add 64-bit statistics support to PPP, including tx_bytes, rx_bytes, tx_packets, and rx_packets. 3.7

  • Add generic netlink support for tcp_metrics that allows unlinking and deletion of entries after a grace period. (3.7)

  • Add bridge port parameters over netlink to permit dumping, monitoring, and changing the bridge multicast database. (3.8)

  • Add support for RFC 5961 5.2 Blind Data Injection Attack Mitigation. (3.8)

  • Change default TCP hash size, and add support for hardware-offloaded encapsulation and offloading of encapsulated packets for VXLAN and IP GRE. (3.8)

  • Add vlan tag access to netfilter. (3.8)

  • Add extensions to VXLAN to support Distributed Overlay Virtual Ethernet (DOVE) networks. (3.8)

  • Add IPv6 set action functionality to openswitch. (3.8)

  • Add GSO support to IPIP tunnels, increasing the performance of a single TCP flow. (3.8)

  • Implement IPv6 fragment handling for IPVS (3.8)

  • Add support in netfilter for querying the destination address of a redirected connection. (3.8)

  • Add NOTRACK target recovery to netfilter. (3.8)

  • Implement QFQ+ in sched. (3.8)

  • Add support for RTM_GETNETCONF to routing netlink. (3.8)

  • Add support for per-association statistics by implementing the SCTP_GET_ASSOC_STATS call for the Stream Control Transmission Protocol (SCTP). (3.8)

  • Add a sysctl that allows the selection of the HMAC algorithm (static or dynamic) used by SCTP. (3.8)

  • Add support for SO_ATTACH_FILTER required to save the full state of a socket. (3.8)

  • Convert tun/tap into a multiqueue device and expose the queues as file descriptors in user space. (3.8)

A.10 perf Utility

  • Add the --symfs option to perf annotate. (3.2)

  • Add the drop monitor script. (3.2)

  • Add the -o and --append options to perf stat. (3.2)

  • Add the -M option. (3.2)

  • Add annotation output controls to all perf tools that have integrated annotation. (3.2)

  • Include information about the host environment in perf.data:

    HEADER_HOSTNAME

    Host name.

    HEADER_OSRELEASE

    Kernel release number.

    HEADER_ARCH

    Hardware architecture.

    HEADER_CPUDESC

    Generic CPU description.

    HEADER_NRCPUS

    Number of online, available CPUs.

    HEADER_CMDLINE

    perf command line.

    HEADER_VERSION

    perf version.

    HEADER_TOPOLOGY

    CPU topology.

    HEADER_EVENT_DESC

    Full event description (attrs).

    HEADER_CPUID

    Easy-to-parse, low-level CPU identification.

    (3.2)

  • Accept FIFOs as input files. (3.3)

  • Add -a option for system-wide profiling. (3.3)

  • Implement printing snapshots to files. (3.6)

  • Add sort by source line number. (3.6)

  • Add PMU event alias support. (3.6)

  • Add support for perf kvm stat to analyze kvm vmexit, mmio, and ioport. (3.7)

  • Add union member access. (3.7)

  • Add --list-opts option to print long option names for use with bash. (3.7)

  • Add script browser. (3.8)

  • Add new display options (-F, -p, and -P) to perf diff. (3.8)

  • perf inject now supports input from a file. 3.8

  • Add --pre and --post options to perf stat. (3.8)

  • Add gtk.command config option to launch the GTK browser. This is equivalent to specifying --gtk option on command line (3.8)

  • Add new features to perf trace. (3.8)

  • Expose hardware events translations in sysfs. (3.8)

  • Add trace_options boot parameter to set trace options at boot time, such as enabling event stack dumps. (3.8)

A.11 Power Management

  • Add a generic DVFS framework with device-specific (non-CPU) OPPs. (3.2)

  • Improve performance of LZO/plain hibernation. (3.2)

  • Implement per-device power management QoS contraints. (3.2)

A.12 Security

  • Add /sys/kernel/security/tomoyo/audit_interface, which generates audit logs in the form of domain policy so they can be reused and appended to domain_policy interface by the TOMOYO auditing daemon (tomoyo-auditd). TOMOYO is a kernel security module which implements mandatory access control (MAC). (3.1)

  • Add ACL group support for TOMOYO, which allows permissions to be globally granted. (3.1)

  • Add policy namespace support for LXC (Linux containers). The policy namespace has its own set of domain policy, exception policy and profiles, independent of other namespaces. (3.1)

  • Add built-in policy support needed to support enforcing mode from early in the boot sequence. (3.1)

  • Make several TOMOYO options configurable to support activating access controls without calling an external policy loader program. (3.1)

  • Permit the use of the following properties as conditions with TOMOYO: argv[], envp[], execve(), executable's real path and symlink target, owner or group of file objects, and the UID or GID of the current thread. (3.1)

  • Implement Extended Verification Module (EVM), which protects a file's security extended attributes (xattrs) against integrity attacks. (3.2)

  • Implement Smack protections for domain transition: BPRM unsafe flags, secure exec, clear unsafe personality bits, and clear parent death signal. (3.2)

  • Enhance performance of Smack rule list lookups. (3.2)

  • Allow user access to /smack/access, removing the requirement for CAP_MAC_ADMIN. (3.2)

  • Add environment variable name restriction to TOMOYO. (3.2)

  • Add socket operation restriction to TOMOYO. (3.2)

  • Add control for generation of access granted logs in TOMOYO. (3.2)

  • Allow domain transition without execve() in TOMOYO. (3.2)

  • Allow audit matching on inode gid. (3.3)

  • Allow inter-field comparison in audit rules between the gid of a running task and the gid of an inode. (3.3)

  • Add a new audit filter type AUDIT_FIELD_COMPARE to indicate which fields should be compared. (3.3)

  • Allow system call exit filter matching based on the uid of the owner of an inode used in the call. (3.3)

  • Add support for digital signature verification in EVM. File metadata can be protected using digital signatures instead of HMAC. (3.3)

  • Add a Yama Linux security module to collect DAC security improvements. (3.4)

  • Add AppArmor security module file tracking to securityfs. (3.4)

  • Add AppArmor security module initial features directory to securityfs for displaying boolean features flags and the known capability mask. (3.4)

  • Add default_type statements to SELinux. (3.5)

  • Add default source and target selectors for the user, role, and range of new objects in SELinux. (3.5)

  • Allow seek operations on the file-exposing policy used by the sesearch SELinux policy query tool. (3.5)

  • Add auditing of failed attempts to set invalid labels in SELinux. (3.5)

  • Add checking for the open permission on truncate calls to SELinux. (3.5)

  • Support long Smack labels. (3.5)

  • Set recursive transmute attribute for Smack in all cases. (3.5)

  • Allow manager programs which do not start with / in TOMOYO to handle differences between distributions. (3.5)

  • Add two modes to the Yama ptrace restrictions. (3.5)

  • Add support for invalidating a key. (3.5)

  • Implement revoking of all rules for a subject label in Smack. (3.7)

  • Allow Yama to be unconditionally stacked, regardless of which LSM module is primary. (3.7)

  • Add the Integrity Measurement Architecture, which supports audit log hashes, digital signature verification, and the integrity appraisal extension. (3.7)

A.13 Storage

Block management in the software RAID MD layer now adds bad blocks to a bad-block list so that the system does not use them. (3.1)

A.14 Virtualization

  • Add memory hotplug support for the Xen balloon driver. (3.1)

  • Add Xen PCI backend driver. (3.1)

  • Implement discard requests and support old-style BARRIER. (3.2)

  • Increase recommended maximum number of VCPU from 64 to 160. (3.4)

  • Allow host IRQ sharing for assigned PCI 2.3 devices. (3.4)

  • Add infrastructure for software and hardware-based TSC rate. (3.4)

  • Move the Hyper-V storage driver out of the staging area. (3.4)

  • Add support for VLAN trunking to Hyper-V. Linux guests can now configure multiple VLANs using a single synthetic NIC on a Windows 8 Hyper-V host. (3.4)

  • Support new KVP message types. (3.4)

  • Support new KVP verbs for Hyper-V in the user level daemon. (3.4)

  • Implements multiconsole support for Hyper-V. 3.4

  • Support enumeration from all available pools for Hyper-V. (3.4)

  • Update Xen ACPI processor to implement C and P state driver that uploads ACPI data to the hypervisor. (3.4)

  • Add netconsole support to Xen. (3.4)

  • Use the S4 code to provide S3 support for virtio devices. (3.4)

  • Add a virtio-based remote processor messaging bus to allow message-based communication with the remote processor (if supported by the firmware). (3.4)

  • Add direct MSI message injection for in-kernel IRQ chips. (3.5)

  • Unregister from the hwrng interface and remove the virtio queue before entering the S3 or S4 states. On restore, add the virtio queue and re-register with hwrng. (3.6)

  • Add mcelog support to Xen. (3.6)

  • Reduce the I/O path in the guest kernel to achieve high IOPS and lower latency. (3.7)

  • Add Xen EFI video mode support. (3.7)

  • Implement backend support for paged out grant targets (retry loop and hooks). (3.7)

  • Implement Xen ACPI processor aggregator driver (pad). (3.8)

  • Remove support for i386 processors. (3.8)