1 New Features and Changes

The Unbreakable Enterprise Kernel Release 4 (UEK R4) is Oracle's fourth major release of its heavily tested and optimized operating system kernel for Oracle Linux 6 Update 7 and later and Oracle Linux 7 Update 1 and later on the x86-64 architecture. It is based on the mainline Linux kernel version 4.1.12. This release also updates drivers and includes bug and security fixes.

Oracle actively monitors upstream checkins and applies critical bug and security fixes to UEK R4.

UEK R4 uses the same versioning model as the mainline Linux kernel version. It is possible that some applications might not understand the 4.1 versioning scheme. However, regular Linux applications are usually neither aware of nor affected by Linux kernel version numbers.

Notable Changes

The following sections describe the major new features of Unbreakable Enterprise Kernel Release 4 (UEK R4) relative to UEK R3.

Containers

The following notable features of containers are implemented in UEK R4:

Core Kernel Functionality

The following notable core kernel features are implemented in UEK R4:

  • The performance of SPECjbb is improved for a system with more than 10 CPUs by removing contention for the global epmutex lock, which is used in EPOLL_CTL_ADD and EPOLL_CTL_DEL operations. For example, in a typical 16-socket run the performance increases from 35k jOPS to 125k jOPS. Benchmarks also exhibit good scaling from 10 sockets to over 40 sockets.

  • The sysctl_numa_balancing_settle_count parameter used by the NUMA scheduler has been removed.

  • The following tracepoints are now provided to monitor NUMA scheduler activity:

    trace_sched_move_numa

    Triggered when a task is moved to a node.

    trace_sched_stick_numa

    Triggered when a NUMA migration fails.

    trace_sched_swap_numa

    Triggered when a task is swapped for another task.

  • The new SCHED_STACK_END_CHECK kernel debugging option can be used to check for a stack overrun on calls to schedule() on a NUMA system. If the stack end location is overwritten, the system panics as the content of the corrupted region cannot be trusted.

  • Sysbench performance has been improved by preventing spurious active NUMA migration.

  • CPU clock frequency scaling for performance management. The possible governor settings as displayed by /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor are:

    ondemand

    Sets the CPU clock frequency between the minimum and maximum possible frequencies, according to the current demand usage. The following sysfs parameters are adjustable:

    ignore_nice_load

    Whether processes with a nice value count (0) or do not count (1) toward CPU usage. The default value is 0.

    powersave_bias

    How much to reduce the target CPU frequency by as a fraction of 1000. A value of 0 disables this feature.

    sampling_down_factor

    A multiplier that the kernel applies to sampling_rate when the CPU is running at its maximum clock frequency. The default value is 1.

    sampling_rate_min

    Minimum sampling rate.

    sampling_rate

    Interval in microseconds between assessments of whether the kernel needs to change the clock frequency.

    up_threshold

    Threshold of average CPU usage as a percentage for the kernel to increase the clock frequency.

    ondemand is the default governor setting if tuned is not configured.

    This setting is equivalent to powersave for more recent microarchitecture CPUs (for example, Haswell, Broadwell, and later) with which the pstate power scaling driver can interact. For older design architecture CPUs (for example, Ivy Bridge, Sandy Bridge, and earlier), ondemand is equivalent to performance as the cores must be kept in a higher power state to minimize CPU latency.

    performance

    Sets the CPU clock frequency to the maximum possible frequency.

    Note:

    performance is the default governor setting for the tuned throughput-performance profile.

    The performance profile is appropriate for some real-time applications but it might not be appropriate for all workloads. Running a CPU at maximum frequency can prevent turbo mode from being enabled because doing so would exceed the thermal envelope.

    powersave

    Sets the CPU clock frequency to the minimum possible frequency.

    userspace

    Permits a user-space program running as an effective root user to control the CPU clock frequency by creating and using a file named scaling_setspeed in the CPU-device directory under sysfs.

    Oracle recommends that you use tuned-adm to select a tuned performance profile for your system that is based on its hardware and software configuration, for example:

    • If your system has Xeon processors or multiple disks, choose a profile such as latency-performance for a cloud server, throughput-performance for a database server, or virtual-host for a virtual host server.

      Note:

      These profiles set the CPU governor setting to performance, which might not be appropriate for all workloads.

    • For a virtual machine guest, choose the virtual-guest profile.

    • For a laptop, choose a suitable laptop profile such as laptop-ac-powersave or laptop-battery-powersave.

    • For a desktop machine, choose either the desktop or balanced profile.

    You can use the tuned-adm list command to display the available profiles.

    If tuned is not configured, the default CPU governor setting is ondemand, which can cause some bursty, CPU-intensive workloads to run more slowly because of demand hysteresis.

    If necessary, you can create your own performance profiles based on the profiles that are provided in the /etc/tune-profiles directory hierarchy.

    When comparing system performance under different profiles, use benchmarks that simulate your server's typical workload.

    For more information, see the tuned(8) and tuned-adm(1) manual pages, which are available in the tuned package.

Cryptography

The following notable cryptographic features are implemented in UEK R4:

  • Accelerated CRC T10 DIF computation with the PCLMULQDQ instruction.

  • LZ4 Cryptographic API.

  • Support for sha256_ssse3, SHA-224, sha512_ssse3, and SHA-384.

  • Support for the AMD cryptographic coprocessor, which can be used to accelerate or offload AES, SHA, and other encryption operations.

File Systems

The following sections detail the most notable features that have been implemented for file systems in UEK R4:

btrfs
  • The skinny-metadata feature is not enabled by default as it is incompatible with UEK R3. (Bug ID 22123918)

  • The btrfs filesystem balance command does not warn that the RAID level can be changed under certain circumstances, and does not provide the choice of cancelling the operation. (Bug ID 16472824)

  • Commands such as du can show inconsistent results for file sizes in a btrfs file system when the number of bytes that is under delayed allocation is changing. (Bug ID 13096268)

  • The copy-on-write nature of btrfs means that every operation on the file system initially requires disk space. It is possible that you cannot execute any operation on a disk that has no space left; even removing a file might not be possible. The workaround is to run sync before retrying the operation. If this does not help, remount the file system with the -o nodatacow option and delete some files to free up space. See https://btrfs.wiki.kernel.org/index.php/ENOSPC.

  • If you run the btrfs quota enable command on a non-empty file system, any existing files do not count toward space usage. Removing these files can cause usage reports to display negative numbers and the file system to be inaccessible. The workaround is to enable quotas immediately after creating the file system. If you have already written data to the file system, it is too late to enable quotas. (Bug ID 16569350)

  • The btrfs quota rescan command is not currently implemented. The command does not perform a rescan and returns without displaying any message. (Bug ID 16569350)

  • When you overwrite data in a file, starting somewhere in the middle of the file, the overwritten space is counted twice in the space usage numbers that btrfs qgroup show displays. (Bug ID 16609467)

  • If you run btrfsck --init-csum-tree on a file system and then run a simple btrfsck on the same file system, the command displays a Backref mismatch error that was not previously present. (Bug ID 16972799)

  • Btrfs tracks the devices on which you create btrfs file systems. If you subsequently reuse these devices in a file system other than btrfs, you might see error messages such as the following when performing a device scan or creating a RAID-1 file system, for example:

    ERROR: device scan failed '/dev/cciss/c0d0p1' - Invalid argument

    You can safely ignore these errors. (Bug ID 17087097)

  • If you use the -s option to specify a sector size to mkfs.btrfs that is different from the page size, the created file system cannot be mounted. By default, the sector size is set to be the same as the page size. (Bug ID 17087232)

  • The btrfs-progs and btrfs-progs-devel packages for use with UEK R4 are made available in the ol6_x86_64_UEKR4 and ol7_x86_64_UEKR4 ULN channels and the ol6_UEKR4 and ol7_UEKR4 Oracle Linux yum server repositories. In UEK R3, these packages were made available in the ol6_x86_64_latest and ol7_x86_64_latest ULN channels and the ol6_latest and ol7_latest Oracle Linux yum server repositories.

efivarfs

The Unified Extensible Firmware Interface (UEFI) variable file system (efivarfs) is enabled on systems that support UEFI. For Oracle Linux 7, systemd automatically mounts efivarfs. For Oracle Linux 6, efivarfs is not mounted by default. If required, you can mount efivarfs, for example:

# mount -t efivarfs efivarfs /sys/firmware/efi/efivars
ext4

The following ext4 features have been implemented:

  • Metadata checksumming can be enabled by specifying the metadata_csum option when making a file system.

  • 64-bit file system support, which allows you to format a file system that is larger than 16 TB, can be enabled by specifying the 64bit option when making a file system.

  • Improved synchronization speed for database workloads.

  • Improved write-back performance if delayed allocation is disabled using the nodelalloc mount option or if ext2 or ext3 compatibility mode is used.

  • Improved extent-tree memory caching.

  • Improved stabilization of hole punching using fallocate().

  • Improved data and hole seeking using lseek().

The following features are considered experimental and are not supported:

  • Big allocation (bigalloc), which does not currently work with fallocate().

  • Inline data, which stores the data for small files in the available space between on-disk inode data structures.

  • File-system image creation from a directory using mke2fs.

  • Specifying an external journal by using the pathname mount option.

FUSE

The following FUSE features have been implemented:

  • Asynchronous I/O support.

  • Optimized short direct reads.

  • Writepages callback improves memory-mapped writeout by mmap.

Cached writeback support is not currently supported by the user-space applications that are provided with Oracle Linux 6 and Oracle Linux 7.

NFS

The following NFS features have been implemented:

Overlayfs

The overlayfs file system is an implementation of a union file system that makes several file systems appear as a single file system when mounted. An overlayfs file system consists of a lower file system and an upper file system which share a single file system namespace. After a file is opened in an overlayfs file system, all operations go directly to the underlying lower or upper file systems, which simplifies the implementation and allows native performance compared to other union file system implementations. A typical use case is to use a read-only OS image as the lower file system and a writeable RAM-backed file system as the upper file system. Modified data is written to the upper file system only and not to the OS image.

Both the upper and lower file systems can be directory trees within the same file system and neither needs to be the root of a file system. The lower file system can be any supported file system, including an overlayfs file system, and does not need to be writable. If the upper file system is writable, as is usually the case, it must support the creation of trusted.* extended attributes and it must provide valid d_type file type in the direct structure returned by readdir(). For example, an NFS file system cannot be used for the upper file system.

The overlayfs file system is not available with UEK R3.

For more information, see https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt.

XFS

The following XFS features have been implemented:

  • The new directory entry file type improves the performance of directory recursion by not having to access the inode data from disk.

  • Namespace support.

  • Defragmentation support for the new CRC file system format.

  • The XFS v5 disk format provides metadata CRC, object back references, better crash recovery, and improved xfs_repair performance. The metadata CRC feature is experimental and not currently supported.

Memory Management

The following notable memory management features are implemented in UEK R4:

  • The MAP_HUGETLB flag has been implemented in mmap to support huge-page memory mapping with hugetlbfs.

  • Problems have been addressed with kswapd and page reclaim behavior during large copy operations or when memory was low.

  • Improved page table access scalability in threaded huge-page workloads by reducing lock contention in the page table.

    For more information, see https://lwn.net/Articles/568076/.

  • Improve page-fault scalability in hugetlb by handing concurrent page faults. Previously, the kernel could only handle a single hugetlb page fault at a time. Typically, the startup time for a 10-gigabyte Oracle database, which generates approximately 5000 page table faults, decreases to 25.7 seconds from 37.5 seconds. Larger workloads should experience even greater improvements in start-up times.

  • Support gigantic page allocation in hugetlb at runtime in addition to the existing boot-time allocation.

  • The unqueued slab allocator (SLUB) is now the default memory allocator for kernel objects. SLUB eliminates the fragmentation that is caused by memory allocation and deallocation by reusing memory that was previously allocated to a data object of the same type.

Networking

The following notable networking features are implemented in UEK R4:

  • The following VXLAN features have been implemented:

    • Layer 2 redirection with layer 3 switching.

    • Setting destination to a unicast address.

    • UDP tunnel segmentation.

    • IPv6 support.

    • Transmit-side VLAN offload for VXLAN devices.

    • Link configuration for transmitting UDPv4 checksums, and transmitting and receiving UDPv6 checksums.

    • Switch the network namespace when a packet is encapsulated or unencapsulated.

  • Per-socket network polling is supported with the bnx2x, ixgbe, and mlx4 network card drivers, which reduces the latency inherent in the NAPI periodic polling method.

    For more information, see https://lwn.net/Articles/551284/ and 2012-lpc-Low-Latency-Sockets-slides-brandeburg.pdf.

  • The new PIE (Proportional Integral controller Enhanced) network packet scheduler controls the average queueing latency to overcome buffer bloat, ensure low latency and achieve high link utilization under various congestion scenarios with very small overhead.

    For more information, see https://tools.ietf.org/html/draft-pan-tsvwg-pie-00.

  • Support for configuring the SR-IOV virtual function (VF) minimum and maximum transmission rates by using the ip command.

    For more information, see git commit ed616689a3d95eb6c9bdbb1ef74b0f50cbdf276a.

  • Support for SR-IOV VF link state control by using the ip command. Previously, VF links were always on, regardless of the physical link status, which allows VMs on the same virtual Ethernet bridge to communicate even if the physical function (PF) link state is down. However, if the VFs were bonded in active/standby mode, this configuration prevented failover when the physical link used by a VF went down. You can now use the ip link set command to configure the behavior of a VF link:

    # ip link set
    device
    vf
    number
    state { auto | enable | disable }

    The possible settings are:

    auto

    The VF link state is determined by the PF link state. This setting is suitable for VFs that are bonded in active/standby mode.

    disable

    The VF link state is permanently down.

    enable

    The VF link state is permanently up. This is the default setting.

  • The following Open vSwitch (OvS) features have been implemented:

    • Generic routing encapsulation (GRE) tunnels.

    • User-space tunneling interface.

    • Stream Control Transmission Protocol (SCTP) support.

    • VXLAN tunneling support.

    • Wild-carded flow implementation.

    • TCP bitwise flag matching.

      For more information, see git commit 5eb26b156e29eadcc21f73fb5d14497f0db24b86

    • Allow user space to announce ability to accept unaligned Netlink messages.

    • Enable memory-mapped Netlink I/O.

    • Enable tunnel generic segmentation offloading (GSO) for Open vSwitch bridge devices so that Open vSwitch can take advantage of hardware offloading to the underling devices.

    • Add recirc and hash action to support distributing packets between the ports of bond devices.

    • Add support for generic network virtualization encapsulation (Geneve) tunneling.

  • The nftables framework provides packet filtering and packet classification features as a replacment for the arptables, ebtables, iptables, and ip6tables frameworks. For example, see https://lwn.net/Articles/564095/.

    The following nftables features have been implemented:

    • Replaced iptables, while providing backwards compatibility.

    • IPv4 and IPv6 masquerading

    • Pre-routing and post-routing filtering.

    • Extended NFT_MSG_DELTABLE call to support flushing the rule set.

    • Add filter support for skipping accounting objects.

    • Add support for exporting the rule-set generation ID.

    • Add CPU attribute support for matching packets against CPU number.

    • Add support for matching packet types for the inet, ip, and ipv6 table families based on link-layer information. For loopback traffic, the packet type is deduced from the network layer header.

    • Add support for matching the device group of a packet's incoming or outgoing interface.

  • TCP Fast Open optimization is enabled by default in UEK R4 for applications that take advantage of this feature.

  • Generic network virtualization encapsulation (Geneve) provides a tunneling framework for establishing layer 2 networks over layer 3 networks.

    For more information, see http://tools.ietf.org/html/draft-gross- geneve-01 and http://blogs.vmware.com/cto/geneve-vxlan-network-virtualization-encapsulations/.

  • Transmission queue batching defers flushing transmission socket buffers to the network driver to reduce the overall cost of processing the transmission queue and can result in a higher effective packet transmission rate. The i40e, igb, ixgbe, mlx4, and virtio_net drivers support this feature.

    For more information, see https://lwn.net/Articles/615238/ and http://netoptimizer.blogspot.com/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html.

NUMA

Many modern multiprocessors have non-uniform memory access (NUMA) memory designs, where the performance of a process can depend on whether the memory range being accessed is attached to the local CPU or to another CPU. As performance is different depending on memory locality, the operating system should ideally schedule a process to run on the CPU whose memory controller is connected to the memory to be accessed.

The following notable NUMA features are implemented in UEK R4:

  • Support NUMA affinity for unbound workqueues.

  • A new NUMA subsystem provides improved performance for NUMA systems. New NUMA policies attempt to place a process near its memory, can share pages between processes and handle transparent huge pages. (3.8, 3.13)

    The following sysctl parameters allow you to enable, disable and tune NUMA scheduling:

    numa_balancing_scan_delay_ms

    Scan delay in milliseconds used for starting a task when it initially forks.

    numa_balancing_scan_period_max_ms

    Maximum delay in milliseconds between scanning for tasks.

    numa_balancing_scan_period_min_ms

    Minimum delay in milliseconds between scanning for tasks.

    numa_balancing_scan_period_reset

    Resets the scan delay period.

    numa_balancing_scan_size_mb

    Amount of pages in megabytes scanned per scan.

    For more information, see https://lwn.net/Articles/568870/.

  • Add the numa_balancing sysctl parameter to enable or disable automatic NUMA memory balancing.

  • Improved algorithm for NUMA migrations that maximizes the performance of workloads that do not fit on one NUMA node.

  • Memory zones are allocated by the page allocator in node order on 64-bit NUMA systems by default.

Real Time

The following notable real-time features are implemented in UEK R4:
  • Dynamic ticks and full CPU time accounting infrastructure.

  • Timerless multitasking support allows the system to run processes without needing to fire up the timer interrupt that is traditionally used to implement multitasking. (3.10, 3.12)

    For more information, see https://lwn.net/Articles/549580/ and https://lwn.net/Articles/558284/.

  • Deadline scheduling provides deadline, period, and runtime parameters for scheduling processes in the SCHED_DEADLINE scheduling class. These process are guaranteed to receive runtime microseconds of execution time every period microseconds and these runtime microseconds are available within deadline microseconds from the beginning of the period. The task scheduler runs the process with the lowest deadline value.

    For more information, see git documentation 712e5e34aef449ab680b35c0d9016f59b0a4494c and https://lwn.net/Articles/575497/.

Security

The following notable security features are implemented in UEK R4:

  • The physical and virtual address at which the kernel image is decompressed is randomized to deter exploit attempts that rely on knowing the location of the kernel internals.

  • The Kexec feature, which allows faster rebooting or automatically booting a new kernel after a crash, now incorporates support for allowing only signed Kexec kernels for use with UEFI secure booting.

  • The kexec_load_disabled sysctl parameter can be used to disable Kexec, which allows a system to be better protected against privilege escalation.

  • An exe field has been added to the auditing log to record the pathname of executables that produce core dumps.

  • An audit_backlog_wait_time configuration option has been added to the auditing subsystem so that if auditd cannot keep up or is blocked, callers are not blocked.

  • If the value of the audit_backlog_limit parameter is set to zero, the length of the backlog queue is limited only by the amount of system memory.

  • By default, errors on AUDIT_NEVER rules are now logged.

  • The auditing subsystem now logs task information when the state of a feature is changed.

  • A netlink multicast socket has been added to read-only user-space clients such as systemd to allow read-only access to the audit logs.

  • Secure generation of random numbers with the getrandom system call. Linux systems traditionally obtained their random numbers from /dev/[u]random. This interface is vulnerable to file descriptor exhaustion attacks, where the attacker consumes all available file descriptors, and is also inconvenient for use in containers. The getrandom system call, which analogous to the getentropy call in OpenBSD overcomes these problems.

  • SELinux now reports permissive mode in avc: denied messages.

Storage

The following notable storage features are implemented in UEK R4:

  • The device mapper dm-cache target allows you to use a fast device such as an SSD as a cache for a slower device such as a rotating disk. You can use various policy plugins to change the selection algorithms for performing actions such as promoting, demoting, cleaning blocks. dm-cache supports both writeback and write-through modes. This feature is still flagged as experimental and might not be suitable for production systems.

    Updates to dm-cache added support for a passthrough mode when the cache contents might not be consistent with the underlying device, cache block invalidation, and cache shrinking.

    For more information, see https://www.kernel.org/doc/Documentation/device-mapper/cache.txt.

  • Bcache is a block layer cache that allows you to use SSDs to cache slower block devices. Bcache can perform both writeback and write-through caching, has no file-system dependencies, is simple to use, and works well on any setup without requiring any configuration.

    For more information, see https://www.kernel.org/doc/Documentation/bcache.txt, https://bcache.evilpiepirate.org/, and https://lwn.net/Articles/497024/.

  • The new, scalable multiqueue block layer subsystem (blk-mq) for supporting high performance SSD storage implements per-CPU submission queues for receiving I/O requests, which are directed to hardware submission queues. The separate per-CPU submission and hardware submission queues balances the I/O workload across multiple CPU cores and reduces latency. The design supports the interface and features of the traditional block layer, but it is also capable of supporting many millions of I/O operations per second by taking advantage of the capabilities of NVM-Express or high-end PCI-E devices and multicore CPUs.

    For more information, see https://lwn.net/Articles/552904/.

  • The device mapper dm-era target behaves similarly to the linear target with the addition of tracking any blocks that were written within an era, which is a time period that you can define. Typical use cases are tracking the changed blocks in backup software and restoring cache coherency after rolling back a snapshot by partially invalidating the cache contents.

    For more information, see https://www.kernel.org/doc/Documentation/device-mapper/era.txt.

OFED Support

The OpenFabrics Enterprise Distribution (OFED) 2.0 stack is integrated with UEK R4, and supports all Oracle branded InfiniBand (IB) hardware, on systems with an x86-64 architecture. This includes:

  • Sun InfiniBand Dual Port 4x QDR Host Channel Adapters M2

  • Oracle Dual Port QDR Infiniband Adapter M3

  • Oracle Dual Port QDR InfiniBand Adapter M4

  • Oracle Dual Port EDR InfiniBand Adapter

OFED 2.0 supports the following protocols with UEK R4:

  • iSCSI Extensions for remote direct memory access (iSER) provide access to iSCSI storage devices

  • Reliable Datagram Sockets (RDS) is a high-performance, low-latency, reliable connectionless protocol for datagram delivery

  • Sockets Direct Protocol (SDP) supports stream sockets for RDMA network fabrics

  • Ethernet over InfiniBand (EoIB)

  • Internet Protocol over InfiniBand (IPoIB)

Note:

Ethernet tunneling over IPoIB (eIPoIB) is not supported with UEK R4.

OFED 2.0 supports the following RDS features with UEK R4:

  • Async Send (AS)

  • Quality of Service (QoS)

  • Active Bonding (AB)

  • Netfilter (NF)

  • Shared Request Queue (SRQ)

Note:

Automatic Path Migration (APM) is not supported with UEK R4.

Support for IB, OFED, and RDS is integrated into the kernel. The OFED user-space RPMs continue to be provided, but the kernel-ib and ofa-kernel RPMs are not required.

Virtualization

The following notable virtualization features are implemented in UEK R4:

  • Hyper-V support for netpoll allows a network console to be used to debug kernel issues.

  • The following Xen features have been implemented:

    • xen-netback support for changing the MAC address of an interface

    • ACPI support for CPU and memory hotplug, including a new memory hotplug driver.

    • xen-netback support for gathering zerocopy statistics and TX grant mapping.

    • Support for MSI message groups in Dom0.

    • Substantially improved performance of Xen virtual network interfaces by implementing multiple queue support between xen-netback and xen-netfront.

    • EFI support in Dom0.

    • Xen PVSCSI backend and frontend driver support for high performance passthrough of SCSI devices or LUNs from Dom0 to a Xen PV or HVM guest.

    • Remapping of existing MFNs that were replaced by the identity map to prevent non-contiguous pages occurring in Dom0.

    • Improved PV ticket locks provide more efficient locking of guests for workloads that rely on this mechanism. If a spin lock is not available for more than a brief period, the lock code stops spinning and calls the hypervisor to wait until the lock becomes available again.

    • NUMA topology and I/O exposure to guests.

    • PVH guests now support Paravirtualized Hardware extensions (v3).

Zram

Zram compresses everything written to specified block devices in RAM, and is used typically for swap devices to improve the responsiveness of systems that have a limited amount of memory. The following example illustrates how to create and enable a zram swap device:

# mkswap /dev/zram0
# swapon /dev/zram0

The next example illustrates how to create a file system on a zram device and then mount this file system:

# mkfs.ext4 /dev/zram1
# mount /dev/zram1 /tmp

The following notable zram features are implemented in UEK R4:

  • Zram has been moved out of staging to drivers/block/zram.

  • Support for LZ4 compression in addition to LZO.

  • Performance improvements to concurrent compression of multiple compression streams.

  • Support for switching the compression algorithm in /sys/block/zramN/comp_algorithm.

  • Support for limiting the maximum amount of useable memory for a zram device in /sys/block/zramN/mem_limit. You can use memory unit suffixes when setting a value, for example:

    # echo 1G > /sys/block/zram0/mem_limit

    To disable the limit, set the value to 0.

  • Support for displaying the maximum memory that a zram device has consumed in /sys/block/zramN/mem_used_max. Writing 0 to this file resets the counter.

Zswap

Zswap is a lightweight, write-behind compressed caching mechanism for swap pages that attempts to compress a page being swapped out to RAM. A successful compression defers and, in many cases, prevents writeback to the swap device, reducing I/O and increasing the performance of a system that is swapping.

For more information, see https://lwn.net/Articles/537422/.

Driver Updates

The Unbreakable Enterprise Kernel Release 4 supports a large number of hardware and devices. In close cooperation with hardware and storage vendors, Oracle has updated several device drivers from the versions in mainline Linux 4.1.12.

The following table details commonly used drivers that have been updated since UEK R3 quarterly update 7:

Vendor Driver Version Description

Avago

megaraid_sas

06.807.10.00-rc1

Avago MegaRAID SAS driver

Avago

mpt2sas

20.100.00.00

LSI MPT Fusion SAS 2.0 device driver

Avago

mpt3sas

09.100.00.00

LSI MPT Fusion SAS 3.0 device driver

Chelsio

csiostor

1.0.0

Chelsio FCoE driver (Oracle Linux 7 only)

Chelsio

cxgb4

2.0.0-ko

Chelsio T4 network driver

Chelsio

cxgb4i

0.9.4

Chelsio T4 iSCSI driver

Chelsio

cxgb4vf

2.0.0-ko

Chelsio T4/T5 Virtual Function (VF) network driver

Cisco

enic

2.3.0.12

Cisco VIC Ethernet NIC driver

Cisco

fnic

1.6.0.22

Cisco FCoE HBA driver

Cisco

usnic_verbs

1.0.3

Cisco VIC (usNIC) Verbs driver

Emulex

be2iscsi

10.6.0.1

Emulex OneConnect Open-iSCSI driver

Emulex

be2net

10.6.0.4

Emulex OneConnect 10Gbps NIC driver

Emulex

lpfc

11.0.0.3

Emulex LightPulse Fibre Channel SCSI driver (includes Lancer G6 and multi-queue support)

Emulex

ocrdma

10.6.0.0

Emulex OneConnect RoCE driver

HP

cciss

3.6.26

Driver for HP Smart Array controllers (Oracle Linux 6 only)

Intel

e1000e

3.2.6-k

Intel PRO/1000 network driver

Intel

i40e

1.3.21-k

Driver for Intel Ethernet controller XL710 family

Intel

i40evf

1.3.13

Intel XL710 X710 Virtual Function network driver

Intel

igb

5.3.0-k

Intel Gigabit Ethernet network driver

Intel

igbvf

2.0.2-k

Intel Gigabit Virtual Function driver

Intel

iw_nes

1.5.0.1

NetEffect RNIC low-level iWARP driver

Intel

ixgb

1.0.135-k2-NAPI

Intel PRO/10GbE network driver

Intel

ixgbe

4.2.1-k

Intel 10 Gigabit PCI Express network driver

Intel

ixgbevf

2.12.1-k

Intel 82599 Virtual Function driver

Intel

nvme

1.0

Non-volatile memory (Fultondale) driver

Mellanox

mlx4_core

2.2-1

Mellanox ConnectX HCA low-level driver

Mellanox

mlx4_en

2.2-1

Mellanox ConnectX HCA Ethernet driver

Mellanox

mlx4_ib

2.2-1

Mellanox ConnectX HCA InfiniBand driver

Mellanox

mlx5_ib

2.2-1

Mellanox Connect-IB HCA IB driver

Oracle

hxge

1.3.6

10 Gigabit Network Driver

Oracle

sxge

0.09302015

SXGE Ethernet driver

Oracle

sxgevf

0.09302015

SXGEVF Ethernet driver

Oracle

xscore

6.0.r8008

Core services driver

Oracle

xsvhba

6.0.r8008

VHBA Driver

Oracle

xsvnic

6.0.r8008

XSVNIC network driver

Oracle

xve

6.0.r8008

Virtual Ethernet driver

QLogic

bfa

3.2.23.0

Brocade Fibre Channel HBA driver fcpim (informational only, no significant update)

QLogic

bna

3.2.25.1

QLogic BR-series 10G PCIe Ethernet driver (informational only, no significant update)

QLogic

bnx2

2.2.6

QLogic BCM57xx driver

QLogic

bnx2fc

2.9.6

QLogic BCM57712 FCoE driver (informational only, no significant update)

QLogic

bnx2i

2.7.10.1

QLogic BCM57xx/57xxx iSCSI driver

QLogic

bnx2x

1.712.30-0

QLogic 10G/20G Ethernet driver

QLogic

cnic

2.5.22

QLogic CNIC driver

QLogic

ib_qib

1.11

QLogic IB driver

QLogic

netxen_nic

4.0.82

QLogic/NetXen (1/10) GbE Intelligent Ethernet driver (informational only, no significant update)

QLogic

qla2xxx

8.07.00.26.39.0-k

QLogic Fibre Channel HBA driver (informational only, no significant update)

QLogic

qla4xxx

5.04.00-k6

QLogic iSCSI HBA driver

QLogic

qlcnic

5.3.63

QLogic 1/10 GbE Converged/Intelligent Ethernet driver

QLogic

tg3

3.137

Broadcom Tigon3 Ethernet driver

ib_ipoib

1.0.0

IP-over-InfiniBand net driver

ib_iser

1.6

iSER (iSCSI Extensions for RDMA) Datamover

ib_isert

1.0

iSER-Target for mainline target infrastructure

xen-netback

Upstream patches

Xen virtual network device backend

xen-netfront

Upstream patches

Xen virtual network device frontend

iSER and iSCSI LIO Support

UEK R4 supports the iSER and iSCSI LIO targets and initiators that are required by the following storage solutions components:

  • Emulex Skyhawk OCe14XXX

  • FlashGrid

LSI Logic / Symbios Logic MegaRAID SAS

UEK R4 does not support the LSI Logic / Symbios Logic MegaRAID SAS 1078 controller.

New and Updated Packages

To support the newly added functionality that the Unbreakable Enterprise Kernel Release 4 provides, the following sections list kernel and user space binary packages have been added or updated from the ones included in the base distribution. For more information about the ULN channels and Oracle Linux yum server repositories in which these packages are available, see Installation and Availability.

New and Updated Kernel, Driver, and Firmware Packages

  • crash (Oracle Linux 6 only)

  • crash-devel (Oracle Linux 6 only)

  • dtrace-modules

  • dtrace-modules-provider-headers

  • dtrace-modules-shared-headers

  • kernel-uek

  • kernel-uek-debug

  • kernel-uek-debug-devel

  • kernel-uek-devel

  • kernel-uek-doc

  • kernel-uek-firmware

  • libdtrace-ctf

  • libdtrace-ctf-devel

  • linux-firmware (Oracle Linux 6 only)

New and Updated User Space Packages

Unless specified otherwise, the following packages are available only in ULN channels.

  • acpid (Oracle Linux 6 only)

  • btrfs-progs (Available from Oracle Linux Yum Server and ULN)

  • btrfs-progs-devel (Available from Oracle Linux yum server and ULN)

  • dtrace-utils

  • dtrace-utils-devel

  • ibutils

  • infiniband-diags

  • libibcm

  • libibmad

  • libibmad-devel

  • libibumad

  • libibumad-devel

  • libibverbs

  • libibverbs-devel

  • libibverbs-utils

  • libmlx4

  • librdmacm

  • librdmacm-devel

  • librdmacm-utils

  • libsdp

  • mstflint

  • ofed-docs

  • opensm-libs

  • oracle-ofed-release

  • perftest

  • qperf

  • rdma

  • rds-tools

Oracle Linux 7 Issue Fixes Using UEK R4

The following issues are fixed in Oracle Linux 7 by booting with UEK R4:

Network Teaming

Teaming is supported with UEK R4. Teaming also works with UEK R3 Quarterly Update 7 or later. (Bug ID 19151770)

systemd Fails to Load the autofs4 and ipv6 Modules

When the system is booted with UEK R3, systemd fails to load the autofs4 and ipv6 modules and errors such as the following are logged:

systemd[1]: Failed to insert module 'autofs4'
systemd[1]: Failed to insert module 'ipv6'

The system can load these modules if booted with UEK R4. (Bug ID 18470449)

Technology Preview

The following features included in the Unbreakable Enterprise Kernel Release 4 are still under development, but are made available for testing and evaluation purposes.

  • Ceph File System and Object Gateway Federation

    Ceph presents a uniform view of object and block storage from a cluster of multiple physical and logical commodity-hardware storage devices. Ceph can provide fault tolerance and enhance I/O performance by replicating and striping data across the storage devices in a Storage Cluster. Ceph's monitoring and self-repair features minimize administration overhead. You can configure a Storage Cluster on non-identical hardware from different manufacturers.

    The Ceph File System (CephFS) and Object Gateway Federation features of Ceph are in technology preview.

  • DCTCP (Data Center TCP)

    DCTCP enhances congestion control by making use of the Explicit Congestion Notification (ECN) feature of state-of-the-art network switches. DCTCP reduces buffer occupancy and improves throughput by allowing a system to react more intelligently to congestion than is possible using TCP.

  • DRBD (Distributed Replicated Block Device)

    A shared-nothing, synchronously replicated block device (RAID1 over network), designed to serve as a building block for high availability (HA) clusters. It requires a cluster manager (for example, pacemaker) for automatic failover.

  • Kernel module signing facility

    Applies cryptographic signature checking to modules on module load, checking the signature against a ring of public keys compiled into the kernel. GPG is used to do the cryptographic work and determines the format of the signature and key data.

  • NFS over RDMA interoperation with ZFS and Oracle Solaris

    NFS over RDMA does not yet fully interoperate with ZFS and Oracle Solaris. NFS over RDMA for NFS versions 3 and 4 is supported for Oracle Linux systems using the Oracle InfiniBand stack and is more efficient than using NFS with TCP over IPoIB. Currently, only the Mellanox ConnectX-2 and ConnectX-3 Host Channel Adapters (HCAs) pass the full Connectathon NFS test suite and are supported.

  • NFS server-side copy offload

    NFS server-side copy offload is an NFS v4.2 feature that reduces the overhead on network and client resources by offloading copy operations to one or more NFS servers rather than involving the client in copying file data over the network.

  • Server-side parallel NFS

    Server-side parallel NFS (pNFS) improves the scalability and performance of an NFS server by making file metadata and data available on separate paths.

Compatibility

Oracle Linux maintains user-space compatibility with Red Hat Enterprise Linux, which is independent of the kernel version running underneath the operating system. Existing applications in user space will continue to run unmodified on the Unbreakable Enterprise Kernel Release 4 and no re-certifications are needed for RHEL certified applications.

To minimize impact on interoperability during releases, the Oracle Linux team works closely with third-party vendors whose hardware and software have dependencies on kernel modules. The kernel ABI for UEK R4 will remain unchanged in all subsequent updates to the initial release. In this release, there are changes to the kernel ABI relative to UEK R3 that require recompilation of third-party kernel modules on the system. Before installing UEK R4, verify its support status with your application vendor.