Chapter 1 New Features and Changes

The Unbreakable Enterprise Kernel Release 5 (UEK R5) is a heavily tested and optimized operating system kernel for Oracle Linux 7.5 and later on the x86-64 and 64-bit Arm (aarch64) architectures. It is based on the mainline Linux kernel version 4.14.35. This release also updates drivers and includes bug and security fixes.

Oracle actively monitors upstream check-ins and applies critical bug and security fixes to UEK R5.

UEK R5U2 uses the 4.14.35-1902 version and build of the UEK R5 kernel, which includes security and bug fixes, as well as driver updates.

UEK R5 uses the same versioning model as the mainline Linux kernel version. It is possible that some applications might not understand the 4.14 versioning scheme. However, regular Linux applications are usually neither aware of nor affected by Linux kernel version numbers.

1.1 Notable Features and Changes

The following sections describe the major new features of Unbreakable Enterprise Kernel Release 5 Update 2 (UEK R5U2) relative to UEK R5U1.

1.1.1 64-bit Arm (aarch64) architecture

With Unbreakable Enterprise Kernel Release 5 Update 2, Oracle continues to deliver kernel modifications to enable support for 64-bit Arm (aarch64) architecture. These changes are built and tested against existing Arm hardware and provide support for Oracle Linux for Arm. Features described in this document are available for Arm insofar as the hardware is capable of supporting the feature that is described. Limitations and items beyond the scope of current development work for Arm are described in more detail in Section 3.1, “Unusable or Unavailable Features for Arm”.

Where specific changes have been made for Arm architecture, these are usually noted through this document.

1.1.2 Core Kernel Functionality

The following notable core kernel features are implemented in UEK R5U2:

  • Pressure Stall Information patchset implemented.  Pressure Stall Information (PSI) is designed to help system administrators better maximize server resources and can be used to pinpoint and troubleshoot resource utilization issues. PSI provides granular metrics and reports them via /proc/pressure. For example:

    # cat /proc/pressure/cpu
    some avg10=94.45 avg60=66.32 avg300=21.06 total=73036145
    # cat /proc/pressure/memory
    some avg10=0.00 avg60=0.00 avg300=0.00 total=0
    full avg10=0.00 avg60=0.00 avg300=0.00 total=0
    # cat /proc/pressure/io
    some avg10=0.00 avg60=1.09 avg300=13.39 total=621424943
    full avg10=0.00 avg60=0.97 avg300=12.73 total=490558883

    PSI is also capable of recording cgroupv2 statistics.

    More information on this feature is described at https://lwn.net/Articles/759781/ and at https://lwn.net/Articles/763629/.

  • Enable cgroup v2 cpuset controller.  This release enables the cpuset controller in the default cgroup v2 hierarchy with a minimal set of features, including cpus, mems, sched.partition, cpus.effective and mems.effective. The new control file, sched.partition, is a tristate flag file that designates if a cgroup is the root of a new scheduling domain or partition with its own set of unique CPUs from a scheduling perspective. A partition root can be enabled by writing a value of 1 to the flag, or can be disabled by writing a value of 0. If the partition root is in an error state, it is given a value of -1, but can be changed back automatically if the error state is rectified.

  • Implementation of the ktask framework for parellelizing CPU-intensive work.  This kernel release implements the ktask framework that is used to parallelize CPU-intensive work in the kernel. This enhancement provides a performance feature that helps to harness idle CPUs to complete jobs more quickly. The framework provides and API that facilitates concurrency for tasks such as zeroing a range of pages or evicting a list of inodes.

  • Enabled UV BIOS Time Stamp Counter (TSC) support for HPE Systems.  This kernel update includes upstream patches to enable and take advantage of enhancements to the UV BIOS Time Stamp Counter (TSC) for HPE systems. The patches can take advantage of features within the UV BIOS to achieve better TSC accuracy than generic kernel functions. This is important for applications that read the TSC values directly for accessing databases.

  • Kernel tuning for Arm platforms.  The kernel is tuned for Arm platforms and parameters for unsupported hardware are disabled to improve stability and performance.

  • AMD Rome Features Enabled.  Patches were added to the kernel to enable the use of AMD Rome processors, a second generation of the 7002-series, EPYC processor family. Not all AMD Rome features are available at this point in time, but features are added in errata releases as required.

    Currently, the following features are tested and supported:

    • AMD Rome Error Detection/Correction: Latest AMD EDAC driver module is available for allow the collecting and reporting of ECC memory events and other errors.

    • x2APIC: Extends the APIC addressability of processor from 8-bits in xAPIC to 32 bits and allows the operating system to recognize up to 128 Cores (256 threads) in a two socket Rome system.

    • CLWB (Cache Line Write Back): New instruction for AMD, already supported for Intel.

    • WBNOINVD (Write Back No Invalidate): Writes all modified cache lines in the internal caches of the processor back to memory, leaving the line valid in the internal caches and rinses the cache.

    • AMD Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV): Enables encryption of pages for guests running on a hypervisor through the allocation of secure keys to the guest. Some newer SEV features may not be fully functional yet.

    CPUID is tested and returns the following list of available features:

    • CLWB (Cache Line Write Back)

    • PQE (Cache Allocation Technology)

    • PQM (Platform QoS Monitoring)

    • WBNOINVD (Write Back No Invalidate)

    • MBE (Memory Bandwidth Enforcement)

    • EncryptedMcodePatch (Encrypted Mcode Patch)

    • RDPRU (Read Processor Register at User Level)

    • GMET (Guest Mode Execute Trap)

    • LBREXTN (Last Branch Extensions)

    • PerfCtrExtLLC (Last Level Cache Performance Counter Extensions)

    • AdMskExtn (Address Mask Extension for instruction breakpoint)

1.1.3 DTrace

The following notable DTrace features and fixes are implemented in UEK R5U2:

  • Support for libpcap packet capture.  This release includes updates to the kernel and userspace code to support libpcap-based packet capture in DTrace. The addition of a DTRACEACT_PCAP action records capture time, full packet length and as much of the packet as the buffer allows. Capture works for linear and non-linear socket buffers (skbs). In the case of a linear skb, the skb is copied, while non-linear skbs are collected by walking skb page fragments.

    This functionality is accessed in userspace via the pcap(skb,proto) action. The action supports two modes of operation. When a file has been freopen()ed, the action performs a pcap_dump to the specified file. For example:

        ip:::send {
            freopen("/tmp/cap.%s", probename);
            pcap((void *)arg0, PCAP_IP);
            freopen("");

    If a file is not freopen()ed, the capture data is passed to an invocation of tshark via a named pipe and displayed; or if tshark is not available it is displayed in a tracemem-like format. Example usage:

    dtrace -n 'ip:::send { pcap((void *)arg0, PCAP_IP); }'

1.1.4 File Systems

The following sections detail the most notable features that have been implemented for file systems in UEK R5U2:

1.1.4.1 Btrfs

The following Btrfs bug fixes and features have been implemented in this update:

  • A fix for a regression issue where a mount error would result after fsync and a power fail after a prior fix was applied was resolved through back porting upstream fixes.

  • A patch was applied to code to perform extra checking to ensure that chunks map to corresponding block groups. This resolves an issue where mounting crafted images that have missing block group items caused unexpected behavior. The issue was filed as CVE-2018-14612 and is resolved in this release.

  • A patch was applied to the code for the aarch64 version of the kernel to resolve an issue where file system trimming was limited to unallocated space and did not trip any space from existing block groups on some Btrfs file systems.

1.1.4.2 CIFS

Several upstream fixes were applied for CIFS to resolve bugs and security issues, including an integer overflow and some wrapping and padding issues.

1.1.4.3 ext4

The following ext4 bug fixes and features have been implemented in this update:

  • A patch was applied to update the i_disksize if a direct write exceeded the i_disksize value but was less than i_size. This resolved an issue that caused a corrupted filesystem after simulated disk failures. The issue caused a problem when attempting to fix inode errors using fsck.

  • A patch was applied to fix an issue that could cause data corruption when zeroing blocks containing unaligned AIO (Asynchronous I/O). The issue resulted when the AIO went beyond the i_size and the i_size was not block-aligned. This fix resolved an issue where an Oracle Database running with the filesystemio_options=SETALL option set was producing corrupt backup data.

1.1.4.4 OCFS2

The following OCFS2 bug fixes and features have been implemented in this update:

  • A patch was applied to resolve an issue that caused a kernel panic while reading a block that is already linked into the journal. Typically this would result from an underlying storage issue that would leave a buffer head validation flag in place.

  • A patch was applied to resolve an issue that caused a kernel panic when direct IO failed and an inode was not cleared. The patch frees up the write context even in the case of IO failure.

  • A patch was applied to resolve an issue that caused a kernel panic when the file system was mounted but had not been cleanly unmounred resulting in inconsistent metadata. If the file system is not clean, the mount fails and an error is returned to notify the user to run fsck to perform the required fixes and to recover local alloc.

  • A patch was applied to resolve a race condition that resulted in a crash when two sync IO requests for the same block were issued and cleared the block head uptodate flag for a read operation.

1.1.4.5 XFS

The following XFS features have been implemented:

  • A patch was applied to enhance dinode verification to resolve CVE-2018-10322. The issue could result in a denial-of-service attack through the use of a crafted XFS image. The code was updated to perform several more validation checks.

  • A patch was applied so that realtime device statistics are shown on statfs calls if the realtime flags are set for a volume. This solves an issue where applications like df would display incorrect available disk for a realtime volume.

  • A patch was applied to fix a memory corruption bug in code for listxattr calls used to handle extended attributes within XFS. The issue results if a call runs out of buffer space during an operation where there are multiple invocations of xfs_xattr_put_listent. A subsequent invocation can result in a write over the byte before the buffer, triggering a KASAN (Kernel Address Sanitizer) report.

1.1.5 RDMA

Remote Direct Memory Access (RDMA) is a feature that allows direct memory access between two systems that are connected by a network. RDMA facilitates high-throughput and low-latency networking in clusters.

Unbreakable Enterprise Kernel Release 5 Update 2 includes RDMA features that are provided in the upstream kernel, with the addition of Ksplice and DTrace functionality and Oracle's own RDMA features, which includes support for RDS and Shared-PD.

Notable changes to RDMA implementation in UEK R5U2 include:

  • RDMA is merged into the kernel as a set of modules.  RDMA is now available as kernel loadable modules and no longer runs as an independent service. The command systemctl start rdma is deprecated and no longer supported.

  • resilient_rdmaip module fixes and improvements.  Fixes that apply to the resilient_rdmaip module are included. Notably, an issue that resulted in problems accessing InfiniBand interfaces if the resilient_rdmaip module was unloaded has been resolved, and it is not possible to unload the module, by default.

1.1.6 Storage

The following notable storage features are implemented in Unbreakable Enterprise Kernel Release 5 Update 2:

  • NVMe updates.  A large number of code updates for NVMe were backported from the upstream Linux kernel releases from versions 4.18 to 4.21 to help enable features in the latest updated drivers for Broadcom/Emulex and QLogic devices. These updates include changes to enable FC-NVMe in the transport class within the NVMe core layer code. Updates are also included in the SCSI layer to enable NVMe within the SCSI/FC transport class. The NVMectl and NVMe cli userspace libraries are also updated. Support for NVM functionality is limited to the support provided by your hardware vendor.

1.1.7 Virtualization

The following notable virtualization features are implemented in Unbreakable Enterprise Kernel Release 5 Update 2:

  • KVM, Xen and Hyper-V backported updates from Linux 4.19.  Upstream commits into the Linux 4.19 kernel for KVM, Xen and Hyper-V have all been backported into this current kernel update release. Major updates to KVM include PCID emulation, enhancements to KVM support for nested guests using Virtual Machine extensions (VMX) and additional work on Arm support. There are many bug fixes and code optimizations for KVM in this update. Microsoft Hyper-V updates have also been significant, including security fixes, code enahancements and optimization. This update enables paravirtualized guest Inter-processor Interrupts (IPI) to use hypercall for enhanced performance.

  • Security fixes for KVM.  Security fixes were applied to KVM code for CVE-2018-19407, to resolve a potential denial of service attach that could be achieved through crafted system calls that trigger the scan ioapic logic when the irqchip is not initialized. Similarly, fixes were applied for CVE-2018-19406, where an attack was achieved by triggering the pv_send_ipi interface in the case where the apic map is dereferenced. A check was included to cause the call to bail out immediately if the apic map has a null value. Fixes were also applied for CVE-2019-7221, to unconditionally cancel preemption to resolve an issue where an emulated VMX preemption timer could be used to access free_nested; and CVE-2019-7222 where emulation of several other instructions could incorrectly inject a page fault to leak contents of uninitialized stack memory.

  • Fix for Xen blkfront hotplug issue.  An issue is resolved within the Xen blkfront code that resulted a kernel panic in the guest when a blkfront hotpluggable device was removed and there had been a memory allocation failure.

  • Fix for the Xen x86 guest clock sheduler.  An issue with the x86 guest clock scheduler that caused clock drift in a guest virtual machine when it was migrated has been resolved to ensure a monotonic clock value when the virtual machine is resumed.

1.2 Driver Updates

The Unbreakable Enterprise Kernel Release 5 supports a large number of hardware and devices. In close cooperation with hardware and storage vendors, Oracle has updated several device drivers from the versions in mainline Linux 4.14.35.

A complete list of the driver modules included in UEK R5U2 along with version information is provided in the appendix at Appendix A, Driver Modules in Unbreakable Enterprise Kernel Release 5 Update 2 (x86_64).

1.2.1 Notable Driver Features

The following new features are noted in the drivers shipped with UEK R5U2 as opposed to the original release of UEK R5:

  • Amazon Elastic Network Adapter.  The Amazon Elastic Network Adapter driver, ena, is enabled in this kernel update release and a patch is included to fix how the driver handles page sizes that are 64kB or larger when used on Arm platforms.

  • Broadcom/Emulex OneConnect NIC driver updated to version 12.0.0.0.  The Broadcom/Emulex OneConnect NIC driver, be2net, has been updated to version 12.0.0.0 in this kernel update release. This update includes upstream patches and bug fixes. Notable changes to the driver include a fix to a hardware stall issue and better handling of transmit completion errors.

  • Broadcom/Emulex BCM573xx NIC driver updated.  The Broadcom/Emulex BCM573xx NIC driver, bnxt_en, has been updated in this kernel update release to include many upstream patches and bug fixes. Notable changes to the driver include support for additional hardware, including a 200Gb network interface card and updates to the firmware interface specification. A bug was also fixed to enable hardware GRO (Generic Receive Offload), in the form of the NETIF_F_GRO_HW feature flag, which can be used to independently manage GRO in the hardware driver without affecting GRO at a system level.

  • Broadcom/Emulex LightPulse Fibre Channel SCSI driver updated to 12.0.0.10.  The Broadcom/Emulex LightPulse Fibre Channel SCSI driver, lpfc has been updated to version 12.0.0.10. Many upstream patches were applied for bug fixes and enhancements. This release also includes improvements to the framework to enable NVMe on Fibre Channel. Note that FC-NVMe in lpfc is available as a technical preview.

  • Broadcom RoCE driver blacklisted.  The Broadcom RoCE driver, bnxt_re, is blacklisted to avoid several bugs that can cause a kernel panic when working with some Broadcom network interface cards, such as the Broadcom BCM57414 NetXtreme-E. These include a bug where a hotplug operation can cause a kernel panic when the bnxt_re driver is loaded for this hardware if virtual fabrics (VFs) are defined and the RDMA option is enabled for the card. A second bug is triggered when you attempt to create or remove VFs and the RDMA option is enabled in the card. RDMA is not supported on this hardware, so to resolve these issues, the driver module is blacklisted.

  • Intel Ethernet Connection XL710 network driver updated.  The Intel Ethernet Connection XL710 network driver, i40e, is updated to include a large number of upstream fixes and enhancements.

  • Intel Ethernet Connection E800 Series network driver added.  The Intel Ethernet Connection E800 Series network driver, ice, has been added to this kernel update release. This driver enables new hardware available from Intel.

  • Intel Ethernet Adaptive Virtual Function network driver renamed.  The Intel Ethernet Adaptive Virtual Function network driver previously known as i40evf has been renamed to iavf as it takes a more generic function and supports a wider range of hardware. The driver is updated for upstream changes including support for a new range of network interface cards.

  • Intel 2.5G Ethernet network driver added.  The Intel 2.5G Ethernet network driver, igc, has been added in this kernel update release. This driver enables new hardware available from Intel.

  • Avago MegaRAID SAS driver updated.  The Avago MegaRAID SAS driver, megaraid_sas, driver was updated for upstream patches and enhancements, including support for new MegaRAID Aero controllers.

  • Avago LSI MPT Fusion SAS 3.0 device driver updated to 27.101.00.00.  The Avago LSI MPT Fusion SAS 3.0 device driver, mpt3sas, was updated to 27.101.00.00 to include upstream patches and enhancements.

1.3 Compatibility

Oracle Linux maintains full user space compatibility with Red Hat Enterprise Linux, which is independent of the kernel version running underneath the operating system. Existing applications in user space will continue to run unmodified on the Unbreakable Enterprise Kernel Release 5 and no re-certifications are needed for RHEL certified applications.

To minimize impact on interoperability during releases, the Oracle Linux team works closely with third-party vendors whose hardware and software have dependencies on kernel modules. The kernel ABI for UEK R5 will remain unchanged in all subsequent updates to the initial release. In this release, there are changes to the kernel ABI relative to UEK R4 that require recompilation of third-party kernel modules on the system. Before installing UEK R5, verify its support status with your application vendor.

1.4 Certification of UEK R5 for Oracle products

Note that certification of different Oracle products on UEK R5 may not be immediately available at the time of a UEK R5 release. You should always check to ensure that the product you are using is certified for use on UEK R5 before upgrading or installing the kernel. Check certification at https://support.oracle.com/epmos/faces/CertifyHome.

Oracle Automatic Storage Management Cluster File System (Oracle ACFS) certification for different kernel versions is described in Document ID 1369107.1 available at https://support.oracle.com/oip/faces/secure/km/DocumentDisplay.jspx?id=1369107.1.

Oracle Automatic Storage Management Filter Driver (Oracle ASMFD) certification for different kernel versions is described in Document ID 2034681.1 available at https://support.oracle.com/oip/faces/secure/km/DocumentDisplay.jspx?id=2034681.1.