Chapter 3 Known Issues

This chapter describes the known issues for the Unbreakable Enterprise Kernel Release 5.

3.1 Unusable or Unavailable Features for Arm

This section calls out specific features that are known to not work, remain untested or which are known to have issues that make the feature unusable.

  • InfiniBand.  InfiniBand hardware is currently not supported for Arm architecture using UEK R5.

  • FibreChannel.  FibreChannel hardware is currently not supported for Arm architecture using UEK R5.

  • RDMA.  RDMA and any sub-features are not supported for Arm.

  • OCFS2.  The OCFS2 file system and all of the features described in Section 1.1.4.4, “OCFS2” are not supported for Arm.

  • Secure Boot.  The Secure Boot feature is currently not supported or available for Arm.

3.2 [aarch64] IOMMU issues

Performance issues, such as increased boot times, soft lockups, and crashes can occur on 64-bit Arm (aarch64) architecture that is running UEK R5 when the input–output memory management unit (IOMMU) feature is active. These issues have been observed on some Arm hardware using Mellanox CX-3 and CX-4 cards. However, note that similar issues could occur with different drivers on different hardware.

UEK R5 is configured to use swiotlb by default. To enable the use of the IOMMU feature, use iommu.passthrough=0 on the kernel command line. (Bug IDs 27687153, 27759954, 27812727, and 27862655)

3.3 [aarch64] Kdump issues

Several issues are noted when using Kdump on 64-bit Arm (aarch64) architecture platforms.

  • Kdump fails when using Mellanox ConnectX devices.  On systems with Mellanox hardware devices that use either the mlx4_core or the mlx5_core driver modules, Kexec fails to load the crash kernel and hangs while the mlx4_core or mlx5_core driver is initialized.

    The workaround is to disable loading the driver in the crash kernel by adding either rd.driver.blacklist=mlx4_core or rd.driver.blacklist=mlx5_core to the KDUMP_COMMANDLINE_APPEND option in /etc/sysconfig/kdump. Note that this solution is only possible if you have configured to store the vmcore file locally. If Kdump is configured to save the vmcore to a remote host via the device, this workaround fails. (Bug ID 27915989) (Bug ID 27916214)

  • Kdump fails when configured to use remote dump target over an ixgbe device.  On systems where Kdump is configured to use a remote dump target over an ixgbe network device, the Kdump Vmcore Save Service is unable to save the vmcore file to the target destination. (Bug ID 27915827)

  • Kdump fails and hangs when configured to use a remote dump target over an igb device.  On systems where Kdump is configured to use a remote dump target over an igb network device, NETDEV WATCHDOG returns a timeout error and the network adapter is continually reset, resulting in a system hang when kexec attempts to load the crash kernel. (Bug ID 27916095)

3.4 [aarch64] CPU hotplug functionality not functional in KVM

Although CPU hotplug functionality is available in QEMU, the aarch64 Linux kernel is not yet able to handle the addition of new virtual CPUs to a running virtual machine. When QEMU is used to add a virtual CPU to a running virtual machine in KVM, and error is returned:

kvm_init_vcpu failed: Device or resource busy

CPU hotplug functionality is currently unavailable for UEK R5 on 64-bit Arm platforms. (Bug ID 28140386)

3.5 File System Issues

The following are known issues that are specific to file systems supported with Unbreakable Enterprise Kernel Release 5 Update 2.

3.5.1 ext4: Frequent repeated system shutdowns can cause file system corruption

If a system using ext4 is repeatedly and frequently shut down, the file system may be corrupted. This issue is considered to be a corner-case due to the difficulty required to replicate. The issue exists in upstream code and proposed patches are currently under review. (Bug ID 27547113)

3.5.2 xfs: xfs_repair fails to repair the corrupted link counts

If an xfs file system is repaired by using the xfs_repair command, and there are invalid inode counts, the utility may fail to repair the corrupted link counts and return errors while verifying the link counts. The issue is currently under investigation, but appears to be related to the xfsprogs-4.15-1 package released with UEK R5. The issue may not appear when using the earlier xfsprogs-4.5.0-18.0.1 version of this package available in the ol7_latest yum repository. (Bug ID 28070680)

3.6 RDMA Issues

The following issues are noted for RDMA:

  • ibacm service is disabled by default.  The ibacm service is disabled by default immediately after installation. This means that the ibacm service does not automatically start after a reboot. This is intended behavior. Requirements to use the ibacm service are application-specific. If your application requires this service, you may need to enable the service to start after reboot:

    # systemctl enable ibacm

    (Bug ID 28074471)

  • Error, some other host already uses address xxx.xxx.xxx.xxx The following error message might be triggered in certain instances:

    Error, some other host already uses address  xxx.xxx.xxx.xxx

    This issue is typically triggered when active-bonding is enabled, and you run the ifup ib-interface command.

    You can ignore this message, as the InfiniBand interface is brought up successfully. (Bug ID 28097516)

3.7 Docker Issues

The following are known Docker issues:

  • Running yum install within a container on an overlayfs file system can fail with the following error: 

    Rpmdb checksum is invalid: dCDPT(pkg checksums): package_name

    This error can break Dockerfile builds but is expected behavior from the kernel and is a known issue upstream (see https://github.com/docker/docker/issues/10180.)

    The workaround is to run touch /var/lib/rpm/* before installing the package.

    Note that this issue is fixed in any Oracle Linux images available on the Docker Hub or Oracle Container Registry, but the issue could still be encountered when running any container based on a third-party image. (Bug ID 21804564)

  • Docker can fail where it uses the overlay2 storage driver on XFS-formatted storage.  A kernel patch has been applied to prevent overlay mounts on XFS if the ftype is not set to 1. This fix resolves an issue where XFS did not properly support the whiteout features of an overlay file system if d_type support was not enabled. If the Docker Engine is already using XFS-formatted storage with the overlay2 storage driver, an upgrade of the kernel can cause Docker to fail if the underlying XFS file system is not created with the -n ftype=1 option enabled. The root partition on Oracle Linux 7 is automatically formatted with -n ftype=0 where XFS is selected as the file system. Therefore, if you intend to use the overlay2 storage driver in this environment, you must format a separate device for this purpose. (Bug ID 25995797)

3.8 IOMMU kernel option enabled by default

Starting with UEK R5U1, IOMMU functionality is enabled by default in the x86_64 kernel. This change better facilitates single root input-output virtualization (SR-IOV) and other virtualization extensions; but, is also known to result in boot failure issues on certain hardware that cannot complete discovery when IOMMU is enabled. The status of this feature no longer appears in /proc/cmd reporting as iommu=on and may need to be explicitly disabled as a kernel cmdline option if boot failure occurs. As an alternate workaround, you can disable IOMMU or Intel-Vtd in your system ROM by following your vendor instructions.

These boot failure issues have been observed on equipment with certain Broadcom network devices, such HP Gen8 servers. For more detailed information, see https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04565693 .

3.9 LXC Issues

The following are known LXC issues:

  • LXC read-only ip_local_port_range parameter.  With lxc-1.1 or later and UEK R5, ip_local_port_range is a read-writable parameter under /proc/sys/net/ipv4 in an Oracle Linux container rather than being read-only. (Bug ID 21880467)

3.10 NVMe device names change across reboots

Since UEK R5 adds support for NVMe subsystems and multipathing, enumerated device names generated by the kernel are not stable. This is similar to the way that other block devices are handled by the kernel. If you use enumerated kernel instance names to handle mounts in your fstab, the mounts may fail or behave unpredictably.

Never use enumerated kernel instance names when referring to block devices. Instead, use the UUID, partition label or file system label to refer to any block device, including an NVMe device. If you are uncertain of the device UUID or labels, use the blkid command to view this information.

Prior to multipathing, a subsystem number would typically map onto the controller number. Therefore, you could assume that the subsystem at /dev/nvme0n1 was affiliated with controller /dev/nvme0. This is no longer the case. For multipathing to be enabled a subsystem could have multiple controllers. In this case, /dev/nvme0n1 could just as easily be affiliated with controllers at /dev/nvme1 and /dev/nvme2. There is no specific correlation between the subsystem device name and the controller device name.

3.11 NVMe device hotplug unplug procedure change

Since UEK R5 adds support for NVMe subsystems and multipathing, enumerated device names generated by the kernel are not stable. This means that the procedure for identifying and unplugging NVMe devices using hotplug functionality is slightly different to the procedure that you may have followed using other kernel releases. This note describes the steps that you should take to identify, power down and unplug the appropriate device.

  1. Use the lsblk command to identify the disk that you wish to remove according to its WWN or UUID. For example:

    # lsblk -o +UUID,WWN,MODEL

    Take note of the enumerated kernel instance name that has been assigned to the device. For example, this may be nvme0n1. It is very important to understand that the device name does not necessarily map onto the controller or PCIe bridge that it is attached to. See Section 3.10, “NVMe device names change across reboots” for more information.

  2. Search for the device path to obtain the PCI domain identifier for the device:

    # find /sys/devices -iname nvme0n1
    /sys/devices/pci0000:85/0000:85:01.0/0000:8d:00.0/nvme/nvme1/nvme0n1

    Note that 0000:8d:00.0 in the returned path for the device is the PCI domain identifier for the device. You need this information to proceed.

  3. Obtain the physical slot number for the NVMe drive. Under UEK R5, the slot is bound to the NVMe device directly and not to the PCIe controller. You can find the slot number for the NVMe device by running the lspci command and querying the PCI domain identifier for the device in verbose mode:

    # lspci -s 0000:8d:00.0 -vvv
    8d:00.0 Non-Volatile memory controller: Intel Corporation Express Flash NVMe
    P4500 (prog-if 02 [NVM Express])
            Subsystem: Oracle/SUN Device 4871
            Physical Slot: 104-1
    … 

    Note that the Physical Slot number for the device in this example is 104-1. Take note of this value to proceed.

  4. Use the Physical Slot number for the device to find its bus interface:

    # find /sys -iname "104-1"
    /sys/bus/pci/slots/104-1
  5. Use the returned bus interface path to power off the NVMe drive:

    # echo 0 > /sys/bus/pci/slots/104-1/power

    Depending on your hardware, the blue disk LED may display on the front panel of the system may display to indicate that you can safely remove the disk drive.

3.12 KVM guest crashes when using memory hotplug operation to shrink available memory

A KVM guest may crash if the guest memory is reduced from 96GB or more to 2GB using a memory hotplug operation. Although this issue is logged for UEK R5, similar issues have been noted for RHCK. The issue is expected behavior and relates to the how memory ballooning works. Shrinking guest memory in large amounts can result in Out Of Memory (OOM) conditions and processes are killed automatically, if the memory shrinks to below the amount that is in use by the guest operating system at the time. (Bug ID 27968656)

3.13 Kernel warning when allocating memory for Avago MegaRAID SAS 9460-16i controller

An issue that causes a kernel warning when loading the megaraid_sas module for the Avago MegaRAID SAS 9460-16i controller is introduced in this kernel release. The issue results when the kernel attempts to allocate memory for the IO request frame pool.

The issue is resolved by setting the contiguous memory allocation (cma) value to 64M at boot, by editing the /etc/defaults/grub file to update the GRUB_CMDLINE_LINUX line to include the option cma=64M. For example:

GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=ol7/root rd.lvm.lv=ol7/swap
rhgb quiet cma=64M"

(Bug ID 29635963, 29618702)