3 Known Issues

This chapter describes the known issues for the Unbreakable Enterprise Kernel Release 5.

Unusable or Unavailable Features for Arm

This section calls out specific features that are known to not work, remain untested, or are known to have issues that make the feature unusable.

  • InfiniBand

    InfiniBand hardware is currently not supported for Arm architecture using UEK R5.

  • FibreChannel

    FibreChannel hardware is currently not supported for Arm architecture using UEK R5.

  • RDMA

    RDMA and any sub-features are not supported for Arm.

  • OCFS2

    The OCFS2 file system is not supported for Arm.

  • Secure Boot

    The Secure Boot feature is currently not supported or available for Arm.

[aarch64] IOMMU issues

Performance issues, such as increased boot times, soft lockups, and crashes can occur on 64-bit Arm (aarch64) architecture that is running UEK R5 when the input–output memory management unit (IOMMU) feature is active. These issues have been observed on some Arm hardware using Mellanox CX-3 and CX-4 cards. However, note that similar issues could occur with different drivers on different hardware.

UEK R5 is configured to use swiotlb by default. To enable the use of the IOMMU feature, use iommu.passthrough=0 on the kernel command line. (Bug IDs 27687153, 27759954, 27812727, and 27862655)

[aarch64] Kdump issues

Note the following issues when using Kdump on the 64-bit Arm (aarch64) architecture.

  • Kdump fails when using Mellanox ConnectX devices

    On systems with Mellanox hardware devices that use either the mlx4_core or the mlx5_core driver modules, Kexec fails to load the crash kernel and hangs while the mlx4_core or mlx5_core driver is initialized.

    The workaround is to disable loading the driver in the crash kernel by adding either rd.driver.blacklist=mlx4_core or rd.driver.blacklist=mlx5_core to the KDUMP_COMMANDLINE_APPEND option in /etc/sysconfig/kdump. Note that this solution is only possible if you have configured to store the vmcore file locally. If Kdump is configured to save the vmcore to a remote host via the device, this workaround fails. (Bug IDs 27915989 and 27916214)

  • Kdump fails and hangs when configured to use a remote dump target over an igb device

    On systems where Kdump is configured to use a remote dump target over an igb network device, NETDEV WATCHDOG returns a timeout error and the network adapter is continually reset, resulting in a system hang when kexec attempts to load the crash kernel. (Bug ID 27916095)

[aarch64] CPU hot plug functionality not functional in KVM

Although CPU hot plug functionality is available in QEMU, the aarch64 Linux kernel is not yet able to handle the addition of new virtual CPUs to a running virtual machine. When QEMU is used to add a virtual CPU to a running virtual machine in KVM, the following error is returned:

kvm_init_vcpu failed: Device or resource busy

CPU hot plug functionality is currently unavailable for UEK R5 on 64-bit Arm platforms. (Bug ID 28140386)

[aarch64] Networking fails for Mellanox ConnectX-3 Pro ethernet controller

Mellanox networking may fail on Arm platform systems using the Mellanox ConnectX-3 Pro ethernet controller with certain firmware versions. The issue typically results in the following dmesg output:

...
[   21.605491] mlx4_core 0001:01:00.0: Failed to initialize event queue
table, aborting
[   22.660967] mlx4_core: probe of 0001:01:00.0 failed with error -12
[   22.704966] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
[   22.711355] mlx4_en 0000:01:00.0: Activating port:1
[   22.742948] mlx4_en: 0000:01:00.0: Port 1: Using 32 TX rings
[   22.748600] mlx4_en: 0000:01:00.0: Port 1: Using 8 RX rings
[   22.754437] mlx4_en: 0000:01:00.0: Port 1: Initializing port
[   22.760602] mlx4_en 0000:01:00.0: registered PHC clock
[   22.766283] mlx4_en 0000:01:00.0: Activating port:2
[   22.766956] mlx4_core 0000:01:00.0 enp1s0: renamed from eth0
[   22.778621] mlx4_en: 0000:01:00.0: Port 2: Failed to allocate NIC
resources
[   22.785776] mlx4_en 0000:01:00.0: removed PHC
[   25.488635] mlx4_en: enp1s0: Steering Mode 1
...

This issue can be resolved by using the maxcpus=8 kernel parameter at boot, to limit the number of CPUs that are available during the boot process. Once the system has fully booted, Systemd enables all available CPUs and there is no performance impact.

To set this parameter so that it is used for all kernels when the system boots, edit the GRUB configuration. You can do this by editing the GRUB_CMDLINE_LINUX line in /etc/sysconfig/grub in a text editor, for example:

GRUB_CMDLINE_LINUX="crashkernel=auto resume=/dev/mapper/linux1-swap rd.lvm.lv=linux1/root \
  rd.lvm.lv=linux1/swap rhgb quiet maxcpus=8"

To update your grub configuration with the changes so that they are used on the next boot if you are using legacy BIOS, run:

# grub2-mkconfig -o /boot/grub2/grub.cfg

Alternately, if you are booting using UEFI, run:

# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

This issue is only present in later firmware versions for this hardware. The issue is not replicated on cards with the HVE102M-0.2 firmware, but appears when the firmware is upgraded to HVE104N-1.12. The issue can also be avoided by downgrading the card firmware. (Bug ID 30877943)

File System Issues

The following are known issues that are specific to file systems supported with Unbreakable Enterprise Kernel Release 5 Update 3.

ext4: Frequent repeated system shutdowns can cause file system corruption

If a system using ext4 is repeatedly and frequently shut down, the file system may be corrupted. This issue is considered to be a corner-case due to the difficulty required to replicate. The issue exists in upstream code and proposed patches are currently under review. (Bug ID 27547113)

xfs: xfs_repair fails to repair the corrupted link counts

If an xfs file system is repaired by using the xfs_repair command, and there are invalid inode counts, the utility may fail to repair the corrupted link counts and return errors while verifying the link counts. The issue is currently under investigation, but appears to be related to the xfsprogs-4.15-1 package that is released with UEK R5. The issue might not appear when using the earlier xfsprogs-4.5.0-18.0.1 package version, which available in the ol7_latest yum repository. (Bug ID 28070680)

RDMA Issues

The following issues are noted for RDMA:

  • ibacm service is disabled by default

    The ibacm service is disabled by default immediately after installation, which means the ibacm service does not automatically start after a reboot. This behavior is expected. Note that requirements for using the ibacm service are application-specific. If your application requires this service, you may need to enable the service to start after reboot:

    # systemctl enable ibacm

    (Bug ID 28074471)

  • Error: some other host already uses address xxx.xxx.xxx.xxx

    The following error message might be triggered in certain instances:

    Error, some other host already uses address  xxx.xxx.xxx.xxx

    This issue is typically triggered when active-bonding is enabled, and you then run the ifup ib-interface command.

    You can ignore this message, as the InfiniBand interface is brought up successfully. (Bug ID 28097516)

Docker Issues

The following are known Docker issues:

  • yum install can fail within a container on an overlayfs file system

    Running the yum install command within a container on an overlayfs file system can fail with the following error:

    Rpmdb checksum is invalid: dCDPT(pkg checksums): package_name

    This error can break Dockerfile builds, but it is expected behavior from the kernel and a known issue upstream. See https://github.com/docker/docker/issues/10180.

    The workaround is to run touch /var/lib/rpm/* before installing the package.

    Note that this issue is fixed in any Oracle Linux images that are available on the Docker Hub or Oracle Container Registry; however, the issue could still be encountered when running any container that is based on a third-party image. (Bug ID 21804564)

  • Docker can fail where it uses the overlay2 storage driver on XFS-formatted storage

    A kernel patch has been applied to prevent overlay mounts on XFS if the ftype is not set to 1. This fix resolves an issue where XFS did not properly support the whiteout features of an overlay file system if d_type support was not enabled. If the Docker Engine is already using XFS-formatted storage with the overlay2 storage driver, an upgrade of the kernel can cause Docker to fail if the underlying XFS file system is not created with the -n ftype=1 option enabled. The root partition on Oracle Linux 7 is automatically formatted with -n ftype=0 where XFS is selected as the file system. Therefore, if you intend to use the overlay2 storage driver in this environment, you must format a separate device for this purpose. (Bug ID 25995797)

IOMMU kernel option enabled by default

Starting with UEK R5U1, IOMMU functionality is enabled by default in the x86_64 kernel. This change better facilitates single root input-output virtualization (SR-IOV) and other virtualization extensions; but, is also known to result in boot failure issues on certain hardware that cannot complete discovery when IOMMU is enabled. The status of this feature no longer appears in /proc/cmd reporting as iommu=on and may need to be explicitly disabled as a kernel cmdline option if boot failure occurs. As an alternate workaround, you can disable IOMMU or Intel-Vtd in your system ROM by following your vendor instructions.

These boot failure issues have been observed on equipment with certain Broadcom network devices, such HP Gen8 servers. For more detailed information, see https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04565693 .

LXC Issues

The following are known LXC issues:

  • LXC read-only ip_local_port_range parameter

    With lxc-1.1 or later and UEK R5, ip_local_port_range is a read-writable parameter under /proc/sys/net/ipv4 in an Oracle Linux container rather than being read-only. (Bug ID 21880467)

NVMe device names change across reboots

Because UEK R5 adds support for NVMe subsystems and multipathing, enumerated device names that are generated by the kernel are not stable. This behavior is similar to the way that other block devices are handled by the kernel. If you use enumerated kernel instance names to handle mounts in your fstab file, the mounts may fail or behave unpredictably.

Never use enumerated kernel instance names when referring to block devices. Instead, use the UUID, partition label, or file system label to refer to any block device, including an NVMe device. If you are uncertain of the device UUID or labels, use the blkid command to view this information.

Prior to multipathing, a subsystem number would typically map to the controller number. Therefore, you could assume that the subsystem at /dev/nvme0n1 was affiliated with controller /dev/nvme0. This correlation is no longer the case. For multipathing to be enabled, a subsystem could have multiple controllers. In this case, /dev/nvme0n1 could just as easily be affiliated with controllers at /dev/nvme1 and /dev/nvme2. Currently, no specific correlation between the subsystem device name and the controller device name exists.

NVMe device hotplug unplug procedure change

Because UEK R5 adds support for NVMe subsystems and multipathing, enumerated device names that are generated by the kernel are not stable. The result is that the procedure for identifying and unplugging NVMe devices by using hotplug functionality is slightly different than the procedure that you may have followed when using other kernel releases.

Perform the following steps to identify, power down, and unplug the appropriate device:

  1. Identify the disk that you wish to remove, according to its WWN or UUID, by using the lsblk command:

    # lsblk -o +UUID,WWN,MODEL

    Take note of the enumerated kernel instance name that is assigned to the device; for example: nvme0n1.

    Important:

    it is important to understand that the device name does not necessarily map to the controller or PCIe bridge to which it is attached. See NVMe device names change across reboots.

  2. Search for the device path to obtain the PCI domain identifier for the device:

    # find /sys/devices -iname nvme0n1
                            
    /sys/devices/pci0000:85/0000:85:01.0/0000:8d:00.0/nvme/nvme1/nvme0n1

    Note that 0000:8d:00.0 in the returned path for the device is the PCI domain identifier for the device. You will need this information to proceed.

  3. Obtain the physical slot number for the NVMe drive. Under UEK R5, the slot is bound to the NVMe device directly, not the PCIe controller.

    You can locate the slot number for the NVMe device by running the lspci command and by querying the PCI domain identifier for the device in verbose mode, for example:

    # lspci -s 0000:8d:00.0 -vvv
    8d:00.0 Non-Volatile memory controller: Intel Corporation Express Flash NVMe
    P4500 (prog-if 02 [NVM Express])
            Subsystem: Oracle/SUN Device 4871
            Physical Slot: 104-1
    … 

    Note that the Physical Slot number for the device in the previous example is 104-1. This value is required to proceed.

  4. Use the Physical Slot number for the device to find its bus interface:

    # find /sys -iname "104-1"
    /sys/bus/pci/slots/104-1
  5. Use the returned bus interface path to power off the NVMe drive:

    # echo 0 > /sys/bus/pci/slots/104-1/power

    Depending on your hardware, the blue disk LED located on the front panel of the system may display to indicate that you can safely remove the disk drive.

Kernel warning when allocating memory for Avago MegaRAID SAS 9460-16i controller

An issue that causes a kernel warning when loading the megaraid_sas module for the Avago MegaRAID SAS 9460-16i controller is introduced in this kernel release. The issue results when the kernel attempts to allocate memory for the IO request frame pool.

The issue is resolved by setting the contiguous memory allocation (cma) value to 64M at boot, by editing the /etc/defaults/grub file to update the GRUB_CMDLINE_LINUX line to include the option cma=64M. For example:

GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=ol7/root rd.lvm.lv=ol7/swap
rhgb quiet cma=64M"

(Bug ID 29635963, 29618702)

KVM guest crashes when using memory hotplug operation to shrink available memory

A KVM guest may crash if the guest memory is reduced from 96GB or more to 2GB by using a memory hotplug operation. Although this issue is logged for UEK R5, similar issues have been noted for RHCK. This behavior is expected and relates to the how memory ballooning works. Shrinking guest memory in large amounts can result in Out Of Memory (OOM) conditions; processes are killed automatically, if the memory shrinks to below the amount that is currently in use by the guest operating system. (Bug ID 27968656)