Chapter 3 Known Issues

This chapter describes the known issues for the Unbreakable Enterprise Kernel Release 5.

3.1 Unusable or Unavailable Features for Arm

This section calls out specific features that are known to not work, remain untested or which are known to have issues that make the feature unusable.

  • InfiniBand.  InfiniBand hardware is currently not supported for Arm architecture using UEK R5.

  • FibreChannel.  FibreChannel hardware is currently not supported for Arm architecture using UEK R5.

  • RDMA.  RDMA and any sub-features that are described in Section 1.1.6, “RDMA” are not supported for Arm.

  • OCFS2.  The OCFS2 file system and all of the features described in Section 1.1.4.3, “OCFS2” are not supported for Arm.

  • Secure Boot.  The Secure Boot feature that is described in Section 1.1.7, “Security” is currently not supported or available for Arm.

3.2 [aarch64] IOMMU issues

Performance issues, such as increased boot times, soft lockups, and crashes can occur on 64-bit Arm (aarch64) architecture that is running UEK R5 when the input–output memory management unit (IOMMU) feature is active. These issues have been observed on some Arm hardware using Mellanox CX-3 and CX-4 cards. However, note that similar issues could occur with different drivers on different hardware.

UEK R5 is configured to use swiotlb by default. To enable the use of the IOMMU feature, use iommu.passthrough=0 on the kernel command line. (Bug IDs 27687153, 27759954, 27812727, and 27862655)

3.3 [aarch64] Kdump issues

Several issues are noted when using Kdump on 64-bit Arm (aarch64) architecture platforms.

  • Kdump fails when using Mellanox ConnectX devices.  On systems with Mellanox hardware devices that use either the mlx4_core or the mlx5_core driver modules, Kexec fails to load the crash kernel and hangs while the mlx4_core or mlx5_core driver is initialized.

    The workaround is to disable loading the driver in the crash kernel by adding either rd.driver.blacklist=mlx4_core or rd.driver.blacklist=mlx5_core to the KDUMP_COMMANDLINE_APPEND option in /etc/sysconfig/kdump. Note that this solution is only possible if you have configured to store the vmcore file locally. If Kdump is configured to save the vmcore to a remote host via the device, this workaround fails. (Bug ID 27915989) (Bug ID 27916214)

  • Kdump fails when configured to use remote dump target over an ixgbe device.  On systems where Kdump is configured to use a remote dump target over an ixgbe network device, the Kdump Vmcore Save Service is unable to save the vmcore file to the target destination. (Bug ID 27915827)

  • Kdump fails and hangs when configured to use a remote dump target over an igb device.  On systems where Kdump is configured to use a remote dump target over an igb network device, NETDEV WATCHDOG returns a timeout error and the network adapter is continually reset, resulting in a system hang when kexec attempts to load the crash kernel. (Bug ID 27916095)

3.4 [aarch64] CPU hotplug functionality not functional in KVM

Although CPU hotplug functionality is available in QEMU, the aarch64 Linux kernel is not yet able to handle the addition of new virtual CPUs to a running virtual machine. When QEMU is used to add a virtual CPU to a running virtual machine in KVM, and error is returned:

kvm_init_vcpu failed: Device or resource busy

CPU hotplug functionality is currently unavailable for UEK R5 on 64-bit Arm platforms. (Bug ID 28140386)

3.5 [aarch64] KVM guests that include a random number generator device are unable to boot

If a virtual machine is created on a 64-bit Arm host system and the virtual machine is configured to include a random number generator device, the guest does not complete boot and returns errors similar to the following:

[2.653535] module virtio_ring: overflow in relocation type 275 val
ffff1f85c31d7178 

The inclusion of a random number generator device is not a default option and can only be triggered by using the --rng random option. Oracle is investigating a fix for this issue. (Bug ID 28954789)

3.6 File System Issues

The following are known issues that are specific to file systems supported with Unbreakable Enterprise Kernel Release 5 Update 1.

3.6.1 ext4: Frequent repeated system shutdowns can cause file system corruption

If a system using ext4 is repeatedly and frequently shut down, the file system may be corrupted. This issue is considered to be a corner-case due to the difficulty required to replicate. The issue exists in upstream code and proposed patches are currently under review. (Bug ID 27547113)

3.6.2 xfs: xfs_repair allows the corruption of a directory to a symbolic link

An issue exists, where if an xfs directory becomes corrupted and appears as a symbolic link, the xfs_repair command does not detect the error and allows the corruption to remain. The issue is under investigation.

(Bug ID 28082433)

3.6.3 xfs: xfs_repair fails to repair the corrupted link counts

If an xfs file system is repaired by using the xfs_repair command, and there are invalid inode counts, the utility may fail to repair the corrupted link counts and return errors while verifying the link counts. The issue is currently under investigation, but appears to be related to the xfsprogs-4.15-1 package released with UEK R5. The issue may not appear when using the earlier xfsprogs-4.5.0-18.0.1 version of this package available in the ol7_latest yum repository. (Bug ID 28070680)

3.7 RDMA Issues

The following issues are noted for RDMA:

  • ibacm service is disabled by default.  The ibacm service is disabled by default immediately after installation. This means that the ibacm service does not automatically start after a reboot. This is intended behavior. Requirements to use the ibacm service are application-specific. If your application requires this service, you may need to enable the service to start after reboot:

    # systemctl enable ibacm

    (Bug ID 28074471)

  • Issues when Network Manager control is enabled or disabled.  Several issues have been observed when the NM_CONTROLLED=yes is set for an interface that is used for RDMA. For these reasons, Oracle recommends setting NM_CONTROLLED=no in the interface configuration, however when Network Manager control is disabled, the CONNECTED_MODE=yes parameter is ignored for InfiniBand interfaces. It is possible to work around this issue by setting the connected mode manually by running the following command:

    # echo connected > /sys/class/net/ib0/mode

    where ib0 is the interface that you wish to change mode for. You may need to run this command after a reboot. (Bug ID 28074921)

  • Error, some other host already uses address xxx.xxx.xxx.xxx The following error message might be triggered in certain instances:

    Error, some other host already uses address  xxx.xxx.xxx.xxx

    The following are the two instances in which this error message might be triggered:

    • When active-bonding is enabled, and you run the ifup ib-interface command.

    • When you run the systemctl start rdma command.

    You can ignore this message, as in both cases, the InfiniBand interface is brought up successfully. (Bug ID 28097516)

  • Unloading resilient_rdmaip can result in issues accessing InfiniBand interfaces.  Unloading the resilient_rdmaip module can cause issues accessing and configuring InfiniBand interfaces. Attempts to use the ifconfig command on an InfiniBand interface after the module has been unloaded, results in a 'Device not found' message, and no IP information for these interfaces is available. The interfaces need to be re-initialized with the ifup command. The active bonding feature is unaffected by this issue. (Bug ID 28123680)

3.8 Docker Issues

The following are known Docker issues:

  • Running yum install within a container on an overlayfs file system can fail with the following error: 

    Rpmdb checksum is invalid: dCDPT(pkg checksums): package_name

    This error can break Dockerfile builds but is expected behavior from the kernel and is a known issue upstream (see https://github.com/docker/docker/issues/10180.)

    The workaround is to run touch /var/lib/rpm/* before installing the package.

    Note that this issue is fixed in any Oracle Linux images available on the Docker Hub or Oracle Container Registry, but the issue could still be encountered when running any container based on a third-party image. (Bug ID 21804564)

  • Docker can fail where it uses the overlay2 storage driver on XFS-formatted storage.  A kernel patch has been applied to prevent overlay mounts on XFS if the ftype is not set to 1. This fix resolves an issue where XFS did not properly support the whiteout features of an overlay file system if d_type support was not enabled. If the Docker Engine is already using XFS-formatted storage with the overlay2 storage driver, an upgrade of the kernel can cause Docker to fail if the underlying XFS file system is not created with the -n ftype=1 option enabled. The root partition on Oracle Linux 7 is automatically formatted with -n ftype=0 where XFS is selected as the file system. Therefore, if you intend to use the overlay2 storage driver in this environment, you must format a separate device for this purpose. (Bug ID 25995797)

3.9 IOMMU kernel option enabled by default

Starting with UEK R5U1, IOMMU functionality is enabled by default in the x86_64 kernel. This change better facilitates single root input-output virtualization (SR-IOV) and other virtualization extensions; but, is also known to result in boot failure issues on certain hardware that cannot complete discovery when IOMMU is enabled. The status of this feature no longer appears in /proc/cmd reporting as iommu=on and may need to be explicitly disabled as a kernel cmdline option if boot failure occurs. As an alternate workaround, you can disable IOMMU or Intel-Vtd in your system ROM by following your vendor instructions.

These boot failure issues have been observed on equipment with certain Broadcom network devices, such HP Gen8 servers. For more detailed information, see https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04565693 .

3.10 LXC Issues

The following are known LXC issues:

  • LXC read-only ip_local_port_range parameter.  With lxc-1.1 or later and UEK R5, ip_local_port_range is a read-writable parameter under /proc/sys/net/ipv4 in an Oracle Linux container rather than being read-only. (Bug ID 21880467)

3.11 NVMe device names change across reboots

Since UEK R5 adds support for NVMe subsystems and multipathing, enumerated device names generated by the kernel are not stable. This is similar to the way that other block devices are handled by the kernel. If you use enumerated kernel instance names to handle mounts in your fstab, the mounts may fail or behave unpredictably.

Never use enumerated kernel instance names when referring to block devices. Instead, use the UUID, partition label or file system label to refer to any block device, including an NVMe device. If you are uncertain of the device UUID or labels, use the blkid command to view this information.

Prior to multipathing, a subsystem number would typically map onto the controller number. Therefore, you could assume that the subsystem at /dev/nvme0n1 was affiliated with controller /dev/nvme0. This is no longer the case. For multipathing to be enabled a subsystem could have multiple controllers. In this case, /dev/nvme0n1 could just as easily be affiliated with controllers at /dev/nvme1 and /dev/nvme2. There is no specific correlation between the subsystem device name and the controller device name.

3.12 NVMe device hotplug unplug procedure change

Since UEK R5 adds support for NVMe subsystems and multipathing, enumerated device names generated by the kernel are not stable. This means that the procedure for identifying and unplugging NVMe devices using hotplug functionality is slightly different to the procedure that you may have followed using other kernel releases. This note describes the steps that you should take to identify, power down and unplug the appropriate device.

  1. Use the lsblk command to identify the disk that you wish to remove according to its WWN or UUID. For example:

    # lsblk -o +UUID,WWN,MODEL

    Take note of the enumerated kernel instance name that has been assigned to the device. For example, this may be nvme0n1. It is very important to understand that the device name does not necessarily map onto the controller or PCIe bridge that it is attached to. See Section 3.11, “NVMe device names change across reboots” for more information.

  2. Search for the device path to obtain the PCI domain identifier for the device:

    # find /sys/devices -iname nvme0n1
    /sys/devices/pci0000:85/0000:85:01.0/0000:8d:00.0/nvme/nvme1/nvme0n1

    Note that 0000:8d:00.0 in the returned path for the device is the PCI domain identifier for the device. You need this information to proceed.

  3. Obtain the physical slot number for the NVMe drive. Under UEK R5, the slot is bound to the NVMe device directly and not to the PCIe controller. You can find the slot number for the NVMe device by running the lspci command and querying the PCI domain identifier for the device in verbose mode:

    # lspci -s 0000:8d:00.0 -vvv
    8d:00.0 Non-Volatile memory controller: Intel Corporation Express Flash NVMe
    P4500 (prog-if 02 [NVM Express])
            Subsystem: Oracle/SUN Device 4871
            Physical Slot: 104-1
    … 

    Note that the Physical Slot number for the device in this example is 104-1. Take note of this value to proceed.

  4. Use the Physical Slot number for the device to find its bus interface:

    # find /sys -iname "104-1"
    /sys/bus/pci/slots/104-1
  5. Use the returned bus interface path to power off the NVMe drive:

    # echo 0 > /sys/bus/pci/slots/104-1/power

    Depending on your hardware, the blue disk LED may display on the front panel of the system may display to indicate that you can safely remove the disk drive.

3.13 KVM guest crashes when using memory hotplug operation to shrink available memory

A KVM guest may crash if the guest memory is reduced from 96GB or more to 2GB using a memory hotplug operation. the guest crashes. Although this issue is logged for UEK R5, similar issues have been noted for RHCK. (Bug ID 27968656)

3.14 HP DL380 system reboot issue with SD card slot enabled in Legacy mode

A reboot failure can occur on HP DL380 Gen 10 systems where the BIOS is set to Legacy mode and an SD flash device is removed from the on-board SD Reader. A patch is available and being tested but is not included at the time of this release. (Bug ID 28171827)