3 Known Issues
This chapter describes known issues for the Unbreakable Enterprise Kernel Release 5.
Unusable or Unavailable Arm Features
The following are specific features that are known to not work, remain untested, or have issues that make the feature unusable.
-
InfiniBand
InfiniBand hardware is currently not supported for Arm architecture using UEK R5.
-
FibreChannel
FibreChannel hardware is currently not supported for Arm architecture using UEK R5.
-
RDMA
RDMA and any sub-features are not supported for Arm.
-
OCFS2
The OCFS2 file system is not supported for Arm.
-
Secure Boot
The Secure Boot feature is currently not supported or available for Arm.
[aarch64] IOMMU Issues
Performance issues, such as increased boot times, soft lockups, and crashes can occur on 64-bit Arm (aarch64) architecture that is running UEK R5 when the input–output memory management unit (IOMMU) feature is active. These issues have been observed on some Arm hardware using Mellanox CX-3 and CX-4 cards. However, note that similar issues could occur with different drivers on different hardware.
UEK R5 is configured to use swiotlb
by
default. To enable the use of the IOMMU feature, use
iommu.passthrough=0
on the kernel command line.
(Bug IDs 27687153, 27812727, and 27862655)
[aarch64] Kdump Issues
Note the following issues when using Kdump on the 64-bit Arm (aarch64) architecture.
-
Kdump fails when using Mellanox ConnectX devices with a remote target
On systems with Mellanox hardware devices that use either the
mlx4_core
or themlx5_core
driver modules, Kexec fails to load the crash kernel and hangs while themlx4_core
ormlx5_core
driver is initialized if a remote dump target is used.The workaround is to either store the vmcore file locally or to disable loading the driver in the crash kernel by adding either
rd.driver.blacklist=mlx4_core
orrd.driver.blacklist=mlx5_core
to theKDUMP_COMMANDLINE_APPEND
option in/etc/sysconfig/kdump
. (Bug IDs 27915989 and 27916214) -
Kdump fails and hangs when configured to use a remote dump target over an
igb
deviceOn systems where Kdump is configured to use a remote dump target over an
igb
network device, NETDEV WATCHDOG returns a timeout error and the network adapter is continually reset, resulting in a system hang when kexec attempts to load the crash kernel. (Bug ID 27916095)
[aarch64] CPU hot plug feature not functional in KVM
Although CPU hot plug functionality is available in QEMU, the aarch64 Linux kernel is not yet able to handle the addition of new virtual CPUs to a running virtual machine. When QEMU is used to add a virtual CPU to a running virtual machine in KVM, the following error is returned:
kvm_init_vcpu failed: Device or resource busy
CPU hot plug functionality is currently unavailable for UEK R5 on 64-bit Arm platforms. (Bug ID 28140386)
[aarch64] Networking fails for Mellanox ConnectX-3 Pro Ethernet controller
Mellanox networking may fail on Arm platform systems using the Mellanox ConnectX-3 Pro Ethernet controller with certain firmware versions. The issue typically results in the following dmesg output:
... [ 21.605491] mlx4_core 0001:01:00.0: Failed to initialize event queue table, aborting [ 22.660967] mlx4_core: probe of 0001:01:00.0 failed with error -12 [ 22.704966] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0 [ 22.711355] mlx4_en 0000:01:00.0: Activating port:1 [ 22.742948] mlx4_en: 0000:01:00.0: Port 1: Using 32 TX rings [ 22.748600] mlx4_en: 0000:01:00.0: Port 1: Using 8 RX rings [ 22.754437] mlx4_en: 0000:01:00.0: Port 1: Initializing port [ 22.760602] mlx4_en 0000:01:00.0: registered PHC clock [ 22.766283] mlx4_en 0000:01:00.0: Activating port:2 [ 22.766956] mlx4_core 0000:01:00.0 enp1s0: renamed from eth0 [ 22.778621] mlx4_en: 0000:01:00.0: Port 2: Failed to allocate NIC resources [ 22.785776] mlx4_en 0000:01:00.0: removed PHC [ 25.488635] mlx4_en: enp1s0: Steering Mode 1 ...
This issue can be resolved by using the maxcpus=8 kernel parameter at boot, to limit the number of CPUs that are available during the boot process. Once the system has fully booted, Systemd enables all available CPUs and there is no performance impact.
To set this parameter so that it is used for all kernels when the
system boots, edit the GRUB configuration. You can do this by
editing the GRUB_CMDLINE_LINUX
line in
/etc/sysconfig/grub
in a text editor, for
example:
GRUB_CMDLINE_LINUX="crashkernel=auto resume=/dev/mapper/linux1-swap rd.lvm.lv=linux1/root \ rd.lvm.lv=linux1/swap rhgb quiet maxcpus=8"
To update your grub configuration with the changes so that they are used on the next boot if you are using legacy BIOS, run the following command:
# grub2-mkconfig -o /boot/grub2/grub.cfg
Alternately, if you are booting using UEFI, run the following command:
# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
This issue is only present in later firmware versions for this
hardware. The issue is not replicated on cards with the
HVE102M-0.2
firmware, but appears when the
firmware is upgraded to HVE104N-1.12
. The issue
can also be avoided by downgrading the card firmware. (Bug ID
30877943)
File System Issues
The following are known issues that are specific to file systems that are supported on UEK R5U5.
ext4: Frequent repeated system shutdowns can cause file system corruption
If a system using ext4
is repeatedly and
frequently shut down, the file system may be corrupted. This
issue is considered to be a corner-case due to the difficulty
required to replicate. The issue exists in upstream code and
proposed patches are currently under review. (Bug ID 27547113)
xfs: xfs_repair fails to repair the corrupted link counts
If an xfs
file system is repaired by using
the xfs_repair command, and there are invalid
inode counts, the utility may fail to repair the corrupted link
counts and return errors while verifying the link counts. The
issue is currently under investigation, but appears to be
related to the xfsprogs-4.15-1
package that
is released with UEK R5. The issue might not appear when using
the earlier xfsprogs-4.5.0-18.0.1
package
version, which available in the ol7_latest
yum repository. (Bug ID 28070680)
RDMA Issues
The following issues are noted for RDMA:
-
ibacm
service disabled by defaultThe
ibacm
service is disabled by default immediately after installation, which means theibacm
service does not automatically start after a reboot. This behavior is expected. Note that requirements for using theibacm
service are application-specific. If your application requires this service, you may need to enable the service so that it starts after a reboot:# systemctl enable ibacm
(Bug ID 28074471)
-
Error: some other host already uses address xxx.xxx.xxx.xxx
The following error message might be triggered in certain instances:
Error, some other host already uses address xxx.xxx.xxx.xxx
This issue is typically triggered if active-bonding is enabled and you then run the
ifup
ib-interface command.You can ignore this message, as the InfiniBand interface is brought up successfully. (Bug ID 28097516)
Docker Issues
The following are known Docker issues:
-
yum install command can fail within a container on an overlayfs file system
Running the yum install command within a container on an
overlayfs
file system can fail with the following error:Rpmdb checksum is invalid: dCDPT(pkg checksums): package_name
Although this error can break Dockerfile builds, it is expected kernel behavior and a known upstream issue. See https://github.com/moby/moby/issues/10180.
The workaround is to run the touch /var/lib/rpm/* command before installing the package.
Note that this issue is fixed for any Oracle Linux images that are available on the Docker Hub or the Oracle Container Registry; however, the issue could still be encountered when running any container that is based on a third-party image.
(Bug ID 21804564)
-
Docker can fail where it uses the overlay2 storage driver on XFS-formatted storage
A kernel patch has been applied to prevent overlay mounts on XFS if the
ftype
is not set to1
. This fix resolves an issue where XFS did not properly support the whiteout features of an overlay file system, ifd_type
support was not enabled. If the Docker Engine is already using XFS-formatted storage with theoverlay2
storage driver, an upgrade of the kernel can cause Docker to fail if the underlying XFS file system is not created with the-n ftype=1
option enabled. The root partition on Oracle Linux 7 is automatically formatted with-n ftype=0
, where XFS is selected as the file system. Therefore, if you intend to use theoverlay2
storage driver in this environment, you must format a separate device for this purpose.(Bug ID 25995797)
IOMMU kernel option enabled by default
Starting with UEK R5U1, IOMMU functionality is enabled by default
in the x86_64
kernel. This change better
facilitates single root input-output virtualization (SR-IOV) and
other virtualization extensions; however, it is also known to
result in boot failure issues on certain hardware that cannot
complete discovery when IOMMU is enabled. The status of this
feature no longer appears in /proc/cmd
reporting as iommu=on
and may need to be
explicitly disabled as a kernel cmdline
option
if boot failure occurs. As an alternative workaround, you can
disable IOMMU or Intel-Vtd in your system ROM by following your
vendor instructions.
These boot failure issues have been observed on equipment with certain Broadcom network devices, such HP Gen8 servers. For more detailed information, see https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-c04565693.
LXC Issues
The following are known LXC issues:
-
LXC read-only ip_local_port_range parameter
With
lxc-1.1
, or later, and UEK R5,ip_local_port_range
is a read-writable parameter under/proc/sys/net/ipv4
in an Oracle Linux container, rather than being read-only. (Bug ID 21880467)
NVMe device names change across reboots
Because UEK R5 adds support for NVMe subsystems and multipathing,
enumerated device names that are generated by the kernel are not
stable. This behavior is similar to the way other block devices
are handled by the kernel. If you use enumerated kernel instance
names to handle mounts in your fstab
file, the
mounts might fail or behave unpredictably.
Never use enumerated kernel instance names when referring to block devices. Instead, use the UUID, partition label, or file system label to refer to any block device, including an NVMe device. If you are uncertain of the device UUID or labels, use the blkid command to view this information.
Prior to multipathing, a subsystem number would typically map to
the controller number. Therefore, you could assume that the
subsystem at /dev/nvme0n1
was affiliated with
the /dev/nvme0
controller. This correlation is
no longer the case. For multipathing to be enabled, a subsystem
could have multiple controllers. In this case,
/dev/nvme0n1
could just as easily be affiliated
with controllers at /dev/nvme1
and
/dev/nvme2
. Currently, no specific correlation
between the subsystem device name and the controller device name
exists.
NVMe device hotplug unplug procedure has changed
Because UEK R5 adds support for NVMe subsystems and multipathing, enumerated device names that are generated by the kernel are not stable. The result is that the procedure for identifying and unplugging NVMe devices by using hotplug functionality is slightly different than the procedure that you may have followed when using other kernel releases.
Perform the following steps to identify, power down, and unplug the appropriate device:
-
Identify the disk that you wish to remove, according to its WWN or UUID, by using the lsblk command:
# lsblk -o +UUID,WWN,MODEL
Take note of the enumerated kernel instance name that is assigned to the device; for example:
nvme0n1
.Important:
it is important to understand that the device name does not necessarily map to the controller or PCIe bridge to which it is attached. See NVMe device names change across reboots.
-
Search for the device path to obtain the PCI domain identifier for the device:
# find /sys/devices -iname nvme0n1 /sys/devices/pci0000:85/0000:85:01.0/0000:8d:00.0/nvme/nvme1/nvme0n1
Note that
0000:8d:00.0
in the returned path for the device is the PCI domain identifier for the device. You will need this information to proceed. -
Obtain the physical slot number for the NVMe drive. Under UEK R5, the slot is bound to the NVMe device directly, not the PCIe controller.
You can locate the slot number for the NVMe device by running the lspci command and by querying the PCI domain identifier for the device in verbose mode, for example:
# lspci -s 0000:8d:00.0 -vvv 8d:00.0 Non-Volatile memory controller: Intel Corporation Express Flash NVMe P4500 (prog-if 02 [NVM Express]) Subsystem: Oracle/SUN Device 4871 Physical Slot: 104-1 …
Note that the Physical Slot number for the device in the previous example is
104-1
. This value is required to proceed. -
Use the Physical Slot number for the device to find its bus interface:
# find /sys -iname "104-1" /sys/bus/pci/slots/104-1
-
Use the returned bus interface path to power off the NVMe drive:
# echo 0 > /sys/bus/pci/slots/104-1/power
Depending on your hardware, the blue disk LED located on the front panel of the system may display to indicate that you can safely remove the disk drive.
Kernel warning when allocating memory for Avago MegaRAID SAS 9460-16i controller
An issue that results in a kernel warning when loading the
megaraid_sas
module for the Avago MegaRAID SAS
9460-16i controller is introduced in this kernel release. The
issue results when the kernel attempts to allocate memory for the
IO request frame pool.
The issue is resolved by setting the contiguous memory allocation
(cma
) value to 64M at boot time, by editing the
/etc/defaults/grub
file and updating the
GRUB_CMDLINE_LINUX
line to include the
cma=64M
option, for example:
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=ol7/root rd.lvm.lv=ol7/swap rhgb quiet cma=64M"
(Bug ID 29635963, 29618702)
KVM guest may crash when using memory hotplug operation to shrink available memory
A KVM guest may crash if the guest memory is reduced from 96GB, or more, to 2GB by using a memory hotplug operation. Although this issue is logged for UEK R5, similar issues have been noted for RHCK. This behavior is expected and relates to the how memory ballooning works. Shrinking guest memory in large amounts can result in Out Of Memory (OOM) conditions; processes are killed automatically, if the memory shrinks to an amount that is lower than the amount currently in use by the guest operating system.
(Bug ID 27968656)
Mellanox ConnectX adapter not detected at boot
On systems that are using the Mellanox ConnectX adapters, the driver does not load the InfiniBand and RMDA modules at boot time, which results in a failure to detect the adapter when using RDMA and InfiniBand-related tools, such as the ibstat command.
Errors similar to the following are typically displayed:
ibpanic: [26013] main: stat of IB device 'mthca0' failed: No such file or directory
This issue occurs because although the
mlx4_core
and mlx5_core
drivers are included in the initramfs to facilitate a PXE boot,
the InfiniBand and RDMA modules are not included. If you need the
driver for a PXE boot, you can reload it manually after booting,
which will trigger the RDMA hotplug sequence, for example:
# modprobe mlx5_core
If you do not require the mlx4_core
or
mlx5_core
driver for a PXE boot, you can remove
these drivers from the initramfs, as they are loaded after boot.
Then, the RDMA hotplug sequence is triggered normally.
To remove the drivers from the initramfs, create the
/etc/dracut.conf.d/10-mlx_dracut-denylist.conf
file and then add the following line:
omit_drivers+=" mlx4_* mlx5_* mlxfw "
After you have updated the file, rebuild the initramfs by running the following command:
# dracut -f
Reboot the system for the changes to take effect.
(Bug ID 31353413)