3 Known Issues

This chapter describes the known issues in this update.

btrfs, ext4 and xfs: Kernel panic when freeze and unfreeze operations are performed in multiple threads

Freeze and unfreeze operations that are performed across multiple threads on any supported file system can cause the system to hang and the kernel to panic. This problem is the result of a race condition that occurs when the unfreeze operation is triggered before it is actually frozen. The resulting unlock operation attempts a write operation on a non-existent lock, resulting in the kernel panic. (Bug ID 25321899)

btrfs

The following are known btrfs issues:

  • Send operation causes soft lockup on large deduped file

    Using btrfs send on a large deduped file results in a soft lockup or out-of-memory issue. This problem occurs because the btrfs send operation cannot handle a large deduped file containing file extents that are all pointing to one extent, as these types of file structures create tremendous pressure for the btrfs send operation.

    To prevent this issue from occurring, do not use btrfs send on systems with less than 4 GB of memory. (Bug ID 25306023)

  • Kernel oops when unmounting during a quota rescan or disable

    Operations that trigger a quota rescan or to disable the quota on a mounted file system cause a kernel oops message when attempting to unmount the file system. This can cause the system to hang. (Bug ID 22377928)

  • Kernel oops when removing shared extents using qgroup accounting

    The removal of shared extents where quota group (qgroup) accounting is used can result in a kernel oops message. This relates to an issue where inaccurate results are obtained during a back reference walk, due to missing records when adding delayed references. (Bug ID 21554517)

  • No warning when balancing file system on RAID

    The btrfs filesystem balance command does not warn that the RAID level can be changed under certain circumstances, and does not provide the choice of cancelling the operation. (Bug ID 16472824)

  • Disk space requirement to perform all btrfs operations

    The copy-on-write nature of btrfs means that every operation on the file system initially requires disk space. It is possible that you cannot execute any operation on a disk that has no space left, and even removing a file might not be possible. In the case that there is no space to store metadata, an ENOSPC error is returned. In this situation, run sync before retrying an operation, as this can clear a background writeback that might be reserving metadata space. Another potential workaround is to add a disk or a file backed loop device using the btrfs device add command. The mechanism that is used to store data and metadata might lead to some confusion on the information returned by tools like df. Sometimes, metadata might fill all of the disk space allocated for this purpose, even while there is still space available for data. In this case, the file system is unbalanced and the problem can be resolved by performing a btrfs fi balance operation. See https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_get_.22No_space_left_on_device.22_errors.2C_but_df_says_I.27ve_got_lots_of_space.

  • Double count of overwritten space in qgroup show

    When you overwrite data in a file, starting somewhere in the middle of the file, the overwritten space is counted twice in the space usage numbers that btrfs qgroup show displays. Using the btrfs quota rescan does not help fix this issue either. (Bug ID 16609467)

  • Sector size should match page size

    If you use the -s option to specify a sector size to mkfs.btrfs that is different from the page size, the created file system cannot be mounted. By default, the sector size is set to be the same as the page size. (Bug ID 17087232)

  • Location of btrfs-progs and btrfs-progs-devel packages

    The btrfs-progs and btrfs-progs-devel packages for use with UEK R4 are made available in the ol6_x86_64_UEKR4 and ol7_x86_64_UEKR4 ULN channels and the ol6_UEKR4 and ol7_UEKR4 channels on the Oracle Linux Yum Server. In UEK R3, these packages were made available in the ol6_x86_64_latest and ol7_x86_64_latest ULN channels and the ol6_latest and ol7_latest channels on the Oracle Linux Yum Server.

ext4

The following are known ext4 issues:

  • System hangs when processing corrupted orphaned inode list

    If the orphaned inode list is corrupted, the inode might be processed repeatedly, resulting in a system hang. For example, if the orphaned inode list contains a reference to the bootloader inode, ext4_iget(), returns a bad inode, it can result in a processing loop that can hang the system. (Bug ID 24433290)

  • System hangs on unmount after an append to a file with negative i_size

    While it is invalid for a file system to load an inode with a negative i_size, it is possible to create a file like this and append to it. However, doing so causes an integer overflow in the routine's underlying writeback, resulting in the kernel locking up. (Bug ID 25565527)

xfs

The following are known xfs issues:

  • File system corruption occurs after direct I/O writes

    A race condition that results in post-eof blocks being used for direct I/O writes causes a corruption in the file system. If a file release occurs during a file extending direct I/O write, it is possible to mistake the post-eof blocks for speculative preallocation and incorrectly truncate them from the inode. This issue is unlikely to be reproduced in real-world workloads. (Bug ID 26128822)

  • Invalid corrupted file system error resulting from a problem with log recovery on v5 superblocks

    A problem with log recovery on v5 superblocks that causes the metadata LSN not to update for buffers that it writes out, can result in a corruption error similar to the following:

    [1044224.901444] XFS (sdc1): Metadata corruption detected at
    xfs_dir3_block_write_verify+0xfd/0x110 [xfs], block 0x1004e90
    [1044224.901446] XFS (sdc1): Unmount and run xfs_repair
    ...
    [1044224.901460] XFS (sdc1): xfs_do_force_shutdown(0x8) called from line 1249
    of file fs/xfs/xfs_buf.c.  Return address = 0xffffffffa07a8910
    [1044224.901462] XFS (sdc1): Corruption of in-memory data detected.  Shutting
    down filesystem
    [1044224.901463] XFS (sdc1): Please umount the filesystem and rectify the
    problem(s)
    [1044224.904207] XFS (sdc1): log mount/recovery failed: error -117
    [1044224.904456] XFS (sdc1): log mount failed"

    This problem is encountered because the log attempts to replay a buffer update that is no longer valid due to subsequent replayed updates. The result is a corruption error, when in fact, the file system is fine. (Bug ID 25380003)

  • System hangs on unmount after a buffered append to a file with negative i_size

    While it is invalid for a file system to load an inode with a negative i_size, it is possible to create a file like this, and in the case where a buffer appends to it, an integer overflow in the routine's underlying writeback results in the kernel locking up. A direct append does not cause this behavior. (Bug ID 25565490)

  • System hangs during xfs_fsr on two-extent files with speculative preallocation

    During an xfs_fsr process on extents that are generated by speculative preallocation, the code that determines whether all of the extents fit inline miscalculates because the di_nextents call that is used does not account for these extents. This results in corruption of the in-memory inode, and ultimately the code attempts to move memory structures using incorrectly calculated ranges. This causes a kernel panic. (Bug ID 25333211)

  • XFS quotas are disabled after a read-only remount on Oracle Linux 6

    Quotas are disabled on XFS if the file system is remounted with read-only permissions on Oracle Linux 6. (Bug ID 22908906)

  • Overlay file system is unable to mount on XFS where there is no d_type support

    Overlay file systems rely on a feature known as d_type support. This feature is a field within a data structure that provides some metadata about files in a directory entry within the base file system. Overlay file systems use this field to track many file operations such as file ownership changes and whiteouts. d_type support can be enabled in XFS when the file system is created, by using the -n ftype=1 option. When d_type support is not enabled, an overlay file system might become corrupt and behave in unexpected ways. For this reason, this update release of UEK R4 prevents the mounting of an overlay file system on an XFS base, where d_type support is not enabled.

    The root partition on Oracle Linux is automatically formatted with -n ftype=0, where XFS is selected as the file system. Thus, for backward compatibility reasons, if you have overlay file systems in place already and these are not hosted on alternate storage, you must migrate them to a file system that is formatted with d_type support enabled.

    To check that the XFS file system is formatted correctly:

    # xfs_info /dev/sdb1 |grep ftype

    Replace /dev/sdb1 with the path to the correct storage device. If the information returned by this command includes ftype=0, you must migrate the overlay data held in this directory to storage that is formatted correctly.

    To correctly format a new block device with the XFS file system with support for overlay file systems, do:

    # mkfs -t xfs -n ftype=1 /dev/sdb1

    Replace /dev/sdb1 with the path to the correct storage device. It is essential that you use the -n ftype=1 option when you create the file system.

    If you do not have additional block storage available, it is possible to create an XFS file system image and loopback that can be mounted. For example, to create a 5 GB image file in the root directory, you could use the following command:

    # mkfs.xfs -d file=1,name=/OverlayStorage,size=5g -n ftype=1

    To temporarily mount this file, you can enter:

    # mount -o loop -t xfs /OverlayStorage /mnt

    Adding an entry in /etc/fstab to make a permanent mount for this storage, might look similar to the following:

    /OverlayStorage    /mnt        xfs     loop            0 0 

    This configuration can help as a temporary solution to solve upgrade issues. However, using a loopback mounted file system image as a form of permanent storage is not recommended for production environments. (Bug ID 26165630)

DIF/DIX is not supported for ext file systems

The Data Integrity Field (DIF) and Data Integrity Extension (DIX) features that have been added to the SCSI standard are dependent on a file system that is capable of correctly handling attempts by the memory management system to change data in the buffer while it is queued for a write.

The ext2, ext3 and ext4 file system drivers do not prevent pages from being modified during I/O which can cause checksum failures and a "Logical block guard check failed" error. Other file systems such as XFS are supported. (Bug ID 24361968)

Console appears to hang when booting

When booting Oracle Linux 6 on hardware with an ASPEED graphics controller, the console might appear to hang during the boot process after starting udev. However, the system does boot properly and is accessible. The workaround is to add nomodeset as a kernel boot parameter in /etc/grub.conf. (Bug ID 22389972)

Docker

The following are known Docker issues:

  • Running yum install within a container on an overlayfs file system can fail with the following error:

    Rpmdb checksum is invalid: dCDPT(pkg checksums): package_name

    This error can break Dockerfile builds but is expected behavior from the kernel and is a known issue upstream (see https://github.com/docker/docker/issues/10180.)

    The workaround is to run touch /var/lib/rpm/* before installing the package.

    Note that this issue is fixed in any Oracle Linux images available on the Docker Hub or Oracle Container Registry, but the issue could still be encountered when running any container based on a third-party image. (Bug ID 21804564)

  • Docker can fail where it uses the overlay2 storage driver on XFS-formatted storage

    A kernel patch has been applied to prevent overlay mounts on XFS if the ftype is not set to 1. This fix resolves an issue where XFS did not properly support the whiteout features of an overlay filesystem if d_type support was not enabled. If the Docker Engine is already using XFS-formatted storage with the overlay2 storage driver, an upgrade of the kernel can cause Docker to fail if the underlying XFS file system is not created with the -n ftype=1 option enabled. The root partition on Oracle Linux 7 is automatically formatted with -n ftype=0 where XFS is selected as the file system. Therefore, if you intend to use the overlay2 storage driver in this environment, you must format a separate device for this purpose. (Bug ID 25995797)

  • Docker can fail where it uses the overlay2 storage driver and SELinux is enabled

    If the Docker Engine is configured to use the overlay2 storage driver and SELinux is enabled and set to Enforcing mode, Docker containers are unable to function properly and permissions errors are encountered. If you intend to use Docker with the overlay2 storage driver, you must set SELinux to Permissive mode. (Bug ID 25684456)

DTrace

The following are known DTrace issues:

  • Argument declarations with USDT probe definitions cannot be declared with derived types such as enum, struct, or union.

  • The following compiler warning can be ignored for USDT probe definition arguments of type string (which is a D type but not a C type):

    provider_def.h:line#: warning: parameter names (without types) in function declaration
  • Multi-threaded processes under ustack(), usym(), uaddr() and umod(), which perform dlopen() in threads other than the first thread might not have accurate symbol resolution for symbols introduced by dlopen(). (Bug ID 20045149)

Error, some other host already uses address xxx.xxx.xxx.xxx

The following error message might be triggered in certain instances:

Error, some other host already uses address  xxx.xxx.xxx.xxx

The following are the two instances in which this error message might be triggered:

  • When active-bonding is enabled, and you run the ifup ib-interface command.

  • When you run the service rdma start command.

You can ignore this message, as in both cases, the InfiniBand interface is brought up successfully. (Bug IDs 21052903, 26639723)

ifup-ib: line 357: /sys/class/net/ib0/acl_enabled: Permission denied error

Running ifup ib-interface or service network restart reports the following error:

/etc/sysconfig/network-scripts/ifup-ib: line 357: /sys/class/net/ib0/acl_enabled: Permission denied

This error is reported, even though the InfiniBand interface is brought up successfully.

The workaround for this issue is to change from using the older configuration method, where you manipulate sysfs files to the newer ibacl tools that are provided. (Bug ID 26197105)

Increased dom0 memory requirement when using Mellanox® HCAs on Oracle VM Server

Oracle VM Servers running UEKR4u2 and upward in dom0 require at least 400MB more memory to use the Mellanox® drivers. This memory requirement is a result of the default size of the SRQ count being increased from 64K to 256K in later versions of the kernel and the scale_profile option is now enabled by default in the mlx_core module.

In the case where out-of-memory errors are observed in dom0, the maximum dom0 memory size should be increased. Alternative workarounds might involve manually setting the module parameters for the mlx4_core driver. To set these parameters, edit /etc/modprobe.d/mlx4_core.conf and set scale_profile to 0. Alternately, set log_num_srq to 16. The preferred resolution to this issue is to increase the memory allocated to dom0 on an Oracle VM Server. (Bug ID 23581534)

LXC

The following are known LXC issues:

  • The lxc-net service does not always start immediately after installation on Oracle Linux 6

    The lxc-net service does not always start immediately after installation on Oracle Linux 6, even though this action is specified as part of the RPM post-installation script. This can prevent the lxcbr0 interface from coming up. If this interface is not up after installation, you can manually start it by running service lxc-net start. (Bug ID 23177405)

  • LXC read-only ip_local_port_range parameter

    With lxc-1.1 or later and UEK R4, ip_local_port_range is a read-writable parameter under /proc/sys/net/ipv4 in an Oracle Linux container rather than being read-only. (Bug ID 21880467)

MSI-X interrupt allocation fails during maximum number of ixgbe/ixgbevf Virtual Function creation

The Intel ixgbe/ixgbevf and Qlogic qla2xxx drivers compete for MSI-X resources. As a result, if both drivers are used in a system, and an attempt is made to create the maximum number of Virtual Function (VF) devices that are allowed for the ixgbe/ixgbevf driver, an interrupt allocation failure occurs during the creation of the last VF device.

Note that you can create and use up to, but not including, the maximum number of VF devices that are allowed for the ixgbe/ixgbevf without encountering this issue. (Bug ID 25952728)

NVMe devices not found under the /dev directory after PCI rescan

After removing the PCI bus of NVM Express (NVMe) adapter card devices and running a rescan of the PCI bus, no NVMe adapter card devices are found under the /dev directory.

The workaround for this issue is to also remove the PCI slot that the NVMe adapter card device is plugged into before running a rescan of the PCI bus. (Bug ID 26610285)

OFED iSER target login fails from an initiator on Oracle Linux 6

An Oracle Linux 6 system with the oracle-ofed-release packages installed and an iSER (iSCSI Extensions for RDMA) target configured, fails to login to the iSER target as an initiator. On the Oracle Linux 6 initiator machine, the following behavior is typical:

# iscsiadm -m node -T iqn.iser-target.t1 -p 10.196.100.134 --login
Logging in to [iface: default, target: iqn.iser-target.t1, portal:
10.196.100.134,3260] (multiple)
iscsiadm: Could not login to [iface: default, target: iqn.iser-target.t1,
portal: 10.196.100.134,3260].
iscsiadm: initiator reported error (8 - connection timed out)
iscsiadm: Could not log into all portals

This is expected behavior resulting from an errata fix for CVE-2016-4564, to protect against a write from an invalid context.

(Bug ID 23615903)

Open File Description (OFD) locks are not supported on NFSv4 mounts

NFS is not designed to handle OFD locking. (Bug ID 22948696).

SDP performance degradation

The Sockets Direct Protocol (SDP), which was designed to provide an RDMA alternative to TCP over InfiniBand networks, is known to suffer from performance degradation on more recent kernels such as UEK R4u2 and later. There is no active development on this protocol.

Although the library for this protocol is still available for this kernel, support is limited. You should consider using TCP on top of IP over InfiniBand as a more stable alternative. (Bug ID 22354885)

Shared Receive Queue (SRQ) is an experimental feature for RDS and is disabled by default

The SRQ function that optimizes resource usage within the rds_rdma module is experimental and is disabled by default. A warning message is displayed when you enable this feature by setting the rds_ib_srq_enabled flag. (Bug ID 23523586).

Unloading or removing the rds_rdma module is unsupported

Once the rds_rdma module has been loaded, you cannot remove the module using either rmmod or modprobe -r. Unloading of the rds_rdma module is unsupported and can trigger a kernel panic. Do not set the module_unload_allowed flag for this module. (Bug ID 23580850).