Chapter 5 Managing the XFS File System

This chapter describes tasks for administering the XFS file system in Oracle Linux 8. For an overview of local file system management, see Chapter 1, About File System Management in Oracle Linux.

5.1 About the XFS File System

XFS is a high-performance journaling file system that was initially created by Silicon Graphics, Inc. for the IRIX operating system and then later ported to Linux. The parallel I/O performance of XFS provides high scalability for I/O threads, file system bandwidth, file and file system size, even when the file system spans many storage devices.

A typical use case for XFS is to implement a several-hundred terabyte file system across multiple storage servers, with each server consisting of multiple FC-connected disk arrays.

XFS is supported for use with the root (/) or boot file systems on Oracle Linux 8.

XFS has a large number of features that make it suitable for deployment in an enterprise-level computing environment that requires the implementation of very large file systems:

  • Implements journaling for metadata operations, which guarantees the consistency of the file system following loss of power or a system crash. XFS records file system updates asynchronously to a circular buffer (the journal) before it can commit the actual data updates to disk. The journal can be located either internally in the data section of the file system, or externally on a separate device to reduce contention for disk access. If the system crashes or loses power, it reads the journal when the file system is remounted, and replays any pending metadata operations to ensure the consistency of the file system. The speed of this recovery does not depend on the size of the file system.

  • Is internally partitioned into allocation groups, which are virtual storage regions of fixed size. Any files and directories that you create can span multiple allocation groups. Each allocation group manages its own set of inodes and free space independently of other allocation groups to provide both scalability and parallelism of I/O operations. If the file system spans many physical devices, allocation groups can optimize throughput by taking advantage of the underlying separation of channels to the storage components.

  • Is an extent-based file system. To reduce file fragmentation and file scattering, each file's blocks can have variable length extents, where each extent consists of one or more contiguous blocks. XFS's space allocation scheme is designed to efficiently locate free extents that it can use for file system operations. XFS does not allocate storage to the holes in sparse files. If possible, the extent allocation map for a file is stored in its inode. Large allocation maps are stored in a data structure maintained by the allocation group.

  • Supports the reflink and deduplication features, which provides the following benefits:

    • Each copy can have different file metadata (permissions, and so on) because each copy has its own distinct inode. Only the data extents are shared.

    • The file system ensures that any write causes a copy-on-write, without applications having to do anything special.

    • Changing one extent continues to allow all of the other extents to remain shared. In this way, space is saved on a per-extent basis. Note, however, that a change to a hard-linked file does require a new copy of the entire file.

  • To maximize throughput for XFS file systems that you create on an underlying striped, software or hardware-based array, you can use the su and sw arguments to the -d option of the mkfs.xfs command to specify the size of each stripe unit and the number of units per stripe. XFS uses the information to align data, inodes, and journal appropriately for the storage. On lvm and md volumes and some hardware RAID configurations, XFS can automatically select the optimal stripe parameters for you.

  • To reduce fragmentation and increase performance, XFS implements delayed allocation, reserving file system blocks for data in the buffer cache, and allocating the block when the operating system flushes that data to disk.

  • XFS supports extended attributes for files, where the size of each attribute's value can be up to 64 KB, and each attribute can be allocated to either a root or a user name space.

  • Direct I/O in XFS implements high throughput, non-cached I/O by performing DMA directly between an application and a storage device, utilising the full I/O bandwidth of the device.

  • To support the snapshot facilities that volume managers, hardware subsystems, and databases provide, you can use the xfs_freeze command to suspend and resume I/O for an XFS file system. See Section 5.7, “Freezing and Unfreezing an XFS File System”.

  • To defragment individual files in an active XFS file system, you can use the xfs_fsr command. See Section 5.11, “Defragmenting an XFS File System”.

  • To grow an XFS file system, you can use the xfs_growfs command. See Section 5.5, “Growing an XFS File System”.

  • To back up and restore a live XFS file system, you can use the xfsdump and xfsrestore commands. See Section 5.9, “Backing Up and Restoring an XFS File System”.

  • XFS supports user, group, and project disk quotas on block and inode usage that are initialized when the file system is mounted. Project disk quotas allow you to set limits for individual directory hierarchies within an XFS file system without regard to which user or group has write access to that directory hierarchy.

For more information about XFS, see https://xfs.wiki.kernel.org/.

5.1.1 About External XFS Journals

The default location for an XFS journal is on the same block device as the data. Because synchronous metadata writes to the journal must complete successfully before any associated data writes can start, such a layout can lead to disk contention for the typical workload pattern on a database server. To overcome this problem, you can place the journal on a separate physical device with a low-latency I/O path. As the journal typically requires very little storage space, such an arrangement can significantly improve the file system's I/O throughput. A suitable host device for the journal is a solid-state drive (SSD) device or a RAID device with a battery-backed, write-back cache.

To reserve an external journal with a specified size when you create an XFS file system, specify the -l logdev=device,size=size option to the mkfs.xfs command. If you omit the size parameter, mkfs.xfs selects a journal size based on the size of the file system. To mount the XFS file system so that it uses the external journal, specify the -o logdev=device option to the mount command.

5.1.2 About XFS Write Barriers

A write barrier assures file system consistency on storage hardware that supports flushing of in-memory data to the underlying device. This ability is particularly important for write operations to an XFS journal that is held on a device with a volatile write-back cache.

By default, an XFS file system is mounted with a write barrier. If you create an XFS file system on a LUN that has a battery-backed, non-volatile cache, using a write barrier degrades I/O performance by requiring data to be flushed more often than necessary. In such cases, you can remove the write barrier by mounting the file system with the -o nobarrier option to the mount command.

5.1.3 About Lazy Counters

With lazy counters enabled on an XFS file system, the free-space and inode counters are maintained in parts of the file system other than the superblock. This arrangement can significantly improve I/O performance for application workloads that are metadata intensive.

Lazy counters are enabled by default. However, if required, you can disable them by specifying the -l lazy-count=0 option to the mkfs.xfs command.

5.2 Installing XFS Packages

Note

You can also obtain the XFS packages from the Oracle Linux yum server.

To install the XFS packages on a system:

  1. Log in to ULN, and subscribe your system to the ol8_x86_64_latest channel.

  2. On your system, install the xfsprogs and xfsdump packages:

    # dnf install xfsprogs xfsdump

  3. If you require the XFS development and QA packages, additionally subscribe your system to the ol8_x86_64_optional channel and install them:

    # dnf install xfsprogs-devel xfsprogs-qa-devel

5.3 Creating an XFS File System

You can use the mkfs.xfs command to create an XFS file system, for example:

# mkfs.xfs /dev/vg0/lv0
meta-data=/dev/vg0/lv0           isize=256    agcount=32, agsize=8473312 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=271145984, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=32768, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0 

The following examples shows how you would create an XFS file system with a stripe-unit size of 32 KB and 6 units per stripe. To do so, you would specify the su and sw arguments to the -d option:

# mkfs.xfs -d su=32k,sw=6 /dev/vg0/lv1

For more information, see the mkfs.xfs(8) manual page.

5.4 Modifying an XFS File System

Note

It is not possible to modify a mounted XFS file system.

You can use the xfs_admin command to modify an unmounted XFS file system. For example, you can enable or disable lazy counters, change the file system UUID, or change the file system label.

To display the existing label for an unmounted XFS file system and then apply a new label, use the following command:

# xfs_admin -l /dev/sdb
label = ""
# xfs_admin -L "VideoRecords" /dev/sdb
writing all SBs
new label = "VideoRecords"
Note

The label can be a maximum of 12 characters in length.

To display the existing UUID and then generate a new UUID, use the following command:

# xfs_admin -u /dev/sdb
UUID = cd4f1cc4-15d8-45f7-afa4-2ae87d1db2ed
# xfs_admin -U generate /dev/sdb
writing all SBs
new UUID = c1b9d5a2-f162-11cf-9ece-0020afc76f16

To clear the UUID altogether:

# xfs_admin -U nil /dev/sdb
Clearing log and setting UUID
writing all SBs
new UUID = 00000000-0000-0000-0000-000000000000

To disable and then re-enable lazy counters:

# xfs_admin -c 0 /dev/sdb
Disabling lazy-counters
# xfs_admin -c 1 /dev/sdb
Enabling lazy-counters

For more information, see the mkfs_admin(8) manual page.

5.5 Growing an XFS File System

Note

You cannot grow an XFS file system that is currently unmounted. Also, no command currently exists to shrink an XFS file system.

You can use the xfs_growfs command to increase the size of a mounted XFS file system if there is space on the underlying devices to accommodate the change. The command does not have any effect on the layout or size of the underlying devices. If necessary, use the underlying volume manager to increase the physical storage that is available. For example, you can use the vgextend command to increase the storage that is available to an LVM volume group and lvextend to increase the size of the logical volume that contains the file system.

You cannot use the parted command to resize a partition that contains an XFS file system. You must instead recreate the partition with a larger size and restore its contents from a backup if you deleted the original partition or from the contents of the original partition if you did not delete it to free up disk space.

For example, you would increase the size of /myxfs1 to 4 TB, assuming a block size of 4 KB, as follows:

# xfs_growfs -D 1073741824 /myxfs1

To increase the size of the file system to the maximum size that the underlying device supports, specify the -d option:

# xfs_growfs -d /myxfs1

For more information, see the xfs_growfs(8) manual page.

5.7 Freezing and Unfreezing an XFS File System

If you need to take a hardware-based snapshot of an XFS file system, you can temporarily stop write operations to it.

Note

You do not need to explicitly suspend write operations if you use the lvcreate command to take an LVM snapshot.

To freeze and unfreeze an XFS file system, use the -f and -u options with the xfs_freeze command:

# xfs_freeze -f /myxfs
# # ... Take snapshot of file system ...
# xfs_freeze -u /myxfs
Note

You can also use the xfs_freeze command with btrfs, ext3, and ext4 file systems.

For more information, see the xfs_freeze(8) manual page.

5.8 Managing Quotas on an XFS File System

The following information pertains to managing quotas on an XFS file system.

5.8.1 Mount Options for Enabling Quotas

The following table describes the options that you can specify with the mount command to enable quotas on an XFS file system.

Mount Option Description

gqnoenforce

Enable group quotas. Report usage, but do not enforce usage limits.

gquota

Enable group quotas and enforce usage limits.

pqnoenforce

Enable project quotas. Report usage, but do not enforce usage limits.

pquota

Enable project quotas and enforce usage limits.

uqnoenforce

Enable user quotas. Report usage, but do not enforce usage limits.

uquota

Enable user quotas and enforce usage limits.

5.8.2 Displaying Block Usage Information

To display the block usage limits and the current usage in the myxfs file system for all users, use the xfs_quota command, for example:

# xfs_quota -x -c 'report -h' /myxfs
User quota on /myxfs (/dev/vg0/lv0)
                        Blocks              
User ID      Used   Soft   Hard Warn/Grace   
---------- --------------------------------- 
root            0      0      0  00 [------]
guest           0   200M   250M  00 [------]

The following forms of the command display the free and used counts for blocks and inodes respectively in the manner of the df -h command:

# xfs_quota -c 'df -h' /myxfs
Filesystem     Size   Used  Avail Use% Pathname
/dev/vg0/lv0 200.0G  32.2M  20.0G   1% /myxfs

# xfs_quota -c 'df -ih' /myxfs
Filesystem   Inodes   Used   Free Use% Pathname
/dev/vg0/lv0  21.0m      4  21.0m   1% /myxfs

If you specify the -x option to enter expert mode, you can use subcommands such as limit to set soft and hard limits for block and inode usage by an individual user, for example:

# xfs_quota -x -c 'limit bsoft=200m bhard=250m isoft=200 ihard=250 guest' /myxfs 

Note that this command requires that you have mounted the file system with user quotas enabled.

To set limits for a group on an XFS file system that you have mounted with group quotas enabled, specify the -g option to limit:

# xfs_quota -x -c 'limit -g bsoft=5g bhard=6g devgrp' /myxfs

For more information, see the xfs_quota(8) manual page.

5.8.3 Setting Project Quotas

User and group quotas are supported by other file systems, such as ext4. The XFS file system additionally enables you to set quotas on individual directory hierarchies in the file system, which are known as managed trees. Each managed tree is uniquely identified by a project ID and an optional project name. The ability to control the disk usage of a directory hierarchy is useful if you do not otherwise want to set quota limits for a privileged user, for example, /var/log, or if many users or groups have write access to a directory, for example, /var/tmp.

To define a project and set quota limits for it:

  1. Mount the XFS file system with project quotas enabled.

    # mount -o pquota device mountpoint

    For example, to enable project quotas for the /myxfs file system, you would use the following command:

    # mount -o pquota /dev/vg0/lv0 /myxfs
  2. Define a unique project ID for the directory hierarchy in the /etc/projects file.

    # echo project_ID:mountpoint/directory >> /etc/projects

    For example, you would set a project ID of 51 for the directory hierarchy /myxfs/testdir as follows:

    # echo 51:/myxfs/testdir >> /etc/projects
  3. Create an entry in the /etc/projid file that maps a project name to the project ID.

    # echo project_name:project_ID >> /etc/projid

    For example, you would map the project name testproj to the project with ID 51 as follows:

    # echo testproj:51 >> /etc/projid
  4. Use the project subcommand of xfs_quota to define a managed tree in the XFS file system for the project.

    # xfs_quota -x -c ’project -s project_namemountpoint

    For example, you would define a managed tree in the /myxfs file system for the project testproj, which corresponds to the directory hierarchy /myxfs/testdir, as follows:

    # xfs_quota -x -c ’project -s testproj’ /myxfs
  5. Use the limit subcommand to set limits on the disk usage of the project.

    # xfs_quota -x -c ’limit -p arguments project_namemountpoint

    For example, to set a hard limit of 10 GB of disk space for the project testproj, you would use the following command:

    # xfs_quota -x -c ’limit -p bhard=10g testproj’ /myxfs

For more information, see the projects(5), projid(5), and xfs_quota(8) manual pages.

5.9 Backing Up and Restoring an XFS File System

The xfsdump package contains the xfsdump and xfsrestore utilities. The xfsdump command examines the files in an XFS file system, determines which files need to be backed up, and copies them to the storage medium. Any backups that you create by using the xfsdump command are portable between systems with different endian architectures. The xfsrestore command restores a full or incremental backup of an XFS file system. You can also restore individual files and directory hierarchies from backups.

Note

Unlike an LVM snapshot, which immediately creates a sparse clone of a volume, xfsdump takes time to make a copy of the file system data.

You can use the xfsdump command to create a backup of an XFS file system on a device such as a tape drive or in a backup file on a different file system. A backup can span multiple physical media that are written on the same device. Additionally, you can write multiple backups to the same medium. Note that you can write only a single backup to a file. The command does not overwrite existing XFS backups that are found on physical media. If you need to overwrite any existing backups, you must use the appropriate command to erase a physical medium.

For example, the following command writes a level 0 (base) backup of the XFS file system (/myxfs) to the device, /dev/st0, and assigns a session label to the backup:

# xfsdump -l 0 -L "Backup level 0 of /myxfs `date`" -f /dev/st0 /myxfs

You can make incremental dumps that are relative to an existing backup by using the same command, for example:

# xfsdump -l level -L "Backup level level of /myxfs `date`" -f /dev/st0 /myxfs

A level 1 backup records only file system changes since the level 0 backup, a level 2 backup records only the changes since the latest level 1 backup, and so on up to level 9.

If you interrupt a backup by typing Ctrl-C and you did not specify the -J option (suppress the dump inventory) to xfsdump , you can resume the dump at a later date by specifying the -R option, for example:

# xfsdump -R -l 1 -L "Backup level 1 of /myxfs `date`" -f /dev/st0 /myxfs

In the previous example, the backup session label from the earlier interrupted session is overridden.

Use the xfsrestore command to find out information about the backups you have made of an XFS file system or to restore data from a backup.

The xfsrestore -I command displays information about the available backups, including the session ID and session label. If you want to restore a specific backup session from a backup medium, you can specify either the session ID or the session label.

For example, to restore an XFS file system from a level 0 backup by specifying the session ID, you would use the following command:

# xfsrestore -f /dev/st0 -S c76b3156-c37c-5b6e-7564-a0963ff8ca8f /myxfs

Specify the -r option to cumulatively recover all of the data from a level 0 backup, as well as higher-level backups that are based on that backup:

# xfsrestore -r -f /dev/st0 -v silent /myxfs

This command searches for backups in the archive, based on the level 0 backup, and then prompts you to choose whether you want to restore each backup, in turn. After restoring the selected backup, the command exits. Note that you must run this command multiple times, first selecting to restore the level 0 backup, and then subsequent higher-level backups, up to and including the most recent backup that you require in order to restore the file system data.

Note

After completing a cumulative restoration of an XFS file system, you should delete the housekeeping directory that the xfsrestore command creates in the destination directory.

As shown in the following example, you can recover a selected file or subdirectory contents from the backup medium. Running the command recovers the contents of /myxfs/profile/examples to /tmp/profile/examples, from the backup with the specified session label:

# xfsrestore -f /dev/sr0 -L "Backup level 0 of /myxfs Sat Mar 2 14:47:59 GMT 2013" \
  -s profile/examples /usr/tmp

Alternatively, you can interactively browse a backup by specifying the -i option, for example:

# xfsrestore -f /dev/sr0 -i

The previous form of the command enables you browse a backup as though it were a file system. You can change directories, list files, add files, delete files, or extract files from a backup.

To copy the entire contents of one XFS file system to another, you can combine the xfsdump and xfsrestore command by using the -J option to suppress the usual dump inventory housekeeping that the commands perform, for example:

# xfsdump -J - /myxfs | xfsrestore -J - /myxfsclone

For more information, see the xfsdump(8) and xfsrestore(8) manual pages.

5.10 Checking and Repairing an XFS File System

Note

If you have an Oracle Linux Premier Support account and encounter a problem mounting an XFS file system, send a copy of the /var/log/messages file to Oracle Support and wait for advice.

If you cannot mount an XFS file system, you can use the xfs_repair -n command to check its consistency. Usually, you would only run this command on the device file of an unmounted file system that you believe has a problem. The xfs_repair -n command displays output to indicates changes that would be made to the file system in the case where it would need to complete a repair operation, but will not modify the file system directly.

If you can mount the file system and you do not have a suitable backup, you can use the xfsdump command to attempt a back up of the existing file system data. However, note that the command might fail if the file system's metadata has become corrupted.

You can use the xfs_repair command to attempt to repair an XFS file system that is specified by its device file. The command replays the journal log to fix any inconsistencies that might have resulted from the file system not being cleanly unmounted. Unless the file system has an inconsistency, you usually do not need to use the follwoing command, as the journal is replayed every time that you mount an XFS file system.

# xfs_repair device

If the journal log has become corrupted, you can reset the log by specifying the -L option to xfs_repair.

Warning

Resetting the log can leave the file system in an inconsistent state, resulting in data loss and data corruption. Unless you are experienced with debugging and repairing XFS file systems by using the xfs_db command, it is recommended that you instead recreate the file system and restore its contents from a backup.

If you cannot mount the file system or you do not have a suitable backup, running the xfs_repair command is the only viable option, unless you are experienced in using the xfs_db command.

xfs_db provides an internal command set that allows you to debug and repair an XFS file system manually. The commands enable you to perform scans on the file system, as well as navigate and display its data structures. If you specify the -x option to enable expert mode, you can modify the data structures.

# xfs_db [-x] device

For more information, see the xfs_db(8) and xfs_repair(8) manual pages, and run the help command within xfs_db.

5.11 Defragmenting an XFS File System

You can use the xfs_fsr command to defragment whole XFS file systems or individual files within an XFS file system. As XFS is an extent-based file system, it is usually unnecessary to defragment a whole file system, and doing so is not recommended.

To defragment an individual file, use the following command to specify the name of the file as the argument to xfs_fsr:

# xfs_fsr pathname

Running the xfs_fsr command without any options defragments all of the currently mounted and writeable XFS file systems that are listed in /etc/mtab. For a period of two hours, the command passes over each file system, in turn, and attempts to defragment the top ten percent of files with the greatest number of extents. After two hours, the command records its progress in the /var/tmp/.fsrlast_xfs file. If you run the command again, the process is resumed from that point.

For more information, see the xfs_fsr(8) manual page.