5 Managing the XFS File System

This chapter describes tasks for administering the XFS file system in Oracle Linux.

About the XFS File System

XFS is a high-performance journaling file system that was initially created by Silicon Graphics, Inc. for the IRIX operating system and later ported to Linux. The parallel I/O performance of XFS provides high scalability for I/O threads, file system bandwidth, file and file system size, even when the file system spans many storage devices.

A typical use case for XFS is to implement a several-hundred terabyte file system across multiple storage servers, each server consisting of multiple FC-connected disk arrays.

XFS is supported for use with the root (/) or boot file systems on Oracle Linux 7.

XFS has a large number of features that make it suitable for deployment in an enterprise-level computing environment that requires the implementation of very large file systems:

  • XFS implements journaling for metadata operations, which guarantees the consistency of the file system following loss of power or a system crash. XFS records file system updates asynchronously to a circular buffer (the journal) before it can commit the actual data updates to disk. The journal can be located either internally in the data section of the file system, or externally on a separate device to reduce contention for disk access. If the system crashes or loses power, it reads the journal when the file system is remounted, and replays any pending metadata operations to ensure the consistency of the file system. The speed of this recovery does not depend on the size of the file system.

  • XFS is internally partitioned into allocation groups, which are virtual storage regions of fixed size. Any files and directories that you create can span multiple allocation groups. Each allocation group manages its own set of inodes and free space independently of other allocation groups to provide both scalability and parallelism of I/O operations. If the file system spans many physical devices, allocation groups can optimize throughput by taking advantage of the underlying separation of channels to the storage components.

  • XFS is an extent-based file system. To reduce file fragmentation and file scattering, each file's blocks can have variable length extents, where each extent consists of one or more contiguous blocks. XFS's space allocation scheme is designed to efficiently locate free extents that it can use for file system operations. XFS does not allocate storage to the holes in sparse files. If possible, the extent allocation map for a file is stored in its inode. Large allocation maps are stored in a data structure maintained by the allocation group.

  • To maximize throughput for XFS file systems that you create on an underlying striped, software or hardware-based array, you can use the su and sw arguments to the -d option of the mkfs.xfs command to specify the size of each stripe unit and the number of units per stripe. XFS uses the information to align data, inodes, and journal appropriately for the storage. On lvm and md volumes and some hardware RAID configurations, XFS can automatically select the optimal stripe parameters for you.

  • To reduce fragmentation and increase performance, XFS implements delayed allocation, reserving file system blocks for data in the buffer cache, and allocating the block when the operating system flushes that data to disk.

  • XFS supports extended attributes for files, where the size of each attribute's value can be up to 64 KB, and each attribute can be allocated to either a root or a user name space.

  • Direct I/O in XFS implements high throughput, non-cached I/O by performing DMA directly between an application and a storage device, utilising the full I/O bandwidth of the device.

  • To support the snapshot facilities that volume managers, hardware subsystems, and databases provide, you can use the xfs_freeze command to suspend and resume I/O for an XFS file system. See Freezing and Unfreezing an XFS File System.

  • To defragment individual files in an active XFS file system, you can use the xfs-fsr command. See Defragmenting an XFS File System.

  • To grow an XFS file system, you can use the xfs_growfs command. See Growing an XFS File System.

  • To back up and restore a live XFS file system, you can use the xfsdump and xfsrestore commands. See Backing up and Restoring XFS File Systems.

  • XFS supports user, group, and project disk quotas on block and inode usage that are initialized when the file system is mounted. Project disk quotas allow you to set limits for individual directory hierarchies within an XFS file system without regard to which user or group has write access to that directory hierarchy.

You can find more information about XFS at https://xfs.wiki.kernel.org/.

About External XFS Journals

The default location for an XFS journal is on the same block device as the data. As synchronous metadata writes to the journal must complete successfully before any associated data writes can start, such a layout can lead to disk contention for the typical workload pattern on a database server. To overcome this problem, you can place the journal on a separate physical device with a low-latency I/O path. As the journal typically requires very little storage space, such an arrangement can significantly improve the file system's I/O throughput. A suitable host device for the journal is a solid-state drive (SSD) device or a RAID device with a battery-backed write-back cache.

To reserve an external journal with a specified size when you create an XFS file system, specify the -l logdev=device,size=size option to the mkfs.xfs command. If you omit the size parameter, mkfs.xfs selects a journal size based on the size of the file system. To mount the XFS file system so that it uses the external journal, specify the -o logdev=device option to the mount command.

About XFS Write Barriers

A write barrier assures file system consistency on storage hardware that supports flushing of in-memory data to the underlying device. This ability is particularly important for write operations to an XFS journal that is held on a device with a volatile write-back cache.

By default, an XFS file system is mounted with a write barrier. If you create an XFS file system on a LUN that has a battery-backed, non-volatile cache, using a write barrier degrades I/O performance by requiring data to be flushed more often than necessary. In such cases, you can remove the write barrier by mounting the file system with the -o nobarrier option to the mount command.

About Lazy Counters

With lazy-counters enabled on an XFS file system, the free-space and inode counters are maintained in parts of the file system other than the superblock. This arrangement can significantly improve I/O performance for application workloads that are metadata intensive.

Lazy counters are enabled by default, but if required, you can disable them by specifying the -l lazy-count=0 option to the mkfs.xfs command.

Installing the XFS Packages

Note:

You can also obtain the XFS packages from the Oracle Linux Yum Server.

To install the XFS packages on a system:

  1. Log in to ULN, and subscribe your system to the ol7_x86_64_latest channel.

  2. On your system, use yum to install the xfsprogs and xfsdump packages:

    sudo yum install xfsprogs xfsdump
  3. If you require the XFS development and QA packages, additionally subscribe your system to the ol7_x86_64_optional channel and use yum to install them:

    sudo yum install xfsprogs-devel xfsprogs-qa-devel

Creating an XFS File System

You can use the mkfs.xfs command to create an XFS file system, for example.

sudo mkfs.xfs /dev/vg0/lv0
meta-data=/dev/vg0/lv0           isize=256    agcount=32, agsize=8473312 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=271145984, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=32768, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0 

To create an XFS file system with a stripe-unit size of 32 KB and 6 units per stripe, you would specify the su and sw arguments to the -d option, for example:

sudo mkfs.xfs -d su=32k,sw=6 /dev/vg0/lv1

For more information, see the mkfs.xfs(8) manual page.

Modifying an XFS File System

Note:

You cannot modify a mounted XFS file system.

You can use the xfs_admin command to modify an unmounted XFS file system. For example, you can enable or disable lazy counters, change the file system UUID, or change the file system label.

To display the existing label for an unmounted XFS file system and then apply a new label:

sudo xfs_admin -l /dev/sdb
label = ""
sudo xfs_admin -L "VideoRecords" /dev/sdb
writing all SBs
new label = "VideoRecords"

Note:

The label can be a maximum of 12 characters in length.

To display the existing UUID and then generate a new UUID:

sudo xfs_admin -u /dev/sdb
UUID = cd4f1cc4-15d8-45f7-afa4-2ae87d1db2ed
sudo xfs_admin -U generate /dev/sdb
writing all SBs
new UUID = c1b9d5a2-f162-11cf-9ece-0020afc76f16

To clear the UUID altogether:

sudo xfs_admin -U nil /dev/sdb
Clearing log and setting UUID
writing all SBs
new UUID = 00000000-0000-0000-0000-000000000000

To disable and then re-enable lazy counters:

sudo xfs_admin -c 0 /dev/sdb
Disabling lazy-counters
sudo xfs_admin -c 1 /dev/sdb
Enabling lazy-counters

For more information, see the mkfs_admin(8) manual page.

Growing an XFS File System

Note:

You cannot grow an XFS file system that is currently unmounted.

There is currently no command to shrink an XFS file system.

You can use the xfs_growfs command to increase the size of a mounted XFS file system if there is space on the underlying devices to accommodate the change. The command does not have any effect on the layout or size of the underlying devices. If necessary, use the underlying volume manager to increase the physical storage that is available. For example, you can use the vgextend command to increase the storage that is available to an LVM volume group and lvextend to increase the size of the logical volume that contains the file system.

You cannot use the parted command to resize a partition that contains an XFS file system. You must instead recreate the partition with a larger size and restore its contents from a backup if you deleted the original partition or from the contents of the original partition if you did not delete it to free up disk space.

For example, to increase the size of /myxfs1 to 4 TB, assuming a block size of 4 KB:
sudo xfs_growfs -D 1073741824 /myxfs1
To increase the size of the file system to the maximum size that the underlying device supports, specify the -d option:
sudo xfs_growfs -d /myxfs1

For more information, see the xfs_growfs(8) manual page.

Freezing and Unfreezing an XFS File System

If you need to take a hardware-based snapshot of an XFS file system, you can temporarily stop write operations to it.

Note:

You do not need to explicitly suspend write operations if you use the lvcreate command to take an LVM snapshot.

To freeze and unfreeze an XFS file system, use the -f and -u options with the xfs_freeze command, for example:

sudo xfs_freeze -f /myxfs

You would then take a snapshot of file systems, after which you would type:

sudo xfs_freeze -u /myxfs

Note:

You can also use the xfs_freeze command with btrfs, ext3, and ext4 file systems.

For more information, see the xfs_freeze(8) manual page.

Setting Quotas on an XFS File System

The following list shows the mount options that you can specify to enable quotas on an XFS file system:

  • gqnoenforce

    Enable group quotas. Report usage, but do not enforce usage limits.

  • gquota

    Enable group quotas and enforce usage limits.

  • pqnoenforce

    Enable project quotas. Report usage, but do not enforce usage limits.

  • pquota

    Enable project quotas and enforce usage limits.

  • uqnoenforce

    Enable user quotas. Report usage, but do not enforce usage limits.

  • uquota

    Enable user quotas and enforce usage limits.

To show the block usage limits and the current usage in the myxfs file system for all users, use the xfs_quota command:

sudo xfs_quota -x -c 'report -h' /myxfs
User quota on /myxfs (/dev/vg0/lv0)
                        Blocks              
User ID      Used   Soft   Hard Warn/Grace   
---------- --------------------------------- 
root            0      0      0  00 [------]
guest           0   200M   250M  00 [------]

The following forms of the command display the free and used counts for blocks and inodes respectively in the manner of the df -h command:

sudo xfs_quota -c 'df -h' /myxfs
Filesystem     Size   Used  Avail Use% Pathname
/dev/vg0/lv0 200.0G  32.2M  20.0G   1% /myxfs
sudo xfs_quota -c 'df -ih' /myxfs
Filesystem   Inodes   Used   Free Use% Pathname
/dev/vg0/lv0  21.0m      4  21.0m   1% /myxfs

If you specify the -x option to enter expert mode, you can use subcommands such as limit to set soft and hard limits for block and inode usage by an individual user, for example:

sudo xfs_quota -x -c 'limit bsoft=200m bhard=250m isoft=200 ihard=250 guest' /myxfs

Of course, this command requires that you mounted the file system with user quotas enabled.

To set limits for a group on an XFS file system that you have mounted with group quotas enabled, specify the -g option to limit, for example:

sudo xfs_quota -x -c 'limit -g bsoft=5g bhard=6g devgrp' /myxfs

For more information, see the xfs_quota(8) manual page.

Setting Project Quotas

User and group quotas are supported by other file systems, such as ext4. The XFS file system additionally allows you to set quotas on individual directory hierarchies in the file system that are known as managed trees. Each managed tree is uniquely identified by a project ID and an optional project name. Being able to control the disk usage of a directory hierarchy is useful if you do not otherwise want to set quota limits for a privileged user (for example, /var/log) or if many users or groups have write access to a directory (for example, /var/tmp).

To define a project and set quota limits on it:
  1. Mount the XFS file system with project quotas enabled:

    sudo mount -o pquota device mountpoint

    For example, to enable project quotas for the /myxfs file system:

    sudo mount -o pquota /dev/vg0/lv0 /myxfs
  2. Define a unique project ID for the directory hierarchy in the /etc/projects file:

    echo project_ID:mountpoint/directory >> /etc/projects

    For example, to set a project ID of 51 for the directory hierarchy /myxfs/testdir:

    echo 51:/myxfs/testdir >> /etc/projects
  3. Create an entry in the /etc/projid file that maps a project name to the project ID:

    echo project_name:project_ID >> /etc/projid

    For example, to map the project name testproj to the project with ID 51:

    echo testproj:51 >> /etc/projid
  4. Use the project subcommand of xfs_quota to define a managed tree in the XFS file system for the project:

    sudo xfs_quota -x -c ’project -s project_namemountpoint

    For example, to define a managed tree in the /myxfs file system for the project testproj, which corresponds to the directory hierarchy /myxfs/testdir:

    sudo xfs_quota -x -c ’project -s testproj’ /myxfs
  5. Use the limit subcommand to set limits on the disk usage of the project:

    sudo xfs_quota -x -c ’limit -p arguments project_namemountpoint

    For example, to set a hard limit of 10 GB of disk space for the project testproj:

    sudo xfs_quota -x -c ’limit -p bhard=10g testproj’ /myxfs

For more information, see the projects(5), projid(5), and xfs_quota(8) manual pages.

Backing up and Restoring XFS File Systems

The xfsdump package contains the xfsdump and xfsrestore utilities. xfsdump examines the files in an XFS file system, determines which files need to be backed up, and copies them to the storage medium. Any backups that you create using xfsdump are portable between systems with different endian architectures. xfsrestore restores a full or incremental backup of an XFS file system. You can also restore individual files and directory hierarchies from backups.

Note:

Unlike an LVM snapshot, which immediately creates a sparse clone of a volume, xfsdump takes time to make a copy of the file system data.

You can use the xfsdump command to create a backup of an XFS file system on a device such as a tape drive, or in a backup file on a different file system. A backup can span multiple physical media that are written on the same device, and you can write multiple backups to the same medium. You can write only a single backup to a file. The command does not overwrite existing XFS backups that it finds on physical media. You must use the appropriate command to erase a physical medium if you need to overwrite any existing backups.

For example, the following command writes a level 0 (base) backup of the XFS file system, /myxfs to the device /dev/st0 and assigns a session label to the backup:

sudo xfsdump -l 0 -L "Backup level 0 of /myxfs `date`" -f /dev/st0 /myxfs

You can make incremental dumps relative to an existing backup by using the command:

sudo xfsdump -l level -L "Backup level level of /myxfs `date`" -f /dev/st0 /myxfs

A level 1 backup records only file system changes since the level 0 backup, a level 2 backup records only the changes since the latest level 1 backup, and so on up to level 9.

If you interrupt a backup by typing Ctrl-C and you did not specify the -J option (suppress the dump inventory) to xfsdump , you can resume the dump at a later date by specifying the -R option:

sudo xfsdump -R -l 1 -L "Backup level 1 of /myxfs `date`" -f /dev/st0 /myxfs

In this example, the backup session label from the earlier, interrupted session is overridden.

You use the xfsrestore command to find out information about the backups you have made of an XFS file system or to restore data from a backup.

The xfsrestore -I command displays information about the available backups, including the session ID and session label. If you want to restore a specific backup session from a backup medium, you can specify either the session ID or the session label.

For example, to restore an XFS file system from a level 0 backup by specifying the session ID:

sudo xfsrestore -f /dev/st0 -S c76b3156-c37c-5b6e-7564-a0963ff8ca8f /myxfs

If you specify the -r option, you can cumulatively recover all data from a level 0 backup and the higher-level backups that are based on that backup:

sudo xfsrestore -r -f /dev/st0 -v silent /myxfs

The command searches the archive looking for backups based on the level 0 backup, and prompts you to choose whether you want to restore each backup in turn. After restoring the backup that you select, the command exits. You must run this command multiple times, first selecting to restore the level 0 backup, and then subsequent higher-level backups up to and including the most recent one that you require to restore the file system data.

Note:

After completing a cumulative restoration of an XFS file system, you should delete the housekeeping directory that xfsrestore creates in the destination directory.

You can recover a selected file or subdirectory contents from the backup medium, as shown in the following example, which recovers the contents of /myxfs/profile/examples to /tmp/profile/examples from the backup with a specified session label:

sudo xfsrestore -f /dev/sr0 -L "Backup level 0 of /myxfs Sat Mar 2 14:47:59 GMT 2013" -s profile/examples /usr/tmp

Alternatively, you can interactively browse a backup by specifying the -i option:

sudo xfsrestore -f /dev/sr0 -i

This form of the command allows you browse a backup as though it were a file system. You can change directories, list files, add files, delete files, or extract files from a backup.

To copy the entire contents of one XFS file system to another, you can combine xfsdump and xfsrestore, using the -J option to suppress the usual dump inventory housekeeping that the commands perform:

sudo xfsdump -J - /myxfs | xfsrestore -J - /myxfsclone

For more information, see the xfsdump(8) and xfsrestore(8) manual pages.

Defragmenting an XFS File System

You can use the xfs_fsr command to defragment whole XFS file systems or individual files within an XFS file system. As XFS is an extent-based file system, it is usually unnecessary to defragment a whole file system, and doing so is not recommended.

To defragment an individual file, specify the name of the file as the argument to xfs_fsr.

sudo xfs_fsr pathname

If you run the xfs_fsr command without any options, the command defragments all currently mounted, writeable XFS file systems that are listed in /etc/mtab. For a period of two hours, the command passes over each file system in turn, attempting to defragment the top ten percent of files that have the greatest number of extents. After two hours, the command records its progress in the file /var/tmp/.fsrlast_xfs, and it resumes from that point if you run the command again.

For more information, see the xfs_fsr(8) manual page.

Checking and Repairing an XFS File System

Note:

If you have an Oracle Linux Premier Support account and encounter a problem mounting an XFS file system, send a copy of the /var/log/messages file to Oracle Support and wait for advice.

If you cannot mount an XFS file system, you can use the xfs_repair -n command to check its consistency. Usually, you would only run this command on the device file of an unmounted file system that you believe has a problem. The xfs_repair -n command displays output to indicate changes that would be made to the file system in the case where it would need to complete a repair operation, but will not modify the file system directly.

If you can mount the file system and you do not have a suitable backup, you can use xfsdump to attempt to back up the existing file system data, However, the command might fail if the file system's metadata has become too corrupted.

You can use the xfs_repair command to attempt to repair an XFS file system specified by its device file. The command replays the journal log to fix any inconsistencies that might have resulted from the file system not being cleanly unmounted. Unless the file system has an inconsistency, it is usually not necessary to use the command, as the journal is replayed every time that you mount an XFS file system.

sudo xfs_repair device

If the journal log has become corrupted, you can reset the log by specifying the -L option to xfs_repair.

Attention:

Resetting the log can leave the file system in an inconsistent state, resulting in data loss and data corruption. Unless you are experienced in debugging and repairing XFS file systems using xfs_db, it is recommended that you instead recreate the file system and restore its contents from a backup.

If you cannot mount the file system or you do not have a suitable backup, running xfs_repair is the only viable option unless you are experienced in using xfs_db.

xfs_db provides an internal command set that allows you to debug and repair an XFS file system manually. The commands allow you to perform scans on the file system, and to navigate and display its data structures. If you specify the -x option to enable expert mode, you can modify the data structures.

sudo xfs_db [-x] device

For more information, see the xfs_db(8) and xfs_repair(8) manual pages, and the help command within xfs_db.