This is a list of the conceptual information and step-by-step instructions in this chapter.
See "Troubleshooting File System Problems" in System Administration Guide, Volume II for information about fsck error messages.
The UFS file system relies on an internal set of tables to keep track of inodes used and available blocks. When these internal tables are not properly synchronized with data on a disk, inconsistencies result and file systems need to be repaired.
File systems can be damaged or become inconsistent because of abrupt termination of the operating system in these ways:
Power failure
Accidental unplugging of the system
Turning the system off without proper shutdown procedure
A software error in the kernel
File system corruption, while serious, is not common. When a system is booted, a file system consistency check is automatically performed (with the fsck program). Most of the time, this file system check repairs problems it encounters.
This chapter describes what the fsck program checks, and repairs and the fsck options. It also describes the following tasks:
How to modify the automatic checking done during booting
How to find out if a file system needs to be checked
How to check and repair a UFS file system interactively
How to restore a bad superblock
How to fix a UFS file system that fsck cannot repair
The fsck error messages are covered in "Troubleshooting File System Problems" in System Administration Guide, Volume II.
The fsck program places files and directories that are allocated but unreferenced in the lost+found directory. The inode number of each file is assigned as the name. If the lost+found directory does not exist, fsck creates it. If there is not enough space in the lost+found directory, fsck increases its size.
The fsck command uses a state flag, which is stored in the superblock, to record the condition of the file system. This flag is used by the fsck command to determine whether or not a file system needs to be checked for consistency. The flag is used by the /sbin/rcS script during booting and by the fsck command when run from a command line using the -m option. If you ignore the result from the -m option to fsck, all file systems can be checked regardless of the setting of the state flag.
The possible state flag values are described in Table 31-1.
Table 31-1 State Flag Values
State Flag Value |
Description |
---|---|
FSACTIVE |
When a file system is mounted and then modified, the state flag is set to FSACTIVE. The file system may contain inconsistencies. A file system will be marked as FSACTIVE before any modified metadata is written to the disk. When a file system is unmounted gracefully, the state flag is set to FSCLEAN. A file system with the FSACTIVE flag must be checked by fsck because it may be inconsistent. |
FSBAD |
If the root (/) file system is mounted when its state is not FSCLEAN or FSSTABLE, the state flag is set to FSBAD. The kernel will not change this file system state to FSCLEAN or FSSTABLE. If a root (/) file system is flagged FSBAD as part of the boot process, it will be mounted read-only. You can run fsck on the raw root device. Then remount the root (/) file system as read/write. |
FSCLEAN |
If the file system was unmounted properly, the state flag is set to FSCLEAN. Any file system with an FSCLEAN state flag is not checked when the system is booted. |
FSLOG |
If the file system was mounted with UFS logging, the state flag is set to FSLOG. Any file system with an FSLOG state flag is not checked when the system is booted. |
FSSTABLE |
The file system is (or was) mounted but has not changed since the last checkpoint (sync or fsflush) which normally occurs every 30 seconds. For example, the kernel periodically checks if a file system is idle and, if so, flushes the information in the superblock back to the disk and marks it FSSTABLE. If the system crashes, the file system structure is stable, but users may lose a small amount of data. File systems that are marked FSSTABLE can skip the checking before mounting. The mount(2) system call will not mount a file system for read/write if the file system state is not FSCLEAN or FSSTABLE. |
Table 31-2 shows how the state flag is modified by the fsck command, based on its initial state.
Table 31-2 How the State Flag is Modified by fsck
Inital State: Before fsck |
State After fsck |
||
---|---|---|---|
No Errors |
All Errors Corrected |
Uncorrected Errors |
|
unknown |
FSSTABLE |
FSSTABLE |
unknown |
FSACTIVE |
FSSTABLE |
FSSTABLE |
FSACTIVE |
FSSTABLE |
FSSTABLE |
FSSTABLE |
FSACTIVE |
FSCLEAN |
FSCLEAN |
FSSTABLE |
FSACTIVE |
FSBAD |
FSSTABLE |
FSSTABLE |
FSBAD |
FSLOG |
FSLOG |
FSLOG |
FSLOG |
This section describes what happens in the normal operation of a file system, what can go wrong, what problems fsck (the checking and repair utility) looks for, and how it corrects the inconsistencies it finds.
Every working day hundreds of files may be created, modified, and removed. Each time a file is modified, the operating system performs a series of file system updates. These updates, when written to the disk reliably, yield a consistent file system.
When a user program does an operation to change the file system, such as a write, the data to be written is first copied into an internal in-core buffer in the kernel. Normally, the disk update is handled asynchronously; the user process is allowed to proceed even though the data write may not happen until long after the write system call has returned. Thus at any given time, the file system, as it resides on the disk, lags behind the state of the file system represented by the in-core information.
The disk information is updated to reflect the in-core information when the buffer is required for another use or when the kernel automatically runs the fsflush daemon (at 30-second intervals). If the system is halted without writing out the in-core information, the file system on the disk will be in an inconsistent state.
A file system can develop inconsistencies in several ways. The most common causes are operator error and hardware failures.
Problems may result from an unclean halt, if a system is shut down improperly, or when a mounted file system is taken offline improperly. To prevent unclean halts, the current state of the file systems must be written to disk (that is, "synchronized") before halting the CPU, physically taking a disk pack out of a drive, or taking a disk offline.
Inconsistencies can also result from defective hardware. Blocks can become damaged on a disk drive at any time, or a disk controller can stop functioning correctly.
This section describes the kinds of consistency checks the fsck applies to these UFS file system components: superblock, cylinder group blocks, inodes, indirect blocks, and data blocks.
The superblock stores summary information, which is the most commonly corrupted item in a UFS file system. Each change to the file system inodes or data blocks also modifies the superblock. If the CPU is halted and the last command is not a sync command, the superblock will almost certainly be corrupted.
The superblock is checked for inconsistencies in:
File system size
Number of inodes
Free-block count
Free-inode count
The file system size must be larger than the number of blocks used by the superblock and the number of blocks used by the list of inodes. The number of inodes must be less than the maximum number allowed for the file system. The file system size and layout information are the most critical pieces of information for fsck. Although there is no way to actually check these sizes, because they are statically determined when the file system is created, fsck can check that the sizes are within reasonable bounds. All other file system checks require that these sizes be correct. If fsck detects corruption in the static parameters of the primary superblock, it requests the operator to specify the location of an alternate superblock.
Free blocks are stored in the cylinder group block maps. fsck checks that all the blocks marked as free are not claimed by any files. When all the blocks have been accounted for, fsck checks to see if the number of free blocks plus the number of blocks claimed by the inodes equal the total number of blocks in the file system. If anything is wrong with the block allocation maps, fsck rebuilds them, leaving out blocks already allocated.
The summary information in the superblock contains a count of the total number of free blocks within the file system. The fsck program compares this count to the number of free blocks it finds within the file system. If the counts do not agree, fsck replaces the count in the superblock with the actual free-block count.
The summary information in the superblock contains a count of the free inodes within the file system. The fsck program compares this count to the number of free inodes it finds within the file system. If the counts do not agree, fsck replaces the count in the superblock with the actual free inode count.
The list of inodes is checked sequentially starting with inode 2 (inode 0 and inode 1 are reserved). Each inode is checked for inconsistencies in:
Format and type
Link count
Duplicate block
Bad block numbers
Inode size
Each inode contains a mode word, which describes the type and state of the inode. Inodes may be one of six types:
Inodes may be in one of three states:
Allocated
Unallocated
Partially allocated
When the file system is created, a fixed number of inodes are set aside, but they are not allocated until they are needed. An allocated inode is one that points to a file. An unallocated inode does not point to a file and, therefore, should be empty. The partially allocated state means that the inode is incorrectly formatted. An inode can get into this state if, for example, bad data is written into the inode list because of a hardware failure. The only corrective action fsck can take is to clear the inode.
Each inode contains a count of the number of directory entries linked to it. The fsck program verifies the link count of each inode by examining the entire directory structure, starting from the root directory, and calculating an actual link count for each inode.
Discrepancies between the link count stored in the inode and the actual link count as determined by fsck may be of three types:
The stored count is not 0 and the actual count is 0.
This condition can occur if no directory entry exists for the inode. In this case, fsck puts the disconnected file in the lost+found directory.
The stored count is not 0 and the actual count is not 0, but the counts are unequal.
This condition can occur if a directory entry has been added or removed but the inode has not been updated. In this case, fsck replaces the stored link count with the actual link count.
The stored count is 0 and the actual count is not 0.
In this case fsck changes the link count of the inode to the actual count.
Each inode contains a list, or pointers to lists (indirect blocks), of all the blocks claimed by the inode. Because indirect blocks are owned by an inode, inconsistencies in indirect blocks directly affect the inode that owns the indirect block.
The fsck program compares each block number claimed by an inode to a list of allocated blocks. If another inode already claims a block number, the block number is put on a list of duplicate blocks. Otherwise, the list of allocated blocks is updated to include the block number.
If there are any duplicate blocks, fsck makes a second pass of the inode list to find the other inode that claims each duplicate block. (A large number of duplicate blocks in an inode may be caused by an indirect block not being written to the file system.) It is not possible to determine with certainty which inode is in error. The fsck program prompts you to choose which inode should be kept and which should be cleared.
The fsck program checks each block number claimed by an inode to see that its value is higher than that of the first data block and lower than that of the last data block in the file system. If the block number is outside this range, it is considered a bad block number.
Bad block numbers in an inode may be caused by an indirect block not being written to the file system. The fsck program prompts you to clear the inode.
Each inode contains a count of the number of data blocks that it references. The number of actual data blocks is the sum of the allocated data blocks and the indirect blocks. fsck computes the number of data blocks and compares that block count against the number of blocks the inode claims. If an inode contains an incorrect count, fsck prompts you to fix it.
Each inode contains a 64-bit size field. This field shows the number of characters (data bytes) in the file associated with the inode. A rough check of the consistency of the size field of an inode is done by using the number of characters shown in the size field to calculate how many blocks should be associated with the inode, and then comparing that to the actual number of blocks claimed by the inode.
Indirect blocks are owned by an inode. Therefore, inconsistencies in an indirect block affect the inode that owns it. Inconsistencies that can be checked are:
Blocks already claimed by another inode
Block numbers outside the range of the file system
The consistency checks are also performed for indirect blocks.
An inode can directly or indirectly reference three kinds of data blocks. All referenced blocks must be of the same kind. The three types of data blocks are:
Plain data blocks
Symbolic-link data blocks
Directory data blocks
Plain data blocks contain the information stored in a file. Symbolic-link data blocks contain the path name stored in a symbolic link. Directory data blocks contain directory entries. fsck can check the validity only of directory data blocks.
Directories are distinguished from regular files by an entry in the mode field of the inode. Data blocks associated with a directory contain the directory entries. Directory data blocks are checked for inconsistencies involving:
Directory inode numbers pointing to unallocated inodes
Directory inode numbers greater than the number of inodes in the file system
Incorrect directory inode numbers for "." and ".." directories
Directories disconnected from the file system
If the inode number in a directory data block points to an unallocated inode, fsck removes the directory entry. This condition can occur if the data blocks containing the directory entries are modified and written out but the inode does not get written out. This condition can occur if the CPU is halted without warning.
If a directory entry inode number points beyond the end of the inode list, fsck removes the directory entry. This condition can occur when bad data is written into a directory data block.
The directory inode number entry for "." must be the first entry in the directory data block. It must reference itself; that is, its value must be equal to the inode number for the directory data block.
The directory inode number entry for ".." must be the second entry in the directory data block. Its value must be equal to the inode number of the parent directory (or the inode number of itself if the directory is the root directory).
If the directory inode numbers for "." and ".." are incorrect, fsck replaces them with the correct values. If there are multiple hard links to a directory, the first one found is considered the real parent to which ".." should point. In this case, fsck recommends you have it delete the other names.
The fsck program checks the general connectivity of the file system. If a directory is found that is not linked to the file system, fsck links the directory to the lost+found directory of the file system. (This condition can occur when inodes are written to the file system but the corresponding directory data blocks are not.)
Data blocks associated with a regular file hold the contents of the file. fsck does not attempt to check the validity of the contents of a regular file's data blocks.
During boot up, a preliminary check on each file system to be mounted from a hard disk is run using the boot script /sbin/rcS, which checks the root (/) and /usr file systems. The other rc shell scripts then use the fsck command to check each additional file system sequentially. They do not check file systems in parallel. File systems are checked sequentially during booting even if the fsck pass numbers are greater than one.
When you run the commands for checking and mounting file systems without specifying a file system directly, the commands step through the file system table (/etc/vfstab) using the information specified in the various fields. The fsck pass field specifies information for file system checking. The mount at boot field specifies information for mounting the file system at boot time.
When you create new file systems, add entries to /etc/vfstab indicating whether they are to be checked and mounted at boot time. See Chapter 28, Mounting and Unmounting File Systems (Tasks) for more information about adding entries to the /etc/vfstab file.
Information in the /etc/vfstab file is specific for the slices and file systems for each system. Here is an example of an /etc/vfstab file:
$ more /etc/vfstab #device device mount FS fsck mount mount #to mount to fsck point type pass at boot options #/dev/dsk/c1d0s2 /dev/rdsk/c1d0s2 /usr ufs 1 yes - /proc - /proc proc - no - fd - /dev/fd fd - no - swap - /tmp tmpfs - yes - /dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 / ufs 1 no - /dev/dsk/c0t0d0s1 - - swap - no - /dev/dsk/c0t0d0s6 /dev/rdsk/c0t0d0s6 /usr ufs 2 no - /dev/dsk/c0t0d0s7 /dev/rdsk/c0t0d0s7 /opt ufs 3 yes - pluto:/export/svr4/man - /usr/man nfs no yes - $ |
Table 31-3 describes the function of the fsck pass field.
Table 31-3 The fsck pass Field
If the fsck pass Field is Set To ... |
Then ... |
Comments |
---|---|---|
- (hyphen) |
The generic fsck command will not check the file system regardless of the state of the file system. |
Use a hyphen for read-only file systems, remote file systems, or pseudo file systems, such as /proc, to which checking does not apply. |
0 or greater |
The file system specific fsck command is called. |
When the value is greater for UFS file systems, the file system is not checked. |
1 or greater and fsck -o p is used |
The file system specific fsck automatically checks UFS file systems in parallel. |
The value can be any number greater than 1. |
In preen mode, fsck allows only one active file system check per disk, starting a new check only after the previous one is completed. fsck automatically uses the major and minor numbers of the devices on which the file systems reside to determine how to check file systems on different disks at the same time.
When the fsck pass number is 1, file systems are checked sequentially, in the order they appear in the /etc/vfstab file. Usually, the root (/) file system has the fsck pass set to 1.
fsck does not use the fsck pass number to determine the sequence of file system checking.
Edit /etc/vfstab entries in the fsck pass field, and save the changes.
The next time the system is booted, the new values are used.
You may need to interactively check file systems:
When they cannot be mounted
When they develop problems while in use
When an in-use file system develops inconsistencies, error messages may be displayed in the console window or the system may crash.
Before using fsck, you may want to refer to "Syntax and Options for the fsck Command" and "Troubleshooting File System Problems" in System Administration Guide, Volume II for more information.
Check the file system.
# fsck -m /dev/rdsk/device-name |
In this command, the state flag in the superblock of the file system you specify is checked to see whether the file system is clean or requires checking.
If you omit the device argument, all the UFS file systems listed in /etc/vfstab with a fsck pass value greater than 0 are checked.
The following example shows that the file system needs checking.
# fsck -m /dev/rdsk/c0t0d0s6 ** /dev/rdsk/c0t0d0s6 ufs fsck: sanity check: /dev/rdsk/c0t0d0s6 needs checking |
Become superuser.
Unmount the local file systems except root (/) and /usr.
# umountall -l |
Check the file system.
# fsck |
All file systems in the /etc/vfstab file with entries in the fsck pass field greater than zero are checked. You can also specify the mount point directory or /dev/rdsk/device-name as arguments to fsck. Any inconsistency messages are displayed. See "Troubleshooting File System Problems" in System Administration Guide, Volume II for information about how to respond to the error message prompts to interactively check one or more UFS file systems.
If you corrected any errors, type fsck and press Return.
fsck may not be able to fix all errors in one execution. If you see the message FILE SYSTEM STATE NOT SET TO OKAY, run the command again. If that does not work, see "How to Fix a UFS File System fsck Cannot Repair".
Rename and move any files put in the lost+found directory.
Individual files put in the lost+found directory by fsck are renamed with their inode numbers. If possible, rename the files and move them where they belong. You may be able to use the grep command to match phrases with individual files and the file command to identify file types. When whole directories are dumped into lost+found, it is easier to figure out where they belong and move them back.
The following example checks /dev/rdsk/c0t0d0s6 and corrects the incorrect block count.
# fsck /dev/rdsk/c0t0d0s6 checkfilesys: /dev/rdsk/c0t0d0s6 ** Phase 1 - Check Block and Sizes INCORRECT BLOCK COUNT I=2529 (6 should be 2) CORRECT? y ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Cylinder Groups 929 files, 8928 used, 2851 free (75 frags, 347 blocks, 0.6% fragmentation) /dev/rdsk/c0t0d0s6 FILE SYSTEM STATE SET TO OKAY ***** FILE SYSTEM WAS MODIFIED ***** |
The preen option to fsck (fsck -o p) checks UFS file systems and automatically fixes the simple problems that normally result from an unexpected system halt. It exits immediately if it encounters a problem that requires operator intervention. The preen option also permits parallel checking of file systems.
You can run fsck with the -o p option to preen the file systems after an unclean halt. In this mode, fsck does not look at the clean flag and does a full check. These actions are a subset of the actions that fsck takes when it runs interactively.
Unmount the file system.
# umount mount-point |
Check a UFS file system with the preen option.
# fsck -o p /dev/rdsk/device-name |
You can preen individual file systems by using mount-point or /dev/rdsk/device-name as arguments to fsck.
The following example preens the /usr file system.
# fsck -o p /usr |
When the superblock of a file system becomes damaged, you must restore it. fsck tells you when a superblock is bad. Fortunately, redundant copies of the superblock are stored within a file system. You can use fsck -o b to replace the superblock with one of the copies.
Become superuser.
Change to a directory outside the damaged file system.
Unmount the file system.
# umount mount-point |
Be sure to use the -N option with newfs in the next step. If you omit the -N option, you will create a new, empty file system.
Display the superblock values with the newfs -N command.
# newfs -N /dev/rdsk/device-name |
The output of this command displays the block numbers that were used for the superblock copies when newfs created the file system.
Provide an alternative superblock with the fsck command.
# fsck -F ufs -o b=block-number /dev/rdsk/device-name |
fsck uses the alternative superblock you specify to restore the primary superblock. You can always try 32 as an alternative block, or use any of the alternative blocks shown by newfs -N.
The following example restores the superblock copy 5264 for the /files7 file system:
# cd / # umount /files7 # newfs -N /dev/rdsk/c0t3d0s7 /dev/rdsk/c0t3d0s7: 163944 sectors in 506 cylinders of 9 tracks, 36 sectors 83.9MB in 32 cyl groups (16 c/g, 2.65MB/g, 1216 i/g) super-block backups (for fsck -b #) at: 32, 5264, 10496, 15728, 20960, 26192, 31424, 36656, 41888, 47120, 52352, 57584, 62816, 68048, 73280, 78512, 82976, 88208, 93440, 98672, 103904, 109136, 114368, 119600, 124832, 130064, 135296, 140528, 145760, 150992, 156224, 161456, # fsck -F ufs -o b=5264 /dev/rdsk/c0t3d0s7 Alternate superblock location: 5264. ** /dev/rdsk/c0t3d0s7 ** Last Mounted on ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 36 files, 867 used, 75712 free (16 frags, 9462 blocks, 0.0% fragmentation) /dev/rdsk/c0t3d0s7 FILE SYSTEM STATE SET TO OKAY ***** FILE SYSTEM WAS MODIFIED ***** # |
If the superblock in the root (/) file system becomes damaged and you cannot boot the system, reinstall /kernel/unix and rebuild the root (/) file system with newfs. Because a superblock is created by the newfs command, you do not need to restore it.
Sometimes you need to run fsck a few times to fix a file system because problems corrected on one pass may uncover other problems not found in earlier passes. fsck does not keep running until it comes up clean, so you must rerun it manually.
Pay attention to the information displayed by fsck. It may help you fix the problem. For example, the messages may point to a bad directory. If you delete the directory, you may find that fsck runs cleanly.
If fsck still cannot repair the file system, you can try to use the fsdb, ff, clri, and ncheck commands to figure out and fix what is wrong. See fsdb(1M), ff(1M), clri(1M), and ncheck(1M) for information about how to use these commands. You may, ultimately, need to re-create the file system and restore its contents from backup media. See Chapter 35, Restoring Files and File Systems (Tasks) for information about restoring complete file systems.
If you cannot fully repair a file system but you can mount it read-only, try using cp, tar, or cpio to retrieve all or part of the data from the file system.
If hardware disk errors are causing the problem, you may need to reformat and divide the disk into slices again before re-creating and restoring file systems. Hardware errors usually display the same error again and again across different commands. The format command tries to work around bad blocks on the disk. If the disk is too severely damaged, however, the problems may persist, even after reformatting. See format(1M) for information about using the format command. See Chapter 23, SPARC: Adding a Disk (Tasks) or Chapter 24, x86: Adding a Disk (Tasks) for information about installing a new disk.
The fsck command checks and repairs inconsistencies in file systems. It has four options:
Checks only whether a file system can be mounted (fsck -m)
Interactively asks for confirmation before making repairs (fsck)
Assumes yes or no response for all repairs (fsck -y)
Noninteractively preens the file system, fixing all expected (innocuous) inconsistencies, but exiting when a serious problem is encountered (fsck -o p)
The fsck command has two components: a generic component and a component specific to each type of file system. The generic commands apply to most types of file systems, while the specific commands apply to only one type of file system. You should always use the generic command, which calls the file system-specific command, as needed.
Usually, you must be superuser to run fsck. You can run the fsck command without being superuser; but to make repairs, you should unmount the file system and you must have read permission for the raw device file for the slice (a potential security hole).
The generic fsck command goes through /etc/vfstab to see what file systems to check. It runs the appropriate file system-specific fsck command on each file system listed, except those excluded by an fsck pass number of - or 0 (UFS only).
The generic fsck command has the following syntax:
/usr/sbin/fsck [-F type] [-V] [-m] [special] /usr/sbin/fsck [-F type] [-V] -[y|Y]|[n|N] [-o specific-options][special] |
Table 31-4 describes the options and arguments to the generic fsck command.
Table 31-4 The fsck Command Options and Arguments