Chapter 4. Filesystem Maintenance

The chapter provides information on the following topics:

Filesystem Reorganization

Filesystems can become fragmented over time. When a filesystem is fragmented, blocks of free space are small and files have many extents. The xfs_fsr command reorganizes filesystems so that the layout of the extents is improved. This improves overall performance. See the xfs_fsr reference page for information on the xfs_fsr command.

Filesystem Corruption

Most often, a filesystem is corrupted because the system experienced a panic. This can be caused by system software failure, hardware failure, or human error (for example, pulling the plug). Another possible source of filesystem corruption is overlapping partitions.

There is no foolproof way to predict hardware failure. The best way to avoid hardware failures is to conscientiously follow recommended diagnostic and maintenance procedures.

Human error is probably the greatest single cause of filesystem corruption. To avoid problems, follow these rules closely:

  • Always shut down the system properly. Do not simply turn off power to the system. Use a standard system shutdown tool, such as the shutdown command.

  • Never remove a filesystem physically (pull out a hard disk) without first turning off power.

  • Never physically write-protect a mounted filesystem, unless it is mounted read-only.

  • Do not mount filesystems on dual-hosted disks on two systems simultaneously.

The best way to insure against data loss is to make regular, careful backups.

In some cases, XFS filesystem corruption, even on the root file system, can be repaired with the command xfs_repair. For more information about xfs_repair see “Checking XFS Filesystem Consistency With xfs_check and xfs_repair ”

Checking XFS Filesystem Consistency With xfs_check and xfs_repair

XFS filesystem consistency checking can be done using the xfs_check command and the dry-run mode of the xfs_repair command. The xfs_repair command is sometimes able to repair filesystem inconsistencies.


Note: If you suspect problems with the root file system, you should use a root disk or an alternate root disk to run xfs_repair.


Checking Filesystem Consistency

The filesystem consistency checking commands for XFS filesystems are xfs_check and xfs_repair -n. Unlike fsck, neither xfs_check nor xfs_repair are invoked automatically on system startup. They should be used only if you suspect a filesystem consistency problem.

Before running xfs_check or xfs_repair -n, the filesystem to be checked must be unmounted cleanly using normal system administration procedures (the umount command or system shutdown), not as a result of a crash or system reset. If the filesystem has not been unmounted cleanly, mount it and unmount it cleanly before running xfs_check or xfs_repair -n.

xfs_repair -n checks XFS filesystem consistency. xfs_repair -n performs a more complete check than xfs_check. The command line for xfs_repair -n is:

# xfs_repair -n device

device is the device file for a disk partition or logical volume that contains an XFS filesystem, for example:

# xfs_repair -n /dev/xscsi/pci02.02.0-1/target3/lun0/part1

The following example shows output with no consistency problems found:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        ...
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        ...
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem starting at / ... 
        - traversal finished ... 
        - traversing all unattached subtrees ... 
        - traversals finished ... 
        - moving disconnected inodes to lost+found ... 
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

xfs_check also checks XFS filesystem consistency. It can be used on filesystems with Extended Attributes (see the attr(1) reference page). (xfs_repair performs only limited checking of Extended Attributes.) The command line for xfs_check is:

# xfs_check device

If no consistency problems were found, xfs_check returns without displaying any messages.

Repairing Inconsistent Filesystems

xfs_repair (without the -n option) checks XFS filesystem consistency and, if problems are detected, corrects them if possible. The filesystem to be checked and repaired must have been unmounted cleanly using normal system administration procedures (the umount command or system shutdown), not as a result of a crash or system reset. If the filesystem has not been unmounted cleanly, mount it and unmount it cleanly before running xfs_repair.

The command line for xfs_repair when you want it to repair any inconsistencies it finds is:

# xfs_repair device

device is the device file for a disk partition or logical volume that contains an XFS filesystem. It must not be mounted.

The following is an example of the output you see from running xfs_repair on a clean filesystem:

# xfs_repair /dev/xscsi/pci02.02.0-1/target3/lun0/part1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        ...
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        ...
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ... 
        - traversal finished ... 
        - traversing all unattached subtrees ... 
        - traversals finished ... 
        - moving disconnected inodes to lost+found ... 
Phase 7 - verify and correct link counts...
done

For information about using xfs_repair on an inconsistent filesystem, see “Repairing XFS Filesystem Problems”.

Repairing XFS Filesystem Problems

The xfs_repair command checks XFS filesystem consistency and sometimes repairs problems that are found. This section describes the messages that you may see from xfs_repair and what to do if xfs_repair is not able to repair a filesystem.

Common Error Messages

Some common error messages from xfs_repair and the repairs that it performs are the following:

disconnected inode 242002, moving to lost+found 

xfs_repair found an inode that is in use, but is not connected to the filesystem. The inode is moved to the filesystem's lost+found directory. Its name is its inode number, in this example 242002. If the disconnected inode is a directory, the directory's subtree is preserved—all its child inodes are automatically moved with it, so the entire directory subtree moves to lost+found.

imap claims in-use inode 2444941 is free, correcting imap 

The inode allocation map in the filesystem behaves as if inode 2444941 is free, but the inode itself looks like it is still in use. xfs_repair corrects the inode map to say that the inode is in use.

entry references free inode 2444940 in shortform directory 2444922 junking entry “fb” in directory inode 2444922  

A directory entry points to an inode that xfs_repair has determined is actually free. xfs_repair junks the directory entry. The term shortform means a small directory. In larger directories, the entry deletion is usually a two-pass process. In this case, the second part of the message reads something like marking bad entry, marking entry to be deleted, or will clear entry.

resetting inode 241996 nlinks from 5 to 3  

xfs_repair detected a mismatch between the number of directory entries pointing to the inode (links) and the number of links recorded in the inode. It corrected the number (from 5 to 3 in this case).

cleared inode 2444926  

There was something wrong with the inode that was not correctable, so xfs_repair turned it into a zero-length free inode. This usually happens because the inode claims blocks that are used by something else or the inode itself is badly corrupted. Typically, the cleared inode message is preceded by one or more messages indicating why the inode needs to be cleared.

Error Messages When Files Are in lost+found

If xfs_repair has put files and directories in a filesystem's lost+found directory and you do not remove them, the next time you run xfs_repair it temporarily disconnects the inodes for those files and directories. They are reconnected before xfs_repair terminates. As a result of the disconnected inodes in lost+found, you see output like this:

Phase 1 - find and verify superblock...
Phase 2 - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        ...
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - clearing existing “lost+found” inode
        - deleting existing “lost+found” entry
        - check for inodes claiming duplicate blocks...
        - agno = 0
imap claims in-use inode 242000 is free, correcting imap
        - agno = 1
        - agno = 2
        ...
Phase 5 - rebuild AG headers and trees...
        - reset superblock counters...
Phase 6 - check inode connectivity...
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ... 
        - traversal finished ... 
        - traversing all unattached subtrees ... 
        - traversals finished ... 
        - moving disconnected inodes to lost+found ... 
disconnected inode 242000, moving to lost+found	
Phase 7 - verify and correct link counts...
done

In this example, inode 242000 was an inode that was moved to lost+found during a previous xfs_repair run. This run of xfs_repair found that the filesystem is consistent. If the lost+found directory had been empty, in phase 4 only the messages about clearing and deleting the lost+found directory would have appeared. The left-justified imap claims and disconnected inode messages appear (one pair of messages per inode) if there are inodes in the lost+found directory.

What to Do If xfs_repair Cannot Repair a Filesystem

If xfs_repair fails to repair the filesystem successfully, try giving the same xfs_repair command twice more; xfs_repair may be able to make more repairs on successive runs. If xfs_repair fails to fix the consistency problems in three tries, your next step depends upon where it failed:

  • If xfs_repair failed in phase 1, you must restore lost files from backups.

  • If xfs_repair failed in phase 2 or later, you may be able to restore files from the disk by backing up and restoring the files on the filesystem.

If xfs_repair failed in phase 2 or later, follow these steps:

  1. Mount the filesystem using mount –r (read-only).

  2. Make a filesystem backup with xfsdump.

  3. Use mkfs to a make new filesystem on the same disk partition or logical volume.

  4. Restore the files from the backup with xfsrestore.

See Chapter 6, “Backup and Recovery Procedures” for information about xfsdump and xfsrestore.

Mounting A Filesystem Without Log Recovery

If a filesystem is damaged to the extent that you are unable to mount the filesystem successfully in the standard fashion, you may be able to recover some of its data by mounting the filesystem with the -o norecover option of the mount command. This option mounts the filesystem without running log recovery. You must mount the filesystem as read-only when you use this option.