Chapter 2. Planning an XFS Filesystem

The following subsections discuss preparation for and choices you can make when creating an XFS filesystem.

Choosing the Filesystem Block Size

XFS allows you to choose the logical block size for each filesystem by using the -b size= option of the mkfs command. (Physical disk blocks remain 512 bytes.)

For XFS filesystems on disk partitions and logical volumes and for the data subvolume of filesystems on logical volumes, the block size guidelines are as follows:

  • The minimum block size is 512 bytes. Small block sizes increase allocation overhead which decreases filesystem performance, but in general, the recommended block size for filesystems under 100 MB and for filesystems with many small files is 512 bytes. The filesystem block size must be a power of two.

  • The default block size is 4096 bytes (4K). This is the recommended block size for filesystems over 100 MB.

  • The maximum block size is the page size of the kernel, which is 4K on x86 systems and is configurable on IA-64 systems. Because large block sizes can waste space, in general block sizes should not be larger than 4096 bytes (4K).

  • For news servers, it is recommended that you use a filesystem block size of 512 bytes and a directory block size of 4096 bytes.

Block sizes are specified in bytes in decimal (default), octal (prefixed by 0), or hexadecimal (prefixed by 0x or 0X). If the number has the suffix “k,” it is multiplied by 1024.

Choosing the Filesystem Directory Block Size

An XFS file system allows you to select a logical block size for the filesystem directory that is greater than the logical block size of the filesystem by using the -n option of the mkfs command. This allows you to choose a filesystem block size to match the distribution of data file sizes without adversely affecting directory operation performance. Using this option could improve performance for a filesystem with many small files, such as a news or mail filesystem. In this case, the filesystem logical block size could be small (512, 1K, or 2K bytes) and the logical block size for the filesystem directory could be large (4K or 8K bytes); this can improve the performance of directory lookups because the tree storing the index information has larger blocks and less depth.

You should consider setting a logical block size for a filesystem directory that is greater than the logical block size for the filesystem if you are supporting an application that reads directories (with the  readdir(3C) or getdents(2) system calls) many times in relation to how much it creates and removes files. Using a small filesystem block size saves on disk space and on I/O throughput for the small files.

In a Linux XFS file system, the data needed to perform a readdir operation is segregated from the index information. Directory data blocks can be “read-ahead” in a readdir. Performing read-ahead improves the readdir performance dramatically. Because the data needed for a readdir operation and index information are separate in a directory block, the offset in a directory is limited to 32 bits.

Choosing the Log Type and Size

Each XFS filesystem has a log that contains filesystem journaling records. This log requires dedicated disk space. This disk space doesn't show up in listings from the df command, nor can you access it with a filename.

The location of the disk space depends on the type of log you choose. The two types of logs are:

External 

When you specify that log records are maintained in a dedicated log device, the log is called an external log. You use the -l option of the mkfs command to specify this device.

Internal 

When an XFS filesystem is created on a disk partition or logical volume that does not have a log subvolume, log records are put into a dedicated portion of the disk partition (or data subvolume) that contains user files. This type of log is called an internal log.

The guidelines for choosing the log type are as follows:

  • If you want the data and log records to be on different partitions, use an external log.

  • If you want the data and the log subvolume of a logical volume to be on different partitions or to use different subvolume configurations, use an external log.

  • If you want the log subvolume of a logical volume to be striped independently from the data subvolume, you must use an external log.

  • If you are making the XFS filesystem on a logical volume that has a log subvolume, you must specify this log subvolume as the log device with the -l option of the mkfs command in order.

The amount of disk space that should be allocated for the log is a function of how the filesystem is used. The amount of disk space required for log records is proportional to the transaction rate and the size of transactions on the filesystem, not the size of the filesystem. Larger block sizes result in larger transactions. Transactions from directory updates (for example, the mkdir and rmdir commands and the create() and unlink() system calls) cause more log data to be generated.

You can choose the amount of disk space to dedicate to the log (called the log size). The minimum log size for a filesystem is enforced by the size of the largest transaction, which depends on the filesystem and directory block sizes. The maximum log size is 64k blocks or 128 MB, whichever is smaller (this will depend on the block size).

For internal logs, the size of the log is specified with the -l size= option when you create the filesystem with the mkfs command.The default log size grows with the size of the filesystem up to the maximum log size, 128 megabytes, on a 1 terabyte filesystem. The log size is specified in bytes as described in “Choosing the Filesystem Block Size”, or as a multiple of the filesystem block size by using the suffix “b.”

For a filesystem which is contained in a striped logical volume, the default internal log size is rounded up to a multiple of the stripe unit size. In this case, the user-specified size value must be a multiple of the stripe unit size.

For external logs, the default size of the log is the same as the size of the log device. You can specify the size of the log with the -l size= option of the mkfs command, but any additional space in the log device cannot be used. You may find that you need to repartition a disk to create a properly sized log subvolume.

For filesystems with a very high transaction activity, a large log size is recommended. You should avoid making your log too large, however, since a large log can increase filesystem mount time after a crash.

Choosing Allocation Groups and Stripe Units

The data section of an XFS filesystem is divided into allocation groups. You can select the number of allocation groups when you create an XFS filesystem or, alternatively, you can select the size of an allocation group. The larger the number of allocation groups, the more parallelism can be achieved when allocating blocks and inodes. You should avoid selecting a very large number of allocation groups or an allocation group size that will yield a very large number of allocation groups; a large number of allocation groups causes an unreasonable amount of CPU time to be used when the filesystem is close to full.

The minimum allocation group size is 16MB; the maximum size is just under 4 GB.

The default number of allocation groups is 8, unless the filesystem is smaller than 128 MB or larger than 8 GB. When the filesystem is smaller than 128 MB, the default number of allocation groups is less than 8, since the minimum allocation group size is 16MB. In this case, the data section, by default, will be divided into as many allocation groups as possible that are at least 16MB. When the filesystem is larger than 8GB, but smaller than 64GB, the default number of allocation groups is greater than 8, with each allocation group approximately 1 GB in size. When the filesystem is larger than 64GB, the default number of allocation groups is still greater than 8, but the allocation group size is 4GB.

XFS allows you to select the stripe unit for a RAID device or stripe volume. This ensures that data allocations, inode allocations, and the internal log will be aligned along stripe units when the end of file is extended and the file size is larger than 512KB. You specify stripe units in 512-byte block units or in bytes. See the mkfs.xfs(1M) man page for information on specifying stripe units.

When you specify a stripe unit, you also specify a stripe width. You specify a stripe width in 512-byte block units or in bytes. The stripe width must be a multiple of the stripe unit. The stripe width will be the preferred I/O size returned in the stat() system call. See the mkfs_xfs(1M) man page for information on specifying stripe width.

When used in conjunction with the -b (block size) option of the mkfs command, you can use the -d su= and -d sw= options to specify the stripe unit and stripe width in filesystem blocks.

For a RAID device, the default stripe unit is 0, indicating that the feature is disabled. It is prudent of the sysadmin to configure the stripe unit and width sizes of RAID devices. This should be done to avoid unexpected performance anomalies caused by the filesystem doing non-optimal I/O operations to the RAID unit. For example, if a block write is not aligned on a RAID stripe unit boundary and is not a full stripe unit, the RAID will be forced to do a read/modify/write cycle to write the data. This can have a significant performance impact. By setting the stripe unit size properly, XFS will avoid unaligned accesses.

For a striped volume, the stripe unit that was specified when the volume was created is provided by default.

Disk Repartitioning

Many system administrators may find that they want or need to repartition disks when they switch to XFS filesystems and/or logical volumes. Some of the reasons to consider repartitioning are:

  • If the system disk has separate partitions for root and usr, var, and home filesystems, the root filesystem may be running out of space. Repartitioning is a way to increase the space in root (at the expense of the size of usr, var, or home) or to solve the problem by combining root and usr, var, or home into a single partition.

  • If you plan to use logical volumes, you may want to put the XFS log into a small subvolume. This requires disk repartitioning to create a small partition for the log subvolume.

  • If you plan to use logical volumes, you may want to repartition to create disk partitions of equal size that can be striped or plexed.