https://access.redhat.com/articles/22540

Configuring Write Barriers: File System Data Integrity over Power Failures in Red Hat Enterprise Linux

Updated 2014年十一月26日19:12 - 

English 

Issue

Data integrity over power failures is one of the critical features for many Red Hat Enterprise Linux (RHEL) users, especially for mission critical data. Specifically, users have the expectation that file systems can survive a power failure without requiring a long running file system check on recovery. This article is meant to give a broad overview of what Write Barriers are in Linux and how to configure and verify them in RHEL.

Environment

Red Hat Enterprise Linux 4

Red Hat Enterprise Linux 5

Resolution

Background

Setting up and verifying data integrity can be a confusing process. Before describing the specifics of how to configure write barriers, it is useful to define some common terms and give an overview of how application data moves from the application's data buffers to persistent storage. A simplified view of the data path is that application data moves from the application data buffer into the kernel's page cache after a "write()" system call. This application data will be referred to as user data in this article. File systems maintain metadata in addition to this user data for its internal book keeping, including things like allocation maps, file names, directory entries, file system super blocks and inodes. Think of the metadata as the structure of the file system which allows the kernel to keep track of where the user data is stored.

File systems take great care to update the metadata in a safe way. Journalling file systems bundled metadata updates into transactions which are sent to persistent storage in an ordered way: first the body of the transaction is sent to the storage device and then a commit block. If the transaction and its corresponding commit block are both present after a power failure, the file system assumes that the transaction will survive any power failure.

Things get complicated when storage devices add extra caches for data. For example, hardware RAID controllers often contain internal write caches which can have battery back up. Storage target devices like a local S-ATA or SAS drive also have write caches which can range  up to 32 or 64 MB in size with modern drives. High end arrays, like those from NetApp, IBM, Hitachi and EMC among others, also have large caches.

The key for file system data integrity is that the IO sent for any given transaction and its commit block must not be reordered or lost on power failure by any of these potential caches.

What are Write Barriers?

Write barriers are a 2.6 kernel mechanism that issue cache flush commands before and after the commit block used by journalling file systems. This is a brute force way to insure that the metadata is correctly ordered on persistent storage and can have a substantial performance impact for some applications when enabled. Specifically, applications which create and delete lots of small files and applications that are heavy "fsync()" users will often go much slower.

In RHEL 4 and RHEL 5, write barriers for ext3 are not enabled by default. To enable barriers for ext3, use the "-o barrier=1" mount option:

Raw

# mount -o barrier=1 /dev/sda1 /test

Note that barriers are enabled by default for ext4 and XFS. Barriers are enabled by default for GFS2 in RHEL6 and above, but are not supported by GFS or GFS2 in RHEL5 or earlier.

The kernel will automatically disable barriers when it detects devices that advertise themselves as having a write through cache or the system is configured with MD or LVM devices that do not properly handle write barriers. When barriers are disabled, a message is logged to /var/log/messages.  The messages may look like:

Raw

Jun 23 11:54:06 hostname kernel: JBD: barrier-based sync failed on dm-1-8 - disabling barriers

Note that write barriers do not help user data survive power failures in most configurations. To do this, applications must use explicit "fsync()" commands. Note that applications that use direct IO still need to use "fsync()" in order to flush data from the downstream IO devices.

Does My System Need Write Barrier Support?

Several configurations of systems do not need to use write barriers.

The first way to avoid data integrity issues is to make sure that there are no write caches that could lose data on power failures. For a simple server or desktop, say with one or more S-ATA drives off a local S-ATA controller like the Intel AHCI part, users can disable the write cache on the target S-ATA drives with the hdparm command:

Raw

# hdparm -W0 /dev/sda

The second type of system that can avoid using write barriers are those with hardware RAID controllers with battery backed write cache. If the system has this kind of hardware RAID card with battery backed write cache and its component drives have their write caches disabled, the controller will advertise itself as a write through cache which indicates that the kernel can trust it to persist data. A specific example would be a controller like the LSI megaraid SAS controller. To verify the state of the backend drives which are normally hidden by such controllers, users need to use vendor specific tools to query and manipulate the target drives. For LSI megaraid SAS, the command is the LSI MegaCli command:

Raw

# MegaCli64 -LDGetProp  -DskCache  -LAll -aALL

The above will show the state of the back end drives.

Raw

# MegaCli64 -LDSetProp -DisDskCache -Lall -aALL

The above disables the write cache for those drives.

As mentioned above, this command is very vendor (even HBA) specific.

SCSI disk devices' cache status can be referenced from /sys/block/(Block device name)/device/scsi_disk/(Address)/cache_type

Raw

# cat /sys/block/sda/device/scsi_disk/5:0:0:0/cache_type
write back

Note that hardware RAID cards recharge their batteries while the system is operational. If a system is powered off for an extended period of time, the batteries will lose their charge and the system will not be protected over a power failure.

The third major class of storage that do not need write barriers are high end arrays that have various ways of maintaining data across a power failure. There is no need to try and verify the state of the internal drives in external RAID storage.

Note that NFS clients do not need to enable write barriers since the data integrity is handled by the NFS server side. NFS servers should be configured and run on local file systems which do have barriers enabled as mentioned above.