Linux 6.2 will include improvements to RAID5 and RAID6 in Btrfs

Linux Kernel

Linux Kernel

It was recently revealed that improvements to Btrfs were proposed for inclusion in the Linux 6.2 kernel to fix the write hole issue in RAID 5/6 implementation.

The essence of the problem boils down to the fact that if a crash occurs during recording, it is initially impossible to understand which block on which of the RAID devices was written correctly, and on which the recording was not completed.

If you try to rebuild a RAID in this situation, the blocks corresponding to the subscribed blocks may become corrupted because the state of the RAID blocks is out of sync. This problem occurs in any RAID1/5/6 array where no special measures are taken to combat this effect.

In a RAID implementation like RAID1 in btrfs, this problem is solved by using checksums on both copies, if there is a mismatch, the data is simply restored from the second copy. This approach also works if any device starts giving bad data instead of failing completely.

However, in the case of RAID5/6, the file system does not store checksums for parity blocks - in a normal situation, the correctness of the blocks is checked by the fact that they are all equipped with a checksum, and the parity block can be recreated from the data. However, in the case of partial recording, this approach may not work in certain situations. In this case, when restoring the array, it is possible that the blocks left in the incomplete record are restored incorrectly.

In the case of btrfs, this problem is more relevant if the write that occurs is smaller than the stripe. In this case, the file system must perform a read-modify-write (RMW) operation.

If it encounters write-in-progress blocks, the RMW operation can cause corruption that will go undetected, regardless of checksums. The developers have made changes in which the RMW operation verifies the checksum of the blocks before performing this operation, and if necessary, the data recovery also performs a checksum verification after writing.

Unfortunately, in a situation where an incomplete fringe (RMW) is written, this creates additional overhead to compute the checksums, but significantly increases reliability. For RAID6, such logic is not ready yet,

In addition, we can note the recommendations on the use of RAID5/6 from the developers, the essence of which is that in Btrfs the profile for storing metadata and data may differ. In this case, you can use the RAID1 (mirror) or even RAID1C3 (3 copies) profile for metadata, and RAID5 or RAID6 for data.

This ensures reliable metadata protection and the absence of a "write hole" on the one hand, and more efficient use of space, typical of RAID5/6, on the other. This prevents metadata corruption and data corruption can be corrected.

As well It can be noted that for SSDs on Btrfs in kernel 6.2, la asynchronous execution of the "discard" operation (mark freed blocks that can no longer be physically stored) will be on by default.

The advantage of this mode is high performance due to efficient grouping of discard operations in a queue and post-processing of the queue by a background handler, so normal FS operations are not slowed down as is the case with synchronous "discard" as blocks are freed, and the SSD can make better decisions. On the other hand, you will no longer need to use utilities like fstrim, since all available blocks will be erased in the FS without the need for additional scanning and without slowing down operations.

Finally, if you are interested in being able to know more about it, you can consult the details in the following link.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: AB Internet Networks 2008 SL
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.