Several months ago we talked here on the blog about reiser5, which is a filesystem maintained by Edward Shishkin and which stands out for including innovation in parallel scaling, which is carried out not at the block level, but through the filesystem.
Reiser5 is a substantially revised version of the ReiserFS file system, in which support for parallel scalable logical volumes is implemented, allowing efficient distribution of data across a logical volume.
Now, in more recent news, Eduard Shishkin announced new features that are being developed as part of the Reiser5 project.
Of recent innovations, It has been observed that the user can add a small high-performance block device (for example, NVRAM), called a proxy disk, to a relatively large logical volume made up of low-budget disks. This will give the impression that the entire volume is made up of the same high-performance devices as the 'proxy disk'.
The implemented method was based on a simple observation that, in practice, writing to a disc is not performed constantly and the curve I / O load It has a beak shape. In the interval between such "spikes", there is always the opportunity to dump data from a proxy disk by overwriting all data (or just part of it) on the "slow" main storage in the background. Therefore, the proxy unit is always ready to receive a new piece of data.
Initially, this technique (known as Burst Buffers) originated in the field of high-performance computing (HPC). But it turned out that it also demanded ordinary applications, especially those that place high demands on data integrity (this is usually a different kind of database). These changes are made atomically by any application in any file, namely:
- First a new file is created containing the modified data;
- Then this new file is written to disk using fsync (2);
- After that, the new file is renamed to the old one, which automatically frees the blocks occupied by old data.
All of these steps, to one degree or another, cause a significant decrease in performance on any file system. The situation improves if the new file is first written to a dedicated high-performance device, which is exactly what happens in the Burst Buffers file system.
In Reiser5, it is planned to optionally send not only new logic blockss from file to proxy disk, but also all dirty pages in general. Also, not only pages with data, but also with metadata, which is recorded in steps (2) and (3).
Proxy disks are supported in the context of regular work with logical volumes Reiser5 announced earlier in the year. That is, the aggregate system "proxy disk - primary storage" is an ordinary logical volume, with the only difference that the proxy disk takes precedence over other components of the volume in the disk addressing policy.
Adding a proxy disk to a logical volume is not accompanied by any data rebalancing, and its removal occurs in the same way as removing a normal disk. All proxy disk operations are atomic.
After adding a proxy disk, the total capacity of the logical volume increases by the capacity of this disk.
The proxy disk should be cleaned periodically, that is, dump data from it to main storage. After reaching Reiser5 beta stability, it is planned to make cleaning automatic (it will be handled by a special core thread). At this stage, the responsibility for cleaning rests with the user.
If there is no free space on the proxy disk, all data is automatically written to main storage. At the same time, the overall performance of the FS is reduced by default (due to the constant invocation of the confirmation procedure of all available transactions).