Advantages of using ZFS with rdiff-backup

Discover a file system that is perfectly suited for backups

In the past year, Linux has welcomed a newcomer to the file system area. With ext4 still being the de facto choice for desktops and most server installations, and with Btrfs failing to deliver the promise of a next generation file system, ZFS is delivering this long-awaited feature.

ZFS is not a “new” file system in the UNIX world. It has a relatively long history. First introduced in 2005 within OpenSolaris operating system, then ported to FreeBSD in 2008, and it was finally ported to Linux in 2013. Since then, it gained popularity in the Linux world, making it into the Ubuntu distribution as a viable option for a default file system in 2016.

Advantages

ZFS has many advantages compared to other file systems and we will focus on the various benefits for rdiff-backup, rdiffweb and Minarca below.

Data integrity

Probably the most critical feature for a file system storing data backups is data integrity! We cannot afford to backup data on a file system and be unsure if we can retrieve the same data years later. ZFS’s main difference in comparison to other file systems is its capability to validate the data integrity using checksum for data and metadata. ZFS has multiple mechanisms to not only validate the data integrity but also to repair it. For example, if the data is found to be corrupted on one disk, ZFS may try to retrieve the data from other disks when you are using raidz layout or mirroring.

Simplifies administration

Perhaps the biggest distinction is that ZFS unifies the volume manager and the file system under a single layer. By doing so, ZFS has the full knowledge of the physical and logical arrangement of the data. This simplifies the administration of the storage as follow: you provide the disks to ZFS as a pool of devices, then you may start creating datasets directly! No need to create partitions, reformat the disks, create filesystems, create volumes, etc.

This also simplifies the maintenance. If you need to expand the storage, you may add more disks to the same pool. The extra space could then be used by existing datasets. You could also expand the storage by replacing each disk by a larger one.

For rdiff-backup, this means you don’t have to worry about storage scalability. Furthermore, you don’t need to worry about maintenance downtime as all operational maintenance can be done online.

Quick snapshots

ZFS snapshots are so cheap to put in place that this is a no-brainer, you must take advantage of them! Take a look at zfs-auto-snapshot. It creates daily, weekly, and monthly snapshots of your data without any performance degradation.

For rdiff-backup, it gives you the chance to rollback to a previous repository. This is very useful if you are doing manual operations on the rdiff-backup repository like using rdiff-backup-delete script to remove files and folders from the history. When using a snapshot, you could always rollback to a safe point.

Offsite replications

Depending of your backup strategies, you may want to have an offsite backup replica. While this can be achieved rater easily with rsync or another similar synchronization tool, those tools are not optimized to do this type of job. Rsync is comparing files and sending delta to the remote system. This process may require a lot of CPU and network bandwidth.

With ZFS you may optimize the offsite replication process by sending the delta between the snapshots. By using a tool like syncoid, it might be as simple as:

syncoid data/backups root@remotehost:data/backups

Online compression

Another benefit of ZFS is its capacity to compress data on disks. For a tool like rdiff-backup that already has gzip compression for files delta, the benefits might not look appealing. But to the contrary, gzip file compression used by rdiff-backup is using more CPU then ZFS’s default lz4 compression (1). Moreover, ZFS would compress the delta files and the current mirror data! When enabling ZFS compression, you will also benefit from a side feature that auto detects spare blocks to free disk space (2).

Enabling ZFS compression is an absolute must!

Conclusion

ZFS definitely has more advantages and you can continue to read more about ZFS and how to configure it for your Linux distribution by reading the OpenZFS FAQ.

If you need help to configure your ZFS environment for rdiff-backup, Minarca or Rdiffweb, don't hesitate to contact us.

1. The Case For Using ZFS Compression 

2. ZFS quietly discards all-zero blocks

Support Debian Bullseye with Python 3.8
Rdiffweb Release v1.5.0