Modern file compression
Unknown to most users, file compression silently works behind the scene. Updates for any operating system, for example, are compressed. That happens automatically and the user doesn't even need to know about it.
But sometimes, we have a choice. In Archlinux, for example, we can set the compression we'd like to use for packages created by makepkg (such as those installed over the AUR) – but how to chose between gz, bz2, xz, lrz, lzo, and z? And some backup software adds further options: Borg, for example, offers zlib, lzma, lz4, and zstd.
Most surprisingly, some of these algorithms have been developed only very recently: zstd comes from Facebook (2016), and there's brotli from Google (2015) and lzfse from Apple (2015). Why do these multi-billion-dollar companies develop compression algorithms? Because of the multi-billion dollars.
Instead of testing each of these algorithms yourself, you can use lzbench. It tests all open source algorithms of the lz family with the de facto standard file package in the compression business, the silesia suite.
Here are three examples geared toward high compression ratio, high speed compression, and high speed decompression:
High compression ratio (<25%)
➜ lzbench -c -ebrotli,11/xz,6,9/zstd,22 silesia.tar lzbench 1.7.3 (64-bit Linux) Assembled by P.Skibinski Compressor name Compress. Decompress. Compr. size Ratio memcpy 9814 MB/s 9852 MB/s 211947520 100.00 brotli 2017-12-12 -11 0.48 MB/s 385 MB/s 51136654 24.13 xz 5.2.3 -6 2.30 MB/s 74 MB/s 48745306 23.00 zstd 1.3.3 -22 2.30 MB/s 600 MB/s 52845025 24.93
These are single core values. xz compression (but not decompression) profits from multithreading, while brotli and zstd do not.
High speed compression (for compression ratios <50%)
➜ lzbench -c -elz4/lzo1x silesia.tar Compressor name Compress. Decompress. Compr. size Ratio memcpy 9861 MB/s 9768 MB/s 211947520 100.00 lz4 1.8.0 524 MB/s 2403 MB/s 100880800 47.60 lzo1x 2.09 -12 521 MB/s 738 MB/s 103238859 48.71
High speed decompression (> 2000 MB/s)
↪ lzbench -c -elz4/lizard,10/lzsse8,6 silesia.tar Compressor name Compress. Decompress. Compr. size Ratio memcpy 9579 MB/s 10185 MB/s 211947520 100.00 lz4 1.8.0 525 MB/s 2421 MB/s 100880800 47.60 lizard 1.0 -10 421 MB/s 2115 MB/s 103402971 48.79 lzsse8 2016-05-14 -6 8.25 MB/s 3359 MB/s 75469717 35.61
What do we learn from these benchmarks?
If we want high compression reasonably fast, nothing beats xz. It's just perfect for what it's actually used by some (all?) Linux distributions: to distribute updates with acceptable computational resources over a channel with a very limited band width.
If the distributor commands over virtually unlimited resources, and compression speed is thus not an issue, brotli and zstd are clearly superior to all other choices. That's how we would like to have our updates: small and fast to decompress.
If size is not of primary importance, but compression speed is, lz4 and lzo are the champions.
If decompression speed is essential, lzsse8 wins. This is a lesser known member of the lz family and not widely available, in contrast to lz4 which thus scores again.