Adam Leventhal's blog

Search
Close this search box.

a small ZFS hack

January 29, 2007

I’ve been dabbling a bit in ZFS recently, and what’s amazing is not just how well it solved the well-understood filesystem problem, but how its design opens the door to novel ways to manage data. Compression is a great example. An almost accidental by-product of the design is that your data can be stored compressed on disk. This is especially interesting in an era when we have CPU cycles to spare, many too few available IOPs, and disk latencies that you can measure with a stop watch (well, not really, but you get the idea). With ZFS can you trade in some of those spare CPU cycles for IOPs by turning on compression, and the additional latency introduced by decompression is dwarfed by the time we spend twiddling our thumbs waiting for the platter to complete another revolution.

smaller and smaller

Turning on compression in zfs (zfs compression=on <dataset>) enables the so called LZJB compression algorithm — a variation on Lempel-Ziv tagged by its humble author. LZJB is fast, reasonably effective, and quite simple (compress and decompress are implemented in about a hundred lines of code). But the ZFS architecture can support many compression algorithms. Just as users can choose from several different checksum algorithms (fletcher2, fletcher4, or sha256), ZFS lets you pick your compression routine — it’s just that there’s only the one so far.

putting the z(lib) in ZFS

I thought it might be interesting to add a gzip compression algorithm based on zlib. I was able to hack this up pretty quicky because the Solaris kernel already contains a complete copy of zlib (albeit scattered around a little) for decompressing CTF data for DTrace, and apparently for some sort of compressed PPP streams module (or whatever… I don’t care). Here’s what the ZFS/zlib mash-up looks like (for the curious, this is with the default compression level — 6 on a scale from 1 to 9):

# zfs create pool/gzip
# zfs set compression=gzip pool/gzip
# cp -r /pool/lzjb/* /pool/gzip
# zfs list
NAME        USED  AVAIL  REFER  MOUNTPOINT
pool/gzip  64.9M  33.2G  64.9M  /pool/gzip
pool/lzjb   128M  33.2G   128M  /pool/lzjb

That’s with a 1.2G crash dump (pretty much the most compressible file imaginable). Here are the compression ratios with a pile of ELF binaries (/usr/bin and /usr/lib):

# zfs get compressratio
NAME       PROPERTY       VALUE      SOURCE
pool/gzip  compressratio  3.27x      -
pool/lzjb  compressratio  1.89x      -

Pretty cool. Actually compressing these files with gzip(1) yields a slightly smaller result, but it’s very close, and the convenience of getting the same compression transparently from the filesystem is awfully compelling. It’s just a prototype at the moment. I have no idea how well it will perform in terms of speed, but early testing suggests that it will be lousy compared to LZJB. I’d be very interested in any feedback: Would this be a useful feature? Is there an ideal trade-off between CPU time and compression ratio? I’d like to see if this is worth integrating into OpenSolaris.


Technorati Tags:

10 Responses

  1. Useful? Yes
    The trouble with any compression in the file system and this one makes it even more clear that you would want to be able to get at both the compressed and the uncompressed data.
    Consider an ftp server it would be good if it could offer compressed data without having the system uncompress in the file system only to compress again in the ftp server.
    Then there is NFS……

  2. Useful? Yes
    It’s very interesting that a compression algorithm could easily added to ZFS. Is this hack available as source code somewhere? 🙂
    Thanks and best regards,
    Ivan

  3. Of course it would be useful and should be integrated! I was playing with the same idea some time ago and adding another compression algorithm to ZFS is easy – the hard part is to do compression/decompression in kernel. If you’ve got gzip it should be integrated ASAP. There’s one thing which can limit performance – there’s open bug that couses all compression/decompression in ZFS being run by only one thread so only one CPU is utilized. Anyway lot of people especially with SATA disks are using ZFS for long term storage and do not necessary need lot of IOs. Simple way of specifying level of compression would be also useful – maybe in form of compression=gzip-N where N is compression level. Without specyfying -N (so only compression=gzip) default level would be enforced. Hope to see it integrated in hours… ok, in days :))) Great job!
    If you can provide you code changes privately right now it would be great.

  4. One side effect of the compression feature is that it skews the CPU utilization. I’ve been using the compress feature on a 3TB filesystem with excellent results. The one issue I notice is that when I’ve a decent amount of I/O against the filesystem my CPU spends most of its time in ‘sys’, >40% is not abnormal on my E2900 (24 x 96GB)

  5. Hi,
    I was just wondering if the gzip compression has been enabled, does it give problems when an ZFS volume is created on an X86 system and afterwards imported on a Sun Sparc?
    Best regards,
    Ivan

  6. Adam, this is positively and without a doubt some really great stuff! One could choose between lzjb for day-to-day use, or bzip2 for heavily compressed, “archival” file systems (as we all know, bzip2 beats the living daylights out of gzip in terms of compression about 95-98% of the time).

    Historical tidbit:

    ZFS finally implemented per-filesystem, one could say “per-directory” compression that AmigaOS had with the XFH: pseudo drive implemented with the xpkmaster.library (http://www.dstoecker.eu/xpkmaster.html).

  7. Are there any documents somewhere explaining the hooks of zfs and how to add features like this to zfs? Would be useful for developers who want to add features like filesystem-based encryption to it.
    Thanks for your great work!

  8. UX-admin, how do you claim that bzip2 “beats the living daylights out of gzip” ? I haven’t seen it compress files significantly better than gzip, and it uses considerably more CPU time to do so.

Recent Posts

January 13, 2024
December 29, 2023
February 12, 2017
December 18, 2016
August 9, 2016

Archives

Archives