Adam Leventhal's blog

Search
Close this search box.

Month: January 2007

The other day I posted about a prototype I had created that adds a gzip compression algorithm to ZFS. ZFS already allows administrators to choose to compress filesystems using the LZJB compression algorithm. This prototype introduced a more effective — albeit more computationally expensive — alternative based on zlib.

As an arbitrary measure, I used tar(1) to create and expand archives of an ON (Solaris kernel) source tree on ZFS filesystems compressed with lzjb and gzip algorithms as well as on an uncompressed ZFS filesystem for reference:

Thanks for the feedback. I was curious if people would find this interesting and they do. As a result, I’ve decided to polish this wad up and integrate it into Solaris. I like Robert Milkowski’s recommendation of options for different gzip levels, so I’ll be implementing that. I’ll also upgrade the kernel’s version of zlib from 1.1.4 to 1.2.3 (the latest) for some compression performance improvements. I’ve decided (with some hand-wringing) to succumb to the requests for me to make these code modifications available. This is not production quality. If anything goes wrong it’s completely your problem/fault — don’t make me regret this. Without further disclaimer: pdf patch

In reply to some of the comments:

UX-admin One could choose between lzjb for day-to-day use, or bzip2 for heavily compressed, “archival” file systems (as we all know, bzip2 beats the living daylights out of gzip in terms of compression about 95-98% of the time).

It may be that bzip2 is a better algorithm, but we already have (and need zlib) in the kernel, and I’m loath to add another algorithm

ivanvdb25 Hi, I was just wondering if the gzip compression has been enabled, does it give problems when an ZFS volume is created on an X86 system and afterwards imported on a Sun Sparc?

That isn’t a problem. Data can be moved from one architecture to another (and I’ll be verifying that before I putback).

dennis Are there any documents somewhere explaining the hooks of zfs and how to add features like this to zfs? Would be useful for developers who want to add features like filesystem-based encryption to it. Thanks for your great work!

There aren’t any documents exactly like that, but there’s plenty of documentation in the code itself — that’s how I figured it out, and it wasn’t too bad. The ZFS source tour will probably be helpful for figuring out the big picture.

Update 3/22/2007: This work was integrated into build 62 of onnv.


Technorati Tags:

I’ve been dabbling a bit in ZFS recently, and what’s amazing is not just how well it solved the well-understood filesystem problem, but how its design opens the door to novel ways to manage data. Compression is a great example. An almost accidental by-product of the design is that your data can be stored compressed on disk. This is especially interesting in an era when we have CPU cycles to spare, many too few available IOPs, and disk latencies that you can measure with a stop watch (well, not really, but you get the idea). With ZFS can you trade in some of those spare CPU cycles for IOPs by turning on compression, and the additional latency introduced by decompression is dwarfed by the time we spend twiddling our thumbs waiting for the platter to complete another revolution.

smaller and smaller

Turning on compression in zfs (zfs compression=on <dataset>) enables the so called LZJB compression algorithm — a variation on Lempel-Ziv tagged by its humble author. LZJB is fast, reasonably effective, and quite simple (compress and decompress are implemented in about a hundred lines of code). But the ZFS architecture can support many compression algorithms. Just as users can choose from several different checksum algorithms (fletcher2, fletcher4, or sha256), ZFS lets you pick your compression routine — it’s just that there’s only the one so far.

putting the z(lib) in ZFS

I thought it might be interesting to add a gzip compression algorithm based on zlib. I was able to hack this up pretty quicky because the Solaris kernel already contains a complete copy of zlib (albeit scattered around a little) for decompressing CTF data for DTrace, and apparently for some sort of compressed PPP streams module (or whatever… I don’t care). Here’s what the ZFS/zlib mash-up looks like (for the curious, this is with the default compression level — 6 on a scale from 1 to 9):

# zfs create pool/gzip
# zfs set compression=gzip pool/gzip
# cp -r /pool/lzjb/* /pool/gzip
# zfs list
NAME        USED  AVAIL  REFER  MOUNTPOINT
pool/gzip  64.9M  33.2G  64.9M  /pool/gzip
pool/lzjb   128M  33.2G   128M  /pool/lzjb

That’s with a 1.2G crash dump (pretty much the most compressible file imaginable). Here are the compression ratios with a pile of ELF binaries (/usr/bin and /usr/lib):

# zfs get compressratio
NAME       PROPERTY       VALUE      SOURCE
pool/gzip  compressratio  3.27x      -
pool/lzjb  compressratio  1.89x      -

Pretty cool. Actually compressing these files with gzip(1) yields a slightly smaller result, but it’s very close, and the convenience of getting the same compression transparently from the filesystem is awfully compelling. It’s just a prototype at the moment. I have no idea how well it will perform in terms of speed, but early testing suggests that it will be lousy compared to LZJB. I’d be very interested in any feedback: Would this be a useful feature? Is there an ideal trade-off between CPU time and compression ratio? I’d like to see if this is worth integrating into OpenSolaris.


Technorati Tags:

Recent Posts

January 22, 2024
January 13, 2024
December 29, 2023
February 12, 2017
December 18, 2016
August 9, 2016

Archives

Archives