On Blogging (Briefly)

I just discovered your blog today. I’m in the process of upgrading my home network ZFS box. It’s a little overdue, presently with 680 days of unbroken uptime (who’s complaining?).

I found your past articles on write throttling were extremely illuminating.

This is a bit off topic concerning your most recent blog post, but I’d like to bring up an older post I read today, one that is no longer accepting comments.

On a post from 2009 concerning metaslabs, you imply that metaslab size is fixed at the time of pool creation, and that forever afterwards, upgrading to larger devices just keeps stamping out more metaslabs of the original size. To my way of thinking, this potentially leads to a pool with a kind of growth-ring artifact that can break the principle of least surprise, in behaving quite differently than a virgin pool consisting of exactly the same devices.

What I don’t understand is why a different common sense did not prevail: pretend that the larger capacity device is the virgin device, allocate metaslabs based on this fiction, then mask the computed metaslabs so as not to stomp over data already located to disk.

Let’s say you start with a 1 TB device. You get 200 metaslabs per arbitrary source code constant which you documented in your 2009 post. Then you replace with a 2 TB device. In your depiction, you get another 200 metaslabs. In my version, it assigns 200 metaslabs to the entire 2 TB device, then masks out the 100 metaslabs that would clobber the pre-existing 1 TB, giving you a net increase of 100 metaslabs.

Then if you repeat this upgrading the device to 4 TB, you would get another 100 metaslabs occupying half your disk. There would still be another 100 historical metaslabs occupying 1/4 of your disk, and another 200 metaslabs occupying 1/4 of your disk from early days long forgotten (the original pool creation).

Surely this heuristic is more “obvious” per your own logic as you applied it to the write throttling analysis–if you aren’t overly concerned about the metaslabs all being the same size.

Its my perspective that if you’re trying to design a system to minimize path dependence (e.g. history of pool management), the heuristic should begin at each step by asking “what would I do if there was _no_ history” then modify that decision as little as possible to respect constraints already on the ground.

I’m not arguing that the current metaslab design has actually lead to any real world problems. I’m raising a philosophical point about engineering smoothness, where it struck me that two or your previous posts seemed to land on opposite sides of the fence.

One Response

Allan Stokes says:

April 6, 2015 at 9:51 pm

I just discovered your blog today. I’m in the process of upgrading my home network ZFS box. It’s a little overdue, presently with 680 days of unbroken uptime (who’s complaining?).

I found your past articles on write throttling were extremely illuminating.

This is a bit off topic concerning your most recent blog post, but I’d like to bring up an older post I read today, one that is no longer accepting comments.

On a post from 2009 concerning metaslabs, you imply that metaslab size is fixed at the time of pool creation, and that forever afterwards, upgrading to larger devices just keeps stamping out more metaslabs of the original size. To my way of thinking, this potentially leads to a pool with a kind of growth-ring artifact that can break the principle of least surprise, in behaving quite differently than a virgin pool consisting of exactly the same devices.

What I don’t understand is why a different common sense did not prevail: pretend that the larger capacity device is the virgin device, allocate metaslabs based on this fiction, then mask the computed metaslabs so as not to stomp over data already located to disk.

Let’s say you start with a 1 TB device. You get 200 metaslabs per arbitrary source code constant which you documented in your 2009 post. Then you replace with a 2 TB device. In your depiction, you get another 200 metaslabs. In my version, it assigns 200 metaslabs to the entire 2 TB device, then masks out the 100 metaslabs that would clobber the pre-existing 1 TB, giving you a net increase of 100 metaslabs.

Then if you repeat this upgrading the device to 4 TB, you would get another 100 metaslabs occupying half your disk. There would still be another 100 historical metaslabs occupying 1/4 of your disk, and another 200 metaslabs occupying 1/4 of your disk from early days long forgotten (the original pool creation).

Surely this heuristic is more “obvious” per your own logic as you applied it to the write throttling analysis–if you aren’t overly concerned about the metaslabs all being the same size.

Its my perspective that if you’re trying to design a system to minimize path dependence (e.g. history of pool management), the heuristic should begin at each step by asking “what would I do if there was _no_ history” then modify that decision as little as possible to respect constraints already on the ground.

I’m not arguing that the current metaslab design has actually lead to any real world problems. I’m raising a philosophical point about engineering smoothness, where it struck me that two or your previous posts seemed to land on opposite sides of the fence.

Adam Leventhal's blog

One Response

Recent Posts

Austin API Summit Wrap-up

Rust and JSON Schema: odd couple or perfect strangers

Oxide and Friends Season 4

DTrace probes in Rust

From Prometheus to Sisyphus

DTrace at Home

Archives

Archives