Adam Leventhal's blog

Search
Close this search box.

A bit over two years ago, I started work on typify, a library to generate Rust types from JSON Schema. It took me a while to figure out it was a compiler, but I’ll call it that now: it’s a compiler! It started life as a necessary component of an OpenAPI SDK generator—a pretty important building block for the control plane services at Oxide. Evolving the compiler has become somewhere between a hobby and an obsession, trying to generate increasingly idiomatic Rust from increasingly esoteric schemas.

Get in, loser; we’re writing a compiler

Why did I start building this? I came to Oxide with a certain amount of OpenAPI optimism (from my previous company), optimism that was in some cases well-founded (it has earned its place as the de facto standard for describing HTTP-based APIs), and in other cases profoundly misplaced (the ecosystem was less mature than expected). On the back of that optimism, Dave and I (but mostly Dave) built a server framework, dropshot, that emits OpenAPI from the code. We gave a pretty good talk about it in 2020 about using the code as the source of truth for interface specification.

As we built out services of the control plane we wanted service-specific clients. Ideally these would be derived from the OpenAPI documents emitted from dropshot. We couldn’t find what we wanted in the ecosystem (read: we tried them and they didn’t work) so we built our own. Before we could invoke APIs and understand their responses we needed to generate types. Since OpenAPI uses JSON Schema to define types**, I started there.

**: sort of; and it’s actually quite annoying but I’ll save my grousing for later.

Sum types

Pretty uncontroversial take for Rust programmers: sum types are great. We use enums a bunch in the API types because they let us express precise constraints. (They do make for tricky SDK generation in languages that don’t support sum types, but that’s not important here.) How are enums represented in JSON serialization or JSON schema? The answer, with some irony, is “variously”. The ubiquitous Rust de/serialization framework gives 4 different choices (that I’ll show below).

My woodworking mentor as a kid observed that I start projects in the middle. That’s exactly what happened here. Reconstructing enums from their generated schemas seemed tricky and interesting, so that’s where I started. Generally, an enum turns into a oneOf construction (“data must conform to exactly one of the given subschemas”). I try to apply heuristics that correspond to each of the serde enum formats:

 let ty = self
    .maybe_option(type_name.clone(), metadata, subschemas)
    .or_else(|| self.maybe_externally_tagged_enum(type_name.clone(), metadata, subschemas))
    .or_else(|| self.maybe_adjacently_tagged_enum(type_name.clone(), metadata, subschemas))
    .or_else(|| self.maybe_internally_tagged_enum(type_name.clone(), metadata, subschemas))
    .or_else(|| self.maybe_singleton_subschema(type_name.clone(), subschemas))
    .map_or_else(|| self.untagged_enum(type_name, metadata, subschemas), Ok)?;

Externally tagged enums have this basic shape:

{
  "<variant-name>": { .. }
}

Internally tagged enums look like this:

{
  "<tag-name>": { "const": ["<variant-name>"] },
  … other properties …
}

Externally tagged enums:

{
  "<tag-name>": { "const": ["<variant-name>"] },
  "<content-name>": { .. }
}

Unlike other formats, the final format, “untagged”, doesn’t include any indication of the variant name—it just dumps the raw type data (and one needs to be careful that the subschemas are mutually exclusive).

Seeing enums traverse JSON Schema and turn back into the same Rust code was very satisfying.While I basically got enum generation right, there are a couple of JSON Schema constructs that I really screwed up.

allOf

In JSON Schema an “allOf” indicates that a data value needs to conform to all subschemas… to no one’s surprise. So you see things like this:

{
  “title": “Doodad",
  “allOf" [
    { “$ref": “#/$defs/Thingamajig" },
    { “$ref": “#/$defs/Whosiewhatsit" },
  ]
}

Serde has a #[serde(flatten)] annotation that takes the contents of a struct and, effectively, dumps it into the container struct. This seemed to match the allOf construct perfectly; the above schema would become:

// ⬇️ This is wrong; don’t do this ⬇️
struct Doodad {
    #[serde(flatten)]
    thingamajig: Thingamajig,
    #[serde(flatten)]
    whosiewhatsit: Whosiewhatis,
}

This is wrong! Very very wrong. So wrong it often results in structs for which no data results in valid deserialization or serializations that don’t match the given schema. In particular, imagine if both Thingamajig and Whosiewhatis have a fields of the same name with incompatible types.

Perhaps more precisely the code above is only right under the narrow conditions that the subschemas are all fully orthogonal. In the wild (as we JSON Schema wranglers refer to its practical application), allOf is mostly commonly used to apply constraints to existing types.

Here’s an example from a github-related schema I found:

"allOf": [
  { "$ref": "#/definitions/issue" },
  {
    "type": "object",
    "required": ["state", "closed_at"],
    "properties": {
      "state": { "type": "string", "enum": ["closed"] },
      "closed_at": { "type": "string" }
    }
  }
]

The “issue” type is an object with non-required properties like:

{
  "state": {
    "type": "string",
    "enum": ["open", "closed"],
    "description": "State of the issue; either 'open' or 'closed'"
  },
  "closed_at": { "type": ["string", "null"], "format": "date-time" },
}

The result of this allOf is a type where state is required and must have the value “closed” and “closed_at” must be a date-time string (and not null). (closed_at was already required by the base type, so I’m not sure why the allOf felt the need to reassert that constraint.)

This is very very different than what #[serde(flatten)] gives us. Originally I was generating a broken type like this:

struct ClosedIssue {
    #[serde(flatten)]
    type_1: Issue,
    #[serde(flatten)]
    type_2: ClosedIssueType2,
}

struct ClosedIssueType2 {
    state: ClosedIssueType2State; // enum { Closed }
    closed_at: String,
}

Wrong and not actually useful. More recently I’ve applied merging logic to these kinds of constructions, but it’s tricky and opens the door to infinite recursion (one of the many sins the JSON Schema spec condemns albeit with merely its second sternest form of rebuke).

anyOf

I got allOf wrong. I got anyOf much wronger. AnyOf says that a valid value should conform to any of the given subschemas. So if an allOf is just a struct with a bunch of flattened members then it would make sense that an anyOf is a struct with a bunch of optional members. It makes sense, especially if you don’t think about it.

// ⬇️ This is wrong; don’t do this ⬇️
struct Doodad {
    #[serde(flatten)]
    thingamajig: Option<Thingamajig>,
    #[serde(flatten)]
    whosiewhatsit: Option<Whosiewhatis>,
}

But if you do think about it even briefly, you realize that a type like carries only the most superficial relationship with the JSON Schema. For example, at least one of the subschemas needs to be valid and this type would be fine with an empty object ({}) turning into a bunch of Nones.

So what’s a valid representation of anyOf as a Rust type? In a way I’m glad I went with this quick, clever, and very very wrong approach because a robust approach is a huge pain the neck! Consider an anyOf like this:

{
  “title": “Something",
  “anyOf": [
    { “$ref": “#/$defs/A" },
    { “$ref": “#/$defs/B" },
    { “$ref": “#/$defs/C" },
  ]
}

Bear in mind, my goal is to allow only valid states to be represented by the generated types. That is, I want type-checking at build time rather than, say, validation by a runtime builder. Consequentially, I think we need a Rust type that’s effectively:

enum Something {
    A,
    B,
    C,
    A ∪ B,
    A ∪ C,
    B ∪ C,
    A ∪ B ∪ C,
}

You need the power set of all sub-types. Sort of. Some of those are going to produce unsatisfiable combinations (i.e. if the types are orthogonal). We’d ideally exclude those. And we need to come up with reasonable names for the enum variants (AAndBAndC?). Ugh. It’s awful. While I’ve cleaned up allOf, typify’s anyOf implementation is still based on that original, wrong insight.

JSON kvetching

I used to abstractly dislike JSON Schema. My dislike has become much more concrete. With a big caveat: I’m considering only the use cases I care about, which assuredly bear little-to-no resemblance to the use cases envisioned by the good folks who designed and evolve the standard. By way of a terrible analogy here’s the crux of the issue: I think about product concept documents (PCDs) and product requirement documents (PRDs) which are vaguely common product management terms (that I’ll now interpret for my convenience). A PCD tells you about the thing. What is it? How’s it work? How might you build it? A PRD provides criteria for completion. Can it do this? Can it do that? JSON Schema is much better at telling you if the thing you’ve built is valid than it is at telling you how to build the intended values.

What I want is a schema definition for affirmative construction, describing the shape of types. What are the properties? What are the variants? What are the constraints? JSON Schema seems to have a greater emphasis on validation: does this value conform?

As an example of this, consider JSON Schema’s if/then/else construction.

{
  “if": { “$ref": “#/$defs/A" },
  “then": { “$ref": “#/$defs/B" },
  “else": { “$ref": “#/$defs/C" }
}

If the value conforms to a schema, then it must conform to another schema… otherwise it must conform to a third schema. Why does JSON Schema even support this? I think (but am deeply unsure) that this is equivalent to:

{
  "oneOf": [
    {
      "allOf": [
        { "$ref": "#/$defs/A" },
        { "$ref": "#/$defs/B" }
      ]
    },
    {
      "allOf": [
        { "not": { "$ref": "#/$defs/A" } },
        { "$ref": "#/$defs/C" }
      ]
    }
  ]
}

In other words, { A ∪ B, ¬A ∪ C }. Perhaps it’s a purely academic concern: I haven’t encountered if/then/else in an actual schema.

More generally: there are often many ways to express equivalent constructions. This is, again, likely a case of my wanting JSON Schema to be something it isn’t. There’s an emphasis on simplicity for human, hand-crafted authorship (e.g. if/then/else) whereas I might prefer a format authored and consumed by machines. The consequence is a spec that’s broad, easy to misimplement or misinterpret, and prone to subtle incompatibilities from version to version.

Typify to the future

As much as it’s been a pain in the neck, this JSON Schema compiler has also been a captivating puzzle, reminiscent of the annual untangling of Christmas tree lights (weirdly enjoyable… just me?). How to translate these complex, intersecting, convoluted (at times) schema into neat, precise, idiomatic Rust types. I feel like I kick over some new part of the spec every time I stare at it (dependentRequired? Who knew!). There are plenty of puzzles left: schemas with no simple Rust representation, unanticipated constructions, weirdo anchor and reference syntex, and—to support OpenAPI 3.1—a new (subtly incompatible) JSON Schema revision to untangle.

In 2021 Twitter rolled out Spaces. Listen to people you follow! Talk to followers! It sounded awful–the neologism “webinar” uncomfortably close. So when Bryan asked if I’d hang out with him to give it a shot, I set aside my skepticism–at the very least it would break up the mid-pandemic monotony!

And it was great! The next week illumos dropped support for SPARC so we had Tom Lyon join us (now frequent OxF contributor) and we jury rigged a recording system for a great discussion (… and eulogy: SPARC had been pretty important to us, early in our careers in particular). We wanted to share the recording in the lowest effort (cheapest) way possible so made a YouTube channel.

The (mostly) weekly show became a thing we do We were learning and having fun… and so were listeners. Social audio turned out to be a new (low-effort) way to share technical experience, wisdom, and stories. Bryan gave a great talk about it last year:

We only sporadically introduce guests and topics. We’ve evolved running jokes (“for the light-cone!“, “millennial podcaster audio quality”) with pretty narrow appeal. The Simpsons is referenced like bible verses. Arguably, we laugh at our own jokes too much? But bear with us, because the technical conversations are always pretty interesting.

Oxide and Friends is now a podcast–we just posted our 100th episode (Predictions 2024!). I imagine we’ll keep going as long as we’re having fun.

DTrace’s User-land Statically-Defined Tracing (USDT) was… kind of an accident. Bryan has (kindly) retconned the genesis of USDT as a way to understand dynamic languages[citation needed]. Indeed, it’s been essential for that, but its origins were much less ambitious or prescient.

Way back in the 1990s(!) Jeff Bonwick had created a program called lockstat(1M) for live instrumentation of kernel locking primitives (“live” as distinct from “dynamic” in that while the instrumentation could be turned on and off, the data payloads were pretty much static). This was incredibly useful! What locks were hot? Where were they contended? New observability led to new performance fixes. When we built DTrace, we incorporated its instrumentation as the lockstat provider.  After building user-land tracing with the pid provider, it seemed like an obvious step to build a plockstat(1M) command to understand user-land locking primitives. So I built it. And, my goodness, was the first iteration of that a disaster. A total mess. Special cases on special cases with unholy knowledge sprinkled everywhere. We yanked that out of the first integration of DTrace and went back to the drawing board. What we came up with was USDT, the first provider of which was plockstat whose probes are consumed by the eponymous command.

Bryan and I touched on this somewhere in the 2+ hours of DTrace history we recorded back in September:

Years passed as they do, and USDT turned out to be very very very useful. At Oxide, we wanted that usefulness in the Rust code we’re writing, so my colleague, Ben Naecker, and I built a usdt crate. While we have probes in lots of places, knowledge had passed more or less word-of-mouth. Shocking! To remedy this, at the last Oxide all-company event, Ben and I put together a little slide deck on inserting USDT probes in Rust code, using those probes (spoiler: here’s a new cause for frustration with async Rust), and an exercise to try it out. Enjoy!

(tl;dr add USDT probes where you have log statements and you’ll probably thank yourself later.)

When Apple announced their new file system, APFS, in June, I hustled to be in the front row of the WWDC presentation, questions with the presenters, and then the open Q&A session. I took a week to write up my notes which turned into as 12 page behemoth of a blog post — longer than my college thesis. Despite reassurances from the tweeps, I was sure that the blog post was an order of magnitude longer than the modern attention span. I was wrong; so wrong that Ars Technica wanted to republish the blog post. Never underestimate the interest in all things Apple.

In that piece I left one big thread dangling. Apple shipped APFS as a technology preview, but they left out access to one of the biggest new features: snapshots. Digging around I noticed that there was a curiously named new system call, “fs_snapshot”, but explicitly didn’t investigate: the post was already too long (I thought), I had spent enough time on it, and someone else (surely! surely?) would want to pull on that thread.

Slow News Day

Every so often I’d poke around for APFS news, but there was very little new. Last month folks discovered that APFS was coming to iOS sooner rather than later. But there wasn’t anything new to play with or any revelations on how APFS would work.

I would search for “APFS snapshots”, “fs_snapshot”, anything I could think of to see if anyone had figured out how to make snapshots work on APFS. Nothing.

A few weekends ago, I decided to yank on that thread myself.

Prometheus

I started from the system call, wandered through Apple’s open source kernel, leaned heavily on DTrace, and eventually figured it out. Apple had shipped snapshots in APFS, they just hadn’t made it easy to get there. The folks at Ars were excited for a follow-up, and my investigation turned into this: “Testing out snapshots in Apple’s next-generation APFS file system.”

Snapshots were there; the APIs were laid bare; I was going to bring fire to all the Mac fans; John Siracusa and Andy Ihnatko would carry me on their shoulders down the streets of the Internet.

Sisyphus

On the eve that this new piece was about to run I was nervously scrolling through Twitter as I took the bus home from work. Now that I had invested the time to research and write-up APFS snapshots I didn’t want someone else beating me to the punch.

Then I found this and my heart dropped:

 

Skim past the craziness of MFS and the hairball of HFS, and start digging through the APFS section. Slide 49, “APFS Snapshots” and there it is “apfs_snapshot” — not a tool that anyone laboriously reverse engineered, deciphering system calls and semi-published APIs — a tool shipped from Apple and included in macOS by default. F — .

Apple had secreted this utility away (along with some others) in /System/Library/Filesystems/apfs.fs/Contents/Resources/

What To Do?

The article that was initially about a glorious act of discovery had become an article about the reinvention of the wheel. Conversely, vanishingly few people would recognize this as rediscovery since the apfs_snapshot tool was so obscure (9 hits on Google!).

We toned down the already modest chest-thumping and published the article this morning to a pretty nice response so far. I might have happier as an FAKE NEWS Prometheus, blissfully unaware of the pre-existence of fire, but I would have been mortified when the inevitable commenter, one of the few who had used apfs_snapshot, crushed me with my own boulder.

I had been procrastinating making the family holiday card. It was a combination of having a lot on my plate and dreading the formulation of our annual note recapping the year; there were some great moments, but I’m glad I don’t have to do 2016 again. It was just before midnight and either I’d make the card that night or leave an empty space on our friends’ refrigerators. Adobe Illustrator had other ideas:

Unable to set maximum number of files to be opened.

I’m not the first person to hit this. The problem seems to have existed since CS6 was released in 2016. None of the solutions was working for me, and — inspired by Sara Mauskopf’s excellent post — I was rapidly running out of the time bounds for the project. Enough; I’d just DTrace it.

A colleague scoffed the other day, “I mean, how often do you actually use DTrace?” In his mind DTrace was for big systems, critical system, when dollars and lives were at stake. My reply: I use DTrace every day. I can’t imagine developing software without DTrace, and I use it when my laptop (not infrequently) does something inexplicable (I’m forever grateful to the Apple team that ported it to Mac OS X).

First I wanted to make sure I had the name of the Illustrator process right:

$ sudo dtrace -n ‘syscall:::entry{ @[execname] = count(); }’
dtrace: description ‘syscall:::entry’ matched 500 probes
^C
pboard 1
watchdogd 2
awdd 3
...
com.apple.WebKit 7065
Google Chrome He 7128
Google Chrome 8099
Adobe Illustrato 36674

Glad I checked: “Adobe Illustrato”. Now we can be pretty sure that Illustrator is failing on setrlimit(2) and blowing up as result. Let’s confirm that it is in fact returning -1:

$ sudo dtrace -n 'syscall::setrlimit:return/execname == "Adobe Illustrato"/{ printf("%d %d", arg1, errno); }'
dtrace: description 'syscall::setrlimit:return' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    532                 setrlimit:return -1 1

There it is. And setrlimit(2) is failing with errno 1 which is EPERM (value too high for non-root user). I already tuned up the files limit pretty high. Let’s confirm that it is in fact setting the files limit and check the value to which it’s being set. To write this script I looked at the documentation for setrlimit(2) (hooray for man pages!) to determine that the position of the resource parameter (arg0) and the type of the value parameter (struct rlimit). I needed the DTrace copyin() subroutine to grab the structure from the process’s address space:

$ sudo dtrace -n 'syscall::setrlimit:entry/execname == "Adobe Illustrato"/{ this->r = *(struct rlimit *)copyin(arg1, sizeof (struct rlimit)); printf("%x %x %x", arg0, this->r.rlim_cur, this->r.rlim_max);  }'
dtrace: description 'syscall::setrlimit:entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    531                 setrlimit:entry 1008 2800 7fffffffffffffff

Looking through /usr/include/sys/resource.h we can see that 1008 corresponds to the number of files (RLIMIT_NOFILE | _RLIMIT_POSIX_FLAG). Illustrator is trying to set that value to 0x7fffffffffffffff or 2⁶³-1. Apparently too big; I filed any latent curiosity for another day.

The quickest solution was to use DTrace again to whack a smaller number into that struct rlimit. Easy:

$ sudo dtrace -w -n 'syscall::setrlimit:entry/execname == "Adobe Illustrato"/{ this->i = (rlim_t *)alloca(sizeof (rlim_t)); *this->i = 10000; copyout(this->i, arg1 + sizeof (rlim_t), sizeof (rlim_t)); }'
dtrace: description 'syscall::setrlimit:entry' matched 1 probe
dtrace: could not enable tracing: Permission denied

Oh right. Thank you SIP. This isa new laptop (at least a new motherboard due to some bizarre issue) which probably contributed to Illustrator not working when once it did. Because it’s new I haven’t yet disabled the part of SIP that prevents you from using DTrace on the kernel or in destructive mode (e.g. copyout()). It’s easy enough to disable, but I’m reboot-phobic — I hate having to restart my terminals — so I went to plan B: lldb.

First I used DTrace to find the code that was calling setrlimit(2): using some knowledge of the x86 ISA/ABI:

$ sudo dtrace -n 'syscall::setrlimit:return/execname == "Adobe Illustrato" && arg1 == -1/{ printf("%x", *(uintptr_t *)copyin(uregs[R_RSP], sizeof (uintptr_t)) - 5) }'
dtrace: description 'syscall::setrlimit:return' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    532                 setrlimit:return 1006e5b72
  0    532                 setrlimit:return 1006e5b72

I ran it a few times to confirm the address of the call instruction and to make sure the location wasn’t being randomized. If I wasn’t in a rush I might have patched the binary, but Apple’s Mach-O Object format always confuses me. Instead I used lldb to replace the call with a store of 0 to %eax (to evince a successful return value) and some nops as padding (hex values I remember due to personal deficiencies):

(lldb) break set -n _init
Breakpoint 1: 47 locations.
(lldb) run
...
(lldb) di -s 0x1006e5b72 -c 1
0x1006e5b72: callq  0x1011628e0     ; symbol stub for: setrlimit
(lldb) memory write 0x1006e5b72 0x31 0xc0 0x90 0x90 0x90
(lldb) di -s 0x1006e5b72 -c 4
0x1006e5b72: xorl   %eax, %eax
0x1006e5b74: nop
0x1006e5b75: nop
0x1006e5b76: nop

Next I just process detach and got on with making that holiday card…

DTrace Every Day

DTrace was designed for solving hard problems on critical systems, but the need to understand how systems behave exists in development and on consumer systems. Just because you didn’t write a program doesn’t mean you can’t fix it.

Since Noms dropped last week the dev community has seemed into it. “Git for data” — it simultaneously evokes something very familiar and yet unconstrained. Something that hasn’t been well-noted is how much care the team has taken to make Noms fun to build with, and it is.

[youtube_sc url=”https://www.youtube.com/watch?v=3R_4Pdb7ev4″ title=”nomfs%20in%20action%20(video%20for%20non-readers)” width=”300″ class=”alignright”]Noms is a content-addressable, decentralized, append-only database. It borrows concepts from a variety of interesting data systems. Obviously databases are represented: Noms is a persistent, transactional data repository. You can also see the fundamentals of git and other decentralized source code management tools. Noms builds up a chain of commits; those chains can be extended, forked, and shared, while historical data are preserved. Noms shares much in common with modern filesystems such as ZFS, btrfs, or Apple’s forthcoming APFS. Like those filesystems, Noms uses copy-on-write, never modifying data in situ; it forms a self-validating hierarchy of data; and it (intrinsically) supports snapshots and efficient movement of data between snapshots.

After learning a bit about Noms I thought it would be interesting to use it as the foundation for a filesystem. I wanted to learn about Noms, and contribute a different sort of example that might push the project in new and valuable ways. The Noms founders, Aaron and Raf, were fired up so I dove in…

What’s Modern?

When people talk about a filesystem being “modern” there’s a list of features that they often have in mind. Let’s look at how the Noms database stacks up:

Snapshots

A filesystem snapshot preserves the state of the filesystem for some future use — typically data recovery or fast cloning. Since Noms is append-only, every version is preserved. Snapshots are, therefore, a natural side effect. You can make a Noms “snapshot” — any commit in a dataset’s history — writeable by syncing it to a new dataset. Easy.

Dedup

The essence of dedup is that unique data should be stored exactly once. If you duplicate a file, a folder, or an entire filesystem the storage consumption should be close to zero. Noms is content addressable, unique data is only ever stored once. Every Noms dataset intrinsically removes duplicated data.

Consistency

A feature of a filesystem — arguably the feature of a filesystem — is that it shouldn’t ever lose or corrupt your data. One common technique to ensure consistency is to write new data to a new location rather than overwriting old data — so called copy-on-write (COW). Noms is append-only, it doesn’t throw out (or overwrite) old data; copying modified is required and explicit. Noms also recursively checksums all data — a feature of ZFS and btrfs, notably absent from APFS.

Backup

The ability to backup your data from a filesystem is almost as important as keeping it intact in the first place. ZFS, for example, lets you efficiently serialize and send the latest changes between systems. When pulling or pushing changes git also efficiently serializes just the changed data. Noms does something similar with its structured data. Data differences are efficiently computed to optimize for minimal data transfer.

Noms has all the core components of a modern filesystem. My goal was to write the translation layer to expose filesystem semantics on top of the Noms interfaces.

Designing a Schema

Initially, Badly

It’s in the name: Noms eats all the data. Feed it whatever data you like, and let Noms infer a schema as you go. For a filesystem though I wanted to define a fixed structure. I started with a schema modeled on a simplified ZFS. Filesystems keep track of files and directories with a structure called an “inode” each of which has a unique integer identifier, the “inode number”. ZFS keeps track of files and directories with DMU objects named by their integer ID. The schema would use a Map<number, Inode> to serve the same function (spoiler: read on and don’t copy this!):

struct Filesystem {
     inodes: Map<Number, struct Inode {
          attr: struct Attr { /* e.g. permissions, modification time, etc. */ }
          contents: Union {
               struct File { data: Ref /* Noms pile of bytes */ } |
               struct Directory { contents: Map<String, Number> }
          }
     }>
     rootInode: Number
     maxInode: Number
}

Nice and simple. Files are just Noms Blobs represented by a sequence of bytes. Directories are a Map of strings (the name of the directory entry) to the inode number; the inode number can be used to find the actual content by looking in the top-level map.

Schema philosophy

This made sense for a filesystem. Did it make sense for Noms? I wasn’t trying to put the APFS team out of work, rather I was creating a portal from the shell or Finder into Noms. To evaluate the schema, I had the benefit of direct access to the Noms team (and so can all developers at http://slack.noms.io/). I learned two guiding principles for data in Noms:

Noms data should be self-validating. As much as possible the application should rely on noms to ensure consistency. My schema failed this test because the relationship between inode numbers contained in directories and the entires of the inodes map was something my code alone could maintain and validate.

Noms data should be deterministic. A given collection of data should have a single representation; the Noms structures should be path and history independent. Two apparently identical datasets should be identical in the eyes of Noms to support efficient storage and transfer, and identical data should produce an identical hash at the root of the dataset. Here, again, my schema fell short because the inode number assigned to a given file or directory depended on how other objects were created. Two different users with two identical filesystems wouldn’t be able to simply sync the way they would with two identical git branches.

A Better Schema

My first try made for a fine filesystem, just not a Noms filesystem. With a better understanding of the principles, and with help from the Noms team, I built this schema:

struct Filesystem {
     root: struct Inode {
          attr: struct Attr { /* e.g. permissions, modification time, etc. */ }
          contents: Union {
               struct File { data: Ref<Blob> /* Noms pile of bytes */ } |
               struct Directory: { contents: Map<string, Cycle<1>> }
          }
     }
}

Obviously simpler; the thing that bears explanation is the use of so-called “Cycle” types. A Cycle is a means of expressing a recursive relationship within Noms types. The integer parameter specifies the ancestor struct to which the cycle refers. Consider a linked list type:

struct LinkedList {
    data: Blob
    next: Cycle<0>
}

The “next” field refers to immediately containing struct, LinkedList. In our filesystem schema, a Directory’s contents are represented by a map of strings (directory entry names) to Inodes, Cycle<1> referring to the struct “above” or “containing” the Directory type. (Read on for a visualization of this.)

Writing It

To build the filesystem I picked a FUSE binding for Go, dug into the Noms APIs, and wrestled my way through some Go heartache.

Working with Noms requires a slightly different mindset than other data stores. Recall in particular that Noms data is immutable. Adding a new entry into a Map creates a new Map. Setting a member of a Struct creates a new Struct. Changing nested structures such as the one defined by our schema requires unzipping it, and then zipping it back together. Here’s a Go snippet that demonstrates that methodology for creating a new directory:

Demo

Showing it off has all the normal glory of a systems demo! Check out the documentation for requirements.

Create and mount a filesystem from a new local Noms dataset:

$ go build
$ mkdir /var/tmp/mnt
$ go run nomsfs.go /var/tmp/nomsfs::fs /var/tmp/mnt
running...

You can open the folder and drop data into it.

Your database fell into my filesystem!

Now let’s take a look at the underlying Noms dataset:

$ noms show http://demo.noms.io/ahl_blog::fs
struct Commit {
  meta: struct {},
  parents: Set<Ref<Cycle<0>>>,
  value: struct Filesystem {
    root: struct Inode {
      attr: struct Attr {
        ctime: Number,
        gid: Number,
        mode: Number,
        mtime: Number,
        uid: Number,
        xattr: Map<String, Blob>,
      },
      contents: struct Directory {
        entries: Map<String, Cycle<1>>,
      } | struct Symlink {
        targetPath: String,
      } | struct File {
        data: Ref<Blob>,
      },
    },
  },
}({
  meta:  {},
  parents: {
    5v82rie0be68915n1q7pmcdi54i9tmgs,
  },
  value: Filesystem {
    root: Inode {
      attr: Attr {
        ctime: 1.4705227450393803e+09,
        gid: 502,
        mode: 511,
        mtime: 1.4705227450393803e+09,
        uid: 110853,
        xattr: {},
      },
      contents: Directory {
        entries: {
          "usenix_winter91_faulkner.pdf": Inode {
            attr: Attr {
              ctime: 1.4705228859273868e+09,
              gid: 502,
              mode: 420,
              mtime: 1.468425783e+09,
              uid: 110853,
              xattr: {
                "com.apple.FinderInfo": 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 32 B
                00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00,
                "com.apple.quarantine": 30 30 30 31 3b 35 37 38 36 36 36 33 37 3b 53 61  // 21 B
                66 61 72 69 3b,
              },
            },
            contents: File {
              data: dmc45152ie46mn3ls92vvhnm41ianehn,
            },
          },
        },
      },
    },
  },
})

You can see the type at the top and then the actual filesystem contents. Let’s look at more complicated example where I’ve taken part of the Noms source tree and copied it to nomsfs. You can navigate around its structure courtesy of the Splore utility (take particular note of nested directories that show the recursive data definition described above):

Embedded ‘Splore! http://splore.noms.io/?db=https://demo.noms.io/ahl_blog&hash=2nhi5utm4s38hka22vt9ilv5i3l8r2ol

You can see the all of the various states that the filesystem has been through — each state change — using noms log http://demo.noms.io/ahl_blog::fsnoms. You can sync it to your local computer with noms sync http://demo.noms.io/ahl_blog::fsnoms /var/tmp/fs or checkout some previous state from the log (just like a filesystem snapshot). Diff two states from the log or make your own changes and diff it with the original using noms diff.

Nom Nom Nom

It took less than 1000 lines of Go code to make Noms appear as a Window in the Finder, eating data as quickly as I could drag and drop (try it!). Imagine what Noms might look like behind other known data interfaces; it could bring git semantics to existing islands of data. Noms could form the basis of a new type of data lake — maybe one that’s simple and powerful enough to bring real results. Beyond the marquee features, Noms is fun to build with. I’m already working on my next Noms application.

Recent Posts

January 22, 2024
January 13, 2024
December 29, 2023
February 12, 2017
December 18, 2016
August 9, 2016

Archives

Archives