Manta: From revelation to product

If you haven’t seen it, you should read Mark’s blog entry on Manta, the revolutionary new object storage service we announced today that features in situ compute. The idea for this is beautifully simple: couple a ZFS-based distributed object storage system with the OS-level virtualization in SmartOS to deliver a system that allows arbitrary compute to be spun up where objects live. That is, not only can you store and retrieve objects (as you can with any internet-facing object store), you can also specify compute jobs to be operated upon those objects without requiring data motion. (If you still need to be convinced that this represents a new paradigm of object storage, check out this screencast.)

Sometimes simple ideas can seem obvious in hindsight — especially as the nuance of historical context is lost to time — so for the record let me be clear on this point: this idea wasn’t obvious. I say this unequivocally because I myself was trying think about how we could use the technological differentiators in SmartOS to yield a better object store — and it was very hard to not seek that differentiator in ZFS. As myopic as it may seem now in retrospect, I simply couldn’t look beyond ZFS — it just had to hold the riddle to a next generation object store!

But while ZFS is essential to Manta, it is ultimately as an implementation detail; the technology that serves as the essential enabler is the OS-level virtualization provided by Zones, which allow us to easily, quickly and securely spin up on-the-metal, multi-tenant compute on storage nodes without fear of compromising the integrity of the system. Zones hit the sweet spot: hardware virtualization (e.g. KVM) is at too low a level of abstraction to allow this efficiently, and higher levels of virtualization (e.g. PaaS offerings) sacrifice expressive power and introduce significant multi-tenancy complexity and risk.

Of course, all of this was obvious once Mark had the initial insight to build on Zones; what we needed to build was instantly self-evident. This flash of total clarity is rare in a career; I have only felt it a handful of times and it’s such an intense moment that it becomes locked in memory. I remember exactly where I was when Bonwick described to me the first ideas for what became ZFS (my dimly lit office in MPK17, circa 2000) — and I remember exactly where I was when I described to Bonwick my first ideas for what became DTrace (in Bart’s old blue van on Willow Road crossing the 101, February 1996). Given this, there was one thing about Manta that troubled me: I couldn’t remember where I was when Mark described the idea to me. I knew that it came from Mark, and I knew that it was sometime in the fall of 2011, but I couldn’t remember the details of the conversation. In talking to Mark about this, he couldn’t remember either — so I decided to go through old IM logs to determine when we first started talking about it to help us both date it.

And in going through my logs, it became clear why I couldn’t remember that initial conversation — because there wasn’t one, at least not in the traditional sense: it happened over IM. This (accidentally) captured for posterity a moment of which one has so few: having one’s brain blasted by enlightenment. (I know that it will disappoint my mother that I dealt with this essentially by swearing, so let me pause to explain that this isn’t her fault; as she herself points out from time to time, I wasn’t raised this way.) So here is my initial conversation with Mark, with some sensitive details redacted:

Of course, a flash of insight is a long way from a product — and that conversation over a year and a half ago was just the beginning of a long journey. As Mark mentioned, shortly after this conversation, he was joined by Dave who led the charge on Manta computation. To this duo, we added Yunong, Nate and Fred. Beyond that core Manta team, Keith developed the hardware architecture for Manta, Jerry developed the first actual Manta code with hyprlofs, and Bill and Matt wrote a deployment management system for Manta — and Matt further developed the DNS service that is at the heart of the system.

As Mark mentioned, Manta is built on top of SmartDataCanter, and as such the engineering work behind it was crucial to Manta: Orlando developed the compute node service that is involved with the provision of every Manta service, Pedro built the workflow engine that actually implements provisioning and Kevin developed the operator console that you’ll have to just trust me is awesome. John developed the auth mechanism that many first-time users will use to create their Joyent accounts today, and Andrés developed the name service replication that assures that those new users will be able to store to Manta.

In terms of SDKs, Marsell developed the ruby-manta SDK, Trent developed both the Python SDK (including mantash!) and Bunyan — the node.js logging service that has been proven indispensable time and time again when debugging Manta issues. Speaking of node.js, no one will be surprised to learn that Manta is largely implemented in node — and that TJ and Isaac were both clutch in helping us debug some nasty issues down the stretch, reinforcing our conviction that the entire stack should be under one (virtual) roof!

Josh Wilsdon developed the vmadm that lies at the heart of SmartOS provisioning — and deserves special mention for the particularly heavy hand he applied to Manta in some of our final load testing; any system that can survive Josh is production-ready! Robert and Rob both jumped in on countless networking issues — they were both critical for implementing some necessary but complicated changes to Manta’s physical networking topology. Brendan provided the performance analysis that is his hallmark, joining up with Robert to form Team Tunable — from whom no TCP tunable is safe!

Jonathan developed the multi-archecture package support which became necessary for the Manta implementation and Filip made sure all sorts of arcane software ran on SmartOS in the Manta compute zone. (When you find that what you’re looking for is already installed in a Manta compute zone, you have Filip to thank!) Finally, Josh Clulow developed mlogin — which really must be tried to be believed. If you’re trying to understand the Manta compute model or if you just want to play around, give mlogin a whirl!

From Mark’s initial flash of insight (and my barrage of swearing) to finished product, it has been a long road, but we are proud to deliver it to you today; welcome to Manta — and dare we say to the next generation of object storage!

Posted on June 25, 2013 at 6:16 am by bmc · Permalink
In: Uncategorized

One Response

Subscribe to comments via RSS

  1. Written by Stephen Green
    on June 26, 2013 at 8:23 am
    Permalink

    This looks very interesting. The Project Caroline folks at Sun had a very similar approach: using Zones to provide lightweight virtualization for Java VMs (although other VM types were in the pipeline to be supported).

    They provided a lot of useful networking infrastructure: each VM got its own IP in a private subnet so that you could easily build distributed systems (which we did!)

    They provided ZFS over NFS, but you could create, snapshot, and rollback filesystems from their API. No object store, your VM just got filesystems that it could read/write.

    They guys who built the Caroline infrastructure were serious engineers. It was bulletproof.

    Caroline was way ahead of its time, too far ahead for Sun unfortunately. The project pages have been scrubbed, but there are mentions here and there on the Web:

    http://perspectives.mvdirona.com/2008/02/20/SunCarolineToCompeteWithAmazonAWS.aspx

Subscribe to comments via RSS