Systems We Love

One of the exciting trends of the past few years is the emergence of Papers We Love. I have long been an advocate of journal clubs, but I have also found that discussion groups can stagnate when confined to a fixed group or a single domain; by broadening the participants and encouraging presenters to select papers that appeal to the heart as well as the head, Papers We Love has developed a singular verve. Speaking personally, I have enjoyed the meetups that I have attended — and I was honored to be given the opportunity to present on Jails and Zones at Papers We Love NYC (for which, it must be said, I was flattered by the best introduction of all time). I found the crowd that gathered to be engaged and invigorating — and thought-provoking conversation went well into the night.

The energy felt at Papers We Love is in stark contrast to the academic venues in which computer science papers are traditionally presented, which I accentuated in a candid keynote at the USENIX Annual Technical Conference, pointing to PWL as a model that is much more amenable to cross-pollination of ideas between academics and practitioners. My keynote was fiery, and it may have landed on dry tinder: if Rik Farrow’s excellent summary of my talk is any indicator, the time is right for a broader conversation about how we publish rigorous work.

But for us practitioners, however well they are discussed, academic work remains somewhat ancillary: while papers are invaluable as a mechanism for the rigorous presentation of thinking, it is ultimately the artifacts that we develop — the systems themselves — that represent the tangible embodiment of our ideas. And for the systems that I am personally engaged in, I have found that getting together to them is inspiring and fruitful, e.g. the quadrennial dtrace.conf or the more regular OpenZFS developer summit. My experiences with Papers We Love and with these system-specific meetings caused me to ask on a whim if there would be interest in a one-day one-track conference that tried to capture the PWL zeitgeist but for systems — a “Systems We Love.”

While I had thrown this idea out somewhat casually, the response was too clear to ignore: there was most definitely interest — to the point of expectation that it would happen! And here at Joyent, a company for which love of systems is practically an organizing principle, the interest quickly boiled into a frothy fervor; we couldn’t not do this!

It took a little while to get the logistics down, but I’m very happy to report that Systems We Love is on: December 13th in San Francisco! To determine the program, I am honored to be joined by an extraordinary program committee: hailing from a wide-range of backgrounds, experience levels, and interests — and united by a shared love of systems. So: the call for proposals is open — and if you have a love of systems, we hope that you will consider submitting a proposal and/or joining us on December 13th!

Posted on September 26, 2016 at 2:55 pm by bmc · Permalink · Leave a comment
In: Uncategorized

Hacked by a bug?

Early this afternoon, I had just recorded a wide-ranging episode of Arrested DevOps with the incomparable Bridget Kromhout and noticed that I had a flurry of Twitter mentions, all in reaction to this tweet of mine. There was just one problem: I didn’t tweet it. With my account obviously hacked, I went into fight-or-flight mode and (thanks in no small part to Bridget’s calm presence) did the obvious things: I changed my Twitter password, revoked the privileges of all applications, and tried to assess the damage…

Other than the tweet, I (thankfully!) didn’t see any obvious additional damage: no crazy DMs or random follows or unfollows. In terms of figuring out where the malicious tweet had come from, the source of the tweet was “Twitter for Android” — but according to my login history, the last Twitter for Android login was from me during my morning commute about two-and-a-half hours before the tweet. (And according to Twitter, I have only used the one device to access my account.) The only intervening logins were two from Quora about an hour prior to the tweet. (Aside: WTF, Quora?! Revoked!)

Then there was the oddity of the tweet itself. There was no caption — just the two images from what I gathered to be Germany. Looking at the raw tweet, however, cleared up its source:

{
  "created_at": "Mon Sep 12 17:56:31 +0000 2016",
  "id": 775392664602554400,
  "id_str": "775392664602554369",
  "text": "https://t.co/pYKRhaAdvC",
  "truncated": false,
  "entities": {
    "hashtags": [],
    "symbols": [],
    "user_mentions": [],
    "urls": [],
    "media": [
      {
        "id": 775378240244449300,
        "id_str": "775378240244449280",
        "indices": [
          0,
          23
        ],
        "media_url": "http://pbs.twimg.com/media/CsKyZsBWgAAHgVq.jpg",
        "media_url_https": "https://pbs.twimg.com/media/CsKyZsBWgAAHgVq.jpg",
        "url": "https://t.co/pYKRhaAdvC",
        "display_url": "pic.twitter.com/pYKRhaAdvC",
        "expanded_url": "https://twitter.com/MattAndersonBBC/status/775378264772775936/photo/1",
        "type": "photo",
        "sizes": {
          "medium": {
            "w": 1200,
            "h": 1200,
            "resize": "fit"
          },
          "large": {
            "w": 2048,
            "h": 2048,
            "resize": "fit"
          },
          "thumb": {
            "w": 150,
            "h": 150,
            "resize": "crop"
          },
          "small": {
            "w": 680,
            "h": 680,
            "resize": "fit"
          }
        },
        "source_status_id": 775378264772776000,
        "source_status_id_str": "775378264772775936",
        "source_user_id": 1193503572,
        "source_user_id_str": "1193503572"
      }
    ]
  },
  "extended_entities": {
    "media": [
      {
        "id": 775378240244449300,
        "id_str": "775378240244449280",
        "indices": [
          0,
          23
        ],
        "media_url": "http://pbs.twimg.com/media/CsKyZsBWgAAHgVq.jpg",
        "media_url_https": "https://pbs.twimg.com/media/CsKyZsBWgAAHgVq.jpg",
        "url": "https://t.co/pYKRhaAdvC",
        "display_url": "pic.twitter.com/pYKRhaAdvC",
        "expanded_url": "https://twitter.com/MattAndersonBBC/status/775378264772775936/photo/1",
        "type": "photo",
        "sizes": {
          "medium": {
            "w": 1200,
            "h": 1200,
            "resize": "fit"
          },
          "large": {
            "w": 2048,
            "h": 2048,
            "resize": "fit"
          },
          "thumb": {
            "w": 150,
            "h": 150,
            "resize": "crop"
          },
          "small": {
            "w": 680,
            "h": 680,
            "resize": "fit"
          }
        },
        "source_status_id": 775378264772776000,
        "source_status_id_str": "775378264772775936",
        "source_user_id": 1193503572,
        "source_user_id_str": "1193503572"
      },
      {
        "id": 775378240248614900,
        "id_str": "775378240248614912",
        "indices": [
          0,
          23
        ],
        "media_url": "http://pbs.twimg.com/media/CsKyZsCWEAA4oOp.jpg",
        "media_url_https": "https://pbs.twimg.com/media/CsKyZsCWEAA4oOp.jpg",
        "url": "https://t.co/pYKRhaAdvC",
        "display_url": "pic.twitter.com/pYKRhaAdvC",
        "expanded_url": "https://twitter.com/MattAndersonBBC/status/775378264772775936/photo/1",
        "type": "photo",
        "sizes": {
          "small": {
            "w": 680,
            "h": 680,
            "resize": "fit"
          },
          "thumb": {
            "w": 150,
            "h": 150,
            "resize": "crop"
          },
          "medium": {
            "w": 1200,
            "h": 1200,
            "resize": "fit"
          },
          "large": {
            "w": 2048,
            "h": 2048,
            "resize": "fit"
          }
        },
        "source_status_id": 775378264772776000,
        "source_status_id_str": "775378264772775936",
        "source_user_id": 1193503572,
        "source_user_id_str": "1193503572"
      }
    ]
  },
  "source": "Twitter for Android",
  "in_reply_to_status_id": null,
  "in_reply_to_status_id_str": null,
  "in_reply_to_user_id": null,
  "in_reply_to_user_id_str": null,
  "in_reply_to_screen_name": null,
  "user": {
    "id": 173630577,
    "id_str": "173630577",
    "name": "Bryan Cantrill",
    "screen_name": "bcantrill",
    "location": "",
    "description": "Nom de guerre: Colonel Data Corruption",
    "url": "http://t.co/VyAyIJP8vR",
    "entities": {
      "url": {
        "urls": [
          {
            "url": "http://t.co/VyAyIJP8vR",
            "expanded_url": "http://dtrace.org/blogs/bmc",
            "display_url": "dtrace.org/blogs/bmc",
            "indices": [
              0,
              22
            ]
          }
        ]
      },
      "description": {
        "urls": []
      }
    },
    "protected": false,
    "followers_count": 10407,
    "friends_count": 1557,
    "listed_count": 434,
    "created_at": "Sun Aug 01 23:51:44 +0000 2010",
    "favourites_count": 2431,
    "utc_offset": -25200,
    "time_zone": "Pacific Time (US & Canada)",
    "geo_enabled": true,
    "verified": false,
    "statuses_count": 4808,
    "lang": "en",
    "contributors_enabled": false,
    "is_translator": false,
    "is_translation_enabled": false,
    "profile_background_color": "C0DEED",
    "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
    "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
    "profile_background_tile": false,
    "profile_image_url": "http://pbs.twimg.com/profile_images/618537697670397952/gW9iQsvF_normal.jpg",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/618537697670397952/gW9iQsvF_normal.jpg",
    "profile_link_color": "0084B4",
    "profile_sidebar_border_color": "C0DEED",
    "profile_sidebar_fill_color": "DDEEF6",
    "profile_text_color": "333333",
    "profile_use_background_image": true,
    "has_extended_profile": false,
    "default_profile": true,
    "default_profile_image": false,
    "following": false,
    "follow_request_sent": false,
    "notifications": false
  },
  "geo": null,
  "coordinates": null,
  "place": {
    "id": "5a110d312052166f",
    "url": "https://api.twitter.com/1.1/geo/id/5a110d312052166f.json",
    "place_type": "city",
    "name": "San Francisco",
    "full_name": "San Francisco, CA",
    "country_code": "US",
    "country": "United States",
    "contained_within": [],
    "bounding_box": {
      "type": "Polygon",
      "coordinates": [
        [
          [
            -122.514926,
            37.708075
          ],
          [
            -122.357031,
            37.708075
          ],
          [
            -122.357031,
            37.833238
          ],
          [
            -122.514926,
            37.833238
          ]
        ]
      ]
    },
    "attributes": {}
  },
  "contributors": null,
  "is_quote_status": false,
  "retweet_count": 2,
  "favorite_count": 9,
  "favorited": false,
  "retweeted": false,
  "possibly_sensitive": false,
  "possibly_sensitive_appealable": false,
  "lang": "und"
}

Note in particular that the media has a source_status_id_str of 775378264772775936; it’s from this tweet roughly an hour before mine from Matt Anderson, the BBC Culture editor who (I gather) is Berlin-based.

Why would someone who had just hacked my account burn it by tweeting an innocuous (if idiosyncratic) photo of campaign posters on the streets of Berlin?! Suddenly this is feeling less like I’ve been hacked, and more like I’m the victim of data corruption.

Some questions I have, that I don’t know enough about the Twitter API to answer: first, how are tweets created that refer to media entities from other tweets? i.e., is there something about that tweet that can give a better clue as to how it was generated? Does the fact that it’s geolocated to San Francisco (albeit with the broadest possible coordinates) indicate that it might have come from the Twitter client misbehaving on my phone? (I didn’t follow Matthew Anderson and my phone was on my desk when this was tweeted — so this would be the app going seriously loco.) And what I’m most dying to know: what other tweets refer to the photos from the tweet from Matthew? (I gather that DataSift can answer this question, but I’m not a DataSift customer and they don’t appear to have a free tier.) If there’s a server-side bug afoot here, it wouldn’t be surprising if I’m not the only one affected.

I’m not sure I’m ever going to know the answers to these questions, but I’m leaving the tweet up there in hopes that it will provide some clues — and with the belief that the villain in the story, if ever brought to justice, will be a member of the shadowy cabal that I have fought my entire career: busted software.

Posted on September 13, 2016 at 12:11 am by bmc · Permalink · Comments Closed
In: Uncategorized

dtrace.conf(16) wrap-up

Something that got a little lost in the excitement of Samsung’s recent acquisition of Joyent was dtrace.conf(16), our quadrennial (!) unconference on DTrace. The videos are up, and in the spirit of Adam Leventhal‘s excellent wrap-ups from dtrace.conf(08) and dtrace.conf(12), I wanted to provide a survey of the one-day conference and its content.

Once again, it was an eclectic mix of technologists — and once again, the day got kicked off with me providing an introduction to dtrace.conf and its history. (Just to save you the time filling out your Cantrill Presentation Bingo Card: you can find me punching myself at 16:19, me offering unsolicited personal medical history at 20:11, and me getting trolled by unikernels at 38:25.)

Some at the conference (okay, one) had never seen or used DTrace before, so to get brains warmed up, Max Bruning gave a quick introduction to DTrace — featuring a real problem. (The problem that Max examined is a process running on LX-branded zones on SmartOS.)

Next up was Graeme Jenkinson from the University of Cambridge on distributed tracing featuring CADETS, a system with a DTrace-inspired query language called Event Query.

We then started a troika of presentations on core DTrace improvements, starting with George Neville-Neil on core D language improvements, some of which have been prototyped and others of which represent a wish list. This led into Matt Ahrens (of ZFS fame) on D syntactic sugar: language extensions that Matt implemented for D to make it easier to write more complicated (and more maintainable) scripts. A singular point of pride for me personally is how much DTrace was used to implement and debug ZFS: Matt has used DTrace as much as anyone, and anything that he feels he needs to make his life easier is something that will almost certainly benefit all DTrace users. First among these: support for “if”/”else”, a change that since dtrace.conf has gone through code review and is poised to integrate into DTrace! The final presentation in this segment on core improvements was Joyent engineer Robert Mustacchi on CTF everywhere, outlining work that Robert and Adam did in 2013 to bring user-level CTF understanding to DTrace.

The next group of presentations focused more on how DTrace is used, kicking off with Eric Saxby on DTracing applications, with a particular focus on instrumenting Ruby programs using Chris Andrews‘ excellent libusdt. When instrumenting upstack, we found that it’s useful for DTrace to pull apart JSON — and Joyent engineer Joshua Clulow presented next on the DTrace JSON subroutine that he implemented a few years ago. (And because I know you’re dying to know: Josh’s presentation is on vtmc, terminal-based presentation software unsurprisingly of his own creation.) Wrapping up this section was Dan McDonald talking about the challenges of DTrace-based dynamic asserts: because of the ubiquity of asserts, we really need to add is-enabled probes to the kernel to support dynamic asserts — an improvement that is long overdue, and that we will hopefully implement before 2020!

In the penultimate group of presentations, we got into some system-specific instrumentation and challenges, starting with James Nugent on DTrace and Go. The problem he refers to as preventing “Go and DTrace from working very well together” is the fact that Go doesn’t preserve frame pointers on x86 — but the good news is that this has changed and frame pointers will be preserved starting in 1.7, making DTrace on Go much more useful! After James, Joyent engineer Dave Pacheco described his experiences of using DTrace and Postgres. For our Manta object storage system, Postgres is a critical component and understanding it dynamically and in production has proved essential for us. George Neville-Neil then took the stage again to discuss performance improvements with always-on instrumentation. (Notable quote: “this is being recorded, but I’ll say it anyway.”) Gordon Marler from Bloomberg discussed the challenges of instrumenting massive binaries with thousands of shared objects, consisting of multiple languages (C, C++ and — pause for effect — Fortran 77) and millions of symbols (!!) — which necessitated DTrace ustack performance improvements via some custom (and optimized) postprocessing.

The final group of presentations kicked off with Joyent engineer Alex Wilson and me presenting DTrace in the zone and the DTrace privilege model, which is an important (if ominous) precursor for a very interesting presentation: security researcher Ben Murphy describing his diabolically clever work on
DTrace exploitation.

As the last presentation of the day (as we felt it would be a good segue to drinks), George Neville-Neil led a brief discussion on what he calls OpenDTrace — but is really about sharing the DTrace code more effectively across systems. (DTrace itself is entirely open source, so “OpenDTrace” is something of a redundancy.) George kicked off an OpenDTrace organization on GitHub and it currently holds scripts and the DTrace toolkit, with the aspirations of potentially mirroring the OpenZFS effort to encourage cross-platform development and collaboration.

After George wrapped up, we celebrated the passing of another quadrennial in traditional DTrace fashion: with cans of Tecate and exhilarating rounds of Fishpong. And for you, dear reader, we have a bonus for you for managing to read this far: if you weren’t able to make it to the conference, we have a few extra dtrace.conf(16) t-shirts. To get one of these, e-mail us your size, address, and maybe a sentence or two on how you use or have used DTrace. Supplies are (obviously) limited; if you miss out, you’ll have to wait until the next dtrace.conf in 2020!

Posted on July 29, 2016 at 12:00 pm by bmc · Permalink · Comments Closed
In: Uncategorized

Unikernels are unfit for production

Recently, I made the mistake of rhetorically asking if I needed to spell out why unikernels are unfit for production. The response was overwhelming: whether people feel that unikernels are wrong-headed and are looking for supporting detail or are unikernel proponents and want to know what the counter-arguments could possibly be, there is clearly a desire to hear the arguments against running unikernels in production.

So, what’s the problem with unikernels? Let’s get a definition first: a unikernel is an application that runs entirely in the microprocessor’s privileged mode. (The exact nomenclature varies; on x86 this would be running at Ring 0.) That is, in a unikernel there is no application at all in a traditional sense; instead, application functionality has been pulled into the operating system kernel. (The idea that there is “no OS” serves to mislead; it is not that there isn’t an operating system but rather that the application has taken on the hardware-interfacing responsibilities of the operating system — it is “all OS”, if a crude and anemic one.) Before we discuss the challenges with this, it’s worth first exploring the motivations for unikernels — if only because they are so thin…

The primary reason to implement functionality in the operating system kernel is for performance: by avoiding a context switch across the user-kernel boundary, operations that rely upon transit across that boundary can be made faster. In the case of unikernels, these arguments are specious on their face: between the complexity of modern platform runtimes and the performance of modern microprocessors, one does not typically find that applications are limited by user-kernel context switches. And as shaky as they may be, these arguments are further undermined by the fact that unikernels very much rely on hardware virtualization to achieve any multi-tenancy whatsoever. As I have expanded on in the past, virtualizing at the hardware layer carries with it an inexorable performance tax: by having the system that can actually see the hardware (namely, the hypervisor) isolated from the system that can actually see the app (the guest operating system) efficiencies are lost with respect to hardware utilization (e.g., of DRAM, NICs, CPUs, I/O) that no amount of willpower and brute force can make up. But it’s not worth dwelling on performance too much; let’s just say that the performance arguments to be made in favor of unikernels have some well-grounded counter-arguments and move on.

The other reason given by unikernel proponents is that unikernels are “more secure”, but it’s unclear what the intellectual foundation for this argument actually is. Yes, unikernels often run less software (and thus may have less attack surface) — but there is nothing about unikernels in principle that leads to less software. And yes, unikernels often run new or different software (and are therefore not vulnerable to the OpenSSL vuln-of-the-week) but this security-through-obscurity argument could be made for running any new, abstruse system. The security arguments also seem to whistle past the protection boundary that unikernels very much depend on: the protection boundary between guest OS’s afforded by the underlying hypervisor. Hypervisor vulnerabilities emphatically exist; one cannot play up Linux kernel vulnerabilities as a silent menace while simultaneously dismissing hypervisor vulnerabilities as imaginary. To the contrary, by depriving application developers of the tools of a user protection boundary, the principle of least privilege is violated: any vulnerability in an application tautologically roots the unikernel. In the world of container-based deployment, this takes a thorny problem — secret management — and makes it much nastier (and with much higher stakes). At best, unikernels amount to security theater, and at worst, a security nightmare.

The final reason often given by proponents of unikernels is that they are small — but again, there is nothing tautologically small about unikernels! Speaking personally, I have done kernel implementation on small kernels and big ones; you can certainly have lean systems without resorting to the equivalent of a gastric bypass with a unikernel! (I am personally a huge fan of Alpine Linux as a very lean user-land substrate for Linux apps and/or Docker containers.) And to the degree that unikernels don’t contain much code, it seems more by infancy (and, for the moment, irrelevancy) than by design. But it would be a mistake to measure the size of a unikernel only in terms of its code, and here too unikernel proponents ignore the details of the larger system: because a unikernel runs as a guest operating system, the DRAM allocated by the hypervisor for that guest is consumed in its entirety — even if the app itself isn’t making use of it. Because running out of memory remains one of the most pernicious of application failure modes (especially in dynamic environments), memory sizing tends to be overengineered in that requirements are often blindly doubled or otherwise slopped up. In the unikernel model, any such slop is lost — nothing else can use it because the hypervisor doesn’t know that it isn’t, in fact, in use. (This is in stark contrast to containers in which memory that isn’t used by applications is available to be used by other containers, or by the system itself.) So here again, the argument for unikernels becomes much more nuanced (if not rejected entirely) when the entire system is considered.

So those are the reasons for unikernels: perhaps performance, a little security theater, and a software crash diet. As tepid as they are, these reasons constitute the end of the good news from unikernels. Everything else from here on out is bad news: costs that must be borne to get to those advantages, however flimsy.

The disadvantages of unikernels start with the mechanics of an application itself. When the operating system boundary is obliterated, one may have eliminated the interface for an application to interact with the real world of the network or persistent storage — but one certainly hasn’t forsaken the need for such an interace! Some unikernels (like OSv and Rumprun) take the approach of implementing a “POSIX-like” interface to minimize disruption to applications. Good news: apps kinda work! Bad news: did we mention that they need to be ported? And here’s hoping that your app’s “POSIX-likeness” doesn’t extend to fusty old notions like creating a process: there are no processes in unikernels, so if your app depends on this (ubiquitous, four-decades-old) construct, you’re basically hosed. (Or worse than hosed.)

If this approach seems fringe, things get much further afield with language-specific unikernels like MirageOS that deeply embed a particular language runtime. On the one hand, allowing implementation only in a type-safe language allows for some of the acute reliability problems of unikernels to be circumvented. On the other hand, hope everything you need is in OCaml!

So there are some issues getting your app to work, but let’s say you’re past all this: either the POSIX surface exposed by your unikernel of choice is sufficient for your app (or platform), or it’s already written in OCaml or Erlang or Haskell or whatever. Should you have apps that can be unikernel-borne, you arrive at the most profound reason that unikernels are unfit for production — and the reason that (to me, anyway) strikes unikernels through the heart when it comes to deploying anything real in production: Unikernels are entirely undebuggable. There are no processes, so of course there is no ps, no htop, no strace — but there is also no netstat, no tcpdump, no ping! And these are just the crude, decades-old tools. There is certainly nothing modern like DTrace or MDB. From a debugging perspective, to say this is primitive understates it: this isn’t paleolithic — it is precambrian. As one who has spent my career developing production systems and the tooling to debug them, I find the implicit denial of debugging production systems to be galling, and symptomatic of a deeper malaise among unikernel proponents: total lack of operational empathy. Production problems are simply hand-waved away — services are just to be restarted when they misbehave. This attitude — even when merely implied — is infuriating to anyone who has ever been responsible for operating a system. (And lest you think I’m an outlier on this issue, listen to the applause in my DockerCon 2015 talk after I emphasized the need to debug systems rather than restart them.) And if it needs to be said, this attitude is angering because it is wrong: if a production app starts to misbehave because of a non-fatal condition like (say) listen drops, restarting the app is inducing disruption at the worst possible time (namely, when under high load) and doesn’t drive at all towards the root cause of the problem (an insufficient backlog).

Now, could one implement production debugging tooling in unikernels? In a word, no: debugging tooling very often crosses the user-kernel boundary, and is most effective when leveraging the ad hoc queries that the command line provides. The organs that provide this kind of functionality have been deliberately removed from unikernels in the name of weight loss; any unikernel that provides sufficiently sophisticated debugging tooling to be used in production would be violating its own dogma. Unikernels are unfit for production not merely as implemented but as conceived: they cannot be understood when they misbehave in production — and by their own assertions, they never will be able to be.

All of this said, I do find some common ground with proponents of unikernels: I agree that the container revolution demands a much leaner, more secure and more efficient run-time than a shared Linux guest OS running on virtual hardware — and at Joyent, our focus over the past few years has been delivering exactly that with SmartOS and Triton. While we see a similar problem as unikernel proponents, our approach is fundamentally different: instead of giving up on the notion of secure containers running on a multi-tenant substrate, we took the already-secure substrate of zones and added to it the ability to natively execute Linux binaries. That is, we chose to leverage advances in operating systems rather than deny their existence, bringing to Linux and Docker not only secure on-the-metal containers, but also critical advances like ZFS, Crossbow and (yes) DTrace. This merits a final reemphasis: our focus on production systems is reflected in everything we do, but most especially in our extensive tooling for debugging production systems — and by bringing this tooling to the larger world of Linux containers, Triton has already allowed for production debugging that we never before would have thought possible!

In the fullness of time, I think that unikernels will be most productive as a negative result: they will primarily serve to demonstrate the impracticality of their approach for production systems. As such, they will join transactional memory and the M-to-N scheduling model as breathless systems software fads that fell victim to the merciless details of reality. But you needn’t take my word for it: as I intimated in my tweet, undebuggable production systems are their own punishment — just kindly inflict them upon yourself and not the rest of us!

Posted on January 22, 2016 at 8:48 am by bmc · Permalink · Comments Closed
In: Uncategorized

Bringing clarity to containers

At the beginning of the year, I laid down a few predictions. While I refuse on principle to engage in Stephen O’Grady-style self-flagellation, I do think it’s worth revisiting the headliner prediction, namely that 2015 is the year of the container. I said at the time that it wasn’t particularly controversial, and I don’t think it’s controversial now: 2015 was the year of the container, and one need look no further than the explosion of container conferences with container camps and container summits and container cons.

My second prediction was marginally more subtle: that the impedence mismatch between containers in development and containers in production would be a wellspring of innovation. If anything, this understated the case: the wellspring turned out to be more like an open sluice, and 2015 saw the world flooded with multiple ways of doing seemingly everything when it comes to containers. That all of these technologies and frameworks are open source have served to accelerate them all, and mutations abound (Hypernetes, anyone?).

On the one hand this is great, as we all benefit by so many people exploring so many different ideas. But on the other hand, the flurry of choice can become a blizzard of confusion — especially when and where there is seemingly overlap between two technologies. (Or worse, when two overlapping and opinionated technologies disagree ardently on those opinions!) This slide from Karl Isenberg of Mesosphere at KubeCon last month captured it; the point is neither the specific technologies (as Karl noted, plenty are missing) and nor is it about the specific layers (many would likely quibble with some of the details of Karl’s taxonomy) but rather about the explosion of abstraction (and concomitant confusion) in this domain.

One of the biggest challenges that we have in containers heading into 2016 is that this confusion now presents significant head winds for early-adopters and second-movers alike. This has become so acute that I posed a question to KubeCon attendees: are we at or near Peak Confusion in the container space? The conclusion among everyone I spoke with (vendors, developers, operators and others) was that we’re nowhere near Peak Confusion — with many even saying that confusion is still accelerating. (!) Even for those of us who have been in containers for years, this has been a little terrifying — and I can imagine for those entirely new to containers, it’s downright paralyzing.

So, what’s to be done? I think much of the responsibility lies with the industry: instead of viewing containers as new territory for conquest, we must take it upon ourselves to assure for users an interoperable and composable future — one in which technologies can differentiate themselves based on the qualities of their implementation rather than the voraciousness of their appetite. Lest this sound utopian, it is this same ethos that underlies our modern internet, as facilitated by the essential work of the Internet Engineering Task Force (IETF). Thanks to the IETF and its ethos of “rough consensus and running code” we ended up with the interoperable internet. (Indeed, this text itself was brought to you by RFC 791, RFC 793, RFC 1034, and RFC 2616 — among many, many others.)

As for an entity that can potentially serve an IETF-like role for container-based computing, I look with guarded optimism to today’s lauch of the Cloud-native Computing Foundation. Joyent has been involved with the CNCF since its inception, and based on what we’ve seen so far, we see great promise for it in 2016 and beyond. We believe that by elucidating component boundaries and by fostering open source projects that share the values of interoperability and composability, the CNCF can combine the best attributes of both the IETF and the Apache Foundation: rough consensus and running, open source software that allows elastic, container-deployed, service-oriented infrastructure. If the CNCF can do this it will (we believe) serve a vital mission for practitioners: displace confusion with clarity — and therefore accelerate our collective cloud-native future!

Posted on December 17, 2015 at 3:21 pm by bmc · Permalink · Comments Closed
In: Uncategorized

Requests for discussion

One of the exciting challenges of being an all open source company is figuring out how to get design conversations out of the lunch time discussion and the private IRC/Jabber/Slack channels and into the broader community. There are many different approaches to this, and the most obvious one is to simply use whatever is used for issue tracking. Issue trackers don’t really fit the job, however: they don’t allow for threading; they don’t really allow for holistic discussion; they’re not easily connected with a single artifact in the repository, etc. In short, even on projects with modest activity, using issue tracking for design discussions causes the design discussions to be drowned out by the defects of the day — and on projects with more intense activity, it’s total mayhem.

So if issue tracking doesn’t fit, what’s the right way to have an open source design discussion? Back in the day at Sun, we had the Software Development Framework (SDF), which was a decidedly mixed bag. While it was putatively shrink-to-fit, in practice it felt too much like a bureaucratic hurdle with concomitant committees and votes and so on — and it rarely yielded productive design discussion. That said, we did like the artifacts that it produced, and even today in the illumos community we find that we go back to the Platform Software Architecture Review Committee (PSARC) archives to understand why things were done a particular way. (If you’re looking for some PSARC greatest hits, check out PSARC 2002/174 on zones, PSARC 2002/188 on least privilege or PSARC 2005/471 on branded zones.)

In my experience, the best part of the SDF was also the most elemental: it forced things to be written down in a forum reserved for architectural discussions, which alone forced some basic clarity on what was being built and why. At Joyent, we have wanted to capture this best element of the SDF without crippling ourselves with process — and in particular, we have wanted to allow engineers to write down their thinking while it is still nascent, such that it can be discussed when there is still time to meaningfully change it! This thinking, as it turns out, is remarkably close to the original design intent of the IETF’s Request for Comments, as expressed in RFC 3:

The content of a note may be any thought, suggestion, etc. related to the software or other aspect of the network. Notes are encouraged to be timely rather than polished. Philosophical positions without examples or other specifics, specific suggestions or implementation techniques without introductory or background explication, and explicit questions without any attempted answers are all acceptable. The minimum length for a note is one sentence.

These standards (or lack of them) are stated explicitly for two reasons. First, there is a tendency to view a written statement as ipso facto authoritative, and we hope to promote the exchange and discussion of considerably less than authoritative ideas. Second, there is a natural hesitancy to publish something unpolished, and we hope to ease this inhibition.

We aren’t the only ones to be inspired by the IETF’s venerable RFCs, and the language communities in particular seem to be good at this: Java has Java Specification Requests, Python has Python Enhancement Proposals, Perl has the (oddly named) Perl 6 apocalypses, and Rust has Rust RFCs. But the other systems software communities have been nowhere near as structured about their design discussions, and you are hard-pressed to find similar constructs for operating systems, databases, container management systems, etc.

Encouraged by what we’ve seen by the language communities, we wanted to introduce RFCs for the open source system software that we lead — but because we deal so frequently with RFCs in the IETF context, we wanted to avoid the term “RFC” itself: IETF RFCs tend to be much more formalized than the original spirit, and tend to describe an agreed-upon protocol rather than nascent ideas. So to avoid confusion with RFCs while still capturing some of what they were trying to solve, we have started a Requests for Discussion (RFD) repository for the open source projects that we lead. We will announce an RFD on the mailing list that serves the community (e.g., sdc-discuss) to host the actual discussion, with a link to the corresponding directory in the repo that will host artifacts from the discussion. We intend to kick off RFDs for the obvious things like adding new endpoints, adding new commands, adding new services, changing the behavior of endpoints and commands, etc. — but also for the less well-defined stuff that captures earlier thinking.

Finally, for the RFD that finally got us off the mark on doing this, see RFD 1: Triton Container Naming Service. Discussion very much welcome!

Posted on September 16, 2015 at 12:07 pm by bmc · Permalink · One Comment
In: Uncategorized

Software: Immaculate, fetid and grimy

Once, long ago, there was an engineer who broke the operating system particularly badly. Now, if you’ve implemented important software for any serious length of time, you’ve seriously screwed up at least once — but this was notable for a few reasons. First, the change that the engineer committed was egregiously broken: the machine that served as our building’s central NFS server wasn’t even up for 24 hours running the change before the operating system crashed — an outcome so bad that the commit was unceremoniously reverted (which we called a “backout”). Second, this wasn’t the first time that the engineer had been backed out; being backed out was serious, and that this had happened before was disconcerting. But most notable of all: instead of taking personal responsibility for it, the engineer had the audacity to blame the subsystem that had been the subject of the change. Now on the one hand, this wasn’t entirely wrong: the change had been complicated and the subsystem that was being modified was a bit of a mess — and it was arguably a preexisting issue that had merely been exposed by the change. But on the other hand, it was the change that exposed it: the subsystem might have been brittle with respect to such changes, but it had at least worked correctly prior to it. My conclusion was that the problem wasn’t the change per se, but rather the engineer’s decided lack of caution when modifying such a fragile subsystem. While the recklessness that had become a troubling pattern for this particular engineer, it seemed that there was a more abstract issue: how does one safely make changes to a large, complicated, mature software system?

Hoping to channel my frustration into something positive, I wrote up an essay on the challenges of developing Solaris, and sent it out to everyone doing work on the operating system. The taxonomy it proposed turned out to be useful and embedded itself in our engineering culture — but the essay itself remained private (it pre-dated blogs.sun.com by several years). When we opened the operating system some years later, the essay was featured on opensolaris.org. But as that’s obviously been ripped down, and because the taxonomy seems to hold as much as ever, I think it’s worth reiterating; what follows is a polished (and lightly updated) version of the original essay.

In my experience, large software systems — be they proprietary or open source — have a complete range of software quality within their many subsystems.

Immaculate

Some subsystems you find are beautiful works of engineering — they are squeaky clean, well-designed and well-crafted. These subsystems are a joy to work in but (and here’s the catch) by virtue of being well-designed and well-implemented, they generally don’t need a whole lot of work. So you’ll get to use them, appreciate them, and be inspired by them — but you probably won’t spend much time modifying them. (And because these subsystems are such a pleasure to work in, you may find that the engineer who originally did the work is still active in some capacity — or that there is otherwise a long line of engineers eager to do any necessary work in such a rewarding environment.)

Fetid

Other subsystems are cobbled-together piles of junk — reeking garbage barges that have been around longer than anyone remembers, floating from one release to the next. These subsystems have little-to-no comments (or what comments they have are clearly wrong), are poorly designed, needlessly complex, badly implemented and virtually undebuggable. There are often parts that work by accident, and unused or little-used parts that simply never worked at all. They manage to survive for one or more of the following reasons:

If you find yourself having to do work in one of these subsystems, you must exercise extreme caution: you will need to write as many test cases as you can think of to beat the snot out of your modification, and you will need to perform extensive self-review. You can try asking around for assistance, but you’ll quickly discover that no one is around who understands the subsystem. Your code reviewers probably won’t be able to help much either — maybe you’ll find one or two people that have had the same misfortune that you find yourself experiencing, but it’s more likely that you will have to explain most aspects of the subsystem to your reviewers. You may discover as you work in the subsystem that maintaining it is simply untenable — and it may be time to consider rewriting the subsystem from scratch. (After all, most of the subsystems that are in the first category replaced subsystems that were in the second.) One should not come to this decision too quickly — rewriting a subsystem from scratch is enormously difficult and time-consuming. Still, don’t rule it out a priori.

Even if you decide not to rewrite such a subsystem, you should improve it while you’re there in manners that don’t introduce excessive risk. For example, if something took you a while to figure out, don’t hesitate to add a block comment to explain your discoveries. And if it was a pain in the ass to debug, you should add the debugging support that you found lacking. This will make it slightly easier on the next engineer — and it will make it easier on you when you need to debug your own modifications.

Grimy

Most subsystems, however, don’t actually fall neatly into either of these categories — they are somewhere in the middle. That is, they have parts that are well thought-out, or design elements that are sound, but they are also littered with implicit intradependencies within the subsystem or implicit interdependencies with other subsystems. They may have debugging support, but perhaps it is incomplete or out of date. Perhaps the subsystem effectively met its original design goals, but it has been extended to solve a new problem in a way that has left it brittle or overly complex. Many of these subsystems have been fixed to the point that they work reliably — but they are delicate and they must be modified with care.

The majority of work that you will do on existing code will be to subsystems in this last category. You must be very cautious when making changes to these subsystems. Sometimes these subsystems have local experts, but many changes will go beyond their expertise. (After all, part of the problem with these subsystems is that they often weren’t designed to accommodate the kind of change you might want to make.) You must extensively test your change to the subsystem. Run your change in every environment you can get your hands on, and don’t be be content that the software seems to basically work — you must beat the hell out of it. Obviously, you should run any tests that might apply to the subsystem, but you must go further. Sometimes there is a stress test available that you may run, but this is not a substitute for writing your own tests. You should review your own changes extensively. If it’s multithreaded, are you obeying all of the locking rules? (What are the locking rules, anyway?) Are you building implicit new dependencies into the subsystem? Are you using interfaces in a new way that may present some new risk? Are the interfaces that the subsystem exports being changed in a way that violates an implicit assumption that one of the consumers was making? These are not questions with easy answers, and you’ll find that it will often be grueling work just to gain confidence that you are not breaking or being broken by anything else.

If you think you’re done, review your changes again. Then, print your changes out, take them to a place where you can concentrate, and review them yet again. And when you review your own code, review it not as someone who believes that the code is right, but as someone who is certain that the code is wrong: review the code as if written by an archrival who has dared you to find anything wrong with it. As you perform your self-review, look for novel angles from which to test your code. Then test and test and test.

It can all be summed up by asking yourself one question: have you reviewed and tested your change every way that you know how? You should not even contemplate pushing until your answer to this is an unequivocal YES.. Remember: you are (or should be!) always empowered as an engineer to take more time to test your work. This is true of every engineering team that I have ever or would ever work on, and it’s what makes companies worth working for: engineers that are empowered to do the Right Thing.

Production quality all the time

You should assume that once you push, the rest of the world will be running your code in production. If the software that you’re developing matters, downtime induced by it will be painful and expensive. But if the software matters so much, who would be so far out of their mind as to run your changes so shortly after they integrate? Because software isn’t (or shouldn’t be) fruit that needs to ripen as it makes its way to market — it should be correct when it’s integrated. And if we don’t demand production quality all the time, we are concerned that we will be gripped by the Quality Death Spiral. The Quality Death Spiral is much more expensive than a handful of outages, so it’s worth the risk — but you must do your part by delivering production quality all the time.

Does this mean that you should contemplate ritual suicide if you introduce a serious bug? Of course not — everyone who has made enough modifications to delicate, critical subsystems has introduced a change that has induced expensive downtime somewhere. We know that this will be so because writing system software is just so damned tricky and hard. Indeed, it is because of this truism that you must demand of yourself that you not integrate a change until you are out of ideas of how to test it. Because you will one day introduce a bug of such subtlety that it will seem that no one could have caught it.

And what do you do when that awful, black day arrives? Here’s a quick coping manual from those of us who have been there:

But most importantly, you must ask yourself: what could I have done differently? If you honestly don’t know, ask a fellow engineer to help you. We’ve all been there, and we want to make sure that you are able to learn from it. Once you have an answer, take solace in it; no matter how bad you feel for having introduced a problem, you can know that the experience has improved you as an engineer — and that’s the most anyone can ask for.

Posted on September 3, 2015 at 4:42 pm by bmc · Permalink · One Comment
In: Uncategorized

The foundation of cloud-native computing

The older I get, the more engineering values matter to me — and the more I seek out shared values in those with whom I endeavor to build things. For us at Joyent, those engineering values reflect that we operate the software we make: we believe that foundational systems must be designed to be robust and high-performing — and when they fail in this regard, it is incumbent upon the system itself to provide the tooling to diagnose the errant behavior. These values are not new (indeed, they are some of the oldest in computing), but there are times when they can feel endangered. It is our belief that the rise of cloud computing has — if anything — made the traditional values of systems software robustness more important. Recently, I’ve had the opportunity to get to know some of the Google engineers involved in the Kubernetes effort, and I have found that they broadly share Joyent’s engineering values — that they too seek to build a robust software substrate, as informed by their (substantial) experience operating systems at scale. Given our shared values, I was particularly pleased to learn of Google’s desire to create a new kind of foundation with their formation of the Cloud-native Computing Foundation. Today, I am excited to announce that Joyent is a charter member of the Cloud-native Computing Foundation, as it represents the values we sought to embody in the Triton stack — and I am honored to have been personally asked to serve on the foundation’s technical steering committee. We believe that we haven’t just joined a(nother) foundation, we have joined with those who share the mission that we have always had for ourselves: to help effect the next revolution in computing.

That I could possibly be so enthusiastic for a foundation merits further explanation, as I have historically been very forthright with my skepticism about foundations with respect to open source: three years ago, in a presentation on Corporate Open Source Anti-patterns (video), I described the insistence of giving newly-opened source code to a foundation as an anti-pattern, noting that giving up ownership also eschews leadership. I further cautioned that many underestimate the complexity and constraints of a 501(c)(3) — while overestimating the need for an explicitly non-profit organization’s involvement in a company’s open source efforts. While these statements about foundations were unequivocal, I also ended that presentation by saying that my observations shouldn’t be perceived as hard rules — and implied that the thinking may change over time as we continue to learn from our own experiences.

Three years after that presentation, I still broadly stand by my claims — but (as my enthusiasm for the Cloud Native Computing Foundation indicates) foundations are one area where my thinking has definitely shifted. In particular, in those rare instances when an open source technology reaches a level of ubiquity such as to sediment into collective bedrock, I believe that it actually does belong in a foundation. How do you know if your open source project is in this category? If multiple companies are betting their future on your open source project, congratulate yourself for laying down the bedrock upon which others are building — and then get it into a foundation to assure its future. This can be hard to internalize (after all, you have almost certainly put more resources into it than anyone else; why should you be expected to simply give that away?!), but the reality is that the commercial pressures that are now being exerted on your (incredibly popular!) technology will rip it apart if you don’t preserve its fate. This can be doubly frustrating when you feel you are acting in the community’s best interests, but as soon as that community includes rival commercial interests, only a foundation can provide the necessary (but not sufficient!) neutrality to assure the community that the technology’s future transcends the fate of any one company. Certainly, we learned all this the hard way with node.js — but the problem is in no way unique to node.js or to Joyent. Indeed, with open source now essentially a constraint on new infrastructure software, we can expect this transition (from corporate-owned open source to foundation-owned open source) will happen with increasing frequency. (Should you find yourself at OSCON this week, this trend and its ramifications is the subject of my talk on Thursday.)

In this regard, the Docker world has been particularly interesting of late: the domain is entirely open source, with many companies (including Joyent!) betting their futures not just on Docker, but on the many other technologies in the ecosystem. With so much bedrock suddenly forming, foundations were practically preordained — so it was no surprise to see the announcement of the Open Container Project at DockerCon just a few weeks ago. We at Joyent applaud these developments (and we are a charter member of the OCP), but I confess that the sprouting of foundations has left me feeling somewhat underwhelmed: are we really to have a foundation for every GitHub repo that reaches a certain level of popularity? To be clear, I don’t object to the foundations in the abstract so much as the cacophony of their putative missions: having the mission of a foundation being merely to promote a particular technology feels like it’s aiming a bit low in Maslow’s hierarchy of needs. Now, one can certainly collect open source software into a foundation like the Apache Foundation — but as we move to a world where an increasing amount of software is open source, what becomes of their mission? Foundations that are amalgamations of otherwise unrelated software seem to me to run the risk of becoming open source orphanages: providing shelter and a modicum of structure, perhaps, but lacking a sense of collective purpose.

The promise of the Cloud-native Computing Foundation is that it offers a potential third model: while the foundation will serve as the new home for Kubernetes, it’s not limited to Kubernetes — nor is it an open source dumping ground. Rather, this foundation is dedicated to a particular ethos: the creation of the new kinds of application and (especially) service stacks that represent modern, server-side computing. That is, it is a foundation with a true mission: to advance key open source technologies that constitute modern, elastic computing. As such, it seeks to transcend any single technology — it has a raison d’être that runs deeper than mere self-preservation. I would like to think that this third parth can serve as a model in the new, all-open world: foundations as entities that don’t let their corporate neutrality prevent them from being opinionated as to their mission, their constituent technologies or — importantly — their engineering values!

Posted on July 21, 2015 at 9:56 am by bmc · Permalink · Comments Closed
In: Uncategorized

Triton: Docker and the “best of all worlds”

When Docker first rocketed into the nerdosphere in 2013, some wondered how we at Joyent felt about its popularity. Having run OS containers in multi-tenant production for nearly a decade (and being one of the most vocal proponents of OS-based virtualization), did we somehow resent the relatively younger Docker? Some were surprised to learn that (to the contrary!) we have been elated to see the rise of Docker: we share with Docker a vision for a containerized future, and we love that Docker has brought the technology to a much broader audience — and via an entirely different vector (namely, emphasizing developer agility instead of merely operational efficiency). Given our enthusiasm, you can imagine the question we posed to ourselves over a year ago: could we somehow combine the operational strength of SmartOS containers with the engaging developer experience of Docker? Importantly, we had no desire to develop a “better” Docker — we merely wanted to use SmartOS and SmartDataCenter as a substrate upon which to deploy Docker containers directly onto the metal. Doing this would leverage over a decade of deep operating systems engineering with technologies like Crossbow, ZFS, DTrace and (of course) Zones — and would deliver all of the operational advantages of pure OS-based virtualization to Docker containers: performance, elasticity, security and density.

That said, there was an obvious hurdle: while designed to be cross-platform, Docker is a Linux-borne technology — and the repository of Docker images is today a collection of Linux binaries. While SmartOS is Unix, it (somewhat infamously) isn’t Linux: applications need to be at least recompiled (if not ported) to work on SmartOS. Into this gap came a fortuitous accident: David Mackay, a member of the illumos community, attempted to revive LX-branded zones, an old Sun project that provided Linux emulation in a zone. While this project had been very promising when it was first done years ago, it had also been restricted to emulating a 2.4 Linux kernel for 32-bit binaries — and it was clear at the time that modernizing it was going to be significant work. As a result, the work sat unattended in the system for a while before being unceremoniously ripped out in 2010. It seemed clear that with the passage of time, this work would hardly be revivable: it had been so long, any resurrection was going to be tantamount to a rewrite.

But fortunately, David didn’t ask us our opinion before he attempted to revive it — he just did it. (As an aside: a tremendous advantage of open source is that the community can perform experiments that you might deem too risky or too expensive in terms of opportunity cost!) When David reported his results, we were taken aback: yes, this had the same limitations that it had always had (namely, 32-bit and lacking many modern Linux facilities), but given how many modern binaries still worked, it was also clear that this was a more viable path than we had thought. Energized by David’s results, Joyent’s Jerry Jelinek picked it up from there, reintegrating the Linux brand into SmartOS in March of last year. There was still much to do of course, but Jerry’s work was a start — and reflected the constraints we imposed on ourselves: do it all in the open; do it all on SmartOS master; develop general-purpose illumos facilities wherever possible; and aim to upstream it all when we were done.

Around this time, I met with Docker CTO Solomon Hykes to share our (new) vision. Honestly, I didn’t know what his reaction would be; I had great respect for what Docker had done and was doing, but didn’t know how he would react to a system bold enough to go its own way at such a fundamental level. Somewhat to my surprise, Solomon was incredibly supportive: not only was he aware of SmartOS, but he was also intimately familiar with zones — and he didn’t need to be convinced of the merits of our approach. Better, he asked a question near and dear to my heart: “Does this mean that I’ll be able to DTrace my Linux apps in a Docker container?” When I indicated that yes, that’s exactly what it would mean, he responded: “It will be the best of all worlds!” That Solomon (and by extension, Docker) was not merely willing but actually eager to see Docker on SmartOS was hugely inspirational to us, and we redoubled our efforts.

Back at Joyent, we worked assiduously under Jerry’s leadership over the spring and summer, and by the fall, we were ready for an attempt on the summit: 64-bit. Like other bringup work we’ve done, this work was terrifying in that we had very little forward visibility, and little ability to parallelize. As if he were Obi-Wan Kenobi meeting Darth Vader in the Death Star, Jerry had to face 64-bit — alone. Fortunately, Jerry didn’t suffer Ben Kenobi’s fate; by late October, he had 64-bit working! With the project significantly de-risked, everything kicked into high gear: Josh Wilsdon, Trent Mick and their team went to work understanding how to integrate SmartDataCenter with Docker; Josh Clulow, Patrick Mooney and I attacked some of the nasty LX-branded zone issues that remained; and Robert Mustacchi and Rob Gulewich worked towards completing their vision for network virtualization. Knowing what we were going to do — and how important open source is to modern infrastructure software in general and Docker in particular — we also took an important preparatory step: we open sourced SmartDataCenter and Manta.

Charged by having all of our work in the open and with a clear line of sight on what we wanted to deliver, progress was rapid. One major question: where to run the Docker daemon? In digging into Docker, we saw that much of what the actual daemon did would need to be significantly retooled to be phrased in terms of not only SmartOS but also SmartDataCenter. However, our excavations also unearthed a gem: the Docker Remote API. Discovering a robust API was a pleasant surprise, and it allowed us to take a different angle: instead of running a (heavily modified) Docker daemon, we could implement a new SDC service to provide a Docker Remote API endpoint. To Docker users, this would look and feel like Docker — and it would give us a foundation that we knew we could develop. At this point, we’re pretty good at developing SDC-based services (microservices FTW!), and progress on the service was quick. Yes, there were some thorny issues to resolve (and definitely note differences between our behavior and the stock Docker behavior!), but broadly speaking we have been able to get it to work without violating the principle of least surprise. And from a Docker developer perspective, having a Docker host that represents an entire datacenter — that is, a (seemingly) galactic Docker host — feels like an important step forward. (Many are as excited by this work as we are, but I think my favorite reaction is the back-handed compliment from Jeff Waugh of Canonical fame; somehow a compliment that is tied to an insult feels indisputably earnest.)

With everything coming together, and with new hardware being stood up for the new service, there was one important task left: we needed to name this thing. (Somehow, “SmartOS + LX-branded zones + SmartDataCenter + sdc-portolan + sdc-docker” was a bit of a mouthful.) As we thought about names, I turned back to Solomon’s words a year ago: if this represented the best of two different worlds, what mythical creatures were combinations of different animals? While this search yielded many fantastic concoctions (a favorite being Manticore — and definitely don’t mess with Typhon!), there was one that stood out: Triton, son of Poseidon. As half-human and half-fish and a god of the deep, Triton represents the combination of two similar but different worlds — and as a bonus, the name rolls off the tongue and fits nicely with the marine metaphor that Docker has pioneered.

So it gives me great pleasure to introduce Triton to the world — a piece of (open source!) engineering brought to you by a cast of thousands, over the course of decades. In a sentence (albeit a wordy one), Triton lets you run secure Linux containers directly on bare metal via an elastic Docker host that offers tightly integrated software-defined networking. The service is live, so if you want to check it out, sign up! If you’re looking for more technical details, check out both Casey’s blog entry and my Future of Docker in Production presentation. If you’d like it on-prem, get in touch. And if you’d prefer to DIY, start with sdc-docker. Finally, forgive me one shameless plug: if you happen to be in the New York City area in early April, be sure to join us at the Container Summit, where we’ll hear perspectives from analysts like Gartner, enterprise users of containers like Lucera and Walmart, and key Docker community members like Tutum, Shopify, and Docker themselves. Should make for an interesting afternoon!

Welcome to Triton — and to the best of all worlds!

Posted on March 24, 2015 at 10:06 am by bmc · Permalink · One Comment
In: Uncategorized

SmartDataCenter and the merits of being opinionated

Recently, Randy Bias of EMC (formerly of CloudScaling) wrote an excellent piece on Why “Vanilla OpenStack” doesn’t exist and never will. If you haven’t read it and you are anywhere near a private cloud effort, you should consider it a must-read: Randy debunks the myth of a vanilla OpenStack in great detail. And it apparently does need debunking; as Randy outlines, those who are deploying an on-premises cloud expect:

  • A uniform, monolithic cloud operating system (like Linux)
  • Set of well-integrated and interoperable components
  • Interoperability with their own vendors of choice in hardware, software, and public cloud

We at Joyent can vouch for these expectations, because years ago we had the same aspirations for our own public cloud. Though perhaps unlike others, we have also believed in the operating system as differentiator — and specifically, that OS containers are the foundation of elastic infrastructure — so we didn’t wait for a system to emerge, but rather endeavored to write our own. That is, given the foundation of our own container-based operating system — SmartOS — we set out to build exactly what Randy describes: a set of well-integrated, interoperable components on top of a uniform, monolithic cloud operating system that would allow us to leverage the economics of commodity hardware. This became SmartDataCenter, a container-centric distributed system upon which we built our own cloud and which we open sourced this past November.

The difference between SmartDataCenter and OpenStack mirrors the difference between the expectations for OpenStack and the reality that Randy outlines: where OpenStack is accommodating of many different visions for the cloud, SmartDataCenter is deliberately opinionated. In SmartDataCenter you don’t pick the storage substrate (it’s ZFS) or the hypervisor (it’s SmartOS) or the network virtualization (it’s Crossbow). While OpenStack deliberately accommodates swapping in different architectural models, SmartDataCenter deliberately rejects it: we designed it for commodity storage (shared-nothing — for good reason), commodity network equipment (no proprietary SDN) and (certainly) commodity compute. So while we’re agnostic with respect to hardware (as long as it’s x86-based and Ethernet-based), we are prescriptivist with respect to the software foundation that runs upon it. The upshot is that the integrator/operator retains control over hardware (and the different economic tradeoffs that that control allows), but needn’t otherwise design the system themselves — which we know from experience can result in greatly reduced times of deployment. (Indeed, one of the great prides of SmartDataCenter is our ease of install: provided you’re racked, stacked and cabled, you can get a cloud stood up in a matter of hours rather than days, weeks or longer.)

So in apparent contrast to OpenStack, SmartDataCenter only comes in “vanilla” (in Randy’s parlance). This is not to say that SmartDataCenter is in any way plain; to the contrary, by having such a deliberately designed foundation, we can unlock rapid innovation, viz. our emerging Docker integration with SmartDataCenter that allows for Docker containers to be deployed securely and directly on the metal. We are very excited about the prospects of Docker on SmartDataCenter, and so are other people. So in as much as SmartDataCenter is vanilla, it definitely comes with whipped cream and a cherry on top!

Posted on February 5, 2015 at 5:45 pm by bmc · Permalink · Comments Closed
In: Uncategorized