The DIRT on JSConf.eu and Surge

I just got back from an exciting (if exhausting) conference double-header: JSConf.eu and Surge. These conferences made for an interesting pairing, together encompassing much of the enthusiasm and challenges in today’s systems. It started in Berlin with JSConf.eu, a conference that reflects both JavaScript’s energy and diversity. Ryan introduced Node.js to the world at JSConf.eu a year ago, and enthusiasm for Node at the conference was running high; I would estimate that at least half of the attendees were implementing server-side JavaScript in one fashion or another. Presentations from the server-side were epitomized by Felix and Philip, each of whom presented real things done with Node — expanding on the successes, limitations and futures of what they had developed. The surge of interest in the server-side is of course changing the complexion of not only JSconf but of JavaScript itself. a theme that Chris Williams hit with fervor in his closing plenary, in which he cautioned of the dangers for a technology and a community upon whom a wave of popularity is breaking. Having been been near the epicenter of Java’s irrational exuberance in the mid-1990s (UltraJava, anyone?), I certainly share some of Chris’s concerns — and I think that Ryan’s talk on the challenges in Node reflects a humility that will become essential in the coming year. That said, I am also looking forward to seeing the JavaScript and Node communities expand to include more of the systems vets that have differentiated Node to date.

Speaking of systems vets, a highlight of JSConf.eu was hanging out with Erik Corry from the V8 team. JavaScript and Node owe a tremendous debt of gratitude to V8, without whom we would have neither the high-performance VM itself, nor the broader attention to JavaScript performance so essential to the success of JavaScript on the server-side. Erik and I brainstormed ideas for deeper support of DTrace within V8, which is, of course, an excruciatingly hard problem: as I have observed in the past, magic does not layer well — and DTrace and a sophisticated VM like V8 are both conjuring deep magic that frequently operate at cross purposes. Still, it was an exciting series of conversations, and we at Joyent certainly look forward to taking a swing at this problem.

After JSConf.eu wrapped up, I spent a few days brainstorming and exploring Berlin (and its Trabants and Veyrons) with Mark, Pedro and Filip. I then headed to the inaugural Surge Conference in Baltimore, and given my experience there, I would put the call out to the systems tribe: Surge promises to be the premier conference for systems practitioners. We systems practitioners have suffered acutely at the hands of academic computer science’s foolish obsession with conferences as publishing vehicle, having seen conference after conference of ours becoming casualties on their bloody path to tenure. Surge, by contrast, was everything a systems conference should be: exciting talks on real systems by those who designed, built and debugged them — and a terrific hallway track besides.

To wit: I was joined at Surge by Joyent VP of Operations Ryan Nelson; for us, Surge more than paid for itself during Rod Cope‘s presentation on his ten lessons for Hadoop deployment. Rod’s lessons were of the ilk that only come the hard way; his was a presentation fired in a kiln of unspeakable pain. For example, he noted that his cluster had been plagued by a wide variety of seemingly random low-level problems: lost interrupts, networking errors, I/O errors, etc. Acting apparently operating on a tip in a Broadcom forum, Rod disabled C-states on all of his Dell R710s and 2970s — and the problems all lifted (and have been absent for four months). Given that we have recently lived through same (and came to the same conclusion, albeit more recently and therefore much more tentatively), I tweeted what Rod had just reported, knowing that Ryan was in John Allspaw‘s concurrent sesssion and would likely see the tweet. Sure enough, I saw him retweet it minutes later — and he practically ran in after the end of Rod’s talk to see if I was kidding. (When it comes to firmare- and BIOS-level issues, I assure you: I never kid.) Not only was Rod’s confirmation of our experience tremendously valuable, it goes to the kind of technologist at Surge: this is a conference with practitioners who are comfortable not only at lofty architectural heights, but also at the grittiest of systems’ depths.

Beyond the many thought-provoking sessions, I had great conversations with many people, including Paul, Rod, Justin, Chris, Geir, Stephen, Wez, Mark, Joe, Kate, Tom, Theo (natch) — and a reunion (if sodden) with surprise stowaway Ted Leung.

As for my formal role at Surge, I had the honor of giving one of the opening keynotes. In preparing for it, I tried to put it all together: what larger trends are responsible for the tremendous interest in Node? What are the implications of these trends for other elements of the systems we build? From Node Knockout, it is clear that a major use case for Node is web-based real-time applications. At the moment these are primarily games — but as practitioners like Matt Ranney can attest, games are by no means the only real-time domain in which Node is finding traction. In part based on my experience developing the backend for the Node knockout leaderboard, the argument that I made in the keynote is that these web-based real-time applications have a kind of orthogonal data consistency: instead of being on the ACID/BASE axis, the “consistency” of the data is its temporality. (That is, if the data is late, it’s effectively inconsistent.) This is not necessarily new (market-facing trading systems have long had these properties), but I believe we’re about to see these systems brought to a much larger audience — and developed by a broader swath of practitioners. And of course — like AJAX before it — this trend was begging for an acronym. Struggling to come up with something, I went back to what I was trying to describe: these are real-time systems — but they are notable because they are data-intensive. A ha: “data-intensive real-time” — DIRT! I dig DIRT! I don’t know if the acronym will stick or not, but I did note that it was mentioned in each of Theo‘s and Justin‘s talks — and this was just minutes after it was coined. (Video of this keynote will be available soon; slides are here.)

Beyond the keynote, I also gave a talk on the perils of building enterprise systems from commodity components. This blog entry has gone on long enough, so I won’t go into depth on this topic here, saying only this: fear the octopus!

Thanks to the organizers of both JSconf and Surge for two great conferences. I arrive home thoroughly exhausted (JSconf parties longer, but Surge parties harder; pick your poison), but also exhilarated and excited about the future. Thanks again — and see you next year!

Posted on October 2, 2010 at 3:33 pm by bmc · Permalink · 3 Comments
In: Uncategorized

A physician’s son

My father is an emergency medical physician, a fact that has had a subtle but discernable influence on my career as a software engineer. Emergency medicine and software engineering are of course very different problems, and even though there are times when a major and cascading software systems failure can make a datacenter feel like an urban emergency room on a busy Saturday night, the reality is that if we software engineers screw up, people don’t generally die. (And while code and machine can both be metaphorically uncooperative, we don’t really have occasion to make a paramedic sandwich.) Not only are the consequences less dire, but the underlying systems themselves are at opposite ends of a kind of spectrum: one is deterministic yet brittle, the other sloppy yet robust. Despite these differences, I believe that medicine has much to teach us in software engineering; two specific (if small) artifacts of medicine that I have incorporated into my career:

M&M and Journal Club are two ways that medicine has influenced software engineering (or mine, anyway); what of the influences of information systems on the way medicine is practiced? To explore this, we at ACM Queue sought to put together an issue on computing in healthcare. This is a brutally tough subject for us to tackle — what in the confluence of healthcare and computing is interesting for the software practitioner? — but we felt it to be a national priority and too important to ignore. It should be said that I was originally introduced to computing through medicine: in addition to being an attending physician, my father — who started writing code as an undergraduate in the 1960s on an IBM System/360 Model 50 — also developed the software system that ran his emergency department; the first computer we owned (a very early IBM PC-XT) was bought to allow him to develop software at home (Microsoft Pascal 1.0 FTW!). I had intended to keep my personal history out of the ACM discussion, but at some point, someone on the Board said that what we really needed was the physician’s perspective — someone who could speak to the pracitical difficulties of integrating modern information systems into our healthcare system. At that, I couldn’t hold my tongue — I obviously knew of someone who could write such an article…

Dad graciously agreed, and the resulting article, Computers in Patient Care: the Promise and the Challenge appears as the cover story on this month’s Communications of the ACM. I hope you enjoy this article as much as I did — and who knows: if you are developing software that aspires to be used in healthcare, perhaps it could be the subject of your own Journal Club. (Just be sure to slip the kids a soda or two!)

Posted on September 24, 2010 at 2:23 am by bmc · Permalink · One Comment
In: Uncategorized

DTrace, node.js and the Robinson Projection

When I joined Joyent, I mentioned that I was seeking to apply DTrace to the cloud, and that I was particularly excited about the development of node.js — leaving it implict that the intersection of the two technologies would be naturally interesting, As it turns out, we have had an early opportunity to show the potential here: as you might have seen, the Node Knockout programming contest was held over the weekend; when I first joined Joyent (but four weeks ago!), Ryan was very interested in potentially using DTrace to provide a leaderboard for the competition. I got to work, adding USDT probes to node.js. To be fair, this still has some disabled overhead (namely, getting into and out of the node addon that has the true USDT probe), but it’s sufficiently modest to deploy DTrace-enabled node’s in production.

And thanks to incredibly strong work by Joyent engineers, we were able to make available a new node.js service that allocated a container per user. This service allowed us to make available a DTrace-enabled node to contestants — and then observe all of that from the global zone.

For example of the DTrace provider for node.js, here’s a simple enabling to print out HTTP requests as zones handle them (running on one of the Node Knockout machines):

# dtrace -n 'node*:::http-server-request{printf("%s: %s of %s\n", \
    zonename, args[0]->method, args[0]->url)}' -q
nodelay: GET of /poll6759.479651377309
nodelay: GET of /poll6148.392275444794
nodebodies: GET of /latest/
nodebodies: GET of /latest/
nodebodies: GET of /count/
nodebodies: GET of /count/
nodelay: GET of /poll8973.863890386003
nodelay: GET of /poll2097.9667574643568
awesometown: GET of /graphs/4c7a650eba12e9c41d000005.js
awesometown: POST of /graphs/4c7a650eba12e9c41d000005/appendValue
awesometown: GET of /graphs/4c7acd5ca121636840000002.js
awesometown: GET of /graphs/4c7a650eba12e9c41d000005.js
awesometown: GET of /graphs/4c7a650eba12e9c41d000005.js
awesometown: GET of /graphs/4c7a650eba12e9c41d000005.js
awesometown: GET of /graphs/4c7b2408546a64b81f000001.js
awesometown: POST of /faye
awesometown: POST of /faye
...

I added probes around both HTTP request and HTTP response; treating the file descriptor as a token that describes that uniquely describes that request while it is pending (an assumption that would only be invalid in the presence of HTTP pipelining), allows one to actually determine the latency for requests:

# cat http.d
#pragma D option quiet

http-server-request
{
        ts[this->fd = args[1]->fd] = timestamp;
        vts[this->fd] = vtimestamp;
}

http-server-response
/this->ts = ts[this->fd = args[0]->fd]/
{
        @t[zonename] = quantize(timestamp - this->ts);
        @v[zonename] = quantize(vtimestamp - vts[this->fd]);
        ts[this->fd] = 0;
        vts[this->fd] = 0;
}

tick-1sec
{
        printf("Wall time:\n");
        printa(@t);

        printf("CPU time:\n");
        printa(@v);
}

This script makes the distinction between wall time and CPU time; for wall-time, you can see the effect of long-polling, e.g. (the values are nanoseconds):

    nodelay
           value  ------------- Distribution ------------- count
           32768 |                                         0
           65536 |                                         4
          131072 |@@@@@                                    52
          262144 |@@@@@@@@@@@@@@@@@@                       183
          524288 |@@@@@                                    55
         1048576 |@@@                                      27
         2097152 |@                                        9
         4194304 |                                         5
         8388608 |@                                        8
        16777216 |@                                        6
        33554432 |@                                        9
        67108864 |@                                        7
       134217728 |@                                        12
       268435456 |@                                        11
       536870912 |                                         1
      1073741824 |                                         4
      2147483648 |                                         1
      4294967296 |                                         5
      8589934592 |                                         0
     17179869184 |                                         1
     34359738368 |                                         1
     68719476736 |                                         0

You can also look at the CPU time to see those that are doing more actual work. For example, one zone with interesting CPU time outliiers:

  nodebodies
           value  ------------- Distribution ------------- count
         4194304 |                                         0
         8388608 |@@@@@@@@@@@@                             57
        16777216 |@@@@                                     21
        33554432 |@@@@                                     18
        67108864 |@@@@@@@                                  34
       134217728 |@@@@@@@@@@@                              54
       268435456 |                                         0
       536870912 |                                         0
      1073741824 |                                         0
      2147483648 |                                         0
      4294967296 |@                                        3
      8589934592 |@                                        4
     17179869184 |                                         0

Note that because node has a single thread do all processing, we cannot assume that the requests themselves are inducing the work — only that CPU work was done between request and response. Still, this data would probably be interesting to the nodebodies team…

I also added probes around connection establishment; so here’s a simple way of looking at new connections by zone:

# dtrace -n 'node*:::net-server-connection{@[zonename] = count()}'
dtrace: description 'node*:::net-server-connection' matched 44 probes
^C

  explorer-sox                                                      1
  nodebodies                                                        1
  anansi                                                           69
  nodelay                                                         102
  awesometown                                                     146

Or if we wanted to see which IP addresses were connecting to, say, our good friends at awesometown (with actual addresses
in the output elided):

# dtrace -n 'node*:::net-server-connection \
    /zonename == "awesometown"/{@[args[0]->remoteAddress] = count()}'
dtrace: description 'node*:::net-server-connection' matched 44 probes
  XXX.XXX.XXX.XXX                                                   1
  XX.XXX.XX.XXX                                                     1
  XX.XXX.XXX.XXX                                                    1
  XX.XXX.XXX.XX                                                     1
  XXX.XXX.XX.XXX                                                    1
  XXX.XXX.XX.XX                                                     2
  XXX.XXX.XXX.XX                                                    8

Ryan saw the DTrace support I had added, and had a great idea: what if we took the IPs of incoming connections and geolocated them, throwing them on a world map and coloring them by team name? This was an idea that was just too exciting not to take a swing at, so we got to work. For the backend, the machinery was begging to itself be written in node, so I did a libdtrace addon for node and started building a scalable backend for processing the DTrace data from the different Node Knockout machines. Meanwhile, Joni came up with some mockups that had everyone drooling, and Mark contacted Brian from Nitobi about working on the front-end. Brian and crew were as excited about it as we were, and they put front-end engineer extraordinaire Yohei on the case — who worked with Rob on the Joyent side to pull it all together. Among Rob’s other feats, he managed to implement in JavaScript the logic for plotting longitude and latitude in the beautiful Robinson projection — which is a brutally complicated transformation. It was an incredible team, and we were pulling it off in such a short period of time and with such a firm deadline that we often felt like contestants ourselves!

The result — which it must be said works best in Safari and Chrome — is at http://leaderboard.no.de. In keeping with both the spirit of node and DTrace, the leaderboard is updated in real-time; from the time you connect to one of the Joyent-hostest (no.de) contestants, you should see yourself show up in the map in no more than 700 milliseconds (plus your network latwork latency). For crowded areas like the Bay Area, it can be hard to see yourself — but try moving to Cameroon for best results. It’s fun to watch as certain contestants go viral (try both hovering over a particular data point and clicking on the team name in the leaderboard) — and you can know which continent you’re cursing at in http://saber-tooth-moose-lion.no.de (now known to the world as Swarmation).

Enjoy both the leaderboard and the terrific Node Knockout entries (be sure to vote for your favorites!) — and know that we’ve only scratched the surface of what DTrace and node.js can do together!

Posted on August 30, 2010 at 2:55 am by bmc · Permalink · 4 Comments
In: Uncategorized

The liberation of OpenSolaris

As many have seen, Oracle has elected to stop contributing to OpenSolaris. This decision is, to put it bluntly, stupid. Indeed, I would (and did) liken it to L. Paul Bremer‘s decision to disband the Iraqi military after the fall of Saddam Hussein: beyond merely a foolish decision borne out of a distorted worldview, it has created combatants unnecessarily. As with Bremer’s infamous decision, the bitter irony is that the new combatants were formerly the strongest potential allies — and in Oracle’s case, it is the community itself.

As it apparently needs to be said, one cannot close an open source project — one can only fork it. So contrary to some reports, Oracle has not decided to close OpenSolaris, they have actually decided to fork it. That is, they have (apparently) decided that it is more in their interest to compete with the community that to cooperate with it — that they can in fact out-innovate the community. This confidence is surprising (and ironic) given that it comes exactly at the moment that the historic monopoly on Solaris talent has been indisputably and irrevocably broken — as most recently demonstrated by the departure of my former colleague, Adam Leventhal.

Adam’s case is instructive: Adam is a brilliantly creative engineer — one with whom it was my pleasure to work closely over nearly a decade. Time and time again, I saw Adam not only come up with innovative solutions to tough problems, but run those innovations through the punishing gauntlet that separates idea from product. One does not replace an engineer like Adam; one can only hope to grow another. And given his nine years of experience at the company and in the guts of the system, one cannot expect to grow a replacement quickly — if at all. Oracle’s loss, however, is the community’s gain; I hope I’m not tipping his hand too much to say that Adam will continue to be deeply engaged in the system, leading a new generation of engineers — but this time within a larger community that spans multiple companies and interests.

And in this way, odd as it may be, Oracle’s decision to fork is actually a relief to those of us whose businesses depend on OpenSolaris: instead of waiting for Oracle to engage the community, we can be secure in the knowledge that no engagement is forthcoming — and we can invest and plan accordingly. So instead of waiting for Oracle to fix a nagging driver bug or address a critical request for enhancement (a wait that has more often than not ended in disappointment anyway), we can tap our collective expertise as a community. And where that expertise doesn’t exist or is otherwise unavailable, those of us who are invested in the system can explicitly invest in building it — and then use it to give back to the community and contribute.

Speaking for Joyent, all of this has been tangibly liberating: just the knowledge that we are going to be cranking our own builds has allowed us to start thinking along new dimensions of innovation, giving us a renewed sense of control over our stack and our fate. I have already seen this shift in our engineers, who have begun to conceive of ideas that might not have been thought practical in a world in which Oracle’s engagement was so uncertain. Yes, hard problems lie ahead — but ideas are flowing, and the future feels alive with possibility; in short, innovation is afoot!

Posted on August 19, 2010 at 9:49 pm by bmc · Permalink · 12 Comments
In: Uncategorized

The node.js demographic

I went to the node.js meetup last night in Palo Alto, and it was an interesting affair on several levels. First (and least surprisingly), it was packed, with the Sencha folks joking that they would need to move to a bigger space just to be able to host the event. Second, the technical content itself was intruiging, with fellow Joyeur (and node BDFL) Ryan on dealing with flow control in node, Jed on (fab), future fellow Joyeur Issac on npm; and Tim demo’ing some Connect-based apps, including a simple web-based shared world app in which the room could (and did) participate. Not surprisingly, the performance of this last demo was snappy under load — so much so that it merits repeating an observation that many are currently making: it is increasingly clear that an early space — if not the first — in which we are going to see broad deployment of node-based apps is online social gaming, a space in which node represents a decisive competitive advantage by offering the potential for much more interactive (and more social) gameplay, and one in which there is substantial code churn to begin with. (And of course, speaking from Joyent’s perspective, this is a fortunate confluence: online gaming is also a space that sorely needs the elasticity that the cloud alone can provide.)

So the attendance and content were certainly notable, but most interesting of all to me was the demographic: given that node has become something of the latest hotness (and especially given that it being in JavaScript gives it a pretty wide net), one might expect node’s enthusiasts to be amateurs or novices. That this was emphatically not the case was clear to me shortly after arriving, when I had the unexpected pleasure of reuniting with fellow CS169 head TA Peter Griess. Not to be overly chummy or clubby, but walking into a meetup and seeing one of the tribe tells you something immediate about not just the room, but the technology itself: that it is not mere syntactic sugar or iconoclasm for its own sake, but rather a true revolution in the way certain classes of systems are designed and built. And indeed, over the course of the evening, it became clear that within the room there was an impressive amount of actual experience deploying real systems, with seasoned technologists like Matt Ranney who aren’t merely writing new apps in node, they are rewriting old apps in node. This is a key point, and it goes to the fact that node is not just an easier way of doing things (though that too, certainly) but rather that it offers such a vastly improved runtime that it merits reevaluation of systems that one has already built and deployed.

To me, the systems experience in the room offered an implicit rebuttal to some of the inane criticism of node — criticism that essentially amounts to discrediting node merely because of its newness or its popularity. (And even more enlightened criticism ultimately disappoints with what essentially amounts to an attack on the basis of style, not substance.) To be sure, node is still a young technology, and there is much engineering work still to be done. (For a concrete example of this, see Paul‘s description of the SSL problem.) But with so much deep systems experience in the community — and with the healthy, collaborative vibe that was on display last night — it’s hard to be anything but optimistic!

Posted on August 11, 2010 at 10:16 am by bmc · Permalink · 3 Comments
In: Uncategorized

OpenSolaris and the power to fork

Back when Solaris was initially open sourced, there was a conscious effort to be mindful of the experiences of other projects. In particular — even though it was somewhat of a paradox — it was understood how important it was for the community to have the power to fork the operating system. As I wrote in January, 2005:

If there’s one thing we’ve learned from watching Linux, it’s to not become forkophobic. Paradoxically, in an environment where forks are actively encouraged (e.g. Linux) forking seems to be less of a problem than in environments where forking is viewed as apostasy (e.g. BSD).

Unfortunately — and now in hindsight — we know that OpenSolaris didn’t go far enough: even though the right to fork was understood, there was not enough attention paid to the power to fork. As a result, the operating system never quite got to being 100% open: there remained some annoying (but essential) little bits that could not be opened for one historical (i.e., legal) reason or another. When coupled with the fact that Sun historically had a monopoly or near-monopoly on Solaris engineering talent, the community was entirely deprived of the oxygen that it would have needed to exercise its right to fork.

But change is afoot: over the last six months, the monopoly over Solaris engineering talent has been broken. And now today, we as a community have turned an important corner with the announcement of the Illumos project. Thanks to the hard work of Garrett D’Amore and his band of co-conspirators, we have the beginning of open sourced variants of those final bits that will allow for not just the right but the power to fork. Not that anyone wants to set out to fork the system, of course, but that power is absolutely essential for the vitality of any open source community — and so will be for ours. Kudos to Garrett and crew; on behalf of all of us in the community, thank you!

Posted on August 3, 2010 at 10:34 am by bmc · Permalink · 8 Comments
In: Uncategorized

Hello Joyent!

As I mentioned in my farewell to Sun, I am excited by the future; as you may have seen, that future is joining Joyent as their VP of Engineering.

So, why Joyent? I have known Joyeurs like Jason, Dave, Mark and Ben since back when the “cloud” was still just something that you drew up on a whiteboard as a placeholder for the in-between crap that someone else was going to build and operate. But what Joyent was doing was very much what we now call cloud computing — it was just that in describing Joyent in those pre-cloud days, I found it difficult to convey exactly why what they were doing was exciting (even though to me it clearly was). I found that my conversations with others about Joyent always ended up in the ditch of “virtual hosting”, a label that grossly diminished the level of innovation that I saw in Joyent. Fortunately for my ability to explain the company, “cloud” became infused with much deeper meaning — one that matched Joyent’s vision for itself.

So Joyent was cloud before there was cloud, but so what? When I started to consider what was next for me, one of the problems that I kept coming back to was DTrace for the cloud. What does dynamic instrumentation look like in the cloud? How do you make data aggregation and retention scale across many nodes? How do you support the ad hoc capabilities that make DTrace so powerful? And how do you visualize the data that in a way that allows for those ad hoc queries to be visually phrased? To me, these are very interesting questions — but looking around the industry, it didn’t seem that too many of the cloud providers were really interested in tackling these problems. However, in a conversation at my younger son‘s third birthday party with Joyeur (and friend) Rod Boothby, it became clear that Joyent very much shared my enthusiasm for this problem — and more importantly, that they had made the right architectural decisions to allow for solving it.

My conversation with Rod kicked off more conversations, and I quickly learned that this was not the Joyent that I had known — that the company was going through a very important inflection point whereby they sought a leadership position in innovating in the cloud. To match this lofty rhetoric, the company has a very important proof point: the hiring of Ryan Dahl, inventor and author of node.js.

Before getting into the details of node.js, one should know that I am a JavaScript lover. (If you didn’t already know this about me, you might be somewhat surprised by this — and indeed, there was a time when such a confession would have to be whispered, if it could be said at all — but times have changed, and I’m loud and proud!) My affection for the language came about over a number of years, and crescendoed at Fishworks when I realized that I needed to rewrite our CLI in JavaScript. And while I’m not sure if I’m the only person or even the first to write JavaScript that was designed to be executed over a 9600 baud TTY, it sure as hell felt like I was a pioneer in some perverse way…

Given my history, I clearly have a natural predisposition towards server-side JavaScript — but node.js is much more than that: its event driven model coupled with the implicitly single-threadedness of JavaScript constrains the programmer into a model that allows for highly scalable control logic, but only with sequential code. (For more on this, see Ryan’s recent Google tech talk — though I have no idea what was meant when Ryan was introduced as “controversial”.) This idea — that one can (and should!) build a concurrent system out of sequential components — is one that Jeff and I discussed this in our ACM Queue article on real-world concurrency:

To make this concrete, in a typical MVC (model-view-controller) application, the view (typically implemented in environments such as JavaScript, PHP, or Flash) and the controller (typically implemented in environments such as J2EE or Ruby on Rails) can consist purely of sequential logic and still achieve high levels of concurrency, provided that the model (typically implemented in terms of a database) allows for parallelism. Given that most don’t write their own databases (and virtually no one writes their own operating systems), it is possible to build (and indeed, many have built) highly concurrent, highly scalable MVC systems without explicitly creating a single thread or acquiring a single lock; it is concurrency by architecture instead of by implementation.

But Ryan says all that much more concisely at 21:40 in the talk: “there’s this great thing in Unix called ‘processes.’” Amen! So node.js to me represents a confluence of many important ideas — and it’s clean, well-implemented, and just plain fun to work with.

While I am excited about node.js, it’s more than just a great idea that’s well executed — it also represents Joyent’s vision for itself as an innovator up and down the stack. One can view node.js as being to Joyent was Java was to Sun: transforming the company from one confined to a certain layer into a true systems company that innovates up and down the stack. Heady enough, but if anything this analogy understates the case: Joyent’s development of node.js is not merely an outgrowth of an innovative culture, but also a reflection of a singular focus to deliver on the economies of scale that are the great promise of cloud computing.

Add it all up — the history in the cloud space, the disposition to solving tough cloud problems that I want to solve like instrumentation and observability, and the exciting development of node.js — and you have a company in Joyent that I believe could be the next great systems company and I’m deeply honored (and incredibly excited) to be a part of it!

Posted on July 30, 2010 at 1:46 pm by bmc · Permalink · 9 Comments
In: Uncategorized

Good-bye, Sun

In Februrary 1996, I came out to Sun Microsystems to interview for a job knowing only two things: that I wanted to do operating systems kernel development — and that I didn’t particularly want to work for Sun. I was right on the first count, but knew I was wrong on the second just moments into my first conversation with Jeff. He was emphatic that I should join him in forging the future, sharing both my enthusiasm for what was possible and my disdain for the broken, busted and boogered-up. Fourteen years later, I don’t for a moment regret my decision to join Jeff and Sun: we fostered an environment where the OS was viewed not as a regrettable drag on progress, but rather as a nexus of innovation — incubating technologies that today make a real difference in people’s lives.

In 2006, itching to try something new, Mike and I talked the company into taking the risk of allowing several of us to start Fishworks. That Sun supported our endeavor so enthusiastically was the company at its finest: empowering engineers to tackle hard problems, and inspiring them to bring innovative solutions to market. And with the budding success of the 7000 Series, I would like to believe that we made good on the company’s faith in us — and more generally on its belief in innovation as differentiator.

Now the time has come for me to venture again into something new — but this time it is to be beyond the company’s walls. This is obviously with mixed emotion; while I am excited about the future, it is very difficult for me personally to leave a company in which I have had such close relationships with so many. One of Sun’s greatest strengths was that we technologists were never discouraged from interacting directly and candidly with our customers and users, and many of our most important innovations came from these relationships. This symbiosis was critically important at several junctures of my own career, and I owe many of you a profound debt of gratitude — both for your counsel over the years, and for your willingness to bet your own business and livelihood on the technologies that I helped develop. You, like us, are innovators who love nothing more than great technology, and your steadfast faith in us means more to me than I can express; thank you.

As for my virtual address, it too is changing. This post will be my last at blogs.sun.com; in the future, you can find my blog at its new (permanent) home: http://dtrace.org/blogs/bmc (where comments on this entry will be open). As for e-mail, you can find me at the first letter of my first name concatenated with my last name at acm.org.

Thank you again for everything; take care — and stay in touch!

Posted on July 25, 2010 at 5:17 pm by bmc · Permalink · 46 Comments
In: Fishworks, Solaris

Turning the corner

It’s a little hard to believe that it’s been only fifteen months since we shipped our first product. It’s been a hell of a ride; there is nothing as exhilarating nor as exhausting as having a newly developed product that is both intricate and wildly popular. Especially in the domain of enterprise storage — where perfection is not just the standard but (entirely reasonably) the expectation — this makes for some seriously spiked punch.

For my own part, I have had my head down for the last six months as the Technical Lead for our latest software release, 2010.Q1, which is publicly available as of today. In my experience, I have found that in software (if not in life), one may only ever pick two of quality, features and schedule — and for 2010.Q1, we very much picked quality and features. (As for schedule, let it be only said that this release was once known as “2009.Q4″…)

2010.Q1 Quality

You don’t often see enterprise storage vendors touting quality improvements for a very simple reason: if the product was perfect when you sold it to me, why are you talking about how much you’ve improved it? So I’m going to break a little bit with established tradition and acknowledge that the product has not been perfect, though not without good reason. With our initial development of the product, we were pushing many new technologies very aggressively: not only did we seek to build enterprise-grade storage on commodity components (a deceptively daunting challenge in its own right), we were also building on entirely new elements like flash — and then topped it all off with an ambitious, from-scratch management stack. What were we possibly thinking by making so many bets at once? We made these bets not out of recklessness, but rather because they were essential elements of our Big Bet: that customers were sick of paying monopoly rents for enterprise storage, and that we could deliver a quantum leap in price-performance. (And if nothing else, let it be said that we got that one very, very right — seemingly too right, at times.) As for the specific technology bets, some have proven to be unblemished winners, while others have been more of a struggle. Sometimes the struggle was because the problem was hard, sometimes it was because the software was immature, and sometimes it was because a component that was assumed to have known failure modes had several (or many) unanticipated (or byzantine) failure modes. And in the worst cases, of course, it was all three…

I’m pleased to report that in 2010.Q1, we turned the corner on all fronts: in addition to just fixing a boatload of bugs in key areas like clustering and networking, we engaged in fundamental work like Dave‘s rearchitecture of remote replication, adapted to new device failure modes as with Greg‘s rearchitecture around resilience to HBA logic failure, and — perhaps most importantly — integrated critical firmware upgrades to each of the essential components of the I/O path (HBAs, SIM cards and disks). Also in 2010.Q1, we changed the way the way that we run the evaluation of the software, opening the door to many in our rapidly growing customer base. As a result, this release is already running on more customer production systems than any of its predecessors were at the time that they shipped — and on many more eval and production machines within our own walls.

2010.Q1 Features

But as important as quality is to this release, it’s not the full story: the release is also packed with major features like deduplication, iSER/SRP support, Kerberized NFS support and Fibre Channel support. Of these, the last is of particular interest to me because, in addition to my role as the Technical Lead for 2010.Q1, I was also responsible for the integration of FC support into the product. There was a lot of hard work here, but much of it was born by John Forte and his COMSTAR team, who did a terrific job not only on the SCSI Target Management facility (STMF) but also on the base ALUA support necessary to allow proper FC operation in a cluster. As for my role, it was fun to cut the code to make all of this stuff work. Thanks to some great design work by Todd Patrick, along with some helpful feedback from field-facing colleagues like Ryan Matthews, I think we came up with a clean, functional interface. And working closely with both John and our test team, we have developed a rock-solid FC product. But of course (and as one might imagine), for me personally, the really gratifying bit was adding FC support to analytics. With just a pinch of DTrace and a bit of glue code, we now have visibility into FC operations by LUN, by project, by target, by initiator, by operation, by SCSI command, by size, by offset and by latency — and by any combination thereof.

As I was developing FC analytics, I would use as my source of load a silly disk benchmark I wrote back in the day when Adam and I were evaluating SSDs. Here for example, is that benchmark running against a LUN that I named “thicktail-bench”:

The initiator here is the machine “thicktail”; it’s interesting to break down by initiator and see the paths by which thicktail is accessing the LUN:

(These names are human readable because I have added aliases for each of thicktail’s two HBA ports. Had I not added those aliases, we would see WWNs here.) The above shows us that thicktail is accessing the LUN through both of its paths, which is what we would expect (but good to visually confirm). Let’s see how it’s accessing the LUN in terms of operations:

Nothing too surprising here — this is the write phase of the benchmark and we have no log devices on this system, so we fully expect this. But let’s break down by offset:

The first time I saw this, I was surprised. Not because of what it shows — I wrote this benchmark, and I know what it does — but rather because it was so eye-popping to really see its behavior for the first time. In particular, this captures an odd phase I added to this benchmark: it does random writes across an increasing large range. I did this because we had discovered that some SSDs did fine when the writes were confined to a small logical region, but broke down — badly — when the writes were over a larger region. And no, I don’t know why this was the case (presumably the firmware was in fragmented/wear-leveling/cache-busting hell); all I know is that we rejected any further exploration once the writes to the SSD were of a higher latency than that of my first hard drive: the IBM PC XT’s 10 MB ST-412, which had roughly 95 ms writes! (We felt that expecting an SSD to have better write latency than a hard drive from the first Reagan Administration was tough but fair…)

What now?

As part of our ongoing maturity as a product, we have developed a new role here at Fishworks: starting in 2010.Q1, the Technical Lead for the release will, as the release ships, transition to become the full-time Support Lead for that release in the field. This means many things for the way we support the product, but for our customers, it means that if and when you do have an issue on 2010.Q1, you should know that the buck on your support call will ultimately stop with me. We are establishing an unprecedented level of engineering integration with our support teams, and we believe that it will show in the support experience. So welcome to 2010.Q1 — and happy upgrading!

Posted on March 10, 2010 at 2:24 pm by admin · Permalink · 20 Comments
In: Fishworks

John Birrell

It is with a heavy heart that I announce that we in the DTrace community have lost one of our own: the indomitable John Birrell, who ported DTrace to FreeBSD, suffered a stroke and passed away on Friday, November 20, 2009.

We on Team DTrace knew John to be a remarkably talented and determined software engineer. As those who have attempted ports can attest, DTrace passes through rough country, and a port to a foreign system is a significant undertaking that requires mastery of both DTrace and (particularly) the target system. And in being the first to attempt a port, John’s challenge was that much greater — and his success in the endeavor a tribute to both his ability and (especially) his tenacity. For example, in performing the port, John decided that DTrace’s dependency on the cyclic subsystem was such that it, too, needed to be ported. He didn’t need to do this (and indeed, other ports have decided that an arbitrary resolution profile provider is not worth the significant trouble), but that he undertook this additional technical challenge anyway — even when any victory would remain hidden to all but the most expert eye — says a lot about John as both an engineer and a man. Later, when the port ran into some frustrating licensing issues, John once again did not give up. Rather, he backed up, and found a path forward that would satisfy all parties — even though it required significant technical reworking on his part. I have long believed that the mark of a great engineer is not how frequently they get knocked down, but rather how quickly they get back up — and in this regard, John was indisputably a giant.

John, you will be missed — not only by the FreeBSD community upon which you made an indelible mark, but by those of us in the DTrace community who only had the opportunity to work with you more recently. And while your legacy might remain anonymous to the future generations that will benefit from the fruits of your long labor, we will always know that it never would have happened without you. Thank you, and farewell.

(Those who wish to memorialize John may want to do as I did and make a donation in his memory to the FreeBSD Foundation.)

Posted on November 26, 2009 at 12:36 pm by bmc · Permalink · Comments Closed
In: Fishworks