KVM on illumos
A little over a year ago, I came to Joyent because of a shared belief that systems software innovation matters — that there is no level of the software stack that should be considered off-limits to innovation. Specifically, we ardently believe in innovating in the operating system itself — that extending and leveraging technologies like OS virtualization, DTrace and ZFS can allow one to build unique, differentiated value further up the software stack. My arrival at Joyent also coincided with the birth of illumos, an effort that Joyent was thrilled to support, as we saw that rephrasing our SmartOS as an illumos derivative would allow us to engage in a truly open community with whom we could collaborate and collectively innovate.
In terms of the OS technology itself, as powerful as OS virtualization, DTrace and ZFS are, there was clearly a missing piece to the puzzle: hardware virtualization. In particular, with advances in microprocessor technology over the last five years (and specifically with the introduction of technologies like Intel’s VT-x), one could (in principle, anyway) much more easily build highly performing hardware virtualization. The technology was out there: Fabrice Bellard‘s cross-platform QEMU had been extended to utilize the new microprocessor support by Avi Kivity and his team with their Linux kernel virtual machine (KVM).
In the fall of last year, the imperative became clear: we needed to port KVM to SmartOS. This notion is almost an oxymoron, as KVM is not “portable” in any traditional sense: it interfaces between QEMU, Linux and the hardware with (rightful) disregard for other systems. And as KVM makes heavy use of Linux-specific facilities that don’t necessarily have clean analogs in other systems, the only way to fully scope the work was to start it in earnest. However, we knew that just getting it to compile in any meaningful form would take significant effort, and with our team still ramping up and with lots of work to do elsewhere, the reality was that this would have to start as a small project. Fortunately, Joyent kernel internals veteran Max Bruning was up to the task, and late last year, he copied the KVM bits from the stable Linux 2.6.34 source, took out a sharpened machete, and started whacking back jungle…
Through the winter holidays and early spring (and when he wasn’t dispatched on other urgent tasks), Max made steady progress — but it was clear that the problem was even more grueling than we had anticipated: because KVM is so tightly integrated into the Linux kernel, it was difficult to determine dependencies — and hard to know when to pull in the Linux implementation of a particular facility or function versus writing our own or making a call to the illumos equivalent (if any). By the middle of March, we were able to execute three instructions in a virtual machine before taking an unresolved EPT violation. To the uninitiated, this might sound pathetic — but it meant that a tremendous amount of software and hardware were working properly. Shortly thereafter, with the EPT problem fixed, the virtual machine would happily run indefinitely — but clearly not making any progress booting. But with this milestone, Max had gotten to the point where the work could be more realistically parallelized, and we had gotten to the point as a team and a product where we could dedicate more engineers to this work. In short, it was time to add to the KVM effort — and Robert Mustacchi and I roped up and joined Max in the depths of KVM.
With three of us on the project and with the project now more parallelizable, the pace accelerated, and in particular we could focus on tooling: we added an mdb dmod to help us debug, 16-bit disassembler support that was essential for us to crack the case on the looping in real-mode, and DTrace support for reading the VMCS. (And we even learned some surprising things about GNU C along the way!) On April 18th, we finally broke through into daylight: we were able to boot into KMDB running on a SmartOS guest! The pace quickened again, as we now had the ability to interactively ascertain state as seen by the guest and much more quickly debug issues. Over the next month (and after many intensive debugging sessions), we were able to boot Windows, Linux, FreeBSD, Haiku, QNX, Plan 9 and every other OS we could get our hands on; by May 13th, we were ready to send the company-wide “hello world” mail that has been the hallmark of bringup teams since time immemorial.
The wind was now firmly at our backs, but there was still much work to be done: we needed to get configurations working with large numbers of VCPUs, with many VMs, over long periods of time and significant stress and so on. Over the course of the next few months, the bugs became rarer and rarer (if nastier!), and we were able to validate that performance was on par with Linux KVM — to the point that today we feel confident releasing illumos KVM to the greater community!
In terms of starting points, if you just want to take it for a spin, grab a SmartOS live image ISO. If you’re interested in the source, check out the illumos-kvm github repo for the KVM driver itself, the illumos-kvm-cmd github repo for our QEMU 0.14.1 tree, and illumos itself for some of our KVM-specific tooling.
One final point of validation: when we went to integrate these changes into illumos, we wanted to be sure that they built cleanly in a non-SmartOS (that is, stock illumos) environment. However, we couldn’t dedicate a machine for the activity, so we (of course) installed OpenIndiana running as a KVM guest on SmartOS and cranked the build to verify our KVM deltas! As engineers, we all endeavor to build useful things, and with KVM on illumos, we can assert with absolute confidence that we’ve hit that mark!