Adam Leventhal's blog

Search
Close this search box.

Tag: OpenSolaris

Back in October I was pleased to attend — and my employer, Delphix, was pleased to sponsor — illumos day and ZFS day, run concurrently with Oracle Open World. Inspired by the success of dtrace.conf(12) in the Spring, the goal was to assemble developers, practitioners, and users of ZFS and illumos-derived distributions to educate, share information, and discuss the future.

illumos day

The week started with the developer-centric illumos day. While illumos picked up the torch when Oracle re-closed OpenSolaris, each project began with a very different focus. Sun and the OpenSolaris community obsessed with inclusion, and developer adoption — often counterproductively. The illumos community is led by those building products based on the unique technologies in illumos — DTrace, ZFS, Zones, COMSTAR, etc. While all are welcome, it’s those who contribute the most whose voices are most relevant.

I was asked to give a talk about technologies unique to illumos that are unlikely to appear in Oracle Solaris. It was only when I started to prepare the talk that the difference in focuses of illumos and Oracle Solaris fell into sharp focus. In the illumos community, we’ve advanced technologies such as ZFS in ways that would benefit Oracle Solaris greatly, but Oracle has made it clear that open source is anathema for its version of Solaris. For example, at Delphix we’ve recently been fixing bugs, asking ourselves, “how has Oracle never seen this?”.

Yet the differences between illumos and Oracle Solaris are far deeper. In illumos we’re building products that rely on innovation and differentiation in the operating system, and it’s those higher-level products that our various customers use. At Oracle, the priorities are more traditional: support for proprietary SPARC platforms, packaging and updating for administrators, and ease-of-use. In my talk, rather than focusing on the sundry contributions to illumos, I picked a few of my favorites. The slides are more or less incomprehensible on their own; many thanks to Deirdre Straughan for posting the video (and for putting together the event!) — check out 40:30 for a photo of Jean-Luc Picard attending the DTrace talk at OOW.

[youtube_sc url=”https://www.youtube.com/watch?v=7YN6_eRIWWc&t=0m19s”]

ZFS day

While illumos day was for developers, ZFS day was for users of ZFS to learn from each others’ experiences, and hear from ZFS developers. I had the ignominous task of presenting an update on the Hybrid Storage Pool (HSP). We developed the HSP at Fishworks as the first enterprise storage system to add flash memory into the storage hierarchy to accelerate reads and writes. Since then, economics and physics have thrown up some obstacles: DRAM has gotten cheaper, and flash memory is getting harder and harder to turn into a viable enterprise solution. In addition, the L2ARC that adds flash as a ZFS read cache, has languished; it has serious problems that no one has been motivated or proficient enough to address.

I’ll warn you that after the explanation of the HSP, it’s mostly doom and gloom (also I was sick as a dog when I prepared and gave the talk), but check out the slides and video for more on the promise and shortcomings of the HSP.

[youtube_sc url=”http://www.youtube.com/watch?v=P77HEEgdnqE&feature=youtu.be”]

Community

For both illumos day and ZFS day, it was a mostly full house. Reuniting with the folks I already knew was fun as always, but even better was to meet so many who I had no idea were building on illumos or using ZFS. The events highlighted that we need to facilitate more collaboration — especially around ZFS — between the illumos distros, FreeBSD, and Linux — hell, even Oracle Solaris would be welcome.

Yesterday (October 4, 2011) Oracle made the surprising announcement that they would be porting some key Solaris features, DTrace and Zones, to Oracle Enterprise Linux. As one of the original authors, the news about DTrace was particularly interesting to me, so I started digging.

I should note that this isn’t the first time I’ve written about DTrace for Linux. Back in 2005, I worked on Linux-branded Zones, Solaris containers that contained a Linux user environment. I wrote a coyly-titled blog post about examining Linux applications using DTrace. The subject was honest — we used precisely the same techniques to bring the benefits of DTrace to Linux applications — but the title wasn’t completely accurate. That wasn’t exactly “DTrace for Linux”, it was more precisely “The Linux user-land for Solaris where users can reap the benefits of DTrace”; I chose the snappier title.

I also wrote about DTrace knockoffs in 2007 to examine the Linux counter-effort. While the project is still in development, it hasn’t achieved the functionality or traction of DTrace. Suggesting that Linux was inferior brought out the usual NIH reactions which led me to write a subsequent blog post about a theoretical port of DTrace to Linux. While a year later Paul Fox started exactly such a port, my assumption at the time was that the primary copyright holder of DTrace wouldn’t be the one porting DTrace to Linux. Now that Oracle is claiming a port, the calculus may change a bit.

What is Oracle doing?

Even among Oracle employees, there’s uncertainty about what was announced. Ed Screven gave us just a couple of bullet points in his keynote; Sergio Leunissen, the product manager for OEL, didn’t have further details in his OpenWorld talk beyond it being a beta of limited functionality; and the entire Solaris team seemed completely taken by surprise.

What is in the port?

Leunissen stated that only the kernel components of DTrace are part of the port. It’s unclear whether that means just fbt or includes sdt and the related providers. It sounds certain, though, that it won’t pass the DTrace test suite which is the deciding criterion between a DTrace port and some sort of work in progress.

What is the license?

While I abhor GPL v. CDDL discussions, this is a pretty interesting case. According to the release manager for OEL, some small kernel components and header files will be dual-licensed while the bulk of DTrace — the kernel modules, libraries, and commands — will use the CDDL as they had under (the now defunct) OpenSolaris (and to the consernation of Linux die-hards I’m sure). Oracle already faces an interesting conundum with their CDDL-licensed files: they can’t take the fixes that others have made to, for example, ZFS without needing to release their own fixes. The DTrace port to Linux is interesting in that Oracle apparently thinks that the CDDL license will make DTrace too toxic for other Linux vendors to touch.

Conclusion

Regardless of how Oracle brings DTrace to Linux, it will be good for DTrace and good for its users — and perhaps best of all for the author of the DTrace book. I’m cautiously optimistic about what this means for the DTrace development community if Oracle does, in fact, release DTrace under the CDDL. While this won’t mean much for the broader Linux community, we in the illumos community will happily accept anything of value Oracle adds. The Solaris lover in me was worried when it appeared that OEL was raiding the Solaris pantry, but if this is Oracle’s model for porting, then I — and the entire illumos community I’m sure — hope that more and more of Solaris is open sourced under the aegis of OEL differentiation.

10/10/2011 follow-up post, Oracle’s port: this is not DTrace.

In 2005, Sun released the source code to Solaris,  described then as the company’s crown jewel. Why do this? The simplest answer is that Solaris had been losing ground to an open source competitor in Linux. Losing ground was a symptom of  economics. Students who had once been raised on Solaris were being inculcated with Linux knowlege. The combination of Linux and x86 were good enough and significantly cheaper; new companies for whom the default had once been Sun/Solaris/SPARC were instead building on x86/Linux. OpenSolaris along with x86 support were specifically intended to address this trend. Indeed, the codename for OpenSolaris was “tonic” — the tonic for Solaris’ problems.

To that end, OpenSolaris was on reasonably stable footing: open source had become expected for an operating system,  source code availability was a benefit to traditional enterprise users (especially with the advent of DTrace), and the community would attract new users. But then Solaris lost the plot. Users chose Solaris because it is a — or perhaps the — enterprise operating system. OpenSolaris was intended to broaden the appeal, but that notion was taken to such extremes as to lose sight of the traditional customers of Solaris, and, indeed, the focus that makes Solaris both unique and great.

OpenSolaris  June 14, 2005 – August 13, 2010

We launched Solaris 10 in 2004 with an impressive list of features — ZFS, DTrace, Zones, SMF, FMA, Fire Engine — all highly relevant for enterprise users. You can find a company that has bet its business on the success of each of those features. In the wake of OpenSolaris, the decision was made (and here I can no longer use the active voice because by then I had left to start Fishworks elsewhere at Sun) to have an explicit focus on building an operating system for developers — which is to say, for their laptops. This was an error, but a predictable one. Once Solaris was free to download and use, revenue recognition for the Solaris organization which has always been difficult to measure became even more indirect. The metrics were changed: the targets for management bonuses became not revenue, or enterprise users, but downloads. Directly or indirectly much of the focus for the Solaris organization shifted to address that straightforward goal. The mistake was that OpenSolaris didn’t need to find users, they found Solaris. In trying to build a community, the new direction for OpenSolaris weakened the very principles upon which a thriving community would have been based.

The very name “OpenSolaris” got confused, diluted, and poluted. OpenSolaris was a source repository, a community, and a distro (although purists still insist that Indiana is the appropriate name for that part) intended to “close the familiarity gap” with Linux. Moreover, new projects that shifted efforts away from enterprise uses (read: paying customers) to focus on the laptop also rallied under the banner of “OpenSolaris”. In a way Oracle’s acquisition of Sun saved Solaris from itself; the marching orders became much clearer: address enterprise users, ship Solaris 11 (something that should not have taken 6 years). As for OpenSolaris, that decision too was likely simple for Oracle, never an overt fan of open source. Had “OpenSolaris” simply meant a code base and user community, I think there’s a good chance it would have been allowed to live. Burdened, however, with the baggage of the Indiana distro and sundry projects incomprehensible to Oracle management, OpenSolaris was in a politically untenable position. Mike‘s “Friday the 13th memo” merely made it official — Solaris was to be closed source once more.

Sun’s efforts with OpenSolaris  were, at best, a mixed success. Quietly, however, an ecosystem of companies grew out of the technologies in OpenSolaris. Notably Joyent uses Zones and DTrace as significant differentiators; Nexenta builds very heavily on ZFS; as I’ve mentioned, Delphix, my new employer, builds on OpenSolaris as well. There are many more that I know about, and still more that I don’t. These companies chose OpenSolaris so they could use the innovative technologies that simply aren’t available anywhere else. And they did so in spite of a common trend towards Linux with its familiarity, and broad compatibility — the innovation in Solaris was more valuable and, in some cases, enabling for the company’s business.

illumos  August 3, 2010 –

The danger for those companies has long been that Oracle would pull the rug out from under them; only the foolish had no contingency plan. The options were to give up on Solaris or maintain a fork. Happily illumos has stepped in to offer a third path. Garrett D’Amore and Nexenta graciously started the illumos project to carry the OpenSolaris torch. It is an ostensible fork of OpenSolaris (can you fork a dead project?), but more importantly a mechanism by which companies building on those component technologies can pool their resources, amortize their costs, and build a community by and for the downstream users who are investing in those same technologies. Rather than being operated by a single corporate interest, its steward will be a 501(c)(3) non-profit in the model of the Mozilla Foundation.

I was pleased to announce at tonight’s SVOSUG meeting that I’ll be joining the illumos developer council, I was delighted to accept when Garrett offered me the position. My bias for illumos is that the main repository will focus on reliability, performance, and compatibility while taking a conservative approach to new features and functionality. As much as possible, I’d like the downstream users — the distributions, appliances, and platforms — to make the decisions appropriate to their uses and only adopt large-scale changes into the trunk when there’s broad consensus among them. The goal must be to build a project that is readily useful to everyone and to allow our collective efforts to be shared as easily as possible.

What’s the future of Solaris? For many it will be Solaris 11 in late 2011. But for others, it will be illumos either as the firmware for an appliance (not unlike what we built at Fishworks in the 7000 series), the platform for your web applications, or as a general purpose operating system. The innovation in Solaris has always flowed from the creative individuals working on the project. Keep your eyes on illumos; Oracle ending OpenSolaris is no surprise, but in doing so they have broken their own monopoly on Solaris and Solaris talent.

As I wrote about last time, I’ve left Oracle. What I was looking for in my next gig was technology that excites me, excellent management, and a chance to build something significant and successful. I’m confident that I’ve found those things with Delphix.

In the established database market, Delphix creates a virtualization layer that simplifies the management of data and reduces duplication and waste. Why’s that interesting? The most important data is in databases, so building a layer between data and storage is incredibly powerful. The software to achieve that can then grow in a variety of directions, from data analysis and tuning, optimization at the level of the operating system and file system, to integration up the stack. The notion of storage virtualization is popular albeit vague one. Delphix brings both a concrete definition and value as well as a unique, application-centric focus.

Delphix builds on top of OpenSolaris which was, of course, another compelling reason for me to join. The Solaris group constructed a platform unique in its facilities for developers and in its comprehensive manageability. As I looked at various prospective employers I came to an even better appreciation of how tough it would be to work in an environment without DTrace, and mdb, and pstack, and libumem, and SMF, and FMA, etc. etc. Of course now Oracle has withdrawn support for OpenSolaris, but we won’t be going it alone (stay tuned for more on that).

It’s that combination of technology that’s interesting both at a high level and in the details, a management team that’s experienced and hungry,  innovation in a market where we can have a lasting impact, and an initial product that proves the potential yet with many hard problems still left to solve. But it’s the people who build a company; Delphix both has a great team and a commitment to assembling talent second to none. I’m excited to get started (… after a couple of weeks of much needed decompression).

I’ve been expecting this automated mail for a while now, but it was disheartening nonetheless:

List:       dtrace-discuss
Member:     bryan.cantrill@eng.sun.com
Action:     Subscription disabled.
Reason:     Excessive or fatal bounces.
Bryan Cantrill, VP of Engineering at Joyent, earning $15.

As one of the moderators of the DTrace discussion list, I see people subscribe and unsubscribe. Bryan has, of course, left Oracle and joined Joyent to be their VP of engineering.

Bryan is a terrific engineer, and I count myself lucky to have worked with him for the past nine years first on DTrace and then on Fishworks. He taught me many things, but perhaps most important was his holistic view of engineering that encompasses all aspects of making a product successful including docs, pricing, talks, papers, and, of course, excellent code. Now Bryan is off to cut through the layers software that make up the cloud. Far from leaving the DTrace community, he’s going to take DTrace to new places and I look forward to seeing the fruits of his labor as he sinks his teeth into a new onion of abstractions.

… and, Robin, Bryan’s certainly a smart guy, but “the smart guy behind Dtrace [sic]”?? Just don’t refer to me and Mike as “the dumb guys behind DTrace” okay?

Double-parity RAID, or RAID-6, is the de facto industry standard for storage; when I started talking about triple-parity RAID for ZFS earlier this year, the need wasn’t always immediately obvious. Double-parity RAID, of course, provides protection from up to two failures (data corruption or the whole drive) within a RAID stripe. The necessity of triple-parity RAID arises from the observation that while hard drive capacity has roughly followed Kryder’s law, doubling annually, hard drive throughput has improved far more modestly. Accordingly, the time to populate a replacement drive in a RAID stripe is increasing rapidly. Today, a 1TB SAS drive takes about 4 hours to fill at its theoretical peak throughput; in a real-world environment that number can easily double, and 2TB and 3TB drives expected this year and next won’t move data much faster. Those long periods spent in a degraded state increase the exposure to the bit errors and other drive failures that would in turn lead to data loss. The industry moved to double-parity RAID because one parity disk was insufficient; longer resilver times mean that we’re spending more and more time back at single-parity. From that it was obvious that double-parity will soon become insufficient. (I’m working on an article that examines these phenomena quantitatively so stay tuned… update Dec 21, 2009: you can find the article here)

Last week I integrated triple-parity RAID into ZFS. You can take a look at the implementation and the details of the algorithm here, but rather than describing the specifics, I wanted to describe its genesis. For double-parity RAID-Z, we drew on the work of Peter Anvin which was also the basis of RAID-6 in Linux. This work was more or less a tutorial for systems programers, simplifying some of the more subtle underlying mathematics with an eye towards optimization. While a systems programmer by trade, I have a background in mathematics so was interested to understand the foundational work. James S. Plank’s paper A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems describes a technique for generalized N+M RAID. Not only was it simple to implement, but it could easily be made to perform well. I struggled for far too long trying to make the code work before discovering trivial flaws with the math itself. A bit more digging revealed that the author himself had published Note: Correction to the 1997 Tutorial on Reed-Solomon Coding 8 years later addressing those same flaws.

Predictably, the mathematically accurate version was far harder to optimize, stifling my enthusiasm for the generalized case. My more serious concern was that the double-parity RAID-Z code suffered some similar systemic flaw. This fear was quickly assuaged as I verified that the RAID-6 algorithm was sound. Further, from this investigation I was able to find a related method for doing triple-parity RAID-Z that was nearly as simple as its double-parity cousin. The math is a bit dense; but the key observation was that given that 3 is the smallest factor of 255 (the largest value representable by an unsigned byte) it was possible to find exactly of 3 different seed or generator values after which there were collections of failures that formed uncorrectable singularities. Using that technique I was able to implement a triple-parity RAID-Z scheme that performed nearly as well as the double-parity version.

As far as generic N-way RAID-Z goes, it’s still something I’d like to add to ZFS. Triple-parity will suffice for quite a while, but we may want more parity sooner for a variety of reasons. Plank’s revised algorithm is an excellent start. The test will be if it can be made to perform well enough or if some new clever algorithm will need to be devised. Now, as for what to call these additional RAID levels, I’m not sure. RAID-7 or RAID-8 seem a bit ridiculous and RAID-TP and RAID-QP aren’t any better. Fortunately, in ZFS triple-parity RAID is just raidz3.

A little over three years ago, I integrated double-parity RAID-Z into ZFS, a feature expected of enterprise class storage. This was in the early days of Fishworks when much of our focus was on addressing functional gaps. The move to triple-parity RAID-Z comes in the wake of a number of our unique advancements to the state of the art such as DTrace-powered Analytics and the Hybrid Storage Pool as the Sun Storage 7000 series products meet and exceed the standards set by the industry. Triple-parity RAID-Z will, of course, be a feature included in the next major software update for the 7000 series (2009.Q3).

Recent Posts

January 22, 2024
January 13, 2024
December 29, 2023
February 12, 2017
December 18, 2016
August 9, 2016

Archives

Archives