Whither systems research?

July 13, 2004

Ted Leung noted the discussion that Werner and I have been having, and observed that we should consider Rob Pike’s (in)famous polemic, “Systems Software Research is Irrelevant.” I should say that I broadly agree with most of Pike’s conclusions — and academic systems software research has seemed increasingly irrelevant in the last five years. That said, I think that what Pike characterizes as “systems research” is far too skewed to the interface to the system — which (tautologically) is but the periphery of the larger system. In my opinion, “systems research” should focus not on the interface of the system, but rather its guts: those hidden Rube Goldberg-esque innards that are rife with old assumptions and unintended consequences. Pike would perhaps dismiss the study of these innards as “phenomenology”, but I would counter that understanding phenomena is a prerequisite to understanding larger systemic truths. Of course, the problem to date has been that much systems research has not been able to completely understand phenomena — the research has often consisted merely of characterizing it.

As evidence that systems research has become irrelevant, Pike points to the fact that SOSP has had markedly fewer papers that have presenting new operating systems, observing that “a new language or OS can make the machine feel different, give excitement, novelty.” While I agree with the sentiment that innovation is the source of excitement (and that such exciting innovation has been woefully lacking from academic systems research), I disagree with the implication that systems innovation is restricted to a new language or OS; a new file system, a new debugger, or a new way of virtualization can be just as exciting. So the good news is that work need not be a new system to be important systems work, but the bad news is that while none of these is as large as a new OS, they’re still huge projects — far more than a graduate student (or even a lab of graduate students) can be expected to complete in a reasonable amount of time.

So if even these problems are too big for academia, what’s to become of academic systems research? For starters, if it’s to be done by graduate students, it will have to be content with smaller innovation. This doesn’t mean that it need be any less innovative — just that the scope of innovation will be naturally narrower. As an extreme example, take the new nohup -p in Solaris 9. While this is a very small body of work, it is exciting and innovative. And yet, most academics would probably dismiss this work as completely uninteresting — even though most could probably not describe the mechanism by which it works. Is this a dissertation? Certainly not — and it’s not even clear how such a small body of work could be integrated into a larger thesis. But it’s original, novel work, and it solves an important and hard (if small) problem. Note, too, that this work is interesting because of the phenomenon that prohibited a naive implementation: any solution that doesn’t address the deadlock inherent in the problem isn’t actually an acceptable solution. This is an extreme example, but it should make the point that smaller work can be interesting — as long as it’s innovative, robust and thorough.

But if the problems that academic systems researchers work on are going to become smaller, the researchers must have the right foundation upon which to build their work: small work is necessarily more specific, and work is markedly less relevant if it’s based on an obsolete system. And (believe it or not) this actually brings us to one of our primary motivations for open sourcing Solaris: we wish to provide complete access to a best-of-breed system that allows researchers to solve new problems instead of revisiting old ones. Will an open source Solaris single-handedly make systems research relevant? Certainly not — but it should make for one less excuse…

14 Responses

Anonymous says:

July 13, 2004 at 1:57 am

Perhaps I fail to see why nohup -p is considered innovative. Using the /proc file system to manipulate processes, running processes, has been around for a long time now. By applying that concept to a single command, it should be considered innovative? I think not.

Log in to Reply
arved's weblog says:

July 13, 2004 at 6:55 am

[Trackback] Bryan Cantrill article “Whither USENIX?” I mentioned here last week got a lot of feedback….

Log in to Reply
Bryan Cantrill says:

July 13, 2004 at 10:01 am

Anonymous: I think you’re proving my point. You dismiss nohup -p as “just using the /proc file system to manipulate processes”, but did you really know (for example) about the agent LWP? How would you implement it on a system that does not have the agent LWP? (A facility which is unique to Solaris, by the way.) nohup -p is a very small innovation, but it _is_ innovation and it’s certainly relevant. Actually, let me phrase this in a much more controversial way: nohup -p — as tiny as it is — represents more relevant innovation than all of academic systems research has provided in the last five years. Counter-examples?

Log in to Reply
Philip Levis says:

July 13, 2004 at 11:15 am

I think you’re falling into the same trap that you’re saying academics do, the distinction being that you describe academic research as “irrelevant” while academics describe much industrial work as “uninteresting.” I’d have to agree with anonymous (sorry, Adam); nohup is not really significant research, from my perspective. It’s just a tiny example, something which I read about and said “Hunh, a neat trick.” However, I found the agent LWP itself to be *very* interesting; after reading about it, I sat for about ten minutes thinking, “Wow, cool, you could really do some crazy stuff with it…” With a small amount of OS kernel support (e.g., it can only run when all other LWPs are halted), you have a mechanism to deal with a lot of problems and add a lot of functionality.

Log in to Reply
Bryan Cantrill says:

July 13, 2004 at 5:26 pm

Phil, I didn’t say that nohup was “significant research” — I said it was relevant innovation. Part of my point is that academic systems research just _isn’t_ going to be significant — truly significant systems research takes much more time, money, focus and expertise than academia can summon. And yes, the agent LWP is very interesting — and perhaps that scale of work is closer to something that academia would deem sufficiently “significant” while still being relevant. (The nohup example is meant to be an extreme example, after all.) Of course, the agent LWP wasn’t invented by a graduate student — it was invented by one of the co-inventors of /proc and only after many years of trying to make the alternatives work. But at least with the source to Solaris, academia will have the opportunity to undertake this kind of work…

Log in to Reply
David Oppenheimer says:

July 14, 2004 at 9:20 am

I think this is largely a philosophical question. The old-school definition of academically “significant” systems research is that it will allow *someone else* to build a better system that real people will use 5-10 years after the academic research is done. So it’s work that’s at the very earliest stages of the technology transfer cycle. That’s why the premiere academic operating systems conference is called the Symposium on Operating Systems Principles — the work is supposed to represent research on general principles of how to build systems, that applies across a wide range of actual operating systems or hardware platforms or whatever. Often a prototype of the idea will be produced, but it’s more to prove the usefulness/practicality/applicability/benefit of the principles and–dare I say it–interfaces that are the real “meat” of the research. Because the artifacts are small, they can be produced by one or two graduate students even though they are designed to showcase a “big idea” with broad applicability. On the other hand, because the research is supposed to be general and 5-10 years out, it can appear irrelevant at the time it is done. [[ PRAGRAPH BREAK SHOULD GO HERE, BUT I CAN’T FIGURE OUT THE SYNTAX!! ]]
Now, that’s the theory. The reality is that a lot of (most?) systems research done in academia today is the same kind of work that could be (and often is) going on in industrial research labs and product groups. There are lots of forums for publishing this kind of work, and that is certainly the kind of work that could benefit from open-source Solaris, data (on workloads, failures, etc.) from companies, etc. I think the problem is that a lot of academic researchers today are doing very practically-oriented research without any real concern for its impact on real existing systems. And certainly I agree with you that if you are going to be doing reserach that you claim helps existing systems, you need to be using realistic existing systems and be able to make a good argument as to how the proposed ideas could be integrated into existing systems. I think the problem arises when academic researchers try to do short-term oriented research yet argue that the work doesn’t have to solve a real problem using state-of-the-art systems, simply because they’re academics. [[ PARAGRAPH BREAK SHOULD GO HERE, BUT I CAN’T FIGURE OUT THE SYNTAX! ]]
Nonetheless, when trying to understand the thought process that academic researchers use when they pick problems, choose solutions to those problems, implement prototypes of the solutions, and write up their results, it’s important to keep in mind that the kind of research I described in the first paragraph *is* considered the holy grail/gold standard. The closer your research, as an academic, is to that standard, the more highly valued it will be by the academic research community. Whether this is a good or bad thing, whether those ideas are ever applied to real systems, whether this *should* be the goal of academic research, etc., is yet another interesting topic in and of itself–and, unfortunately, one that many academics seem to consider to be too “practical” to be “interesting”!

Log in to Reply
Bryan Cantrill says:

July 14, 2004 at 10:28 am

David: First, apologies for the lack of HTML formatting in comments. This is apparently due to some site-wide configuration problem, but I have raised the issue again with the b.s.c support staff; hopefully we’ll see a resolution soon. Now, to address the meat of your comments: there is very little in a software system that truly depends on the technology curve, so there are therefore very few ideas that exist “5-10 years out” — ideas are either practical today, or not at all. Or, to flip this around, what is an example of an idea that was proposed (sans practical implementation) 5-10 years before its adoption by industry? It seems to me that the good ideas from academia (RAID, RISC, log-structured filesystems, etc.) have all been adopted more or less immediately. The only way there would be an exception to this would be for a software system to be proposed around hardware that doesn’t yet exist. For example, work on what a system would look like with carbon nanotube memory would likely yield ideas that are “5-10 years out” (and would probably be damn interesting, actually), and the SMT work done in the early to mid 1990s would be actual an example of this kind of hardware-dependent work. But ironically, I think that work done in this vein would likely be deemed too “impractical” by the research community — I absolutely agree with your assessment that there is far too much “practically-oriented research without any real concern for its impact on existing systems.”

Log in to Reply
Philip Levis says:

July 14, 2004 at 2:23 pm

I’m not sure I agree with the 5-10 years assertion, and would be really hard pressed to name any research that emerged as important in that time window. However, in a 1-3 year window, there are a lot of examples: Disco (VMWare) is one that immediately comes to mind. In the end, a lot of systems research is a craps shoot on what people think may be problems coming over the horizon, many of which turn out to not be that big a deal. Usually, the problems that academics look at are not identical to those industry faces, but there are enough similarities that ideas or basic approaches can be borrowed; divining exactly what future problems will be is hard. That being said, and I get the feeling this is a major component of Bryan’s concern, academia has really moved away from hardcore kernel research: it’s become systems research, rather than operating systems research. There’s still always a few papers at each conference (e.g., Nooks at the recent SOSP, superpages at OSDI 2002, anticipatory scheduling at SOSP 2001) , but they’re greatly outnumbered by papers about service composition, availability for distributed services, etc.

Log in to Reply
Chris Rijk says:

July 15, 2004 at 2:58 am

Brian’s comments that open sourcing Solaris could help stimulate systems research among academics are interesting. I certainly think it has potential – after all, Solaris is (arguably) the best general purpose OS around, and certainly has a lot of interesting and unique features as standard. So for research it makes a very solid and capable foundation. (The only thing I can think of where Solaris is behind is in NUMA optimisation – as of Solaris 9 this is quite limited AFAIK. I haven’t seen any details on the NUMA optimisations coming in Solaris 10, though I’ve heard a number of times that there are some. I also wonder what happened to Sun’s sCOMA research)
Open sourcing Solaris would give researches a lot of interesting “kernel middleware” to use. I certainly think that any web-site for “OpenSolaris” should have a few short introductory pages on the interesting and unique features Solaris has, with links to more in-depth details, to encourage such development.
Another thing I think that would be useful would be similar to the bug and RFE (request for enhancement) databases at java.sun.com, which also has voting and top 25s of most requested bug fixes and most wanted enhancements. Doing something similar (and public / semi-public) for Solaris would be quite helpful in a number of ways I think. Maybe the RFEs should be split into sub-groups like “incremental improvements”, “major new development”, “hardware specific” (drivers, CPU specific, chipset specific etc) and “non kernel”.
I think OpenSolaris would also make it easier for Sun Labs to work in co-operation with outside groups on OS research.

Log in to Reply
David Oppenheimer says:

July 15, 2004 at 9:40 am

I can think of a few that were around the 5 year mark from the inception of the research project to the product becoming widespread (I admit that it is very difficult to think of successful technologies that took 10 years to make the transition). The first one that came to mind was the one that Phil mentioned, namely VMWare’s roots in Disco, which was part of the FLASH project at Stanford. Another is the architecture and compiler research that went into the Itanium, particularly the VLIW research from the 1980s. For that matter I think a fair amount of the academic compiler research slowly made its way into compilers by SGI and others. As for the SMT work, which you cite and which is another good example, I never got the impression that academics felt it was too impractical, and certainly the latency from the work at UW and elsewhere to Intel’s hyperthreading and companies like Tera was in this general timeframe. As for today, two of the hot research areas in academia today are sensor networks and distributed hash tables. Neither are widespread system building blocks today, but sensor networks seem likely to be widespread by 2005-2010, which is 5-10 years after the sensor research project began here at Berkeley, IIRC. (And if you talk to the DHT folks, they will tell you that their work will transform the Internet from a network that does address-based routing to one that does content-based routing, but that’s definitely not going to happen within 5 years of ~2001.) Presumably the reserach going on in these areas in academia today is developing the principles and architectures that will be needed if/when these systems become deployed widely. Certainly there is lots of research going on in academia today in other areas as well–I was just expressing my opinion as to why much academic research may seem “impractical” and why many academics might not be looking at projects that are more immediately “practical.” This is definitely not a comment on the value of one kind of project over another, and many projects with shorter-term horizons have had more impact than those with longer-term horizons. Indeed, as you correctly point out, RISC and RAID did not take 5 years to become widespread (OTOH I think there were some market forces that led those technologies to be exceptions (in particular IIRC RAID was concurrently under development at IBM and Berkeley, and Hennessy left Stanford for several years to start MIPS specifically to commercialize RISC)). [[ PARAGRAPH BREAK ]] To tie in to Phil’s comment, I agree that there is not a lot of core OS research going on anymore, but I think that’s due to a perception that OSes for the masses can’t be substantially “improved” beyond their current state through more academic research (though I think the open sourcing of Solaris may help to change that, and is definitely a great thing). Areas like service composition and wide-area distributed services are still in their formative stages, and hence, I think, are perceived (perhaps incorrectly) as more subject to influence by academics than is core OS, which is viewed (again, perhaps incorrectly) both as an “old” field with no “low-hanging” fruit left, and as being controlled by the folks up in Redmond. OTOH, if you can come up with a great idea to improve operating systems, then your impact is likely to be greater, at least over the short term, than is research in distributed services or even sensor networks. Just my two cents on all of this; if I were truly perceptive and wise, I’d have my Ph.D. by now. 🙂

Log in to Reply
Bryan Cantrill says:

July 15, 2004 at 3:29 pm

I think it’s a bit dodgy for academia to claim too much credit for Disco/VMware, given that this arguably proves the point that the scope of such an undertaking requires the prolonged focus that only industry can provide. But I believe that the observation that there is no core OS research because of the perception that there is “no low-hanging fruit left” is absolutely correct, and gets to the core of my argument: “no low-hanging fruit” does not mean “no unsolved problems”; if academia restricts itself to low-hanging fruit, it will become increasingly less relevant. My whole point is that academic systems research needs to be willing to solve harder, smaller problems than are being solved today — and the problems need to be solved more thoroughly than they’ve been solved in the past. DHT’s are actually a decent example of this, as they are (in my opinion) a smaller and harder problem than the average academic systems problem…

Log in to Reply
Werner Vogels says:

July 15, 2004 at 6:34 pm

I think that each field that evolves as much as ours has a number of early pioneers that feel that the art is gone. Where the art is associated with the simplicity of the early days. I have to admit that I also sometimes dream of the excitement I experienced when getting the pdp-11 to jump through hoops. I do think the comments to Bryan posting show exactly what Rob Pike means: if our only measure is how quickly research results get adopted, you get away from true innovation because you start doing market-oriented research, which can be valid research but always tends to be less spectacular than the stuff that is totally funky far-out principled work. The kind of work that when you read about it oozes something like ‘these guys really believe in this and they really pushed the limits’. Remember that Rob’s measurement for value is not industry adoption (otherwise he should consider his own work somewhat irrelevant). I don’t completely agree with this conclusion, but I can see why there is less excitement than in the pioneering days. It will be fun to see what he is going to produce at Google. I have a less pessimistic view. I actually think a lot of good research is happening; it is just that SOSP is no longer the only measurement to use for validating results (easy for me to say, I already scored there :-)). My measurement is whether there is a paper in a conference where I think, wow I wish I had done that… I still have that at times, so some good stuff must be happening. Now whether I felt that about the DTrace paper is a whole different matter�

Log in to Reply
Bryan Cantrill says:

July 16, 2004 at 10:28 am

I’m not an early pioneer by any stretch, but for me personally, this is an incredibly exciting time for operating systems: after a long period of consolidation (where the innovation was interesting but pretty small), we began work developing several radical ideas that we had been thinking about for quite some time. After many years of development, these ideas have become real technologies, and they have finally become publicly available over the last year. This process isn’t done yet: there are several major technologies still in development that will be publicly available shortly. Unfortunately, the very last stage in productization for us is the writing of the academic paper, so the only one of these technologies that’s been fully described to the academy is DTrace. Anyway, stay tuned — and in particular check out the Zones paper at LISA ’04 in Atlanta. You probably won’t think “wow I wish I had done that”, but you may well think “I can’t believe they did all that” or “I’m glad someone finally did that!” For us, at least, the excitement is very much back in operating systems work… (Of course, for us the excitement never really left — but that’s a longer story.)

Log in to Reply
Ron Barry says:

August 3, 2004 at 10:26 am

An example comes to mind of a concept that was developed more than 10 years before its practical implementation in our science: hardware-accelerated radiosity. “Global” illumination had been in application for satellite heat sink design for about 20 years when ideas about how to really make it a broadly-applicable technology started coming out. It took another 10-15 years before these ideas could be tested. The concepts that go from idea to implementation in a few years come out with a bang and really grab our attention. We’re hearing fewer and fewer bangs in the last few years and they’re taking more and more effort to produce as time goes by.

As a science, we have enjoyed an explosive beginning. We’ve been spoiled. Small ideas were grandly innovative and we’ve spent our time being easily occupied – but I think this is changing. If you compare our world to one of the other hard sciences, we see a picture of where we may be going: how long does it take for new discoveries in physics and math to become of economic interest? How much dedication to pure research does it take to get to the point where you’re contributing at that level? How long did it take for Maxwell’s, Einstein’s, Watson/Crick’s work to go from the realm of research to the engineering of products?

If this becomes the case for software, then industry will let academics get on with its pie in the sky projects while industry continues with its “nohup -p”s. Some of what academia produces will be of use to engineers and some of it won’t. As the science becomes more complex, the distinction between research and engineering becomes very important. Brian makes the point the point that nohup -p would not have been a likely product of academics, but neither would the proof of Fermat’s Last Theorem be of interest to industry. Both sides are a necessary part of the whole.

Log in to Reply

Whither systems research?

14 Responses

Leave a Reply Cancel reply

Recent Posts

Archives

Archives