The Observation Deck

Search
Close this search box.

Some prehistory

June 14, 2004

I am an engineer in Solaris Kernel Development here at Sun. I have spent many (well, eight) years at Sun, and I’ve done work in quite a few different kernel subsystems. Most recently, I — along with Mike Shapiro and Adam Leventhal1 — designed, implemented and shipped DTrace, a new facility for dynamic instrumentation of production systems. As DTrace has been a nearly decade-long dream for me personally, much of this blog will focus on DTrace — its history, its architecture, its use, its budding community, its ongoing development and its futures.

But first, some prehistory to let you know where I’m coming from. (And apologies in advance for the length.)

Above all else, I believe that software should:

  • Work correctly
  • Work efficiently

One would think that it would be very straightforward to achieve these ends, and yet (historically, at least) it’s been damned near impossible — even for very bright people with high levels of expertise. (It may help to know that my definition of “works correctly” is not “works once”, “seems to work” or “works for me” but rather “rock-solid, never-breaks, lean-my-business-on-it and trust-my-life-to-it”.) So why is there so much software out there that’s slow and broken? The fundamental difficulty is that we cannot see software. And why can we not see it? Because software has the peculiar property that to see it, you must make it slower. That is, the act of indicating that a particular instruction is being executed will slow down the execution of that instruction. Worse, software has the even stranger property that if you ever want to even be able to see it, you must make it slower. That is, let’s say you want to optionally be able to trace the entry to a function foo(). You might have something like this:

foo(int arg)
{
if (tracing_enabled)
trace(FOO_ENTRY, arg);
...

This boils down to instructions that look something like this (using a RISC-ish proto instruction set):

set   tracing_enabled, %o0
ld    [%o0], %o1
cmp   %o0, 0
bne   go_around
set   FOO_ENTRY, %o0
mov   %i0, %o0
call  trace
...
go_around:
!
! The rest of the function foo()...
!
...

That is, it boils down to a load, a compare and a branch. This slows down execution (loads hurt — especially if they’re not in the cache), causes a branch-taken in the common path, increases instruction cache footprint, etc. Not a big deal if foo() is the only function in which you do this to, but start putting this in every function, and you’ll find that you have a system that is too slow to ship — it has suffered the infamous “death of a thousand cuts.”

(Yes, if you’re lucky or if the compiler supports a hint to indicate that trace() isn’t called in the common case, the sense of the branch may change such that the branch will be not-taken in the common case — which is better, but this still hurts too much to do it everywhere.)

So we can’t leave this kind of code in our optimized, production code, so what do we do? Many do something like this:

#ifdef DEBUG
#define TRACE(tok, arg) if (tracing_enabled) trace((tok), (arg))
#else
#define TRACE(tok, arg)
#endif

This is better — at least the production version isn’t hindered and we still have debug support in a DEBUG version. But now we have a new problem. We now have essentially two versions of our software: the slow one that we can see, and the fast one that we can’t. So what do we do when we see a performance problem in production? Well, we might try to run the DEBUG version (or worse, one with custom instrumentation) in production. But that requires downtime, and additional risk — and usually doesn’t fly. So what do we do? We try to reproduce this in development on the DEBUG version that we can see. (This is not “we” as in Sun, by the way, this is “we” as in humanity — you and me and everyone.) And reproducing perfomance problems is bad, bad news: when you’re reproducing performance problems, you’re reproducing symptoms. (Naturally, because the symptom are all you’ve got; if you knew the root-cause you would be watching Knight Rider reruns instead of horsing around at work.) And why is reproducing symptoms such bad news? Because disjoint problems can manifest the same symptoms.

To borrow a medical analogy: let’s say that you discover that you’re running a fever in production. So you take your development or test environment, and you try to make it look closer and closer to your production environment until you have a fever. Maybe you add more artificial load, more hardware, more users, whatever. Finally, you see the fever in your development environment. You get all of the developers in the room, and they start throwing instrumented binaries on the development machine. Maybe you think you’ve got an OS issue, so you have Solaris engineers throwing on new kernels — or maybe you have your ISVs giving you instrumented binaries of their products. Finally, after a huge amount of time and escalation and more time and frustration, you discover the problem: the fever is due to influenza. Okay, this isn’t the end of the world: if the production environment stays off its feet, drinks fluids and gets some rest for the next few weeks, it should be fine.

But here’s the problem: it was influenza in the development environment — that much was correct. But it’s not influenza in the production environment. In the production environment, the problem was cerebral malaria. No amount of rest is going to help — our diagnosis is completely wrong.2 It may strike you as a glib analogy, but it’s an accurate one for the experiences of many. Just think: how many times have you found “a” problem without finding “the” problem?

Okay, so we’re clearly down a blind alley of sorts. We actually need to start over with a clean sheet of paper — we need to change the model. We need to be able to ship an optimized kernel and optimized apps, and when we want to be able to see the software, we need to be able to dynamically instrument it to answer our question. And when we’re done answering our question, we want the system to go back to be completely optimized. And we want to do all this in production environments. This means that is must be absolutely safe — there must be no way to crash the system through user error.

And this, in essense, is what we’ve done with DTrace. DTrace is a new facility in Solaris 10 for the dynamic instrumentation of production systems.
DTrace is available today via Solaris Express. It has been available since November, and many people have already used it to diagnose real problems. You can read some of their thoughts in the DTrace feature story that ran on sun.com late last month.


1
I would have hyperlinked to Mike and Adam’s blogs, but they don’t (yet) exist. I would expect Adam to have a blog shortly, but given that Mike doesn’t yet have a cell phone, it might be a longer wait. Then again, Mike bought a TiVo the first weekend they were on sale at the Palo Alto Fry’s back in 1998 — so you never know when he’s going to adopt a technology.

2
Lest you think medical science has figured this one out: I encourage you to contract cerebral malaria and present at your local emergency department — and observe just how many weeks you spend bouncing around the health care system before some clever doc finally cracks the case…

One Response

  1. On the malaria comparison: I had cerebral malaria in Africa, in 1989. It took about a week to get a positive diagnosis, even though the disease isn’t uncommon in Africa. The doctors kept trying to find a reason to think it was AIDS, to my considerable subsequent puzzlement. The crisis was over by the time the diagnosis was made, though without a good intensive care department I’d have been dead.
    More to the point, the observations on tracing are exactly right. I’ve more than once had the experience that the act of tracing the application has slowed it down enough so the problem no longer occurs. Then you’re down to intuition and the SWAG. DTrace is a wonderful idea, I hope it spreads. Pity that most of our customers aren’t on Solaris..

Leave a Reply

Recent Posts

November 18, 2023
November 27, 2022
October 11, 2020
July 31, 2019
December 16, 2018
September 18, 2018
December 21, 2016
September 30, 2016
September 26, 2016
September 13, 2016
July 29, 2016
December 17, 2015
September 16, 2015
January 6, 2015
November 10, 2013
September 3, 2013
June 7, 2012
September 15, 2011
August 15, 2011
March 9, 2011
September 24, 2010
August 11, 2010
July 30, 2010
July 25, 2010
March 10, 2010
November 26, 2009
February 19, 2009
February 2, 2009
November 10, 2008
November 3, 2008
September 3, 2008
July 18, 2008
June 30, 2008
May 31, 2008
March 16, 2008
December 18, 2007
December 5, 2007
November 11, 2007
November 8, 2007
September 6, 2007
August 21, 2007
August 2, 2007
July 11, 2007
May 20, 2007
March 19, 2007
October 12, 2006
August 17, 2006
August 7, 2006
May 1, 2006
December 13, 2005
November 16, 2005
September 13, 2005
September 9, 2005
August 21, 2005
August 16, 2005

Archives