DTrace for Linux

Yesterday (October 4, 2011) Oracle made the surprising announcement that they would be porting some key Solaris features, DTrace and Zones, to Oracle Enterprise Linux. As one of the original authors, the news about DTrace was particularly interesting to me, so I started digging.

I should note that this isn’t the first time I’ve written about DTrace for Linux. Back in 2005, I worked on Linux-branded Zones, Solaris containers that contained a Linux user environment. I wrote a coyly-titled blog post about examining Linux applications using DTrace. The subject was honest — we used precisely the same techniques to bring the benefits of DTrace to Linux applications — but the title wasn’t completely accurate. That wasn’t exactly “DTrace for Linux”, it was more precisely “The Linux user-land for Solaris where users can reap the benefits of DTrace”; I chose the snappier title.

I also wrote about DTrace knockoffs in 2007 to examine the Linux counter-effort. While the project is still in development, it hasn’t achieved the functionality or traction of DTrace. Suggesting that Linux was inferior brought out the usual NIH reactions which led me to write a subsequent blog post about a theoretical port of DTrace to Linux. While a year later Paul Fox started exactly such a port, my assumption at the time was that the primary copyright holder of DTrace wouldn’t be the one porting DTrace to Linux. Now that Oracle is claiming a port, the calculus may change a bit.

What is Oracle doing?

Even among Oracle employees, there’s uncertainty about what was announced. Ed Screven gave us just a couple of bullet points in his keynote; Sergio Leunissen, the product manager for OEL, didn’t have further details in his OpenWorld talk beyond it being a beta of limited functionality; and the entire Solaris team seemed completely taken by surprise.

What is in the port?

Leunissen stated that only the kernel components of DTrace are part of the port. It’s unclear whether that means just fbt or includes sdt and the related providers. It sounds certain, though, that it won’t pass the DTrace test suite which is the deciding criterion between a DTrace port and some sort of work in progress.

What is the license?

While I abhor GPL v. CDDL discussions, this is a pretty interesting case. According to the release manager for OEL, some small kernel components and header files will be dual-licensed while the bulk of DTrace — the kernel modules, libraries, and commands — will use the CDDL as they had under (the now defunct) OpenSolaris (and to the consernation of Linux die-hards I’m sure). Oracle already faces an interesting conundum with their CDDL-licensed files: they can’t take the fixes that others have made to, for example, ZFS without needing to release their own fixes. The DTrace port to Linux is interesting in that Oracle apparently thinks that the CDDL license will make DTrace too toxic for other Linux vendors to touch.

Conclusion

Regardless of how Oracle brings DTrace to Linux, it will be good for DTrace and good for its users — and perhaps best of all for the author of the DTrace book. I’m cautiously optimistic about what this means for the DTrace development community if Oracle does, in fact, release DTrace under the CDDL. While this won’t mean much for the broader Linux community, we in the illumos community will happily accept anything of value Oracle adds. The Solaris lover in me was worried when it appeared that OEL was raiding the Solaris pantry, but if this is Oracle’s model for porting, then I — and the entire illumos community I’m sure — hope that more and more of Solaris is open sourced under the aegis of OEL differentiation.

10/10/2011 follow-up post, Oracle’s port: this is not DTrace.

Posted on October 5, 2011 at 9:51 pm by ahl · Permalink
In: DTrace · Tagged with: , , , , , , ,

32 Responses

Subscribe to comments via RSS

  1. Written by Derek
    on October 6, 2011 at 2:46 pm
    Permalink

    Awesome news. :-) Now all we need are SMF and ZFS… or I could just stick to Solaris. ;)

  2. Written by ahl
    on October 6, 2011 at 3:12 pm
    Permalink

    Or Oracle could rebrand Solaris as the new OEL and we could all call it a day…

  3. Written by Chris Samuel
    on October 6, 2011 at 8:12 pm
    Permalink

    @Derek: ZFS for Linux is already on the way, LLNL require it for the 50PB Lustre filesystem backends for their planned 10 Petaflop BlueGene/Q supercomputer Sequoia..

    http://zfsonlinux.org/

  4. Written by ahl
    on October 6, 2011 at 9:24 pm
    Permalink

    @Chris I wasn’t aware of that project, and while it’s that there are some ports of the illumos ZFS source base to Linux, what would really be interesting is if Oracle decided to expose the last year’s worth of development for a Linux port.

  5. Written by bdha
    on October 6, 2011 at 9:34 pm
    Permalink

    I’ve used the most recent RC from zfsonlinux.org under a decent amount of load on EC2 for the last several months. It’s worked quite well.

    DTrace on Linux is extremely exciting.

  6. Written by fashion jewelry
    on October 7, 2011 at 1:05 am
    Permalink

    Awesome news. Now all we need are SMF and ZFS… or I could just stick to Solaris.

  7. Written by Sum Yung Gai
    on October 7, 2011 at 5:10 pm
    Permalink

    “Oracle apparently thinks that the CDDL license will make DTrace too toxic for other Linux vendors to touch”

    This was the original reason for Sun writing and using the CDDL, according to Danese Cooper, who worked at Sun at the time this was done. It was specifically to prevent use of CDDL code in Linux or other GPL’d code, which Sun execs viewed as a direct threat to Slowaris. Oracle execs don’t seem to view Linux as an enemy, but rather Red Hat, which they’re scared of. They don’t want Red Hat to be able to include DTrace.

    Problem for Oracle: to comply with the GPL v2, they essentially *have* to allow others to distribute any part of the code that links with Linux itself (the kernel). Thus, Red Hat can do it with RHEL, and thus presumably CentOS and others (Slackware, et. al.?) could include it as well. It’d be the userland stuff that might need to be rewritten, depending on how Oracle execs decide to license the userland code.

    • Written by ahl
      on October 8, 2011 at 12:58 am
      Permalink

      It is so disappointing that these conversations inevitably circle the drain around licensing rather than focusing on technology. Danese Cooper did work at Sun at the time. I did as well. I’m fairly certain her statements are not definitive.

      As I wrote above, Oracle will be using the CDDL for the kernel modules, libraries, and commands; they will use the GPL for a few header files and some kernel-level hooks. If you find that Oracle is out of compliance with the GPL then I suggest you find someone with standing who can demonstrate damages, and have that person take action. Let me know when that happens.

      • Written by David Gerard
        on October 8, 2011 at 9:10 am
        Permalink

        Danese was largely responsible for the CDDL, wasn’t she? She’d be as close to definitive as is available.

      • Written by David Gerard
        on October 8, 2011 at 9:11 am
        Permalink

        Reponsible for implementation, I mean, not the original initiative.

        • Written by ahl
          on October 8, 2011 at 3:29 pm
          Permalink

          In WWII Ronald Reagan was responsible for reviewing film of the liberation of the concentration camps. He later claimed on multiple occasions to have been there in person. This wasn’t mendacity or dementia, just the nature of human memory.

  8. Written by Shane
    on October 7, 2011 at 6:59 pm
    Permalink

    ZFS on Linux won’t be necessary with BTRFS on the way.

    • Written by ahl
      on October 8, 2011 at 1:00 am
      Permalink

      … in the same way that DTrace isn’t necessary because of SystemTap is my guess. It’s taken 10 years to get ZFS where it is today; many may prefer not to wait for BTRFS to be completed or reach stability.

  9. Written by Keith Wesolowski
    on October 8, 2011 at 3:13 am
    Permalink

    This sounds like typical Oracle:

    1. Some VP puts some collection of spit and baling wire in front of Larry because he thinks he’ll get a big bonus.

    2. Larry likes the idea, even though he’s really too dumb to understand it, and tells the VP, most of whose engineers have never heard of it, that he’s announcing it as a product next week.

    3. Months pass, during which analysts completely forget to ever ask about it again, even though the initial ship date has long since passed.

    4. Something using the same set of keywords but bearing little or no technological resemblance to the product that was initially announced is quietly shipped to three customers.

    I’m embarrassed for the people who still work for this worthless company. And they ought to be ashamed of themselves for lying there and taking it. I’m also ashamed for you, who ought to be indignant that Larry is using your work to promote a piece of garbage he’ll probably never ship while doing everything he can to make sure it remains of absolutely no benefit to anyone except himself.

    DTrace is dead, just like Solaris. It’s sad, but move on. It’s not like you did anything to prevent it.

  10. Written by Han Solo
    on October 8, 2011 at 4:48 am
    Permalink

    You can keep it.

    Linux already has SystemTap.

    Thanks but no thanks.

    • Written by Brendan Gregg
      on October 8, 2011 at 5:19 am
      Permalink

      Have you used them both?

    • Written by Keith Wesolowski
      on October 8, 2011 at 7:02 am
      Permalink

      I really hope you’re a troll, because SystemTap is complete garbage. The point of all this, and the real tragedy of all of it, is that DTrace isn’t coming to Linux at all. Something Oracle will call DTrace, which may or may not end up resembling DTrace itself, is coming to *Oracle Linux*, which is like saying the Fed is going to lower interest rates: they don’t mean *your* interest rates, silly! The only people who will be using it are people who’ve paid Larry for the privilege of using a tool created by our esteemed author, his colleague Mr. Cantrill, and the man who shall not be named because he deserves whatever Satan has in store for people who throw their friends to the wolves. Also unclear is just who is doing this “porting” and who will be planning to support, maintain, and extend the result. It’s not like Oracle’s Linux division is known for having a bunch of superstars on staff; Oracle is the land of the cog and the home of the interchangeable part.

      SystemTap is not a substitute for DTrace; it’s the cheapest of cheap knockoffs. Trust me, I’ve tried to use it and ended up stripping the whole thing down to the exception handler in an effort to build atop something that’s not hopelessly broken. Let’s see, imagine a stone cylinder, attached to another stone cylinder by some sort of rod… what a bloody waste of time. Quite honestly, Linux’s BUG() mechanism is a better starting point for a tracing framework than SystemTap.

      No, if SystemTap were any good I wouldn’t care. But anyone who’s used it knows it’s not. My money says you’ll never find a conforming DTrace implementation in any supported GNU/Linux distribution but Oracle’s (part of me wants to go ahead and discard the qualifiers because I don’t think I’m giving up much), but I guess we can give it 6 months and see what happens.

      • Written by ahl
        on October 8, 2011 at 3:20 pm
        Permalink

        … I’m just relieved that you hold me in ostensible esteem.

      • Written by Terry Lambert
        on October 9, 2011 at 4:37 am
        Permalink

        I have to agree with Keith. We recently has an issue where gettimeofday() was being called on Chrome OS an absurd number of times a second. On Mac OS X, which has dtrace, we were able to do a speculative trace and get the call stack up into user space to pinpoint the responsible call site.

        Linux’s systemtap operates at the boundary, and so is useless for something like this, where you have a system call, and you want a backtrace int a plugin loaded into a binary in user space that’s calling a particular system call.

        As another example, it’s possible to identify call sites for system calls resulting in an error. Definitionally, these are place where you are doing (potentially) a metric buttload of work on something you could apriori know was going to fail. The fastest work you can do is the work you don’t do because it would result in failure.

        We identified that the majority of failures was a case of something resulting in an fd being created by open/socket/etc…. some 900 times, but close being called some 70,000 times. Using a speculative trace, it was possible to pinpoint the place in Chrome where the close call on an already closed fd was being called (speculate, and only trace stacks for closes that result in errno=EBADF). This is a basically impossible task on a Linux system without a DTrace analogue.

        The systemtap facility has utility, it’s better to have it than not, and it’s orders of magnitude better than what existed before (nothing), but it falls orders of magnitude again below the utility of DTrace, even for just watching something run and seciding to reoder elements in a PATH environment variable to reduce path lookup misses.

        – Terry

  11. Written by ahl
    on October 8, 2011 at 5:19 am
    Permalink

    Welcome, slashdotters. If you want to wax vitriolic about licensing, do it somewhere else. If you want to talk about various technologies, great. If you want to share your opinions of Oracle, great. Ad hominem attacks against your author are a privilege allotted exclusively to former colleagues.

  12. Written by Jonathan Wilson
    on October 8, 2011 at 5:53 am
    Permalink

    Because the CDDL is an OSI-approved open source license, anyone is free to take any and all code Oracle releases (GPL or CDDL) and use/modify/distribute it.

    If Oracle thought they would be sued by Linux kernel developers for mixing CDDL and GPL code, they would not be doing it. So the only thing preventing RedHat or SUSE from using whatever Oracle releases is whether their own legal teams disagree with the Oracle legal assessment of the risks of being sued for mixing GPL and CDDL code.

    • Written by Terry Lambert
      on October 9, 2011 at 2:17 am
      Permalink

      The licensing issue is the same one we faced when we (when “we” meant “Apple”) ported DTrace to Mac OS X. It’s ultimately the issue that killed ZFS on Mac OS X.

      There are parts of the kernel that are severable, and parts that are not, for performance/operational reasons. The biggest is the trap handler for the FBT and SDT2.0 (cost-free SDT) interposition, which relies on an interrupt frequently shared with debuggers. Even if you could make it hookable without a performance penalty, on something like Mac OS X, it’s going to be shared by chud/shark developer tools, and it’s got to at least be possible to runtime select one or the other.

      The end run around binary only modules on the Linux side of things has been to declare kernel internal interfaces to be exported GPL’ed — one of the reasons there is no reasonable binary nVidia driver for Linux systems, and therefore no nVidia Wayland support (Wayland is one of the more promising technologies potentially supplant the aging X11).

      If they do release in CDDL, and hook deeply — I was very involved in the kernel part of the Mac OS X port, and they will in fact need to do so — those deep hooks can be declared GPL’ed or refactored to be DTrace-unfriendly, should they not simply dual-license the code.

      There is no discussion of technology in the Linux community that does not involve a discussion of license. In this particular case, I come down on the side of the community

      If Oracle dual-licenses the code rather than trying to carry around CDDL’ed modules, they can expect to have a great deal of help from the Linux community (which includes me – I work at Google these days).

      — Terry

  13. Written by Dimo
    on October 8, 2011 at 7:54 pm
    Permalink

    Glad to know fish folk are as eloquent and active as ever :)

  14. Written by UX-admin
    on October 9, 2011 at 3:01 pm
    Permalink

    …And indeed, the freetard outrage on Slashdot was very great; I think it is cheaper for us who are on Solaris to just stay on Solaris and keep enjoying the inherent luxuries of DTrace and ZFS without “waiting for Godot”.

    My take on the whole debacle is that Oracle is no longer relevant in either Linux or Solaris space: thanks to the efforts of you and Bryan, and Eric and Garrett and other Sun engineers who left, the rest of us have become inspired to roll out our own code and our own versions of the SunOS operating system. The point is that we are no longer dependent upon Oracle, and Oracle has made themselves irrelevant to those of us who would really use Solaris and push SunOS to the limit.

    And if I have seen further and have been able to achieve more, it is because I am standing on your collective shoulders… the shoulders of giants.

    Your work and your return to the Bell Labs’ culture and mentality are a tremendous inspiration. Before you guys decided to split off and do your own OS, maintaining one’s own kernel code, packaging subsystem and installer apart from the direction Oracle is taking was deemed largely impractical; you guys showed that it can be done and that it works. And for leading the way, those of us that are hardcore about Solaris owe you a debt of gratitude.

    • Written by Keith Wesolowski
      on October 10, 2011 at 3:41 am
      Permalink

      What a lovely fantasy land you live in; perhaps you’d consider a merger with Crazytown?

      Meanwhile, in the real world, you can rest assured that if Joyent, Delphix, or anyone else becomes a serious competitor to Oracle in a market or technology area it cares about, they will disappear in a flash of cash and anyone who wants to continue work on a SunOS-based operating platform will have to start all over… again.

      There are only two kinds of good things in the world: those that have been destroyed by Oracle, and those that will be.

  15. Written by UX-admin
    on October 10, 2011 at 9:32 pm
    Permalink

    Dear Keith, that is a lot of vitriol from a former Solaris kernel engineer. I would almost pay to sit down with you and hear what caused all that vitriol, although I have a pretty good idea.

    Now, about Oracle: it appears to me you misunderstand our motives: we are not going to compete with Oracle. Oracle simply does not matter, because Solaris 11 is unusable with IPS in it.

    This is not about competing with Oracle. This is about rolling out one’s own version of Solaris, complete with SVR4 pkgadd(1M), JumpStart anf Flash. Oracle can keep their “Oracle Solaris”. We do not need them. We can roll out our own Solaris, and fix our own kernel code. We can develop our own software for it.

    Oracle can stick their “Solaris 11″ and IPS where the Sun does not shine.

    • Written by Keith Wesolowski
      on October 11, 2011 at 6:59 pm
      Permalink

      If you’re ever in San Francisco, I’ll join you for a beer and tell you all about it. I hope you’re right about the future of non-Oracle Solaris; it will be a challenge to create a self-sustaining ecosystem that will remain innovative and relevant for an extended time. Whether it’s legal threats from Oracle or diversion of talent to Linux or simply lack of capital to build big new things, there are a lot of bombs to dodge on the way to that world.

      • Written by UX-admin
        on October 12, 2011 at 6:57 pm
        Permalink

        You’ll have to leave me your new e-mail address. Adam has mine, so he can supply it to you. Next time I’m in California, I will be looking forward to “storytime”. And beer.

        Regarding Oracle, I do not concern myself with them any more.
        The only thing that matters is code, and that we can get automate everything that can be automated with a computer.

  16. Written by ahl
    on October 10, 2011 at 10:50 pm
    Permalink

    After trying out DTrace on OEL, I wrote this follow-up called Oracle’s port: this is not DTrace.
    http://dtrace.org/blogs/ahl/2011/10/10/oel-this-is-not-dtrace/

  17. Written by Seth
    on October 10, 2011 at 11:16 pm
    Permalink

    This is fantastic entertainment. Thanks, guys.

  18. Written by Ramon F. Herrera
    on November 10, 2011 at 6:03 pm
    Permalink

    What needs to happen is the following: IBM, HP, RedHat and some big-leaguers like that port DTrace to Linux, AND give a lot of public crap to Oracle: “freeloaders”, “disrespect the OS community”, etc.

Subscribe to comments via RSS