Eric Schrock's Blog

Category: Software

So the first day of FISL has come to a close. I have to say it went better than expected, based on the quality of questions posed by the audience and visitors to the Sun booth. If today is any indication, my voice is going to completely gone by the end of the conference. I started off the day with a technical overview of Solaris 10/OpenSolaris. You can find the slides for this presentation here. Before taking too much credit myself, the content of these slides are largely based off of Dan’s USENIX presentation (thanks Dan!). This is a whirlwind tour of Solaris features – three slides per topic is nowhere near enough. Each of the major topics has been presented many times as a standalone 2-hour presentation, so you can imagine the corners I have to cut to cover them all.

My presention was followed by a great OpenSolaris overview from Tom Goguen. His summary of the CDDL was one of the best I’ve ever seen – it was the first time I’ve seen an OpenSolaris presentation without a dozen questions about GPL, CDDL, and everybody’s favorite pet license. Dave followed up with a detailed description of how Solaris is developed today and where we see OpenSolaris development heading in the future. All in all, we managed to cram 10+ hours of presentations into a measley 3 1/2 hours. For those of you who still have lingering questions, please stop by the Sun booth and chat with us about anything and everything. We’ll be here all week

After retiring to the booth, we had several great discussions with some of the attendees. The highlight of the day was when Dave was talking to an attendee about SMF (and the cool GUI he’s working on) and I was feeling particularly bored. Since my laptop was hooked up to the monitor in the “community theater”, I decided to play around with some DTrace scripts to come up with a cool demo. Within three minutes I had 4 or 5 people watching what I was doing, so I decided to start talking about all the wonders of DTrace. The 4 or 5 people quickly turned into 10 or 12, and pretty soon I found myself in the middle of a 3 hour mammoth DTrace demo, from which my voice is still recovering. This brings us to the major thing I learned today:

“If you DTrace it, they will come”

Technorati tags:

In the last few weeks, I’ve been completely re-designing the ZFS commands from the ground up1. When I stood back and looked at the current state of the utilities, several glaring deficiencies jumped out at me2. I thought I’d use this blog entry to focus on one that near and dear to me. Having spent a great deal of time with the debugging and observability tools, I’ve invariably focused on answering the question “How do I diagnose and fix a problem when something goes wrong?”. When it comes to command line utilities, the core this problem is in well-designed error messages. To wit, running the following (former) ZFS command demonstrates the number one mistake when reporting error messages:

# zfs create -c pool/foo pool/bar
zfs: Can't create pool/bar: Invalid argument
#

The words “Invalid argument” should never appear as an error message. This means that at some point in the software stack, you were able to determine there was a specific problem with an argument. But in the course of passing that error up the stack, any semantic information about the exact nature of the problem has been reduced to simply EINVAL. In the above case, all we know is that one of the two arguments was invalid for some unknown reason, and we have no way of knowing how to fix it. When choosing to display an error message, you should always take the following into account:

An error message must clearly identify the source of the problem in a way that that the user can understand.

An error message must suggest what the user can do to fix the problem.

If you print an error message that the administrator can’t understand or doesn’t suggest what to do, then you have failed and your design is fundamentally broken. All too often, error semantics are given a back seat during the design process. When approaching the ZFS user interface, I made sure that error semantics were a fundamental part of the design document. Every command has complete usage documentation, examples, and every possible error message that can be emitted. By making this part of the design process, I was forced to examine every possible error scenario from the perspective of an administrator.

A grand vision of proper failure analysis can be seen in the Fault Management Architecture in Solaris 10, part of Predictive Self Healing. A complete explanation of FMA and its ramifications is beyond the scope of a single blog entry, but the basic premise is to move from a series of unrelated error messages to a unified framework of fault diagnosis. Historically, when hardware errors would occur, an arbitrary error message may or may not have been sent to the system log. The error may have been transient (such as an isolated memory CE), or the result of some other fault. Administrators were forced to make costly decisions based on a vague understanding of our hardware failure semantics. When error messages did succeed in describing the problem sufficiently, they invariably failed in suggesting how to fix the problem. With FMA, the sequence of errors is instead fed to a diagnosis engine that is intimately familiar with the characteristics of the hardware, and is able to produce a fault message that both adequately describes the real problem, as well as how to fix it (when it cannot be automatically repaired by FMA).

Such a wide-ranging problem doesn’t necessarily compare to a simple set of command line utilities. A smaller scale example can be seen with the Solaris Management Facility. When SMF first integrated, it was incredibly difficult to diagnose problems when they occurred3. The result, after a few weeks of struggle, was one of the best tools to come out of SMF, svcs -x. If you haven’t tried this command on your Solaris 10 box, you should give it a shot. It does automated gathering of error information and combines it into output that is specific, intelligible, and repair-focused. During development of the final ZFS command line interface, I’ve taken a great deal of inspiration from both svcs -x and FMA. I hope that this is reflected in the final product.

So what does this mean for you? First of all, if there’s any Solaris error message that is unclear or uninformative that is a bug. There are some rare cases when we have no other choice (because we’re relying on an arbitrary subsystem that can only communicate via errno values), but 90% of the time its because the system hasn’t been sufficiently designed with failure in mind.

I’ll also leave you with a few cardinal4 rules of proper error design beyond the two principles above:

  1. Never distill multiple faults into a single error code. Any error that gets passed between functions or subsystems must be traceable back to a single specific failure.
  2. Stay away from strerror(3c) at all costs. Unless you are truly interfacing with an arbitrary UNIX system, the errno values are rarely sufficient.
  3. Design your error reporting at the same time you design the interface. Put all possible error messages in a single document and make sure they are both consistent and effective.
  4. When possible, perform automated diagnosis to reduce the amount of unimportant data or give the user more specific data to work with.
  5. Distance yourself from the implementation and make sure that any error message makes sense to the average user.

1No, I cannot tell you when ZFS will integrate, or when it will be available. Sorry.

2This is not intended as a jab at the ZFS team. They have been working full steam on the (significantly more complicated) implementation. The commands have grown organically over time, and are beginning to show their age.

3Again, this is not meant to disparage the SMF team. There were many more factors here, and all the problems have since been fixed.

4 “cardinal” might be a stretch here. A better phrase is probably “random list of rules I came up with on the spot”.

This little gem came up in conversation last night, and it was suggested that it would make a rather amusing blog entry. A Solaris project had a command line utility with the following, unspeakably horrid, piece of code:

/*
* Use the dynamic linker to look up the function we should
* call.
*/
(void) snprintf(func_name, sizeof (func_name), "do_%s", cmd);
func_ptr = (int (*)(int, char **))
dlsym(RTLD_DEFAULT, func_name);
if (func_ptr == NULL) {
fprintf(stderr, "Unrecognized command %s", cmd);
usage();
}
return ((*func_ptr)(argc, argv));

So when you type “a.out foo”, the command would sprintf into a buffer to make “do_foo”, and rely on the dynamic linker to find the appropriate function to call. Before I get a stream of comments decrying the idiocy of Solaris programmers: the code will never reach the Solaris codebase, and the responsible party no longer works at Sun. The participants at the dinner table were equally disgusted that this piece of code came out of our organization. Suffice to say that this is much better served by a table:

for (i = 0; i < sizeof (func_table) / sizeof (func_table[0]); i++) {
if (strcmp(func_table[i].name, cmd) == 0)
return (func_table[i].func(argc, argv));
}
fprintf(stderr, "Unrecognized command %s", cmd);
usage();

I still can’t imagine the original motivation for this code. It is more code, harder to understand, and likely slower (depending on the number of commands and how much you trust the dynamic linker’s hash table). We continually preach software observability and transparency – but I never thought I’d see obfuscation of this magnitude within a 500 line command line utility. This prevents us from even searching for callers of do_foo() using cscope.

This serves as a good reminder that the most clever way of doing something is usually not the right answer. Unless you have a really good reason (such as performance), being overly clever will only make your code more difficult to maintain and more prone to error.

Update – Since some people seem a little confused, I thoght I’d elaborate two points. First off, there is no loadable library. This is a library linked directly to the application. There is no need to asynchronously update the commands. Second, the proposed function table does not have to live separately from the code. It would be quite simple to put the function table in the same file with the function definitions, which would improve maintainability and understability by an order of magnitude.

So my last few posts have sparked quite a bit of discussion out there, appearing on slashdot as well as OSNews. It’s been quite an interesting experience, though it’s had a significant effect on my work productivity today 🙂 While I’m not responding to every post, I promise that I’m reading them (and thanks to those of you sending private mail, I promise to respond soon).

I have to say that I’ve been reasonably impressed with the discussion so far. Slashdot, as usual, leaves something to be desired (even reading at +5), but the comments in my blog and in my email have been for the most part very reasonable. There is a certain amount of typical fanboy drivel (more so on the pro-Linux side, but only because Solaris doesn’t have many fanboys). But there’s also a reasonable contingent on Slashdot fighting down the baseless arguments of the zealots. In the past, the debate has been rather one-sided. Solaris is usually dismissed as an OS for big computers for people with lots of money. Sun has traditionally let our marketing department do all the talking, which works really well for CEOs and CTOs (our paying customers), but not as well for spreading detailed technical knowledge to the developer community. We’re changing our business model – encouraging blogs, releasing Solaris Express, hosting discussions with actual kernel engineers, and eventually open sourcing Solaris – to encourage direct connections with the community at large.

We’ve been listening to the (often one-sided) discussion for a long time now, and it shows in Solaris. Solaris 10 has killer performance, even on single- and dual-processor x86 machines. Hardware support has been greatly improved (S10 installed on my Toshiba laptop without a hitch). We’re focusing on the desktop again, with X.Org integration, Gnome 2.6, Mozilla 1.7, and better open source packages all around. Sure, we’re still playing catchup in a lot of ways, but we’re learning. I only hope the Linux community can learn from Solaris’s strengths, and dismiss many of the Solaris stereotypes that have been implanted (not always without merit) over the course of history. Healthy competition is good, and can only benefit the customer.

As much as I would like to continue this debate forever, I think it’s time I get back to doing what I really love – making Solaris into the best OS it can be. I’ll probably be focusing on more technical posts for a while, but I’ll revive the discussion again at a future point. Until then, feel free to continue posting comments or sending me mail. I do read them, even if I don’t respond publicly.

So it seems my previous entry has finally started to stir up
some controversy. I’ll address some of the technical issues raised
here
shortly. But first I thought I’d clarify my view of the GPL, with the help of an analogy:

Let’s say that I manufacture wooden two by fours, and that I want to make
them freely available under an “open source” license. There are several options
out there:

  1. You have the right to use and modify my 2x4s to your hearts content.

    This is the basis for open source software. It protects the rights of the
    consumer, but imparts few rights to the developer.

  2. You have the right to use my 2x4s however you please, but if you modify
    one, then you have to make that modification freely available to the public in
    the same fashion as the original.

    This gives the developer a few more guarantees about what can and cannot
    be done with his or her contributions. It protects the developer’s rights
    without infringing on the rights of the consumer.

  3. You have the right to use my 2×4 as-is, but if you decide to build a
    house with it, then your house must be as freely available as my 2×4.

    This is the provision of the GPL that I don’t agree with, and neither do
    customers that we’ve talked to. It protects my rights as a developer, but
    severely limits the rights of the consumer in what can and cannot be done with my public
    donation.

This analogy has some obvious flaws. Open source software is neither excludable nor rival, unlike the house I just built. There is also a tenuous
line between derived works and fair use. In my example, I wouldn’t have the
right to the furniture put into your house. But I feel like its a reasonable
simplification of my earlier point.

As an open source advocate, I would argue that #1 is the “most free”. This
is why, in many ways, the BSD license is the “most open” of all the main
licenses. As a developer, I would argue that #2 is the best solution. My
contribution is protected – no one can make changes without giving it back to me(and the community at large). But my code is essentially a service, and I feel
everyone should have a right to that service, even if they go off and make money
from it.

The problems arise when we get to #3, which is the essential controversy of
the GPL. To me, this is a personal choice, which is why GPL advocacy often
turns into pseudo-religious fanaticism. In many ways, arguing with a GPL zealot
is like an atheist arguing with a religious fundamentalist. In the end, they
agree on nothing. The atheist leaves understanding the fundamentalist’s beliefs
and respects his or her right to have them. The fundamentalist leaves beliving
that the atheist will burn in hell for not accepting the one true religion.

This would be fine, except that GPL advocates often blur the line between #2
and #3, and make it seem like the protections of #2 can only be had if you fully
embrace the GPL in all its glory. I support the rights provided by #2. You
can scream and shout about the benefits of #3 and how it’s an inalienable right
of all people, but in the end I just don’t agree. Don’t equate the GPL with
open source – if you do want to argue the GPL, make it very clear which points you are arguing for.

One final comment about GPL advocacy. Time and again I see people talk about
easing migration, avoiding vendor lockin, and the necessity of consumer choice.
But in the same breath they turn around and scream that you must accept the GPL,
and any other license would be pure evil (at best, a slow and painful death). Why is it that we have the right to choose
everything except our choice of license? I like Linux. I like the GPL. The
GPL is not evil. There are a lot of great projects that benefit from the GPL. But it isn’t everything to all people, and in my opinion it’s not
what’s best for OpenSolaris.

[ UPDATE ]

As has been enumerated in the comments on this post, the original intent of the analogy is to show the the definition of derived works. As mentioned in the comments:

Say I post an example of a function foo() to my website. Oracle goes and uses that function in their software. They make no changes to it whatsover, and are willing to distribute that function in source code form with their product. If it was GPL, they would have to now release all of Oracle under the GPL, even though my code has not been altered. The consumer’s rights are preserved – they still have the same rights to my code as before it was put into Oracle. I just don’t see why they have a right to code that’s not mine.

Though I didn’t explain it well enough, the analogy was never intended to encompass right to use, ownership, distribution, or any of the other qualities of the GPL. It is a specific issue with one part of the GPL, and the analogy is intentionally simplistic in order to demonstrate this fact.

So it’s been a while since my KMDB post, but I promised I would do some investigation into kernel debugging on the Linux side. Keep in mind that I have no Linux kernel experience. While I will try to be thorough in my research, there may be things I miss simply from lack of experience or a good test system. Feel free to comment on any errors or omissions.

We’ll try to solve the same problem that I approached with KMDB in the last post: a deadlock involving reader-writer locks. Linux has a choice of two debuggers, kdb and kgdb (though User Mode Linux presents interesting possibilities). In this post I’ll be taking a look at KDB.

Fire up KDB

Chances are you’re not running a Linux kernel with KDB installed. Some distros (like Debian) make it easier to download and apply the patch, but none seems to include it by default (admittedly, I didn’t do a very thorough search). This means you’ll have to go download the patch, apply it, tweak some kernel config variables (CONFIG_KDB and CONFIG_FRAME_POINTER), recompile/reinstall your kernel, and reboot. Hopefully you’ve done all this beforehand, because as soon you reboot you’ve lost your bug (possibly forever – race conditions are fickle creatures). Assuming you were running a kdb-enabled kernel when you hit this bug, you then run:

# echo "1" > /proc/sys/kernel/kdb

And then press the ‘pause’ key on your keyboard. Alternatively, you can hook up a serial console, but I’ll opt for the easy way out.

Find our troubled thread

First, we need to find the pid our offending process. The only way to do this is to use the 'ps' command to display all processes on the system, and then pick out (visually) which pid belongs to our ‘ps’ process. Once we have this information, we can then use 'btp <pid>' to get a stack trace.

Get the address of the rwlock

This step is very similar to the one we took when using kmdb. The stack trace produced by 'btp' includes frame pointers like kmdb’s $C. Looking back over my kmdb post, it wasn’t immediately clear where I got that magic starting number – it came from the frame pointer in the (verbose) stack trace. In any case, we use 'id <addr>' to disassemble the code around our call site. We then use 'mdr <addr+offset>' to examine the memory where the original value is saved. This gets much more interesting (painful) on amd64, where arguments are passed in registers and may not get pushed on the stack until several frames later.

Without a paddle?

At this point, the next step should be “Find who owns the reader lock.” But I can’t find any commands in the kdb manpages that would help us determine this. Without kmdb’s ::kgrep, we’re stuck searching for a needle in a haystack. Somewhere on this system, one or more threads have referenced this rwlock in the past. Our only course of action is to try 'bta', which will give us a stack trace of every single process on the system. With a deep understanding of the code, a great deal of persistence, and a little bit of luck, we may be able to pick out the offending stack just by sight. This quickly becomes impractical on large systems, not to mention difficult to verify and prone to error.

With KDB we can do some basic debugging tasks, but it still relies on giant “leaps of faith” to correlate two pieces of seemingly disjoint data (two thread involved in a deadlock, for example). As a point of comparison, KDB provides 40 different commands, while KMDB provides 771 (356 dcmds and 415 walkers on my current desktop). Next week I’ll look at kgdb and see if it fills in any of these gaps.

Recent Posts

April 21, 2013
February 28, 2013
August 14, 2012
July 28, 2012

Archives