Solaris 11 DTrace syscall Provider Changes

Oracle Solaris 11 dropped many commonly used probes from the DTrace syscall provider, a disappointing side-effect of some code refactoring in the system call trap table (PSARC 2010/441 “delete obsolete system call traps”). This breaks a lot of scripts and one liners, including many that are used to teach beginners DTrace. Functionality is still (I think) possible, albeit by learning trap table mappings and tracing those. Given how commonly used and taught the syscall provider is, this is not a minor bug or nit (as other providers have had), rather it’s the biggest regression in DTrace’s history.

In this post I’ll explain the changes by showing what happened to the syscall::open:entry probe. For a summary of the affected probes and necessary changes, see the New System Calls and Deleted System Calls lists. This only affects Oracle Solaris 11, all other related operating systems (including Solaris 10, Illumos and SmartOS) remain as before.

Solaris 10

This one-liner traces open() syscalls in Solaris 10, showing process and file names:

# dtrace -n 'syscall::open:entry { printf("%s %s", execname, copyinstr(arg0)); }'

It follows the open() syscall as defined by the POSIX standard and the open(2) man page:

     int open(const char *path, int oflag, /* mode_t mode */);

Mapping this to the one-liner is straighforward, easy and intuitive. It’s commonly introduced to beginners learning DTrace.

Here’s another example. This is the openat() syscall, which was standardized in POSIX.1-2008:

# dtrace -n 'syscall::openat:entry { printf("%s %s", execname, copyinstr(arg1)); }'

It follows the synopsis:

     int openat(int fildes, const char *path, int oflag, /* mode_t mode */);

This time arg1, not arg0, refers to the pathname.

While this is still straightforward, there were some syscall provider probes that were not, and were documented in the DTrace guide. The syscall provider isn’t actually a stable interface, and didn’t match exactly the POSIX syscall names. While this was a minor nuisance at times, the syscall provider generally did work as expected: tracing syscalls.

Oracle Solaris 11

Tracing only open() in Oracle Solaris 11 may not be possible. You are supposed to use this instead:

# dtrace -n 'syscall::openat:entry { printf("%s %s", execname, copyinstr(arg1)); }'

On Oracle Solaris 11, this traces both the open() and openat() syscalls. On Solaris 10, this traces just openat(). And if you try to trace open() on Oracle Solaris 11, you get an error:

# dtrace -n 'syscall::open:entry { printf("%s %s",execname,copyinstr(arg0)); }'
dtrace: invalid probe specifier syscall::open:entry { printf("%s %s",execname,copyinstr(arg0)); }:
probe description syscall::open:entry does not match any probes

While open() is still a supported syscall in Oracle Solaris 11 (it needs to be for POSIX), it’s no longer present in the DTrace syscall provider, making the provider not work as one may expect.

The syscall provider isn’t showing the POSIX defined syscall interface, it’s exposing the function names in the syscall trap table, as defined in uts/common/os/sysent.c. In fact, it always has.

The syscall trap table was an attractive location to instrument, as all syscalls could be caught from one place and with similar context. The down side was that the trap table names didn’t match the POSIX syscall names exactly. Oracle Solaris 11 stretches the difference much wider and much more noticeable. What was a minor interface bug is now an eyesore, and what was once the basis for learning DTrace becomes a pitfall into Solaris internals.

It’s not all bad news. The advantage for the DTrace user is that the above one-liner is more powerful: you won’t miss an openat() if you just trace open(). However, you may still miss the 64-bit file offset transitional interface calls: open64() and openat64(), which are now both traced using the single syscall::openat64:entry probe (syscall::open64:entry is gone too).

Making the change

The Oracle Solaris 11 syscall provider changes are listed here in the New and Deleted System Call sections. Other syscalls affected included: chmod(), creat(), mkdir(), readlink(), rename(), rmdir(), stat(), symlink() and unlink(). To see what syscall probes are present, either cat /etc/name_to_sysnum or “dtrace -ln ‘syscall:::entry’”.

Fortunately, it should only take minutes to update scripts and one-liners to match Solaris 11. The DTraceToolkit has already been updated (by Oracle), and will be shipped in /usr/dtrace/DTT (thanks!). Jim and I are working on the errata for the DTrace book. I assume all other sources of DTrace documentation will be noting the Oracle Solaris 11 changes as well.

Who was born on 3/04/1965?

I mentioned earlier that it may not be possible to achieve the exact same functionality as Solaris 10, namely, using the syscall provider to trace just the open() syscall and not both open() and openat(). Here’s my attempt:

# dtrace -n 'syscall::openat:entry /(int)arg0 == -3041965/ { printf("%s %s", execname,
    copyinstr(arg1)); }'

This traces both open() and openat() as before, but filters (hopefully) just the open() calls by matching the first argument (arg0) to be the value for AT_FDCWD, which is defined to be a so-called “magic” number: 0xffd19553 or -3041965. I’m assuming the latter, as other OSes implement AT_FDCWD in the invalid negative range for file descriptors (Linux uses -100, for example). -3041965 appears, to me, to be a date (3/04/1965), possibly the birthday of the engineer who wrote the code. Such easter eggs date back to the Unix File System (if not FFS or earlier) where the on-disk magic number was the birthday of one of the engineers – either Marshell Kirk McKusick or Bill Joy (incidentally, I wrote a program years ago to search for this on disks: findbill.pl).

I expect open() to map to openat(AT_FDCWD, …), which will be matched by this one-liner, and in-effect trace open() but not openat() (with a valid file descriptor). But what happens if openat() is explicitly called with AT_FWCWD? This one-liner and approach probably won’t work, and it may not be possible to do with the syscall provider alone. Fortunately, it may not really matter: the syscall provider can identify that a type of open() syscall happened, which would be sufficient in most cases, and when not you use the DTrace pid provider to see what actual syscall was used.

Instead of hardcoding -3041965 in the one-liner, it would be better to type “AT_FDCWD”, but this constant is unknown to DTrace.

So, G’Day -3041965, whoever you are!

Why are we even here?

This change wasn’t implemented for the possible DTrace advantage mentioned above: that was a side effect. This was housekeeping to eliminate some duplicate code, as stated in the case title: PSARC 2010/441 “delete obsolete system call traps”.

Since open() and openat() have duplicated functionality, you may assume that this eliminates hundreds of duplicate lines. This is not the case. From uts/common/syscall/open.c:

int
open(char *path, int fmode, int cmode)
{
        return (openat(AT_FDCWD, path, fmode, cmode));
}

Any duplication here was already eliminated long ago, with open() a wrapper to openat().

I think this is just about duplication in the syscall trap table. (I can’t check myself, the Oracle Solaris 11 code has not yet been released.) This how the code looked before (uts/common/os/sysent.c):

struct sysent sysent32[NSYSCALL] =
[...]
        /*  5 */ SYSENT_CI("open",              open32,         3),
[...]
        /* 68 */ SYSENT_CI("openat",            openat32,       4),

By reducing duplicates, the “open” and “openat” lines can become just one line, “openat”.

Given the desire to do this, and the risk to the syscall provider, there were at least four options:

Oracle picked (C). It wasn’t presented for discussion with the DTrace community (dtrace-discuss), and the PSARC case was private (as Oracle shut the doors on these since 2010/329). There could be additional reasons for this choice that have yet to be made public.

truss -ftopen

While DTrace’s syscall::open has become syscall::openat, the truss(1) utility has been changed to stick to the POSIX interface despite the Solaris 11 changes. It will report openat(AT_FDCWD, …) as open(), for example. To see what’s really happening, you can use the -x option; from the man page:

Displays the arguments to the specified system calls (if traced by -t) in raw form, usually hexadecimal, rather than symbolically. This is for unredeemed hackers who must see the raw bits to be happy. Default is -x!all.

Also, “truss -ftopen command” still works (a commonly used one-liner for debugging file opens).

This puts truss(1) in a better position for the Solaris 11 changes than DTrace. Indeed, if truss can be modified to match the changes, could DTrace be too?

Summary

DTrace scripts and one-liners that use the syscall provider may need updating for Oracle Solaris 11. Many syscall probes were deleted, and grouped into others of a similar type. In this post I discussed what happened to the syscall::open:entry probe, which in Oracle Solaris 11 is now part of syscall::openat:entry.

While this change breaks many existing DTrace scripts, documentation and tutorials (on Oracle Solaris 11 only), it doesn’t break DTrace itself. It’s just one provider out of many (albeit the first you usually use), and the rest of DTrace still provides enormous value.

There is some very good news for DTrace on Oracle Solaris 11: the ip, tcp and udp providers (which I created and originally developed) have been integrated (thanks Alan Macguire and all who helped!). The iscsi, cpc and kerberos providers are there too.

Print Friendly
Posted on November 9, 2011 at 3:51 pm by Brendan Gregg · Permalink
In: Solaris · Tagged with: , , ,

4 Responses

Subscribe to comments via RSS

  1. Written by Chris
    on November 10, 2011 at 12:00 am
    Permalink

    I’ve been out of proper sysadmin-ing too long, as I’m not sure I even noticed the openat() call. Looking at it from this point of view, one could argue that the change is actually helpful. I can imagine tracing open() calls when some application is doing openat() and wondering what the heck is going on!

    But then I understand you as saying that truss _will_ differentiate, so I guess I’ll still be confused.

    In the end, it just helps to know how and why stuff works.

    • Written by Brendan Gregg
      on November 10, 2011 at 11:59 am
      Permalink

      G’Day Chris. Yes, I understand this line of argument, and in that case it will help (it doesn’t prevent you from missing the 64-bit versions though, so you are still at risk of missing calls).

      If this change were to force the user to learn POSIX.1-2008 and then use that as the interface, that would be one thing. This isn’t, it’s forcing you to learn POSIX.1-2008, and then learn something else – Oracle Solaris 11 trap table implementation – and use that instead. Tracing openat() to trace both open() and openat() doesn’t come from POSIX.1-2008, that comes from Oracle Solaris 11 only. And for users who do know POSIX.1-2008, you’ll still be confused: tracing syscall::open:entry should work, but doesn’t.

      And, for users who know POSIX.1-2008 and the Oracle Solaris 11 trap table: once you start tracing openat() only, you are at risk of missing open()s on Solaris 10 (and everywhere else).

      In many more ways this change will hurt, rather than help. It would have been great to discuss this among the community (dtrace-discuss) before making the change. Now that it’s been done, the syscall provider on Oracle Solaris 11 will be the anomaly: where basic calls including open() don’t work as expected.

  2. Written by Andrew
    on November 10, 2011 at 12:37 pm
    Permalink

    Does POSIX actually specify a system call implementation and naming convention? I thought the stable representation in Solaris was the libc interfaces (unlike Linux), and any correlation between the two was purely an implementation artifact.

    Not that it doesn’t suck to change the syscall provider, I just don’t see how it’s a violation of POSIX in any way.

    • Written by Brendan Gregg
      on November 10, 2011 at 12:55 pm
      Permalink

      G’Day Andrew. Yes, you are right: Solaris 11 remains POSIX compliant because the syscalls are there, regardless of how they are implemented (trap table or libc).

      The problem is a user who knows or learns POSIX who then tries to use the syscall provider. POSIX.1-2008 has open(), but in Oracle Solaris 11 there is no open() in the syscall provider, only openat(). It’s not affecting operating system POSIX compliance, rather it’s affecting DTrace usability.

Subscribe to comments via RSS