Tweaking memory on the fly

DTrace can modify process address space programatically based on events, such as user-level functions or syscalls. This could be handy if you have a known software issue, and overwriting some bytes is a convenient workaround until the real software fix is available. When DTrace was first released, there was a popular demo of this making the rounds, which spoofed uname(1).

There are two functions which can do this, and they are two of the most dangerous functions in DTrace: copyout() and copyoutstr(). As a safety measure they are only available when you are using DTrace with the “destructive” option (either via a #pragma D option, or -w). The danger is from using these incorrectly, and accidentally overwriting the wrong data or overwriting the wrong location. This could quickly cause the process to fault and core dump, or worse, it could cause silent data corruption.

I was just testing this for a suggestion regarding a bug with some commercial software in Solaris zones. The bug is where the software is checking root level inodes for verification, and doesn’t like how these are different in a zone. This is how it looks in a normal (global) system:

system$ ls -lai /
total 746
         2 drwxr-xr-x  44 root     root        1536 Mar 30 18:03 .
         2 drwxr-xr-x  44 root     root        1536 Mar 30 18:03 ..
[...]

and here is a zone (this is a Joyent SmartMachine):

myzone$ ls -lai /
total 5110
         4 drwxr-xr-x  18 root     root          20 Dec  1  2010 .
         3 drwxr-xr-x  18 root     root          20 Dec  1  2010 ..
[...]

The first column is the inode number, and in a zone environment the inode for the “.” and “..” directories are different (4, 3) betraying the fact that this isn’t the real root directory (it’s virtualized). For some reason the commercial software doesn’t like that, and one suggestion was to use DTrace to tweak getdents(), in lieu of a fixed version.

To try this out, I’ll use DTrace to tweak the behavior of the ls(1) command. We’ll know it works if running the “ls -lai /” command above shows an inode number of 3 instead of 4 for the first entry.

copyin()

I’ll start by checking what ls(1) is actually using to read the directory:

myzone# dtrace -n 'syscall:::entry /execname == "ls"/ { @[probefunc] = count(); }'
dtrace: description 'syscall:::entry ' matched 236 probes
^C

  fsat                                                              1
  getpid                                                            1
  getrlimit                                                         1
  open64                                                            1
  read                                                              1
  readlink                                                          1
  rexit                                                             1
  sysi86                                                            1
  fcntl                                                             2
  getdents64                                                        2  <-- found it
  ioctl                                                             2
  setcontext                                                        2
  sysconfig                                                         2
  mmapobj                                                           3
  fstat64                                                           4
  memcntl                                                           5
  resolvepath                                                       5
  close                                                             6
  open                                                              6
  brk                                                               8
  doorfs                                                            8
  getuid                                                            9
  mmap                                                              9
  pathconf                                                         20
  lstat64                                                          21
  write                                                            21
  stat64                                                           25
  acl                                                              40
  gtime                                                            64

With the DTrace one-liner running, I executed "ls -lai /" in another window, then hit Ctrl-C. This shows that it's using getdents64(), the 64-bit version of the get-directory-entry call.

It's prototype, from the man page, is:

     int getdents(int fildes, struct dirent *buf, size_t nbyte);

It populates the buffer pointer with multiple entries, returning the size.

DTrace can examine the state of this buffer when the function returns. Since the buf pointer is pointing to a user-land address, and DTrace is running in kernel-land, in order for DTrace to inspect the data we must use copyin():

myzone# dtrace -n 'syscall::getdents64:entry /execname == "ls"/ { self->p = arg1; }
    syscall::getdents64:return /self->p/ { tracemem(copyin(self->p, arg1), 100); }'
dtrace: description 'syscall::getdents64:entry ' matched 2 probes
CPU     ID                    FUNCTION:NAME
  4    571                getdents64:return
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 04 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
        10: 18 00 2e 00 00 00 00 00 03 00 00 00 00 00 00 00  ................
        20: 02 00 00 00 00 00 00 00 18 00 2e 2e 00 00 00 00  ................
        30: 0e 00 00 00 00 00 00 00 37 2b fb 11 00 00 00 00  ........7+......
        40: 18 00 62 69 6e 00 00 00 0d 00 00 00 00 00 00 00  ..bin...........
        50: fd 00 fd 11 00 00 00 00 18 00 75 73 72 00 00 00  ..........usr...
        60: 0f 00 00 00                                      ....

  4    571                getdents64:return
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: b7 c6 82 13 00 00 00 00 18 00 65 74 63 00 00 00  ..........etc...
        10: 53 00 02 00 00 00 00 00 83 d7 18 15 00 00 00 00  S...............
        20: 18 00 63 6f 72 65 00 00 3b 1f 00 00 00 00 00 00  ..core..;.......
        30: 3e 46 b1 16 00 00 00 00 18 00 72 6f 6f 74 00 00  >F........root..
        40: f6 01 00 00 00 00 00 00 4e 4e d8 16 00 00 00 00  ........NN......
        50: 18 00 70 72 6f 63 00 00 0a 00 00 00 00 00 00 00  ..proc..........
        60: 53 c1 b7 17                                      S...

Inode numbers and directory names are visible in the buffer. I printed it using tracemem() which does the neat hex-dumps.

The very first byte in the first return is the one we want to change - from 4 to 3.

copyout()

While copyout() can write bytes back to user-land, there is a problem to start with: this is an array directory entries, for which we only want to modify one entry. DTrace does not currently have loops - so stepping over this array and searching for the "." entry is difficult (one option is unrolled loops).

To do this, I'll assume that the first entry is always the "." entry. Seems to be that case whenever I've tried.

# cat -n getdents.d
     1  #!/usr/sbin/dtrace -Cs
     2
     3  #pragma D option destructive
     4
     5  #include <dirent.h>
     6
     7  syscall::getdents*:entry
     8  /zonename == "myzone" && execname == "ls"/
     9  {
    10          self->buf = arg1;
    11  }
    12
    13  syscall::getdents*:return
    14  /self->buf && arg1 > 0/
    15  {
    16          /* modify first entry of ls(1) getdents() */
    17          this->dep = (struct dirent *)copyin(self->buf, sizeof (struct dirent));
    18          this->dep->d_ino = 3;
    19          copyout(this->dep, self->buf, sizeof (struct dirent));
    20          exit(0);
    21  }
    22
    23  syscall::getdents*:return
    24  /self->buf/
    25  {
    26          self->buf = 0;
    27  }

Note the destructive pragma on line 3, which is needed to allow the copyout() on line 19.

This script also uses the C preprocessor by adding the -C option on line 1, allowing the #include on line 5, which defined the "struct dirent" for lines 17 and 18.

Lines 23-27 aren't really necessary, as we should have exited on line 20.

Does it work?

global# ./getdents.d
dtrace: script './getdents.d' matched 4 probes
dtrace: allowing destructive actions
CPU     ID                    FUNCTION:NAME
  3    571                getdents64:return 

myzone# ls -lai /
total 5110
         3 drwxr-xr-x  18 root     root          20 Dec  1  2010 .
         3 drwxr-xr-x  18 root     root          20 Dec  1  2010 ..

Yes!

But I'm running this from the global zone (hence the check on line 8 for the zonename). I think this would probably make more sense to run within the zone, provided the zone can use DTrace in the first place (for example, having limitpriv="default,dtrace_proc,dtrace_user" in /etc/zones/myzone.xml):

myzone# ./getdents.d 
dtrace: script './getdents.d' matched 4 probes
dtrace: allowing destructive actions
dtrace: error on enabled probe ID 3 (ID 571: syscall::getdents64:return): invalid user access in action #3 at DIF offset 52
dtrace: error on enabled probe ID 3 (ID 571: syscall::getdents64:return): invalid user access in action #3 at DIF offset 52

This doesn't work. I'm running the ls(1) command as the user "brendan", and dtrace(1M) is running as root. If I run the ls(1) command as root - it works fine.

It looks like that when in a zone, DTrace can only copyout() to processes with the same user as the dtrace(1M) process. To have this DTrace script write to a brendan-owned ls(1M) command, I had to run the DTrace script as user brendan (which I did by giving a brendan-owned shell DTrace privileges: "ppriv -s A+dtrace_user,dtrace_proc PID"). This looks like a bug with how the kernel privilege checks work for copyout() in zones.

I hit a 3rd issue as well. I originally wrote the program to trace "getdents64", instead of the wildcard "getdents*" seen in the above script. But that didn't work:

myzone# grep syscall getdents64.d 
syscall::getdents64:entry
syscall::getdents64:return
myzone# ./getdents64.d -c 'ls -lai /'
dtrace: script './getdents64.d' matched 2 probes
dtrace: allowing destructive actions
total 5110
         4 drwxr-xr-x  18 root     root          20 Dec  1  2010 .
         3 drwxr-xr-x  18 root     root          20 Dec  1  2010 ..
[...]

Now it doesn't even see the events.

Fortunately I don't think this is a DTrace bug, but rather something unintended from #including the dirent.h file and using the C preprocessor. Changing the script to avoid that, and just treat the first bytes as an int:

# cat -n getdents64_int.d
     1  #!/usr/sbin/dtrace -s
     2
     3  #pragma D option destructive
     4
     5  syscall::getdents64:entry
     6  /zonename == "myzone" && execname == "ls"/
     7  {
     8          self->buf = arg1;
     9  }
    10
    11  syscall::getdents64:return
    12  /self->buf && arg1 > 0/
    13  {
    14          /* modify first entry of ls(1) getdents() */
    15          this->dep = (int *)alloca(sizeof (int));
    16          this->dep[0] = 3;
    17          copyout(this->dep, self->buf, sizeof (int));
    18          exit(0);
    19  }
    20
    21  syscall::getdents64:return
    22  /self->buf/
    23  {
    24          self->buf = 0;
    25  }

Since this isn't modifying the d_ino member of struct dirent, there seemed little point doing the copyin(), so I've used alloca() on line 15 to create a buffer instead.

Putting this to the test:

myzone# ./getdents64_int.d -c 'ls -lai /'
dtrace: script './getdents64_int.d' matched 2 probes
dtrace: allowing destructive actions
total 5110
         3 drwxr-xr-x  18 root     root          20 Dec  1  2010 .
         3 drwxr-xr-x  18 root     root          20 Dec  1  2010 ..
[...]

That's better. Simple works.

DTrace often works smoothly, but sometimes (like with all software) there can be nits to workaround. We can get these fixed. I hope this quick post is useful for anyone else trying this capability.

Print Friendly
Posted on June 30, 2011 at 6:13 pm by Brendan Gregg · Permalink
In: DTrace · Tagged with: ,

4 Responses

Subscribe to comments via RSS

  1. Written by kevincreason
    on July 1, 2011 at 11:28 am
    Permalink

    Thanks Brendan!
    The final 25 line example didn’t work on my zone exactly as intended. I’m trying to figure it out– but not there yet.
    I don’t think its a typo, I’ve checked and rechecked. Are you using little endian or big endian? my system is big endian.
    I’m also using a zpool, not ufs. My system is a Solaris 5.10 Generic_144488-06 sun4v sparc SUNW,T5140.

    I’m about six zones into my zpool on this lab box, so my inodes for the test system are six digits long (397727 for . and 415469 for ..).
    Instead of replacing the inode it prefixes the hex byte for the inode with the number it actually ups it way up to 8590332319. I ran hex conversion on the original and new inode and it appears that it just updated the first bit to my replacement value (2) giving me 0-20006119f compared to the original 0-06119f.
    Interesting, but over my immediate knowledge.
    So I went back to your beginning and began applying it my specific case, lmgrd and the sub-process that handles the license. I determined it is the subprocess that does this ridiculous inode check and fails, working through your other steps I figured out that I can use the 27 line example (next to last) to do what I want. I modified the SMF start script to run this as the license user first first, sleep a few seconds, and then start the license manager.
    Presto chango! It worked! Both license processes are running, the dtrace pre-process exits cleanly, and licenses are being issues by the daemon.
    Thanks to you, I will have a solution to virtualize this ancient old system that needs to keep ticking another decade, as long as Station keeps flying… and I’m going to play with this a bit longer to see if there is a more elegant solution.

  2. Written by Dtrace Saves the Day « coffeeortech
    on July 5, 2011 at 6:59 pm
    Permalink

    [...] out to Brendan through twitter and managed to pique his curiosity and in the space of a day had a blog post leading me to a [...]

  3. Written by Rennie Allen
    on July 7, 2011 at 8:31 pm
    Permalink

    I assume that copyout() doesn’t operate with an address in .text for the destination? Too bad… that would be really handy…

  4. Written by Sam Horrocks
    on August 19, 2011 at 11:57 am
    Permalink

    This is good example of how to use dtrace copyout(), but it doesn’t seem like the best way to fix this particular problem. Is hardcoding”3″ into your dtrace code going to work for other zones, or if the zone is re-installed, etc? Also, instead of just targetting the broken program, every single program that calls getdents is trapped to check to see if it’s the the program you want to fix.

    Have you considered an interposer instead? This can be done by writing in C a new getdents function that changes the inode number however you want and then you can load it only for the broken program using the LD_PRELOAD environment variable. That way you can write a function that can work in all circumstances (don’t have to hardcode “3″), and the replacement code is only used by that single broken program.

    http://developers.sun.com/solaris/articles/lib_interposers.html

Subscribe to comments via RSS