Eric Schrock's Blog

Shameless self-promotion

June 18, 2004

One of the most visible features that I have integrated into Solaris 10 is the ability to store pathnames with each open file1. This allows new avenues of observability that were previously inaccessible. First off, we simply have the files as symbolic links in /proc/<pid>/path:

$ ls -l /proc/`pgrep Firebird`/path | cut -b 55-
0 -> /devices/pseudo/mm@0:null
1 -> /home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0
10 -> /usr/local/MozillaFirebird/chrome/comm.jar
11 -> /usr/local/MozillaFirebird/chrome/en-US.jar
12 -> /usr/local/MozillaFirebird/chrome/embed-sample.jar
13 -> /usr/local/MozillaFirebird/chrome/pipnss.jar
14 -> /usr/local/MozillaFirebird/chrome/pippki.jar
15 -> /usr/local/MozillaFirebird/chrome/US.jar
16 -> /usr/local/MozillaFirebird/chrome/en-unix.jar
17 -> /usr/local/MozillaFirebird/chrome/classic.jar
18 -> /usr/local/MozillaFirebird/chrome/toolkit.jar
19 -> /usr/local/MozillaFirebird/chrome/browser.jar
2 -> /home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0
20
21
22 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/Cache/_CACHE_MAP_
23 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/Cache/_CACHE_001_
24 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/Cache/_CACHE_002_
25 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/Cache/_CACHE_003_
26
27 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/formhistory.dat
28 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/history.dat
29 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/cert8.db
3
30 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/key3.db
4 -> /var/run/name_service_door
5 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/XUL.mfasl
6
7
8
9
a.out -> /usr/local/MozillaFirebird/MozillaFirebird-bin
cwd -> /home/eschrock
root -> /
ufs.102.0.11082 -> /usr/lib/iconv/646%UTF-16LE.so
ufs.102.0.11521 -> /usr/lib/iconv/UTF-16LE%646.so
[ ... output elided ... ]
$

As usual, mozilla firebird has lots of interesting stuff open. You may notice that some of the file descriptors have no path information. This is likely because they refer to a socket or FIFO (there is a small chance they refer to a file that has since been moved). The pfiles(1) command has been modified to use this information, so you can now see the path with the rest of the goodies:

$ pfiles `pgrep Firebird`
286670: /usr/local/MozillaFirebird/MozillaFirebird-bin
Current rlimit: 512 file descriptors
0: S_IFCHR mode:0666 dev:200,0 ino:6815752 uid:0 gid:3 rdev:13,2
O_RDONLY|O_LARGEFILE
/devices/pseudo/mm@0:null
1: S_IFREG mode:0644 dev:210,1281 ino:346 uid:138660 gid:10 size:4164
O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
/home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0
2: S_IFREG mode:0644 dev:210,1281 ino:346 uid:138660 gid:10 size:4164
O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
/home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0
3: S_IFIFO mode:0666 dev:209,0 ino:9 uid:0 gid:1 size:0
O_RDWR|O_NONBLOCK FD_CLOEXEC
4: S_IFDOOR mode:0444 dev:209,0 ino:52 uid:0 gid:0 size:0
O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[100253]
/var/run/name_service_door
5: S_IFREG mode:0644 dev:210,1281 ino:744 uid:138660 gid:10 size:747398
O_RDONLY|O_LARGEFILE
/home/eschrock/.phoenix/default/7pkwqbju.slt/XUL.mfasl
6: S_IFIFO mode:0000 dev:203,0 ino:119094 uid:138660 gid:10 size:0
O_RDWR|O_NONBLOCK
7: S_IFIFO mode:0000 dev:203,0 ino:119094 uid:138660 gid:10 size:0
O_RDWR|O_NONBLOCK
[ ... output elided ... ]
$

This should be enough to get most savvy sysadmins drooling. But wait, there’s more!. This feature allowed the new DTrace io provider (integrated into build 60, aka Beta 5, aka SX 07/04) to get path name information for arbitrary files in the system. This allows you to do neat stuff like:

# cat iohog.d
#!/usr/sbin/dtrace -s
io:::start
{
@[execname, args[2]->fi_pathname] = sum(args[0]->b_bcount);
}
# ./iohog.d
^C
sched           /home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0      4096
xlp             /var/adm/utmpx                                         4096
fsflush         /export/iso/solaris_4.iso                              73728
sched           <none>                                                 82432
cp              <none>                                                 114688
fsflush         <none>                                                 177152
cp              /export/iso/solaris_4.iso                              238936064
cp              /export/iso/solaris_1.iso                              239910912
#

For years we’ve had the iostat(1M) utility. It’s great to know that someone is hammering away on sd0, but that’s not really the question you want answered. What you really want to know is who is hammering away on your disks. With the DTrace io provider, we’ve taken it one step further by giving you the means to answer why someone is hammering away on your disks. All of a sudden one of the most opaque problems is now completely transparent. So head on over and check it out (while the io provider is not available in Solaris Express quite yet, the documentation for it is available on the DTrace page).


1 For the curious: Solaris implements a Virtual File System (VFS) layer, which includes the notion of a vnode to represent an abitrary file. The filesystem-dependent part is stored in a format private to the filesystem implementation (think of it in terms of inheritence if it helps). To illustrate with crude ASCII art:

USERLAND        KERNEL VFS                         KERNEL FS
fd ----+----> file_t -----+----> vnode_t ------> inode_t /
|                  |                      prnode_t /
fd ----+                  |                      etc
|
fd ---------> file_t -----+

We store a (char *) pointer at the end of the vnode_t when we go to look up the file, and now we have path information for all the open files in the kernel (even those implicitly mapped into process address space, without an associated file_t). There are some subtleties with hard links and moving files around, but it works perfectly 99% of the time, which is all we can hope for in this case.

3 Responses

  1. This is EXTREMELY kool! Previously my only hope was to truss the proccess and watch for the open() syscalls and then trace the FD thru all the accesses which not only sucked, but wasn’t possible if the FD had already been opened and wasn’t be closed/opened frequently… all you knew was that FD 4 was concerning but who knows what that is. The /proc updates are kool, the pfiles update is even kooler, but the DTrace updates for it are just pure gold! This should be especially helpful for tuning/moniting large allocations such as Oracle datafiles at the file level instead of per filesystem.
    I humbly bow to you sir. Glad you blogged this… this is exactly the sort of stuff that just gets burried under all the other larger features in such a massive release as Solaris10.
    benr

  2. Really nice feature. One that when you read about you immediately feel how much you missed it.

Recent Posts

April 21, 2013
February 28, 2013
August 14, 2012
July 28, 2012

Archives