Debugging dynamic library dependencies on illumos

In this short follow-up to my post on illumos process tools, I’ll expand a bit on ldd and pldd, which print the dynamic linking dependencies of binaries and processes, respectively, and crle, which prints out the runtime linker configuration. These tools are available in most illumos distributions including SmartOS.

Understanding builds (and broken builds in particular) can be especially difficult. I hate running into issues like this one:

$ ffmpeg
ld.so.1: ffmpeg: fatal: libavdevice.so.53: open failed: No such file or directory
Killed

You can use ldd to see the dynamic library dependencies of a binary:

$ ldd $(which ffmpeg)
        libavdevice.so.53 =>     (file not found)
        libavfilter.so.2 =>      (file not found)
        libavformat.so.53 =>     (file not found)
        libavcodec.so.53 =>      (file not found)
        libswresample.so.0 =>    (file not found)
        libswscale.so.2 =>       (file not found)
        libavutil.so.51 =>       (file not found)
        libsocket.so.1 =>        /lib/libsocket.so.1
        libnsl.so.1 =>   /lib/libnsl.so.1
        libvpx.so.0 =>   /opt/local/lib/libvpx.so.0
        libm.so.2 =>     /lib/libm.so.2
        libbz2.so.0 =>   /opt/local/lib/libbz2.so.0
        libz.so.1 =>     /lib/libz.so.1
        libc.so.1 =>     /lib/libc.so.1
        libmp.so.2 =>    /lib/libmp.so.2
        libmd.so.1 =>    /lib/libmd.so.1
        libpthread.so.1 =>       /lib/libpthread.so.1
        librt.so.1 =>    /lib/librt.so.1
        libgcc_s.so.1 =>         /opt/local/lib/libgcc_s.so.1

In this case, the problem is that I installed ffmpeg into /usr/local, but the ffmpeg build appears not to have used the -R linker flag, which tells the runtime linker where to look dynamic libraries when the program is loaded. As a result, ffmpeg doesn’t know where to find its own libraries. If I set LD_LIBRARY_PATH, I can see that it will work:

$ LD_LIBRARY_PATH=/usr/local/lib ldd $(which ffmpeg)
        libavdevice.so.53 =>     /usr/local/lib/libavdevice.so.53
        libavfilter.so.2 =>      /usr/local/lib/libavfilter.so.2
        libavformat.so.53 =>     /usr/local/lib/libavformat.so.53
        libavcodec.so.53 =>      /usr/local/lib/libavcodec.so.53
        libswresample.so.0 =>    /usr/local/lib/libswresample.so.0
        libswscale.so.2 =>       /usr/local/lib/libswscale.so.2
        libavutil.so.51 =>       /usr/local/lib/libavutil.so.51
        ...

I resolved this by rebuilding ffmpeg explicitly with LDFLAGS += -R/usr/local.

ldd only examines binaries, and so can only print out dependencies built into the binary. Some programs use dlopen to open libraries whose name isn’t known until runtime. Node.js add-ons and Apache modules are two common examples. You can view these with pldd, which prints the dynamic libraries loaded in a running process. Here’s the output on a Node program with the node-dtrace-provider add-on:

$ pfexec pldd $(pgrep -x node)
32113:  /usr/local/bin/node /home/snpp/current/js/snpp.js -l 80 -d
/usr/local/bin/node
/lib/libz.so.1
/lib/librt.so.1
/lib/libssl.so.0.9.8
/lib/libcrypto.so.0.9.8
/lib/libdl.so.1
/lib/libsocket.so.1
/lib/libnsl.so.1
/lib/libkstat.so.1
/opt/local/lib/libstdc++.so.6.0.16
/lib/libm.so.2
/opt/local/lib/libgcc_s.so.1
/lib/libc.so.1
/usr/sfw/lib/libgcc_s.so.1
/home/dap/node_modules/dtrace-provider/build/Release/DTraceProviderBindings.node

If you want to see where the system looks for dynamic libraries, use crle, which prints or edits the runtime linker configuration:

$ crle
Configuration file [version 4]: /var/ld/ld.config
  Platform:     32-bit LSB 80386
  Default Library Path (ELF):   /lib:/usr/lib:/opt/local/lib
  Trusted Directories (ELF):    /lib/secure:/usr/lib/secure  (system default)
Command line:
  crle -c /var/ld/ld.config -l /lib:/usr/lib:/opt/local/lib

Of course, for more information on any of these tools, check out their man pages. They’re well-documented. If you find yourself debugging build problems, you’ll probably also want to know about nm, objdump, and elfdump, which are available on many systems and well documented elsewhere.

Posted on September 18, 2012 at 9:29 am by dap · Permalink · 4 Comments
In: SmartOS

illumos tools for observing processes

illumos, with Solaris before it, has a history of delivering rich tools for understanding the system, but discovering these tools can be difficult for new users. Sometimes, tools just have different names than people are used to. In many cases, users don’t even know such tools might exist.

In this post I’ll describe some tools I find most useful, both as a developer and an administrator. This is not intended to be a comprehensive reference, but more like part of an orientation for users new to illumos (and SmartOS in particular) but already familiar with other Unix systems. This post will likely be review for veteran illumos and Solaris users.

The proc tools (ptools)

The ptools are a family of tools that observe processes running on the system. The most useful of these are pgrep, pstack, pfiles, and ptree.

pgrep searches for processes, returning a list of process ids. Here are some common example invocations:

$ pgrep mysql         # print all processes with "mysql" in the name
                      # (e.g., "mysql" and "mysqld")
$ pgrep -x mysql      # print all processes whose name is exactly "mysql"
                      # (i.e., not "mysqld")
$ pgrep -ox mysql     # print the oldest mysql process
$ pgrep -nx mysql     # print the newest mysql process
$ pgrep -f mysql      # print processes matching "mysql" anywhere in the name
                      # or arguments (e.g., "vim mysql.conf")
$ pgrep -u dap        # print all of user dap's processes

These options let you match processes very precisely and allow scripts to be much more robust than “ps -A | grep foo” allows.

I often combine pgrep with ps. For example, to see the memory usage of all of my node processes, I use:

$ ps -opid,rss,vsz,args -p "$(pgrep -x node)"
  PID  RSS  VSZ COMMAND
 4914 94380 98036 /usr/local/bin/node demo.js -p 8080
32113 92616 95964 /usr/local/bin/node demo.js -p 80

pkill is just like pgrep, but sends a signal to the matching processes.

pstack shows you thread stack traces for the processes you give it:

$ pstack 51862
51862:      find /
 fedd6955 getdents64 (fecb0200, 808ef87, 804728c, fedabd84, 808ef88, 804728c) + 15
 0805ee9c xsavedir (808ef87, 0, 8089a90, 1000000, 0, fee30000) + 7c
 080582dc process_path (808e818, 0, 8089a90, 1000000, 0, fee30000) + 33c
 080583ee process_path (808e410, 0, 8089a90, 1000000, 0, fee30000) + 44e
 080583ee process_path (808e008, 0, 8089a90, 0, 0, fecb2a40) + 44e
 080583ee process_path (8047cbd, 0, 8089a90, 0, fef40c20, fedc78b6) + 44e
 080583ee process_path (8075cd0, 0, 2f, fed59274, 8047b48, 8047cbd) + 44e
 08058931 do_process_top_dir (8047cbd, 8047cbd, 0, 0, 0, 0) + 21
 08057c5e at_top   (8058910, 2f, 8047bb0, 8089a90, 28, 80571f0) + 9e
 08072eda main     (2, 8047bcc, 8047bd8, 80729d0, 0, 0) + 4ea
 08057093 _start   (2, 8047cb8, 8047cbd, 0, 8047cbf, 8047cd3) + 83

This is incredibly useful as a first step for figuring out what a program is doing when it’s slow or not responsive.

pfiles shows you what file descriptors a process has open, similar to “lsof” on Linux systems, but for a specific process:

$ pfiles 32113
32113:      /usr/local/bin/node /home/snpp/current/js/snpp.js -l 80 -d
  Current rlimit: 1024 file descriptors
   0: S_IFCHR mode:0666 dev:527,6 ino:2848424755 uid:0 gid:3 rdev:38,2
      O_RDONLY|O_LARGEFILE
      /dev/null
      offset:0
   1: S_IFREG mode:0644 dev:90,65565 ino:38817 uid:0 gid:0 size:793928
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
      /var/svc/log/application-snpp:default.log
      offset:793928
   2: S_IFREG mode:0644 dev:90,65565 ino:38817 uid:0 gid:0 size:793928
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
      /var/svc/log/application-snpp:default.log
      offset:793928
   3: S_IFPORT mode:0000 dev:537,0 uid:0 gid:0 size:0
   4: S_IFIFO mode:0000 dev:524,0 ino:6257976 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   5: S_IFIFO mode:0000 dev:524,0 ino:6257976 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   6: S_IFSOCK mode:0666 dev:534,0 ino:23280 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
    SOCK_STREAM
    SO_REUSEADDR,SO_SNDBUF(49152),SO_RCVBUF(128000)
    sockname: AF_INET 0.0.0.0  port: 80
   7: S_IFREG mode:0644 dev:90,65565 ino:91494 uid:0 gid:0 size:6999682
      O_RDONLY|O_LARGEFILE
      /home/snpp/data/0f0f2418d7967332caf0425cc5f31867.webm
      offset:2334720

This includes details on files (including offset, which is great for checking on programs that scan through large files) and sockets.

ptree shows you a process tree for the whole system or for a given process or user. This is great for programs that use lots of processes (like a build):

$ ptree $(pgrep -ox make)
4599  zsched
  6720  /usr/lib/ssh/sshd
    45902 /usr/lib/ssh/sshd
      45903 /usr/lib/ssh/sshd
        45906 -bash
          54464 make -j4
            54528 make -C out BUILDTYPE=Release
              55718 cc -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DL_ENDIAN -DOPENS
                55719 /opt/local/libexec/gcc/i386-pc-solaris2.11/4.6.2/cc1 -quiet -I
              55757 cc -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DL_ENDIAN -DOPENS
                55758 /opt/local/libexec/gcc/i386-pc-solaris2.11/4.6.2/cc1 -quiet -I
              55769 sed -e s|^bf_null.o|/home/dap/node/out/Release/obj.target/openss
              55771 /bin/sh -c sed -e "s|^bf_nbio.o|/home/dap/node/out/Release/obj.t

Here’s a summary of these and several other useful ptools:

Some of these tools (including pfiles and pstack) will briefly pause the process to gather their data. For example, “pfiles” can take several seconds if there are many file descriptors open.

For details on these and a few others, check their man pages, most of which are in proc(1).

Core files

Many of the proc tools operate on core files just as well as live processes. Core files are created when a process exits abnormally, as via abort(3C) or a SIGSEGV. But you can also create one on-demand with gcore:

$ gcore 45906
gcore: core.45906 dumped

$ pstack core.45906
core 'core.45906' of 45906:     -bash
 fee647f5 waitid   (7, 0, 8047760, f)
 fee00045 waitpid  (ffffffff, 8047838, c, 108a7, 3, 8047850) + 65
 0808f4c3 waitchld (0, 0, 0, 0, 20000, 0) + 87
 0808ffc6 wait_for (108a7, 0, 813c128, 3e, 330000, 78) + 2ce
 08082ee8 execute_command_internal (813b348, 0, ffffffff, ffffffff, 813c128) + 1758
 08083d3d execute_command (813b348, 1, 8047b58, 8071a7d, 0, 0) + 45
 08071c18 reader_loop (fed90b2c, 80663dd, 8047c34, fed90dc8, 8069380, 0) + 240
 080708e3 main     (1, 8047dfc, 8047e04, 80eb9f0, 0, 0) + aff
 0806f32b _start   (1, 8047ea4, 0, 8047eaa, 8047eb3, 8047ebf) + 83

Lazy tracing of system calls

DTrace can trace system calls across the system with minimal impact, but for cases where the overhead is not important and you only care about one process, truss can be a convenient tool because it decodes arguments and return values for you:

$ truss -p 3135
sysconfig(_CONFIG_PAGESIZE)                     = 4096
ioctl(1, TCGETA, 0x080479F0)                    = 0
ioctl(1, TIOCGWINSZ, 0x08047B88)                = 0
brk(0x08086CA8)                                 = 0
brk(0x0808ACA8)                                 = 0
open(".", O_RDONLY|O_NDELAY|O_LARGEFILE)        = 3
fcntl(3, F_SETFD, 0x00000001)                   = 0
fstat64(3, 0x08047940)                          = 0
getdents64(3, 0xFEC84000, 8192)                 = 720
getdents64(3, 0xFEC84000, 8192)                 = 0

When debugging path-related issues (like why Node.js can’t find the module you’re requiring), it’s often useful to trace just calls to “open” and “stat” with “truss -topen,stat”. This is also good for watching commands that traverse a directory tree, like “tar” or “find”.

DTrace and MDB

I mention DTrace and MDB last, but they’re the most comprehensive, most powerful tools in the system for understanding program behavior. The tools described above are simpler and present the most commonly useful information (e.g., process arguments or open file descriptors), but when you need to get arbitrary information about the system, these two are the tools to use.

DTrace is a comprehensive tracing framework for both the kernel and userland apps. It’s designed to be safe by design, to have zero overhead when not enabled, and to minimize overhead when enabled. DTrace has hundreds of thousands of probes at the kernel level, including system calls (system-wide), the scheduler, the I/O subsystem, ZFS, process execution, signals, and most function entry/exit points in the kernel. In userland, DTrace instruments function entry and exit points, individual instructions, and arbitrary probes added by application developers. At each of these instrumentation points, you can gather information like the currently running process, a kernel or userland stack backtrace, function arguments, or anything else in memory. To get started, I’d recommend Adam Leventhal’s DTrace boot camp slides. (The context and instructions for setup are a little dated, but the bulk of the content is still accurate.)

MDB is the modular debugger. Like GDB on other platforms, it’s most useful for deep inspection of a snapshot of program state. That can be a userland program or the kernel itself, and in both cases you can open a core dump (crash dump, for the kernel) or attach to the running program (kernel). As you’d expect, MDB lets you examine the stack, global variables, threads, and so on. The syntax is a little arcane, but the model is Unixy, allowing debugger commands to be strung together much like a shell pipeline. Eric Schrock has two excellent posts for people moving from GDB to MDB.

Let me know if I’ve missed any of the big ones. I’ll be writing a few more posts on tools in other areas of the system.

Posted on August 4, 2012 at 12:16 pm by dap · Permalink · 3 Comments
In: SmartOS

OSCON Slides

Thanks to all who attended my talk at OSCON on Node.js in production: postmortem debugging and performance analysis. Just a reminder: all of the technology I described is open source, most of it part of illumos. For more info, check out the links in slide 22.

For the curious, there are also some slides on implementation details I didn’t have time to cover.

Posted on July 19, 2012 at 12:32 pm by dap · Permalink · Comments Closed
In: Joyent, Node.js, SmartOS

NodeConf slides

NodeConf was a great success this year. Thanks to @mikeal for organizing and everyone who spoke and attended. The slides from my talk on DTrace, Node.js, and Flame Graphs are here.

Posted on July 5, 2012 at 12:05 pm by dap · Permalink · One Comment
In: DTrace, Node.js

ACM Turing Centenary Celebration

This past weekend, I was very fortunate to have a chance to attend the ACM‘s Turing Centenary Celebration in honor of the 100th anniversary of the birth of Alan Turing. The event brought together nearly every living Turing Award winner for a series of talks and panel discussions on subjects like AI, theory of computation, computer architecture, and the role of computation in other fields. A webcast of the entire event is already available online. Below are some of my own notes on the conference.

For many of us in the audience, this weekend was about seeing in person the people we’d learned so much from both academically and professionally. This awe-inspiring feeling was best articulated by Vint Cerf in closing yesterday: “it’s like the history books opened up and the people walked out”. For me, by far, the highlight was the panel on Computer Architecture, moderated by David Patterson (yes, that Patterson) and featuring Frederick Brooks (of OS/360 and Mythical Man-Month fame), Ivan Sutherland, and Chuck Thacker. More than the other panels, I found all of the speakers’ prepared remarks accessible (perhaps because my work and interests most closely align with theirs), but at the same time very instructional. Sutherland began with the “Tyranny of the Clock”, an eloquent articulation of an important barrier in modern chip design and a call to action for designers and researchers. Then, in a sort of reverential but thoughtful engineering-style postmortem, Brooks discussed why the machine that Turing actually built, unlike so much of his other work, was not very influential. Thacker discussed the nature of computer architecture research and the modern developments that have made it more accessible for students today. In the subsequent discussion, Patterson referenced a prophetic quote by Maurice Wilkes at the dawn of modern computing (that Bryan also cited in his QCon talk last year) in which Wilkes suddenly “realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs”.

Complexity

Complexity was a prominent theme in several sessions. Ken Thompson expressed his disappointment at the complexity of modern Linux, pointing to its 1000+ syscalls compared to the original Unix’s 200. In his prepared remarks, he also shared some grim reflections on how Turing would feel about the state of computing today: Windows, phishing, botnets, C++, and so on. He compared his feeling to that of an early television pioneer visiting our time and seeing people watching Maury Povich. On a more positive note in the same session, Fernando Corbato (leader on Multics) gave a brief but fascinating account of what it was like to work on computers in the early days. He called Unix actually one the greatest results of Multics, Unix being a “distillation” of the better ideas of Multics without all the complexity. (It’s well worth watching Fernando’s and Ken’s remarks from the “Systems Architecture, Design, Engineering, and Verification” session.) Later, Thacker too called complexity “the enemy”, suggesting that we seriously reconsider many of the held-over assumptions in today’s systems that are costing us enormously today. (I’m sure that’s a good idea, and I’d have loved to hear some examples of these things that he had in mind.)

In the Programming Languages panel, Barbara Liskov lamented that the languages people use today for production systems are pretty complex for introducing new students to computer programming, but also admitted that building languages simple enough to be thoroughly understood in an introductory course and rich enough to support what professional engineers want is a serious challenge. She suggested starting from scratch with only the essential language features of modularity and encapsulation. In the same session, Nicklaus Wirth (in an entertaining, light presentation) explained how he sought to design languages that were both simpler and more powerful than their contemporaries — that these are not opposing goals. All of the participants agreed that in practice, most popular languages seem to accrete lots of cruft from small changes that seem good at the time, but contribute to an overly complex system.

Lucky or good?

Another theme that came up quite a bit was the role of luck in the speakers’ success. Many of them attributed their success to luck, leaving it there, but I liked Dahlia Malkhi’s reference to a golfer who hit a hole-in-one. He was asked if that was a result of luck or training, and replied that he was lucky, but he had to train a lot in order to get that lucky.

Beware elegance?

Several speakers (notably Butler Lampson and Joseph Sifakis) mentioned that they tend to be suspicious of clean, elegant solutions to problems, because such solutions often don’t work well in the real world. I’d never heard it put so generally, especially by leaders in the field, as that goes against a common intuition among mathy people that’s usually nurtured as part of our education. (That’s still a good thing — as Einstein famously said, it’s important to strive for simplicity, but be careful not to go to far.) In fact, Sifakis attributed the lack of serious work in rigorous system design to researchers’ preference for nice theories, even if those theories don’t match reality. (While probably a factor, this explanation seems to leave out the economic cost of such rigor as an important reason why many systems today aren’t built the way he suggests.)

System verification

In the Systems Architecture and Verification session, Vint Cerf noted that automatic verifiers don’t seem to work well for many types of systems we build and asked Sifakis and E. Allen Emerson whether there existed interactive tools that would help programmers test assertions about their systems, rather than automatically trying to verify the whole thing. Emerson pointed out that this is called semi-automatic verification, but still seemed more interested in the fully-automatic kind. Vint’s idea made me think of a sort of extensible lint, since lint is already an admittedly limited tool for checking a fixed set of assertions about a program. But despite its limits, lint is incredibly useful (at least in languages like C and JavaScript) for rooting out large classes of bugs, and it would be interesting to think about a more interactive workflow that would free the tool from having to report only things it knows are problems. (People generally won’t use an error-checking tool that reports many false positives, but they might use a tool that can evaluate static assertions about their code in a less rigid context.)

“What”-based programming

Alan Kay and others talked about the idea of “what”-based programming, rather than the “how”-based approaches we typically use today. The idea is that humans tell the computer what to do, and some engine under the hood figures out how to do it. Kay demonstrated a graphical environment based on this idea, and then wondered why we couldn’t build more complex systems (including operating systems) that way. Bill, Robert, and I tried for a while to imagine what this would look like. On the one hand, many classes of device drivers are similar enough that you could imagine modeling some pieces of them with a declarative “what”-based description, but interaction with physical devices often requires precise sequences of register reads and writes and it’s hard to imagine encoding that without essentially describing “how”. Achieving good performance may be challenging, since humans writing code for such systems today necessarily describe how to organize them to be fast. And if you could solve this for something as constrained as device drivers, how could you generalize it for the schedulers or VM system, without encoding detailed knowledge into the engine that actually translates the “what” to the “how”? You could also imagine that debugging such systems would be very difficult. Still, I found the idea compelling, because there are many cases where we do build “what”-based descriptions and the result is that it’s much easier to verify both that the description does what the human wants it to do and that the “what”-to-”how” system properly translates it (e.g., Meta-D, a Cloud Analytics module that describes a family of DTrace scripts declaratively, or even the D language itself). It would be interesting to hear from Alan Kay what he was thinking in posing this question.

Computation in biology and physics

I was especially intrigued by Leonard Adleman‘s remarks during the “Algorithmic View of the Universe” panel, in which he talked about vastly different notions of computation and how results based on the Turing model, plus the Church-Turing thesis, can inform physics and biology. He discussed protein folding in the cell as a form of computation, and what implications that has for biologists trying to understanding cellular processes. Later he wondered what implications the proven constraints of the Turing model, taken as physical laws, would have on quantum mechanics (e.g., that certain types of time travel allowed by QM must actually be impossible).

 

These were just a few of the bits I found most interesting, but the whole weekend was a humbling experience. Besides being able to see so many important figures in the field, it was a good opportunity to step outside the confines of day-to-day engineering, which for me tends toward a time horizon of a few years. And most of the talks provoked interesting discussions. So thanks to all the speakers, and to the ACM for putting together the event and making the video available.

Posted on June 17, 2012 at 3:04 pm by dap · Permalink · Comments Closed
In: Uncategorized