In this short follow-up to my post on illumos process tools, I’ll expand a bit on ldd and pldd, which print the dynamic linking dependencies of binaries and processes, respectively, and crle, which prints out the runtime linker configuration. These tools are available in most illumos distributions including SmartOS.
Understanding builds (and broken builds in particular) can be especially difficult. I hate running into issues like this one:
$ ffmpeg ld.so.1: ffmpeg: fatal: libavdevice.so.53: open failed: No such file or directory Killed
You can use ldd to see the dynamic library dependencies of a binary:
$ ldd $(which ffmpeg) libavdevice.so.53 => (file not found) libavfilter.so.2 => (file not found) libavformat.so.53 => (file not found) libavcodec.so.53 => (file not found) libswresample.so.0 => (file not found) libswscale.so.2 => (file not found) libavutil.so.51 => (file not found) libsocket.so.1 => /lib/libsocket.so.1 libnsl.so.1 => /lib/libnsl.so.1 libvpx.so.0 => /opt/local/lib/libvpx.so.0 libm.so.2 => /lib/libm.so.2 libbz2.so.0 => /opt/local/lib/libbz2.so.0 libz.so.1 => /lib/libz.so.1 libc.so.1 => /lib/libc.so.1 libmp.so.2 => /lib/libmp.so.2 libmd.so.1 => /lib/libmd.so.1 libpthread.so.1 => /lib/libpthread.so.1 librt.so.1 => /lib/librt.so.1 libgcc_s.so.1 => /opt/local/lib/libgcc_s.so.1
In this case, the problem is that I installed ffmpeg into /usr/local, but the ffmpeg build appears not to have used the -R linker flag, which tells the runtime linker where to look dynamic libraries when the program is loaded. As a result, ffmpeg doesn’t know where to find its own libraries. If I set LD_LIBRARY_PATH, I can see that it will work:
$ LD_LIBRARY_PATH=/usr/local/lib ldd $(which ffmpeg) libavdevice.so.53 => /usr/local/lib/libavdevice.so.53 libavfilter.so.2 => /usr/local/lib/libavfilter.so.2 libavformat.so.53 => /usr/local/lib/libavformat.so.53 libavcodec.so.53 => /usr/local/lib/libavcodec.so.53 libswresample.so.0 => /usr/local/lib/libswresample.so.0 libswscale.so.2 => /usr/local/lib/libswscale.so.2 libavutil.so.51 => /usr/local/lib/libavutil.so.51 ...
I resolved this by rebuilding ffmpeg explicitly with
LDFLAGS += -R/usr/local.
ldd only examines binaries, and so can only print out dependencies built into the binary. Some programs use dlopen to open libraries whose name isn’t known until runtime. Node.js add-ons and Apache modules are two common examples. You can view these with pldd, which prints the dynamic libraries loaded in a running process. Here’s the output on a Node program with the node-dtrace-provider add-on:
$ pfexec pldd $(pgrep -x node) 32113: /usr/local/bin/node /home/snpp/current/js/snpp.js -l 80 -d /usr/local/bin/node /lib/libz.so.1 /lib/librt.so.1 /lib/libssl.so.0.9.8 /lib/libcrypto.so.0.9.8 /lib/libdl.so.1 /lib/libsocket.so.1 /lib/libnsl.so.1 /lib/libkstat.so.1 /opt/local/lib/libstdc++.so.6.0.16 /lib/libm.so.2 /opt/local/lib/libgcc_s.so.1 /lib/libc.so.1 /usr/sfw/lib/libgcc_s.so.1 /home/dap/node_modules/dtrace-provider/build/Release/DTraceProviderBindings.node
If you want to see where the system looks for dynamic libraries, use crle, which prints or edits the runtime linker configuration:
$ crle Configuration file [version 4]: /var/ld/ld.config Platform: 32-bit LSB 80386 Default Library Path (ELF): /lib:/usr/lib:/opt/local/lib Trusted Directories (ELF): /lib/secure:/usr/lib/secure (system default) Command line: crle -c /var/ld/ld.config -l /lib:/usr/lib:/opt/local/lib
Of course, for more information on any of these tools, check out their man pages. They’re well-documented. If you find yourself debugging build problems, you’ll probably also want to know about nm, objdump, and elfdump, which are available on many systems and well documented elsewhere.
illumos, with Solaris before it, has a history of delivering rich tools for understanding the system, but discovering these tools can be difficult for new users. Sometimes, tools just have different names than people are used to. In many cases, users don’t even know such tools might exist.
In this post I’ll describe some tools I find most useful, both as a developer and an administrator. This is not intended to be a comprehensive reference, but more like part of an orientation for users new to illumos (and SmartOS in particular) but already familiar with other Unix systems. This post will likely be review for veteran illumos and Solaris users.
The proc tools (ptools)
The ptools are a family of tools that observe processes running on the system. The most useful of these are pgrep, pstack, pfiles, and ptree.
pgrep searches for processes, returning a list of process ids. Here are some common example invocations:
$ pgrep mysql # print all processes with "mysql" in the name # (e.g., "mysql" and "mysqld") $ pgrep -x mysql # print all processes whose name is exactly "mysql" # (i.e., not "mysqld") $ pgrep -ox mysql # print the oldest mysql process $ pgrep -nx mysql # print the newest mysql process $ pgrep -f mysql # print processes matching "mysql" anywhere in the name # or arguments (e.g., "vim mysql.conf") $ pgrep -u dap # print all of user dap's processes
These options let you match processes very precisely and allow scripts to be much more robust than “ps -A | grep foo” allows.
I often combine pgrep with ps. For example, to see the memory usage of all of my node processes, I use:
$ ps -opid,rss,vsz,args -p "$(pgrep -x node)" PID RSS VSZ COMMAND 4914 94380 98036 /usr/local/bin/node demo.js -p 8080 32113 92616 95964 /usr/local/bin/node demo.js -p 80
pkill is just like pgrep, but sends a signal to the matching processes.
pstack shows you thread stack traces for the processes you give it:
$ pstack 51862 51862: find / fedd6955 getdents64 (fecb0200, 808ef87, 804728c, fedabd84, 808ef88, 804728c) + 15 0805ee9c xsavedir (808ef87, 0, 8089a90, 1000000, 0, fee30000) + 7c 080582dc process_path (808e818, 0, 8089a90, 1000000, 0, fee30000) + 33c 080583ee process_path (808e410, 0, 8089a90, 1000000, 0, fee30000) + 44e 080583ee process_path (808e008, 0, 8089a90, 0, 0, fecb2a40) + 44e 080583ee process_path (8047cbd, 0, 8089a90, 0, fef40c20, fedc78b6) + 44e 080583ee process_path (8075cd0, 0, 2f, fed59274, 8047b48, 8047cbd) + 44e 08058931 do_process_top_dir (8047cbd, 8047cbd, 0, 0, 0, 0) + 21 08057c5e at_top (8058910, 2f, 8047bb0, 8089a90, 28, 80571f0) + 9e 08072eda main (2, 8047bcc, 8047bd8, 80729d0, 0, 0) + 4ea 08057093 _start (2, 8047cb8, 8047cbd, 0, 8047cbf, 8047cd3) + 83
This is incredibly useful as a first step for figuring out what a program is doing when it’s slow or not responsive.
pfiles shows you what file descriptors a process has open, similar to “lsof” on Linux systems, but for a specific process:
$ pfiles 32113 32113: /usr/local/bin/node /home/snpp/current/js/snpp.js -l 80 -d Current rlimit: 1024 file descriptors 0: S_IFCHR mode:0666 dev:527,6 ino:2848424755 uid:0 gid:3 rdev:38,2 O_RDONLY|O_LARGEFILE /dev/null offset:0 1: S_IFREG mode:0644 dev:90,65565 ino:38817 uid:0 gid:0 size:793928 O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE /var/svc/log/application-snpp:default.log offset:793928 2: S_IFREG mode:0644 dev:90,65565 ino:38817 uid:0 gid:0 size:793928 O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE /var/svc/log/application-snpp:default.log offset:793928 3: S_IFPORT mode:0000 dev:537,0 uid:0 gid:0 size:0 4: S_IFIFO mode:0000 dev:524,0 ino:6257976 uid:0 gid:0 size:0 O_RDWR|O_NONBLOCK FD_CLOEXEC 5: S_IFIFO mode:0000 dev:524,0 ino:6257976 uid:0 gid:0 size:0 O_RDWR|O_NONBLOCK FD_CLOEXEC 6: S_IFSOCK mode:0666 dev:534,0 ino:23280 uid:0 gid:0 size:0 O_RDWR|O_NONBLOCK FD_CLOEXEC SOCK_STREAM SO_REUSEADDR,SO_SNDBUF(49152),SO_RCVBUF(128000) sockname: AF_INET 0.0.0.0 port: 80 7: S_IFREG mode:0644 dev:90,65565 ino:91494 uid:0 gid:0 size:6999682 O_RDONLY|O_LARGEFILE /home/snpp/data/0f0f2418d7967332caf0425cc5f31867.webm offset:2334720
This includes details on files (including offset, which is great for checking on programs that scan through large files) and sockets.
ptree shows you a process tree for the whole system or for a given process or user. This is great for programs that use lots of processes (like a build):
$ ptree $(pgrep -ox make) 4599 zsched 6720 /usr/lib/ssh/sshd 45902 /usr/lib/ssh/sshd 45903 /usr/lib/ssh/sshd 45906 -bash 54464 make -j4 54528 make -C out BUILDTYPE=Release 55718 cc -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DL_ENDIAN -DOPENS 55719 /opt/local/libexec/gcc/i386-pc-solaris2.11/4.6.2/cc1 -quiet -I 55757 cc -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DL_ENDIAN -DOPENS 55758 /opt/local/libexec/gcc/i386-pc-solaris2.11/4.6.2/cc1 -quiet -I 55769 sed -e s|^bf_null.o|/home/dap/node/out/Release/obj.target/openss 55771 /bin/sh -c sed -e "s|^bf_nbio.o|/home/dap/node/out/Release/obj.t
Here’s a summary of these and several other useful ptools:
- pgrep/pkill: search processes (and signal them)
- pstack: print thread stack traces
- ptree: print process tree
- pargs [-e]: print process arguments (and environment variables)
- pmap: print process virtual address mappings
- pwdx: print a process’s working directory
- pstop: stop a process (as a debugger would — useful for testing what happens when a process hangs or otherwise gets delayed)
- prun: run a stopped process
- plockstat: print lock statistics for a process
- psig: print a process’s signal dispositions
- pwait: wait for a process to terminate
- ptime: print detailed timing stats for a process
- pldd: print dynamic libraries for a process
- fuser: show which processes have a given file open (not technically a ptool, but useful nonetheless)
Some of these tools (including pfiles and pstack) will briefly pause the process to gather their data. For example, “pfiles” can take several seconds if there are many file descriptors open.
For details on these and a few others, check their man pages, most of which are in proc(1).
Many of the proc tools operate on core files just as well as live processes. Core files are created when a process exits abnormally, as via abort(3C) or a SIGSEGV. But you can also create one on-demand with gcore:
$ gcore 45906 gcore: core.45906 dumped $ pstack core.45906 core 'core.45906' of 45906: -bash fee647f5 waitid (7, 0, 8047760, f) fee00045 waitpid (ffffffff, 8047838, c, 108a7, 3, 8047850) + 65 0808f4c3 waitchld (0, 0, 0, 0, 20000, 0) + 87 0808ffc6 wait_for (108a7, 0, 813c128, 3e, 330000, 78) + 2ce 08082ee8 execute_command_internal (813b348, 0, ffffffff, ffffffff, 813c128) + 1758 08083d3d execute_command (813b348, 1, 8047b58, 8071a7d, 0, 0) + 45 08071c18 reader_loop (fed90b2c, 80663dd, 8047c34, fed90dc8, 8069380, 0) + 240 080708e3 main (1, 8047dfc, 8047e04, 80eb9f0, 0, 0) + aff 0806f32b _start (1, 8047ea4, 0, 8047eaa, 8047eb3, 8047ebf) + 83
Lazy tracing of system calls
DTrace can trace system calls across the system with minimal impact, but for cases where the overhead is not important and you only care about one process, truss can be a convenient tool because it decodes arguments and return values for you:
$ truss -p 3135 sysconfig(_CONFIG_PAGESIZE) = 4096 ioctl(1, TCGETA, 0x080479F0) = 0 ioctl(1, TIOCGWINSZ, 0x08047B88) = 0 brk(0x08086CA8) = 0 brk(0x0808ACA8) = 0 open(".", O_RDONLY|O_NDELAY|O_LARGEFILE) = 3 fcntl(3, F_SETFD, 0x00000001) = 0 fstat64(3, 0x08047940) = 0 getdents64(3, 0xFEC84000, 8192) = 720 getdents64(3, 0xFEC84000, 8192) = 0
When debugging path-related issues (like why Node.js can’t find the module you’re requiring), it’s often useful to trace just calls to “open” and “stat” with “truss -topen,stat”. This is also good for watching commands that traverse a directory tree, like “tar” or “find”.
DTrace and MDB
I mention DTrace and MDB last, but they’re the most comprehensive, most powerful tools in the system for understanding program behavior. The tools described above are simpler and present the most commonly useful information (e.g., process arguments or open file descriptors), but when you need to get arbitrary information about the system, these two are the tools to use.
DTrace is a comprehensive tracing framework for both the kernel and userland apps. It’s designed to be safe by design, to have zero overhead when not enabled, and to minimize overhead when enabled. DTrace has hundreds of thousands of probes at the kernel level, including system calls (system-wide), the scheduler, the I/O subsystem, ZFS, process execution, signals, and most function entry/exit points in the kernel. In userland, DTrace instruments function entry and exit points, individual instructions, and arbitrary probes added by application developers. At each of these instrumentation points, you can gather information like the currently running process, a kernel or userland stack backtrace, function arguments, or anything else in memory. To get started, I’d recommend Adam Leventhal’s DTrace boot camp slides. (The context and instructions for setup are a little dated, but the bulk of the content is still accurate.)
MDB is the modular debugger. Like GDB on other platforms, it’s most useful for deep inspection of a snapshot of program state. That can be a userland program or the kernel itself, and in both cases you can open a core dump (crash dump, for the kernel) or attach to the running program (kernel). As you’d expect, MDB lets you examine the stack, global variables, threads, and so on. The syntax is a little arcane, but the model is Unixy, allowing debugger commands to be strung together much like a shell pipeline. Eric Schrock has two excellent posts for people moving from GDB to MDB.
Let me know if I’ve missed any of the big ones. I’ll be writing a few more posts on tools in other areas of the system.
Thanks to all who attended my talk at OSCON on Node.js in production: postmortem debugging and performance analysis. Just a reminder: all of the technology I described is open source, most of it part of illumos. For more info, check out the links in slide 22.
For the curious, there are also some slides on implementation details I didn’t have time to cover.
In: Joyent, Node.js, SmartOS
NodeConf was a great success this year. Thanks to @mikeal for organizing and everyone who spoke and attended. The slides from my talk on DTrace, Node.js, and Flame Graphs are here.
This past weekend, I was very fortunate to have a chance to attend the ACM‘s Turing Centenary Celebration in honor of the 100th anniversary of the birth of Alan Turing. The event brought together nearly every living Turing Award winner for a series of talks and panel discussions on subjects like AI, theory of computation, computer architecture, and the role of computation in other fields. A webcast of the entire event is already available online. Below are some of my own notes on the conference.
For many of us in the audience, this weekend was about seeing in person the people we’d learned so much from both academically and professionally. This awe-inspiring feeling was best articulated by Vint Cerf in closing yesterday: “it’s like the history books opened up and the people walked out”. For me, by far, the highlight was the panel on Computer Architecture, moderated by David Patterson (yes, that Patterson) and featuring Frederick Brooks (of OS/360 and Mythical Man-Month fame), Ivan Sutherland, and Chuck Thacker. More than the other panels, I found all of the speakers’ prepared remarks accessible (perhaps because my work and interests most closely align with theirs), but at the same time very instructional. Sutherland began with the “Tyranny of the Clock”, an eloquent articulation of an important barrier in modern chip design and a call to action for designers and researchers. Then, in a sort of reverential but thoughtful engineering-style postmortem, Brooks discussed why the machine that Turing actually built, unlike so much of his other work, was not very influential. Thacker discussed the nature of computer architecture research and the modern developments that have made it more accessible for students today. In the subsequent discussion, Patterson referenced a prophetic quote by Maurice Wilkes at the dawn of modern computing (that Bryan also cited in his QCon talk last year) in which Wilkes suddenly “realized that a large part of my life from then on was going to be spent in ﬁnding mistakes in my own programs”.
Complexity was a prominent theme in several sessions. Ken Thompson expressed his disappointment at the complexity of modern Linux, pointing to its 1000+ syscalls compared to the original Unix’s 200. In his prepared remarks, he also shared some grim reflections on how Turing would feel about the state of computing today: Windows, phishing, botnets, C++, and so on. He compared his feeling to that of an early television pioneer visiting our time and seeing people watching Maury Povich. On a more positive note in the same session, Fernando Corbato (leader on Multics) gave a brief but fascinating account of what it was like to work on computers in the early days. He called Unix actually one the greatest results of Multics, Unix being a “distillation” of the better ideas of Multics without all the complexity. (It’s well worth watching Fernando’s and Ken’s remarks from the “Systems Architecture, Design, Engineering, and Verification” session.) Later, Thacker too called complexity “the enemy”, suggesting that we seriously reconsider many of the held-over assumptions in today’s systems that are costing us enormously today. (I’m sure that’s a good idea, and I’d have loved to hear some examples of these things that he had in mind.)
In the Programming Languages panel, Barbara Liskov lamented that the languages people use today for production systems are pretty complex for introducing new students to computer programming, but also admitted that building languages simple enough to be thoroughly understood in an introductory course and rich enough to support what professional engineers want is a serious challenge. She suggested starting from scratch with only the essential language features of modularity and encapsulation. In the same session, Nicklaus Wirth (in an entertaining, light presentation) explained how he sought to design languages that were both simpler and more powerful than their contemporaries — that these are not opposing goals. All of the participants agreed that in practice, most popular languages seem to accrete lots of cruft from small changes that seem good at the time, but contribute to an overly complex system.
Lucky or good?
Another theme that came up quite a bit was the role of luck in the speakers’ success. Many of them attributed their success to luck, leaving it there, but I liked Dahlia Malkhi’s reference to a golfer who hit a hole-in-one. He was asked if that was a result of luck or training, and replied that he was lucky, but he had to train a lot in order to get that lucky.
Several speakers (notably Butler Lampson and Joseph Sifakis) mentioned that they tend to be suspicious of clean, elegant solutions to problems, because such solutions often don’t work well in the real world. I’d never heard it put so generally, especially by leaders in the field, as that goes against a common intuition among mathy people that’s usually nurtured as part of our education. (That’s still a good thing — as Einstein famously said, it’s important to strive for simplicity, but be careful not to go to far.) In fact, Sifakis attributed the lack of serious work in rigorous system design to researchers’ preference for nice theories, even if those theories don’t match reality. (While probably a factor, this explanation seems to leave out the economic cost of such rigor as an important reason why many systems today aren’t built the way he suggests.)
Alan Kay and others talked about the idea of “what”-based programming, rather than the “how”-based approaches we typically use today. The idea is that humans tell the computer what to do, and some engine under the hood figures out how to do it. Kay demonstrated a graphical environment based on this idea, and then wondered why we couldn’t build more complex systems (including operating systems) that way. Bill, Robert, and I tried for a while to imagine what this would look like. On the one hand, many classes of device drivers are similar enough that you could imagine modeling some pieces of them with a declarative “what”-based description, but interaction with physical devices often requires precise sequences of register reads and writes and it’s hard to imagine encoding that without essentially describing “how”. Achieving good performance may be challenging, since humans writing code for such systems today necessarily describe how to organize them to be fast. And if you could solve this for something as constrained as device drivers, how could you generalize it for the schedulers or VM system, without encoding detailed knowledge into the engine that actually translates the “what” to the “how”? You could also imagine that debugging such systems would be very difficult. Still, I found the idea compelling, because there are many cases where we do build “what”-based descriptions and the result is that it’s much easier to verify both that the description does what the human wants it to do and that the “what”-to-”how” system properly translates it (e.g., Meta-D, a Cloud Analytics module that describes a family of DTrace scripts declaratively, or even the D language itself). It would be interesting to hear from Alan Kay what he was thinking in posing this question.
Computation in biology and physics
I was especially intrigued by Leonard Adleman‘s remarks during the “Algorithmic View of the Universe” panel, in which he talked about vastly different notions of computation and how results based on the Turing model, plus the Church-Turing thesis, can inform physics and biology. He discussed protein folding in the cell as a form of computation, and what implications that has for biologists trying to understanding cellular processes. Later he wondered what implications the proven constraints of the Turing model, taken as physical laws, would have on quantum mechanics (e.g., that certain types of time travel allowed by QM must actually be impossible).
These were just a few of the bits I found most interesting, but the whole weekend was a humbling experience. Besides being able to see so many important figures in the field, it was a good opportunity to step outside the confines of day-to-day engineering, which for me tends toward a time horizon of a few years. And most of the talks provoked interesting discussions. So thanks to all the speakers, and to the ACM for putting together the event and making the video available.