Adam Leventhal's blog

Search
Close this search box.

Month: July 2004

go to the Solaris 10 top 11-20 list for more

Bart Smaalders has written some great stuff about event ports including an extensive coding example. Event ports provide a single API for tying together disparate souces of events. We had baby steps in the past with poll(2) and select(3c), but event ports let you have the file descriptor and timer monitoring as well as dealing with asynchronous I/O and your own custom events.

Corporate shill that I am, there’s now a little article I wrote on the meet the architects page. The DTrace team already has a column as a group, but in this one I focus on application tracing which was my primary contribution to DTrace.

go to the Solaris 10 top 11-20 list for more

Here’s a little secret about software development: different groups usually aren’t that good at working with one another. That’s probably not such a shocker for most of you, but the effects can be seen everywhere, and that’s why tight integration can be such a distinguishing feature for a collection of software.

About a year and a half ago, we had the DTrace prototype working on much of the system: from kernel functions, through system calls, to every user-land function and instruction. But we were focused completely on C and C++ based applications and this java thing seemed to be catching on. In a radical move, we worked with some of the java guys to take the first baby step in making DTrace and Solaris’s other observability tools begin to work with java.

ustack() action for java

One of the most powerful features of DTrace is its ability to correlate low level events in the kernel — disk I/O, scheduler events, networking, etc. — with user-land activity. What application is generating all this I/O to this disk? DTrace makes answering that a snap. But what about when you want to dive deeper? What is that application actually doing to generate all that kernel activity? The ustack() action records the user-land stack backtrace so even in that prototype over a year ago, you could hone in on the problem.

Java, however, was still a mystery. Stacks in C and C++ are fairly easy to record, but in java, some methods are interpretted and just-in-time (JIT) compilation means that other methods can move around in the java virtual machine’s (JVM) address space. DTrace needed help from the JVM. Working with the java guys, we built a facility where the JVM actually contains a little bit of D (DTrace’s C-like language) machinery that knows how to interpret java stacks. We enhanced the ustack() action to take an optional second argument for the number of bytes to record (we’ve also recently added the jstack() action; see the DTrace Solaris Express Schedule for when it will be available) so when we use the ustack() action in the kernel on a thread in the JVM, that embedded machinery takes over and fills in those bytes with the symbolic interpretation for those methods. Either Bryan or I will give a more complete (and comprehensible) description in the future, but an example should speak volumes:

# dtrace -n profile-100'/execname == "java"/{ @[ustack(50, 512)] = count() }'
...
java/security/AccessController.doPrivileged
java/net/URLClassLoader.findClass
java/lang/ClassLoader.loadClass
sun/misc/Launcher$AppClassLoader.loadClass
java/lang/ClassLoader.loadClass
java/lang/ClassLoader.loadClassInternal
StubRoutines (1)
...

It seems simple, but there’s a lot of machinery behind this simple view, and this is actually an incredibly powerful and unique view of the system. Maybe you’ve had a java application that generated a lot of I/O or had some unexpected latency — using DTrace and its java-enabled ustack() action, you can finally track the problem down.

pstack(1) for java

While we had the java guys in the room, we couldn’t pass up the opportunity to collaborate on getting stacks working in another observability tool: pstack(1). The pstack(1) utility can print out the stack traces of all the threads in a live process or a core file. We implemented it slightly differently than DTrace’s ustack() action, but pstack(1) now works on java processes and java core files.

Collaboration is a great thing, and I hope you find the fruits of collaborative effort useful. These are just the first steps — we have much more planned for integrating Solaris and DTrace with java.

go to the Solaris 10 top 11-20 list for more

Eric Schrock has an overview of watchpoints as well as a discussion of the cool improvements he’s made to watchpoints in Solaris 10. Watchpoints have, in the past, been a bit dodgy — they were only vaguely compatible with C++, multi-threaded code, and x86 stacks. Now they’re way more robust and much faster.

go to the Solaris 10 top 11-20 list for more

pmap(1)

For the uninitiated, pmap(1) is a tool that lets you observe the mappings in a process. Here’s some typical output:

311981: /usr/bin/sh
08046000       8K rw---    [ stack ]
08050000      80K r-x--  /sbin/sh
08074000       4K rwx--  /sbin/sh
08075000      16K rwx--    [ heap ]
C2AB0000      64K rwx--    [ anon ]
C2AD0000     752K r-x--  /lib/libc.so.1
C2B9C000      28K rwx--  /lib/libc.so.1
C2BA3000      16K rwx--  /lib/libc.so.1
C2BB1000       4K rwxs-    [ anon ]
C2BC0000     132K r-x--  /lib/ld.so.1
C2BF1000       4K rwx--  /lib/ld.so.1
C2BF2000       8K rwx--  /lib/ld.so.1
total      1116K

You can use this to understand various adresses you might see from a debugger, or you can use other modes of pmap(1) to see the page sizes being used for various mappings, how much of the mappings have actually been faulted in, the attached ISM, DISM or System V shared memory segments, etc. In Solaris 10, pmap(1) has some cool new features — after a little more thought, I’m not sure that this really belongs on the top 11-20 list, but this is a very cool tool and gets some pretty slick new features; anyways the web affords me the chance for some revisionist history if I feel like updating the list…

thread and signal stacks

When a process creates a new thread, that thread needs a stack. By default, that stack comes from an anonymous mapping. Before Solaris 10, those mappings just appeared as [ anon ] — undifferentiated from other anonymous mappings; now we label them as thread stacks:

311992: ./mtpause.x86 2
08046000       8K rwx--    [ stack ]
08050000       4K r-x--  /home/ahl/src/tests/mtpause/mtpause.x86
08060000       4K rwx--  /home/ahl/src/tests/mtpause/mtpause.x86
C294D000       4K rwx-R    [ stack tid=3 ]
C2951000       4K rwxs-    [ anon ]
C2A5D000       4K rwx-R    [ stack tid=2 ]
...

That can be pretty useful if you’re trying to figure out what some address means in a debugger; before you could tell that it was from some anonymous mapping, but what the heck was that mapping all about? Now you can tell at a glance that its the stack for a particular thread.

Another kind of stack is the alternate signal stack. Alternate signal stacks let threads handle signals like SIGSEGV which might arise due to a stack overflow of the main stack (leaving no room on that stack for the signal handler). You can establish an alternate signal stack using the sigaltstack(2) interface. If you allocate the stack by creating an anonymous mapping using mmap(2) pmap(1) can now identify the per-thread alternate signal stacks:

...
FEBFA000       8K rwx-R    [ stack tid=8 ]
FEFFA000       8K rwx-R    [ stack tid=4 ]
FF200000      64K rw---    [ altstack tid=8 ]
FF220000      64K rw---    [ altstack tid=4 ]
...

core file content

Core files have always contained a partial snapshot of a process’s memory mappings. Now that you can you manually adjust the content of a core file (see my previous entry) some ptools will give you warnings like this:
pargs: core 'core' has insufficient content
So what’s in that core file? pmap(1) now let’s you see that easily; mappings whose data is missing from the core file are marked with a *:

$ coreadm -P heap+stack+data+anon
$ cat
^\Quit - core dumped
$ pmap core
core 'core' of 312077:  cat
08046000       8K rw---    [ stack ]
08050000       8K r-x--* /usr/bin/cat
08062000       4K rwx--  /usr/bin/cat
08063000      40K rwx--    [ heap ]
C2AB0000      64K rwx--
C2AD0000     752K r-x--* /lib/libc.so.1
C2B9C000      28K rwx--  /lib/libc.so.1
C2BA3000      16K rwx--  /lib/libc.so.1
C2BC0000     132K r-x--* /lib/ld.so.1
C2BF1000       4K rwx--  /lib/ld.so.1
C2BF2000       8K rwx--  /lib/ld.so.1
total      1064K

If you’re looking at a core file from an earlier release or from a customer in the field, you can quickly tell if you’re going to be able to get the data you need out of the core file or if the core file can only be interpreted on the original machine or whatever.

go to the Solaris 10 top 11-20 list for more

core files

Core files are snapshots of a process’s state. They contain some of the memory segments (e.g. the stack and heap) as well as some of the in-kernel state associated with the process (e.g. the signal masks and register values). When a process gets certain signals, the kernel, by default, kills the process and produces a core file. You can also creat core files from running processes — without altering the process — using Solaris’s gcore(1) utility.

So when your application crashed in the field, you could just take the core file and debug it right? Well, not exactly. Core files contained a partial snap-shot of the process’s memory mappings — in particular they omitted the read-only segments which contained the program text (instructions). As a result you would have to recreate the environment from the machine where the core file was produce exactly — identical versions of the libraries, application binary and loadable modules. Consequently, core files were mostly useful for developers in development (and even then, an old core file could be useless after a recompilation). And this isn’t just Solaris — every OS I’ve every worked with has omitted program text from core files making those core files of marginal utility once they’ve left the machine that produced them.

coreadm(1M)

In Solaris 7 we introduced coreadm(1M) to let users and system administrators control the location and name of core files. Previously , core files had always been named “core” and resided in the current working directory of the process that dumped the core. With coreadm(1M) you can name core files whatever you want including meta characters that expand when the core is created; for example, “core.%f.%n” would expand to “core.staroffice.dels” if staroffice were to dump core on my desktop (named dels). System administrators can also set up a global repository for all cores produced on the system to keep an eye on programs unexpectedly dumping core (naturally in Solaris 10, zone administrators can set up per-zone core file repositories).

In Solaris 10, coreadm(1M) becomes an even more powerful tool. Now you can specify which parts of the processes image go into the core file. Program text is there by default, and you can also choose to omit or include the stack, heap, anonymous data, mapped files, system V shared memory segments, ISM, DISM, etc. Let’s say you’ve got some multi-processed database that contains a big DISM segment; rather than having each process include the shared segment in its core file, you can set up just one of the processes (or none of them) to include the segment in the core file.

debugging core files from the field

Now that program text is included by default, core files from failures in the field can be useful without the incredibly arduous task of exactly replicating the original environment. The program text also includes a partial symbol table — the dynsym — so you can get accurate stack back traces, and correctly disassemble functions in your favorite post-mortem debugger. If the dynsym doesn’t cut it, you can use coreadm(1M) to configure your process to include the full symbol table in its core dumps as well — so don’t strip those binaries!

Also new to Solaris 10, we’ve started building many libraries with embedded type information in a compressed format. This is more of a teaser, since we’re not quite ready to ship the tools to generate that type information, but that type information is included in core files by default. So now not only can we in Solaris actually make headway on core files we get from customers, but we can make progress much more quickly.

If you’ve installed Solaris Express, go check out the man page for coreadm(1m) and figure out how to get the right content in your core files. Once you get your first core file from a Solaris 10 machine in the field I hope you’ll appreciate how much easier it was to debug.

Recent Posts

April 17, 2024
January 13, 2024
December 29, 2023
February 12, 2017
December 18, 2016

Archives

Archives