Node.js/V8 postmortem debugging

Update 2014-01-27: The commands and output described in this post have changed since it was first posted. For updated details, see the related post on the Joyent Dev Center.


Photo by Auntie P (flickr)

I recently wrote an article for ACM Queue discussing postmortem debugging for dynamic environments. I argued that while native environments for years have provided rich tools for understanding software failures postmortem, most popular dynamic environments have not developed similar facilities, and this gap is becoming more important as these environments grow popular for building complex core components of large distributed systems.

In building Cloud Analytics, we knew we needed a robust facility for postmortem debugging. The CA service comprises several components: a few global services that manage state, an agent on each server in the datacenter that collects performance data, and a cluster of data aggregators. Debugging distributed systems is always hard, and it’s even harder when the software is running on other people’s production systems.

We built a module called node-panic that dumps predefined program state in the event of a crash. It also has a tool for manually triggering a crash in the event of a hard-hang (e.g., infinite loop). These tools have been extremely valuable in helping us root-cause issues in both production and development environments. But they’re really just a first step, and as I explain in the article, this approach has several important limitations, mostly owing to the fact that node-panic is pure JavaScript and so isn’t baked into the VM itself and doesn’t have access to a lot of important program state.

A next step

More recently I’ve been playing around with building an MDB module for inspecting V8 state. (Recall that V8 is the blazingly fast JavaScript virtual machine in both Node.js and Chrome, and MDB is the illumos modular debugger, an extensible tool for analyzing both live programs and process core dumps.) With this approach, you can attach MDB to either a running Node program, a core file manually created using gcore(1), or a core file generated as a result of a Node crash (e.g., segfault).  Importantly, you don’t need a debug build of node for this so you can use it in production.

Here’s a very simple example program:

dap@devel ~ $ cat -n node-stack/examples/loop-anon.js
     1	/*
     2	 * This script demonstrates a simple stack trace involving an anonymous
     3	 * function, then goes into an infinite loop so you can catch it with the
     4	 * debugger.  One caveat: we grab a stack trace at the start to force V8 to
     5	 * compute line number information for this script.  If we didn't have this, the
     6	 * debugger would only print out position-in-file information.
     7	 */
     8	new Error().stack;
     9
    10	function main() { func1(); }
    11	function func1() { func2(); }
    12
    13	function func2()
    14	{
    15		(function () {
    16			for (;;)
    17				;
    18		})();
    19	}
    20
    21	main();

Note the comment at the top of the script. V8 computes line number information for a given script lazily. If it has never done this for a particular script, the debugger module won’t be able to print out the line number.  This example grabs a stacktrace at the start of the script to make sure V8 has computed line numbers for it.  (There’s an example of the other case below.)

Now I’ll run this script with a stock node build and attach MDB:

dap@devel ~ $ node node-stack/examples/loop-anon.js &
[1] 400524

dap@devel ~ $ mdb -p 400524
Loading modules: [ ld.so.1 libc.so.1 ]
> ::status
debugging PID 400524 (32-bit)
file: /usr/bin/node
threading model: native threads
status: stopped by debugger

Now I’ll grab a JavaScript stack trace:

> ::load mdb_v8.so
> ::jsstack
804738c <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon.js line 15
80473a4 func2 at /home/dap/node-stack/examples/loop-anon.js line 13
80473bc func1 at /home/dap/node-stack/examples/loop-anon.js line 1180473d4 main at /home/dap/node-stack/examples/loop-anon.js line 10
80473f0 <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon.js line 1
8047424 <InternalFrame>
804745c <anonymous> (as Module._compile) at module.js line 340
804747c <anonymous> (as Module._extensions) at module.js line 406
80474a0 <anonymous> (as Module.load) at module.js line 324
80474cc <anonymous> (as Module._load) at module.js line 257
80474ec <anonymous> (as Module.runMain) at module.js line 419
8047518 <anonymous> (as startup.processNextTick.process._tickCallback) at node.js line 120
8047534 <InternalFrame>
8047568 <EntryFrame>
fe84e240 <None>

If V8 hadn’t computed the line number information for our script, the stack trace would look more like this:

> ::jsstack
804738c <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon-noinfo.js
position 814
80473a4 func2 at /home/dap/node-stack/examples/loop-anon-noinfo.js position 772
80473bc func1 at /home/dap/node-stack/examples/loop-anon-noinfo.js position 710
80473d4 main at /home/dap/node-stack/examples/loop-anon-noinfo.js position 652
80473f0 <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon-noinfo.js
position 2
8047424 <InternalFrame>
804745c <anonymous> (as Module._compile) at module.js position 19476
804747c <anonymous> (as Module._extensions) at module.js position 23260
80474a0 <anonymous> (as Module.load) at module.js position 18526
80474cc <anonymous> (as Module._load) at module.js position 15200
80474ec <anonymous> (as Module.runMain) at module.js position 23916
8047518 <anonymous> (as startup.processNextTick.process._tickCallback) at node.js
position 7224
8047534 <InternalFrame>
8047568 <EntryFrame>
fe84e240 <None>

Besides getting a stack trace, you can examine individual frames in more detail using the frame pointer:

> 80473a4::jsframe -v
frame pointer: 80473a4
    function: fd862b8d (JSFunction)
        shared function info: fc615229 (SharedFunctionInfo)
            function name: fc6941a9 (ascii SeqString: "func2")
            function token position: SMI: value = 515
            script: fc614f4d (Script)
                script name: fd85ec05 (ascii ConsString: "
                /home/dap/node-stack/examples/loop-anon.js")
                line endings: fd863605 (FixedArray)
            function token at: line 13

and you can inspect individual heap objects as well. Given a pointer to any JavaScript heap object, “::jsprint” will figure out what type it is and dump a representation of the corresponding C++ class inside the V8 VM:

> fd862b8d::jstype
0xfd862b8d: JSFunction

> fd862b8d::jsprint
fd862b8d JSFunction {
    fd862b8d JSObject {
        fd862b8d HeapObject < Object < MaybeObject  {
            fd862b8c map = fe82df41 (Map)
        }
        fd862b90 properties = fe86e021 (FixedArray)
        fd862b94 elements = fe86e021 (FixedArray)
    }
    fd862b9c prototype_or_initial_map = fe86e0cd (Oddball)
    fd862ba0 shared = fc615229 (SharedFunctionInfo)
    fd862ba8 literals = fe86e021 (FixedArray)
    fd862bac next_function_link = fe86e05d (Oddball "undefined")
}

This is still very much a prototype, but we think this could be very powerful. In the meantime, there are some issues to be solved with the existing implementation and several more problems to address.

Maintainability

The heap inspector requires pretty deep knowledge of the internal memory layout of the V8 heap. How can this be maintained in a reasonable way as V8 itself evolves?

For C programs shipped with illumos, we use CTF at build-time to encode type definitions in the binary itself. CTF is compact enough to be shipped in production binaries, so MDB understands all in-memory structures for these programs on all illumos systems. Application-specific debugger commands can be implemented in terms of CTF so that they don’t have to be built specifically for a particular application binary.

This approach is tougher for C++ programs, which have additional information like class hierarchies that can’t be expressed with CTF. But while V8 uses C++ classes (and an inheritance hierarchy) to represent different heap object types, it doesn’t use standard C++ member fields. Instead, each class defines static constants describing the offsets of its member fields, and macros are used to define accessors in terms of these constants. This actually makes it easier to build and maintain a C debugger module because it’s much easier to export these constants as part of the build process than to reverse-engineer how a C++ structure gets laid out in memory. If this proves to be the right approach, then we could save these constants into the binary using CTF and then build the debugger module independent of the V8 binary.

Completeness

Today, this module can dump a simple stack trace and provide basic inspection of heap objects. But there’s a lot more that would be useful:

  • global variables
  • arguments and local variables for each function on the stack
  • list of events pending on the event queue
  • list of function contexts to be invoked in the future (e.g., when pending I/O completes)

I’ve also only tested it on pretty simple examples so far. My examples have included JavaScript frames for named and anonymous functions, constructor frames, argument adaptor frames, and simple entry frames. I haven’t yet tried optimized frames, exit frames, or an exit/entry sequence.

This module has already helped us understand enough to write a basic DTrace ustack helper for Node programs, which allows similar stack traces to be gathered via DTrace while the program is running. In future blog posts I’ll talk more about the DTrace helper as well as what I’ve learned about V8, which I hope will be useful for others hacking around with it.

4 thoughts on “Node.js/V8 postmortem debugging

  1. Any intent of using code similar to that under ::jsstack to implement a *_db module that pstack may use? (does it make sense in that context? I’m not sure how the JS stack and the C stack relate to eachother, in v8)

    I suspect that the libpythonX.Y_db.so johansen wrote is the only current implementation of a module to extend/annotate pstack output, but it’s proven very useful.

  2. Rich,

    I didn’t realize it was possible to extend pstack in that way, but that sounds great! I’ll definitely look into it, as that too would be very valuable. Is there documentation on it? Where does the python one live?

    – Dave

  3. Dave, Schrock built this support for the Java folks back in the day when we were creating the jstack() action. Unfortunately, it seems that rather than subsequently make this facility generic, the python version is more or less a copy. I also believe that mdb(1) doesn’t use either of these mechanisms. But now we’re talking about non-trivial work :-)

Comments are closed.