Profiling Node.js

(For returning readers, this is basically a “tl;dr” version of my previous post on Node.js performance. The post below also appears on the Node.js blog.)

It’s incredibly easy to visualize where your Node program spends its time using DTrace and node-stackvis (a Node port of Brendan Gregg’s FlameGraph tool):

  1. Run your Node.js program as usual.
  2. In another terminal, run:
    $ dtrace -n 'profile-97/execname == "node" && arg1/{
        @[jstack(100, 8000)] = count(); } tick-60s { exit(0); }' > stacks.out

    This will sample about 100 times per second for 60 seconds and emit results to stacks.out. Note that this will sample all running programs called “node”. If you want a specific process, replace execname == "node" with pid == 12345 (the process id).

  3. Use the “stackvis” tool to transform this directly into a flame graph. First, install it:
    $ npm install -g stackvis

    then use stackvis to convert the DTrace output to a flamegraph:

    $ stackvis dtrace flamegraph-svg < stacks.out > stacks.svg
  4. Open stacks.svg in your favorite browser.

You’ll be looking at something like this:

This is a visualization of all of the profiled call stacks. This example is from the “hello world” HTTP server on the Node.js home page under load. Start at the bottom, where you have “main”, which is present in most Node stacks because Node spends most on-CPU time in the main thread. Above each row, you have the functions called by the frame beneath it. As you move up, you’ll see actual JavaScript function names. The boxes in each row are not in chronological order, but their width indicates how much time was spent there. When you hover over each box, you can see exactly what percentage of time is spent in each function. This lets you see at a glance where your program spends its time.

That’s the summary. There are a few prerequisites:

  • You must gather data on a system that supports DTrace with the Node.js ustack helper. For now, this pretty much means illumos-based systems like SmartOS, including the Joyent Cloud. MacOS users: OS X supports DTrace, but not ustack helpers. The way to get this changed is to contact your Apple developer liason (if you’re lucky enough to have one) or file a bug report at I’d suggest referencing existing bugs 5273057 and 11206497. More bugs filed (even if closed as dups) show more interest and make it more likely Apple will choose to fix this.
  • You must be on 32-bit Node.js 0.6.7 or later, built --with-dtrace. The helper doesn’t work with 64-bit Node yet. On illumos (including SmartOS), development releases (the 0.7.x train) include DTrace support by default.

There are a few other notes:

  • You can absolutely profile apps in production, not just development, since compiling with DTrace support has very minimal overhead. You can start and stop profiling without restarting your program.
  • You may want to run the stacks.out output through c++filt to demangle C++ symbols. Be sure to use the c++filt that came with the compiler you used to build Node. For example:
    c++filt < stacks.out > demangled.out

    then you can use demangled.out to create the flamegraph.

  • If you want, you can filter stacks containing a particular function. The best way to do this is to first collapse the original DTrace output, then grep out what you want:
    $ stackvis dtrace collapsed < stacks.out | grep SomeFunction > collapsed.out
    $ stackvis collapsed flamegraph-svg < collapsed.out > stacks.svg
  • If you’ve used Brendan’s FlameGraph tools, you’ll notice the coloring is a little different in the above flamegraph. I ported his tools to Node first so I could incorporate it more easily into other Node programs, but I’ve also been playing with different coloring options. The current default uses hue to denote stack depth and saturation to indicate time spent. (These are also indicated by position and size.) Other ideas include coloring by module (so V8, JavaScript, libc, etc. show up as different colors.)

For more on the underlying pieces, see my previous post on Node.js profiling and Brendan’s post on Flame Graphs.


Playing with Node/V8 postmortem debugging

2016 Update: The commands and output in this post are pretty dated now, but you can find an up-to-date docs (including a tutorial) in the mdb_v8 user guide.

“Post Mortem” by C. MacLaurin

Several weeks ago I posted about postmortem debugging for Node.js, a critical technique for understanding fatal software failure (and thereby keeping up software quality). Now that the underlying pieces are freely available[1], you can use the documentation below to start debugging your own Node programs. With these tools you can inspect the call stack and arbitrary JavaScript heap objects for both live programs and core dumps. Documentation is often dry reading, but I’ve worked to keep this document concise and relevant. You can read this start-to-finish in half an hour and you’ll learn a bunch about V8 internals by exploring your own node program. If you don’t care about this level of detail, just read my older post to get a sense of what this tool does. Shortly I’ll post an example of using this tool to debug a problem we weren’t able to debug with existing tools.

This tool works out of the box for Node.js on the latest SmartOS release. So if you’ve got a SmartOS live CD and any release of Node.js since 0.4.8, these tools will just work. If you’re running an older version of SmartOS, another illumos-based system, or a Joyent SmartMachine that hasn’t yet been upgraded to the latest SmartOS, you can just download the binaries that you need. The instructions below cover both of these cases.


The V8 mdb dmod facilitates inspection of V8-level and JS-level state from MDB, the illumos modular debugger. If you’re not familiar with MDB, you may want to skim the MDB man page. The syntax may take some getting used to, but you can probably pick up enough to get started from the examples below.

The various dcmds and walkers are prefixed with “v8″ or “js” depending on which level they operate on.

The JavaScript-level dcmds are:

  walk jsframe              - walk V8 JavaScript stack frames
  dcmd jsframe              - summarize a JavaScript stack frame
  dcmd jsstack              - print a JavaScript stacktrace
  dcmd jsprint              - print a JavaScript object

The V8-level dcmds are:

  dcmd v8array              - print elements of a V8 FixedArray
  dcmd v8classes            - list known V8 heap object C++ classes
  dcmd v8load               - load canned config for a specific V8 version
  dcmd v8print              - print a V8 heap object
  dcmd v8str                - print the contents of a V8 string
  dcmd v8type               - print the type of a V8 heap object
  dcmd v8types              - list known V8 heap object types

Getting started

Step 1: Run MDB

First, run your program. Then attach with MDB (using mdb -p <pid>) or take a core dump with gcore(1M) and run MDB on the core file.

Step 2: Load the dmod

The dmod is not loaded by default. You must explicitly load it with:

> ::load v8

If this fails, you’re running an older version of SmartOS and you’ll need to download the binary and use ::load /path/to/ (You have to use ::load ./ even if the file is in the same directory so that MDB knows you’re giving it a relative path; by default, MDB searches the dynamic library path, which probably doesn’t include “.”.)

Step 3: Configure the dmod

The dmod requires knowledge of internal V8 structures that change between versions of V8. Some builds of V8 include the required information directly in the binary, in which case the dmod will auto-configure itself and you’ll see this when you initially load the dmod:

> ::load v8
Loaded V8 support.

If you don’t see that, your version of Node doesn’t have support built-in, but you can load a pre-canned configuration using ::v8load, which supports arguments for “node-0.4″ (for v0.4.8 and later) and “node-0.6.5″ (which works on v0.6.5 and v0.6.6, and possibly later versions).


To summarize how you’d start debugging a core file from node 0.6.5:

$ mdb core.node.jpc
Loading modules: [ ]
> ::load v8
> ::v8load node-0.6.5
V8 dmod configured based on node-0.6.5

or, if you downloaded the binaries separately above:

$ ./mdb core.node.jpc
Loading modules: [ ]
> ::load ./
> ::v8load node-0.6.5
V8 dmod configured based on node-0.6.5

Inspecting JavaScript-level state

::walk jsframe

Iterates stack frames and prints out the frame pointers. Starts with %ebp, and then follows frame pointers until it finds something that doesn’t look like a V8 JavaScript stack frame. This is very much like MDB’s $C macro but only prints the frame pointers. To get more details, pipe this to ::jsframe.

Note: libv8 is typically compiled with -fomit-frame-pointer, so if mdb catches “node” while it’s somewhere inside of V8, you may not be able to get a stacktrace at all.

::jsframe [-v]

Given a frame pointer, prints out a summary of the stack frame. With no arguments, prints a one-line summary for use in a stacktrace that includes the frame pointer, the name of the function (which may be inferred), the script where it was defined, and either the line number or character position in the file where the function was defined. The line number information will only be available if V8 has computed that information, which it does on a per-script basis as needed.

Example 1: anonymous function, invoked with a name, with line number information present:

> 8046df0::jsframe
8046df0 <anonymous> (as EventEmitter.emit) at events.js line 38

Example 2: named function with no line number information:

> 8046af8::jsframe
8046af8 caAggrValueHeatmapImage at
/opt/local/lib/node/.npm/cabase/1.0.3vrelease-20110901-20110901-3-gc693a90-dirty/package/lib/ca/ca-agg.js position 48960

Example 3: anonymous function invoked with no name and with no line number information:

> 8047500::jsframe
8047500 <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon.js line 15

With -v, prints out a best guess at function arguments. This uses the “length” property of the Function object to determine how many arguments the function consumes and then prints them from the stack. For example:

> 8046a9c::jsframe -v
8046a9c <anonymous> (as exports.bucketize)
    func: fc435fcd
    posn: position 7838
    arg1: fc070709 (JSArray)
    arg2: fc070719 (JSObject)

You can use ::jsprint to print these objects.

Note: there are several types of non-JavaScript frames that ::jsframe recognizes:

  • InternalFrame: represents a C++ frame
  • EntryFrame: represents a transition from C++ to JavaScript
  • ExitFrame: represents a transition from JavaScript to C++
  • ArgumentsAdaptorFrame: used when a function is invoked with fewer (or more?) arguments than it’s defined to consume.
  • ConstructorFrame: indicates the next frame’s function is being invoked as a constructor

::jsstack [-v]

Equivalent to ::walk jsframe | ::jsframe [-v]. This iterates the known JS stack frames and prints out the summary or detailed view.

::jsprint [-a] [-d depth]

Given a JavaScript object of class Object, Array, Function, Number, Boolean, or special value (like null or undefined), print it out. Except for functions, the result should closely resemble the JavaScript syntax you would use to define the object inline. For example:

> fc070719::jsprint
    base: 1320886447,
    weighbyrange: false,
    height: 281,
    width: 624,
    nbuckets: 50,
    max: 11538462,
    nsamples: 30,
    hue: [ 21 ],
    linear: false,
    min: 11538462,

Note: Not all types of objects are supported, but we haven’t yet encountered those that aren’t, and the dmod should behave sanely if they are encountered.

With -a, prints the address of the V8 heap object representing each JavaScript object so you can inspect it individually. For example:

> fc070719::jsprint -a
fc070719: {
    base: fbf4ec4d: 1320886447,
    weighbyrange: fe88e0ad: false,
    height: 232: 281,
    width: 4e0: 624,
    nbuckets: 64: 50,
    max: 160203c: 11538462,
    nsamples: 3c: 30,
    hue: fc043179: [ 2a: 21 ],
    linear: fe88e0ad: false,
    min: 160203c: 11538462,

Since many objects are circular (or just large), the -d option is provided to control how deep to recurse. The default recursion depth is 2.

Note: You may see expected special values like “undefined”, “null”, “true”, and “false”. There’s an additional special value called “hole” which represents holes in an array. For example:

> fc070709::jsprint
    [],    [],

indicates that the array’s length has somehow been extended beyond the last actual element so the remaining elements are holes.

Inspecting V8-level state

Since not all information is contained directly in JavaScript state, it may be useful to inspect V8-level heap objects directly. To make sense of this, you must understand a few basics about the V8 heap. The details contained here are obviously subject to change at any time as V8 itself evolves.

Pointers in the heap are word-aligned, but the low-order bits are used to encode some information about the item. Specifically, most heap objects belong to one of two classes: SMI (small integer) values and HeapObjects.

  • If the low-order bit is 0, then the “pointer” actually represents an SMI whose integer value is represented using the upper 31 bits. For example, the “pointer” value 0x36de actually denotes the integer 7023 (0x36de>>1). There’s no object in memory corresponding to this number; the pointer word represents the value directly.
  • If the low-order bit is 1, then the pointer refers to a HeapObject. This is an actual memory object allocated on the JS heap. The actual object begins at the address with the low-order 1 cleared, but for convenience the dcmds take care of this so you do not need to subtract 1 explicitly. For example, 0xfe88e05d represents an object that actually starts at fe88e05c, but you’d still use fe88e05d as the argument to the various dcmds that inspect V8 state.

Heap Object type information

Various C++ classes are used to represent objects of the heap, including JSFunction, JSObject, JSArray, Oddball (for undefined, null, true, and false), HeapNumber (for non-SMI numbers), and various types of String. You can list all of the ones that the dmod knows about with ::v8classes.

There’s a related but separate set of type values (a C++ enum): FUNCTION_TYPE, JSOBJECT_TYPE, JSARRAY_TYPE, ODDBALL_TYPE, etc. For strings, the type field also encodes whether the string is encoded in ASCII or two-byte encoding and whether it’s represented as a sequence of bytes or a ConsString. You can list the type values that the dmod knows about with ::v8types.

All of the various heap object classes are subclasses of HeapObject. HeapObject defines a field called “map” (of class Map, itself a HeapObject) which describes the layout of the object in detail. Most importantly, the Map includes the type of the object.

Using the type information in a HeapObject’s map, and a separate built-in mapping of type values to C++ classes, the dmod can take an arbitrary pointer and tell you what C++ class it must be. This is what the ::v8type dcmd does:

> fc070719::v8type
0xfc070719: JSObject

> fbf4ec4d::v8type
0xfbf4ec4d: HeapNumber

> fc043179::v8type
0xfc043179: JSArray

> fe88e0ad::v8type
0xfe88e0ad: Oddball: "false"

> 160203c::v8type
0x160203c: SMI: value = 11538462

(The above examples are taken from the JSObject printed with ::jsprint -a in the example above.)

The dmod also knows the layout of these C++ classes. You can use ::v8print to print out an object as an instance of its C++ class (which is automatically inferred as described above):

> fc070719::v8print
fc070719 JSObject {
    fc070719 HeapObject < Object < MaybeObject  {
        fc070718 map = fe85e7e1 (Map)
    fc07071c properties = fc07b9c9 (FixedArray)
    fc070720 elements = fe88e021 (FixedArray)

> fc043179::v8print
fc043179 JSArray {
    fc043179 JSObject {
        fc043179 HeapObject < Object < MaybeObject  {
            fc043178 map = fe84caa1 (Map)
        fc04317c properties = fe88e021 (FixedArray)
        fc043180 elements = fc0480f1 (FixedArray)
    fc043184 length = 2 (SMI: value = 1)

A few notes about this output:

  • The output is designed to look like MDB’s “::print -at”.
  • The type printed for each field is the dynamically-detected type of the object pointed to by the field, not the type declared in the source for that field. This is useful because many fields are declared with type “Object” in the source but are actually a more specific type.
  • ::v8print prints out the fields from base classes as though these were objects contained within the subclass object (which is essentially how it actually works). In this case, a JSArray is a JSObject, which is a HeapObject. ::v8print prints the base class fields first (omitting the empty struct for classes with no fields, as in Object and MaybeObject above).
  • Note that for the JSArray object at address fc043179, the base class JSObject is necessarily at the same address, and so on until we get to the first field, which is actually at offset fc043178, 1 less than the object itself for the reasons described above.

Note: Some fields may be missing because the dmod does not know about every field in every class.

Type-specific dcmds

There are two additional dcmds which are useful for printing out instances of certain specific V8 classes. You can always use ::v8print, but these commands give additional information.

::v8str [-v]

Prints the contents of any ASCII V8 string, regardless of representation. There are basically three representations:

  • SeqString: characters stored sequentially in memory (like a conventional C string)
  • ConsString: stores two pointers to Strings that are semantically concatenated
  • ExternalString: stores a pointer to a string outside the V8 heap. These are used very rarely. The representation depends on the type of the pointed-to value. The dmod assumes these are Node strings.

With -v, this dcmd shows what it did. For example:

> fc3e2081::v8str

> fc3e2081::v8str -v
    ptr1: fe8a7ba1
    ptr2: fc417a75
        length: 7, will read: 7
        value: "exports"
        length: 10, will read: 10
        value: ".bucketize"

In this case, the original pointer is a ConsString. Each pointer in this top-level ConsString is itself a SeqString. In general, these pointers may themselves be ConsStrings that must be followed recursively.


Prints the contents of a V8 array. This is mostly useful for exploring and debugging the dmod. For example, here’s how you can get the contents of a JSArray:

> fc043179::v8print
fc043179 JSArray {
    fc043179 JSObject {
        fc043179 HeapObject < Object < MaybeObject  {
            fc043178 map = fe84caa1 (Map)
        fc04317c properties = fe88e021 (FixedArray)
        fc043180 elements = fc0480f1 (FixedArray)
    fc043184 length = 2 (SMI: value = 1)

We see that this is a JSArray with length 1 (since “length” is stored as an SMI with value 2). We can use ::v8array to print out the elements of the “elements” FixedArray:

> fc0480f1::v8array

and then we can print out the one element’s value:

> 2a::v8print
SMI: value = 21

We can check our work with ::jsprint, which does all this automatically:

> fc043179::jsprint
[ 21 ]

Putting it all together

To reinforce all of this, we’ll get the details about a particular stack frame manually. First, walk the stack:

> ::walk jsframe

We’ll take the top frame. Use ::jsframe -v to get a pointer to the frame’s “func” object:

> 8046a9c::jsframe -v
8046a9c <anonymous> (as exports.bucketize)
    func: fc435fcd
    file: /opt/local/lib/node/.npm/cabase/1.0.3vrelease-20110901-20110901-3-gc693a90-dirty/package/node_modules/.npm/heatmap/0.0.1/package/lib/heatmap.js
    posn: position 7838
    arg1: fc070709 (JSArray)
    arg2: fc070719 (JSObject)

We’ve cheated because we have the function and file names already, but we’ll show how to get this from just the function pointer, which is a fixed offset of -0×8 from the frame pointer. We only used ::jsframe to get the function pointer.

Print out the function:

> fc435fcd::v8print
fc435fcd JSFunction {
    fc435fcd JSObject {
        fc435fcd HeapObject < Object < MaybeObject  {
            fc435fcc map = fe84d8a1 (Map)
        fc435fd0 properties = fe88e021 (FixedArray)
        fc435fd4 elements = fe88e021 (FixedArray)
    fc435fdc prototype_or_initial_map = fe88e0cd (Oddball: "hole")
    fc435fe0 shared = fc4348c1 (SharedFunctionInfo)
    fc435fe8 literals = fc436021 (FixedArray)
    fc435fec next_function_link = fe88e05d (Oddball: "undefined")

The interesting details are all in the associated SharedFunctionInfo:

> fc4348c1::v8print
fc4348c1 SharedFunctionInfo {
    fc4348c1 HeapObject < Object < MaybeObject  {
        fc4348c0 map = fe84c681 (Map)
    fc4348c4 name = fe88e2f1 (SeqAsciiString: "")
    fc4348d0 construct_stub = fe86ea21 (Code)
    fc4348d4 instance_class_name = fe88e199 (SeqAsciiString: "Object")
    fc4348d8 function_data = fe88e05d (Oddball: "undefined")
    fc4348dc script = fc434581 (Script)
    fc4348e0 debug_info = fe88e05d (Oddball: "undefined")
    fc4348e4 inferred_name = fc3e2081 (ConsString: "exports.bucketize")
    fc4348e8 initial_map = fe88e05d (Oddball: "undefined")
    fc4348ec this_property_assignments = fe88e021 (FixedArray)
    fc4348f4 length = 4 (SMI: value = 2)
    fc4348f8 formal_parameter_count = 4 (SMI: value = 2)
    fc4348fc expected_nof_properties = 14 (SMI: value = 10)
    fc434900 num_literals = 8 (SMI: value = 4)
    fc434904 start_position_and_type = 7ac0 (SMI: value = 15712)
    fc434908 end_position = 3968 (SMI: value = 7348)
    fc43490c function_token_position = 1e9e (SMI: value = 3919)
    fc434910 compiler_hints = 0 (SMI: value = 0)
    fc434914 this_property_assignments_count = 0 (SMI: value = 0)
    fc434918 opt_count = 0 (SMI: value = 0)

We can already see that the name is “”, indicating an anonymous function, and the inferred name is “exports.bucketize”. If ::v8print hadn’t printed the function name, we could get it with ::v8str:

> fc3e2081::v8str

To get the script name, follow the “script” field:

> fc434581::v8print
fc434581 Script {
    fc434581 Struct {
        fc434581 HeapObject < Object < MaybeObject  {
            fc434580 map = fe84c581 (Map)
    fc434584 source = fc130021 (
    SeqAsciiString: "(function (exports, require, module, __filename, __dirname) { /*
     * Copyright 2010 Joyent, Inc.  All rights reserved.
     * Use is subject to license terms.

    var assert = require('assert');
    var libpng = require('png');
    var sys = require()
    fc434588 name = fc4120bd (
    SeqAsciiString: "/opt/local/lib/node/.npm/cabase/1.0.3vrelease-20110901-20110901-3-gc693a90-dirty/package/node_modules/.npm/heatmap/0.0.1/package/lib/heatmap.js")
    fc43458c line_offset = 0 (SMI: value = 0)
    fc434590 column_offset = 0 (SMI: value = 0)
    fc434594 data = fe88e05d (Oddball: "undefined")
    fc434598 context_data = fe88e05d (Oddball: "undefined")
    fc43459c wrapper = fc400bf1 (<unknown>)
    fc4345a0 type = 4 (SMI: value = 2)
    fc4345a4 compilation_type = 0 (SMI: value = 0)
    fc4345a8 line_ends = fe88e05d (Oddball: "undefined")
    fc4345ac id = 96 (SMI: value = 75)
    fc4345b0 eval_from_shared = fe88e05d (Oddball: "undefined")
    fc4345b4 eval_from_instructions_offset = 0 (SMI: value = 0)

See the “name” field. (Interestingly, the “Script” object includes as a member the source code of the script itself, but we only print out the first 255 characters or so.)

You can now see how the JavaScript-level dcmds work based on the lower-level V8 facilities.

Finally, we can look back at the ::jsframe output to get the argument pointers and then print out the corresponding values. We saw that arg2 was “fc070719″, a JSObject, and we can print it out with ::jsprint:

> fc070719::jsprint
    base: 1320886447,
    weighbyrange: false,
    height: 281,
    width: 624,
    nbuckets: 50,
    max: 11538462,
    nsamples: 30,
    hue: [ 21 ],
    linear: false,
    min: 11538462,

The way the dmod extracts the object’s properties is surprisingly complex and beyond the scope of this document.

I hope this documentation is helpful for people looking to debug Node programs. Expect more information and examples in the coming weeks.

[1] For the curious, the postmortem support for V8 is contained in two illumos commits (1, 2) plus a V8 patch that has not yet landed. The illumos commits include support for the versions of V8 from Node.js 0.4.8 through 0.6.6 (and possibly later) so that you don’t actually need the V8 patch yet.

Where does your Node program spend its time?

Photo by Julian Lim (flickr)

Performance analysis is one of the most difficult challenges in building production software. If a slow application isn’t spending much time on CPU, it could be waiting on filesystem (disk) I/O, network traffic, garbage collection, or many other things. We built the Cloud Analytics tool to help administrators and developers quickly identify these sources of latency in production software. When the app is actually spending its time on CPU, the next step is figuring out what it’s doing: what functions it’s executing and who’s calling them.

One approach is to sample the program at say, 1000 times per second, and examine the 5-10 most frequently seen stacktraces. DTrace lets you do this with the profile provider, whose output might include something like this for a C program:`mutex_trylock_adaptive+0x137              ← stacktrace`mutex_lock_impl+0xdc`mutex_lock+0x10
                7                                       ← count

... (more stacktraces and associated counts)

Brendan has a great example of using this to find a 2000x win in GNU grep. Knowing that many samples shared a particular stacktrace like this one often points you right to the performance problem. But for dynamic environments like Python or Node.js, this information is very hard to obtain. Although the V8 VM supports profiling, it has to be enabled when you start the program, and this adds overhead to the entire execution of the program. That’s a non-starter for most production deployments. And although you can use DTrace to profile such programs, by default it sees only the native stack frames. In environments like Python, this corresponds to interpreter frames, which are fairly useless:`PyObject_Free+0xe`PyObject_Realloc+0xcb`_PyString_Resize+0x7f`get_line+0x333`file_readline+0xa8`PyCFunction_Call+0x63`PyEval_EvalFrameEx+0x4099`PyEval_EvalCodeEx+0x7d8`PyEval_EvalCode+0x2f`run_mod+0x41`PyRun_FileExFlags+0x7e`PyRun_SimpleFileExFlags+0x32c`PyRun_AnyFileExFlags+0x69`Py_Main+0xa70

You can’t tell what Python-level functions are being called here. In Node.js things are even worse because the VM compiles JavaScript functions and invokes them directly on the heap, so you often don’t see any symbols at all:


However, through a mechanism called a ustack helper, DTrace can be taught to understand V8 stack frames, giving you a stacktrace that looks more like this (colors and labels added for clarity):`gettimeofday+0x7                                  ←  libc gettimeofday call
    Date at  position                                           ←  JavaScript "Date" function
    << adaptor >>
    << constructor >>                                           ← (invoked as a constructor)
    (anon) as at timers.js position 7590
    (anon) as Socket._write at net.js position 21336
    (anon) as Socket.write at net.js position 19714
    << adaptor >>
    (anon) as OutgoingMessage._writeRaw at http.js position 21526
    (anon) as OutgoingMessage._send at http.js position 20434
    << adaptor >>
    (anon) as OutgoingMessage.end at http.js position 35062     ← re-entering node library
    << adaptor >>
    (anon) as a at /home/dap/work-server.js line 25
    handle at /home/dap/work-server.js line 13                  ← invoking our request handler
    (anon) as EventEmitter.emit at events.js position 3532
    << adaptor >>
    (anon) as parser.onIncoming at http.js position 80542
    (anon) as parser.onHeadersComplete at http.js position 4904 ← parsing HTTP headers
    << internal >>
    << entry >>                                                 ← entering JavaScript

The helper is built into the “node” binary and translates each stack frame that it understands into a text description that typically includes the function name, the inferred name (for anonymous functions), and the file and position where the function is defined. Unrecognized stack frames are left alone, so you can see native C/C++ frames mixed with JavaScript frames. If line numbers are available, the position includes the line number where the function was defined; if not, only the character position in the file is printed. V8 only has line number information available for files that have previously been used in an exception stack trace.

Reading from the bottom up, we see the initial entry into JavaScript from node, then some frames in the node library for processing HTTP parser events, then the call into our program to handle the HTTP request, a call back to the node library to finish sending the response, then native code invoked by the Date constructor invoked as part of sending the response.

There are a few frames that you might be surprised to see if you’re not familiar with V8 internals:

  • entry: frame transitioning from C++ to JavaScript
  • exit: frame transitioning from JavaScript to C++
  • construct: the above frame’s function is being invoked as a constructor
  • adaptor: adaptor used when a function is invoked with fewer arguments than it expects
  • internal: internal V8 frame (does not correspond to a JavaScript frame)

Now that we can have DTrace output JavaScript-friendly stacktraces, we can use the profile provider as described above to identify common stacktraces and thus find out where any Node program is spending its time.


The DTrace ustack helper landed in the upcoming Node v0.6.7 and the tools for generating a flame graph are available on github. With this, you’ve got all the pieces to try this out on your own programs, though there are a few caveats to be aware of for now.

  1. First, you must be running on an Illumos- (or Solaris-)based system. ustack helpers don’t work at all on Mac OS and appear to be broken on FreeBSD. If you don’t have such a system and want to play around, try a SmartOS live CD (it’s really easy).
  2. You must also have access to modify kernel tunables, as the default limits are too low to support the V8 helper. (We’ll be rolling these tunable changes across the Joyent cloud so that SmartMachine customers can skip this step, but that hasn’t been completed yet.) Use MDB or /etc/system to set dtrace_helper_actions_max from 32 (the default) to at least 128:
    # echo "dtrace_helper_actions_max/W 0t128" | mdb -kw
    dtrace_helper_actions_max:      0x20            =       0x80

    (update: modern versions of SmartOS need no tuning so you can skip this step.)

  3. Build Node with ustack helper support. Run ./configure --with-dtrace and you’ll get support for the ustack helper as well as additional DTrace probes.
  4. Run your program as normal. I’ll use this sample program:
     * This script demonstrates a simple stack trace involving an anonymous
     * function, then goes into an infinite loop so you can more easily catch it
     * with DTrace.  One caveat: we grab a stack trace at the start to force V8 to
     * compute line number information for this script.  If we didn't have this,
     * DTrace would print out position-in-file information instead of line numbers.
    new Error().stack;
    function main() { func1(); }
    function func1() { func2(); }
    function func2()
    	(function () {
    		for (;;)

    which I run like this:

    $ node loop-anon.js &
    [1] 1791
  5. Now we’ll use this DTrace invocation to profile our program for 60 seconds and aggregate on stacktrace:
    # dtrace -n 'profile-997/execname == "node" && arg1/ {
        @[jstack(40, 2000)] = count(); } tick-60s { exit(0); }' > out.stacks

    The script automatically stops after 60 seconds. At this point, the output file is already pretty useful, showing JavaScript stacktraces sorted by frequency count. We can also use this file to generate flame graphs (below).

    If you’re not familiar with DTrace, this invocation has a few important pieces:

    • profile-997: Samples on all CPUs at 997Hz. We use 997 instead of the more obvious 1000 to avoid oversampling periodic activity.
    • execname == "node": the probe’s body only fires when the currently running program is “node” to avoid sampling other processes on the system. You could omit this and sample everything.
    • arg1: arg1 for profile probes is the current user-level instruction pointer, so testing against zero here ensures that we only fire the probe body for samples that interrupt userland code.
    • @[jstack(40, 2000)] = count();: the body of the probe populates an aggregation that counts the number of times each userland stacktrace has been seen. We use jstack to collect the userland stack trace. We gather 40 frames and give the ustack helper 2000 bytes of buffer for translating the stack frames.
    • tick-60s { exit(0); }: causes the DTrace invocation to exit after 60 seconds
    • > out.stacks: saves the output to out.stacks rather than stdout.

If you run through these steps with the above example program, you should see all samples with the same or nearly identical stacktraces, since the program quickly enters a tight loop. For a more interesting example, I ran this on one of our Cloud Analytics services. From the output you can see that not all of the stacktraces are complete; see below for details on that.

Scaling up

We believe this DTrace-based profiling support for Node.js will be a huge leap for understanding the performance of individual Node programs in production. We’re also exploring ways of scaling this up to many instances of the same program, or cases where there’s no single hot stacktrace but rather one long-running function that calls many different functions during its execution. For such cases, we’ve been looking at flame graphs to visualize overlapping stacktraces. Here’s a flame graph based on the above output that shows where the Cloud Analytics aggregator (itself a Node.js program) spends its time:

Try clicking that graph for an interactive SVG where you can inspect individual frames.

From here we can see that most time is spent generating heatmaps (not surprisingly, if you saw our OSCON or Surge presentations that talk about our architecture). We can also tell how much time is spent bucketizing the data in preparation for rendering the heatmap vs. generating the PNG image itself.

You can easily generate flame graphs for your own programs. Collect data as shown above, then use the stackcollapse and flamegraph tools to generate the graph.


With the V8 ustack helper, you can profile a running Node program to see what functions it’s spending time in. Being DTrace-based, you can switch this on and off at will and do it safely in production with literally zero overhead when not enabled and minimal overhead when enabled. (There are also ustack helpers for Python and Java.)

There are still some cases that the ustack helper doesn’t handle well that lead to some artifacts in the output. For example, in the above aggregator flame graph, three of the frames called from caAggrValueHeatmapImage are actually the same function, but because the helper doesn’t resolve them, they appear as distinct hex addresses and so show up as three silos instead of one.

Another artifact appears to result from the fact that internal V8 stack frames don’t use frame pointers. This is a common optimization that destroys observability because you can’t collect a stacktrace without frame pointers. More investigation is needed to determine whether this is actually the problem and whether it would be a reasonable tradeoff to disable this optimization.

Nevertheless, for us this tool has been a floodlight in an otherwise pitch-black room. It’s unfortunate that ustack helpers don’t work on Mac OS or FreeBSD, but perhaps with the increased attention on performance analysis in dynamic environments like Node, those systems will address the underlying issues. At Joyent, we’ll be updating both the kernel and the node binaries that we ship to fully support DTrace-based profiling out of the box, and we’re exploring ways of incorporating this data into Cloud Analytics.

USDT Providers Redux

In this post I’m going to review DTrace USDT providers and show a complete working example that I hope will be a useful reference for people interested in building providers for their own applications.

First, the prerequisites:

  • DTrace is the comprehensive dynamic tracing framework available on Illumos-based, BSD, and MacOS systems. If you’ve never used DTrace, check out; this post assumes you’ve at least played around with it.
  • USDT (Userland Statically Defined Tracing) is the mechanism by which application developers embed DTrace probes directly into an application. This allows users to trace semantically meaningful operations like “request-start”, rather than having to know which function implements the operation. More importantly, since USDT probes are part of the source code, scripts that use them continue working even as the underlying software evolves and the implementing functions are renamed and deleted.[1]

Tektronix MSO 4034 (photo by Rostislav Lisovy (flickr))

Back in 2009 I posted about how we put together an Apache USDT provider for the Sun Storage appliances. Although I wasn’t able to include any of the actual source, this post became the de-facto documentation on implementing a USDT provider with translated arguments. Two years later, as part of our work on Cloud Analytics we’ve written (from scratch) a new Apache provider called mod_usdt, and this time the complete source is available on github.

I hope this provider can be a useful reference for others building USDT providers. There are plenty of other providers out there already, but many of them don’t use translated arguments (see below), and most of them are (naturally) embedded inside a complex application and build system, so it’s hard to pick out what’s actually relevant. Since this module is built separately from Apache, nearly 100% of the code in mod_usdt relates directly to the provider.

The main files are these:

  • src/httpd.d: defines the structure translators. Delivered into /usr/lib/dtrace and loaded by dtrace(1M) at run-time.
  • src/httpd_provider.d: defines the provider probes and arguments. Linked into the module binary at link-time.
  • src/httpd_provider_impl.h: defines the structure passed from Apache to DTrace at run-time.
  • src/usdt.c: implements an Apache module using hooks at the beginning and end of each request to fire the corresponding DTrace probes

There are also some example DTrace scripts in the repo for tracing request latency; see the README for details.  Between these various pieces, you have everything you need to define and use the provider.

At this point, if you want to understand the pieces of the implementation, your best bet is to go read all the code in mod_usdt. It’s not actually that much code, and it’s well documented. The rest of this post describes how the surrounding mechanism works.

USDT probes

The provider implementation uses macros in the source to define probe points:

    HTTPD_REQUEST_START(request->method, request->uri);

This example (not actually from mod_usdt) includes two probe arguments, which can be accessed from D scripts to filter or aggregate on this information (the HTTP method and URI in this example). Passing simple arguments works well for simple probes, but starts getting unwieldy as you add a bunch more arguments. So DTrace also allows you to define C-style structs that are passed as arguments to the probes. To do this, you must define translators that take the actual arguments from the application and populate a struct for use in the probe. (Don’t worry if you didn’t catch all that. This will become clearer below.)

Building the shared object

We start with the provider definition file, httpd_provider.d, which defines probes like this:

probe request__start(dthttpd_t *p) : (conninfo_t *p, httpd_rqinfo_t *p);

This defines a probe called “request-start”. The application will pass a pointer to dthttpd_t, and the actual D probes will get a conninfo_t and a httpd_rqinfo_t.

When you run “make”, the first step is to take this provider definition generate a header file that defines the macros that the application uses to fire the probes:

dtrace -xnolibs -h -o build/httpd_provider.h -s src/httpd_provider.d

The header file has definitions like this:

#define HTTPD_REQUEST_START(arg0) \

Conceptually, the application uses this macro to fire the probe. (As we’ll see, there’s some magic involved.) When you compile the file:

gcc -Wall -Werror -fPIC  -Ibuild -I/opt/local/include/apr-1 -I/opt/local/include -I/opt/local/include/db4 -D_REENTRANT -I/usr/include -DLDAP_DEPRECATED -I/opt/local/include/httpd -c -o build/usdt.o src/usdt.c

The resulting object file indeed includes a call to the special __dtrace function, but there’s another important step next:

dtrace -xnolibs -G -o build/httpd_provider.o -s src/httpd_provider.d build/usdt.o

The “dtrace -G” pass iterates each of the objects (just usdt.o in this case), identifies these function calls, and for each one records which probe is being fired and the location in the program text[2]. It then replaces the function calls with “nop” instructions, which are ignored by the CPU. A new object file (httpd_provider.o) is generated that includes the probe and location information in a special SUNW_dof section. This object also contains an “_init” function — we’ll get to this shortly. To wrap up the build, we link these objects into the final shared library:

gcc -shared -fPIC  -o build/ build/usdt.o build/httpd_provider.o

Loading the module

When the library is actually loaded, the _init function transmits the information in the SUNW_dof section down to the DTrace kernel module. See the source for details. With this, DTrace knows which probes are available for this process and their locations in memory.

Enable the probes

When a user runs dtrace(1M) to instrument a USDT probe, DTrace replaces the “nop” instructions at all of that probe’s call sites with “int 3″ (0xcc) instructions, which will cause a fast trap into the kernel. The DTrace kernel module can tell (from the location of the trapping instruction) that this corresponds to an enabled probe, so it fires the probe and returns back to userland.

When the probes are disabled again (i.e. when you CTRL-C the dtrace(1M) process), the “int 3″ instructions are changed back to nops.

The upshot of all this is that when DTrace is not enabled, the application runs essentially the same as it did before USDT support was added at all. There’s no overhead to just having DTrace support.

Translated arguments

The files in /usr/lib/dtrace define structures and translators that take arguments passed into probes from the application and convert them to semantically meaningful structures. For details, see src/httpd.d in mod_usdt.

As described above, you can get away without translated arguments by passing primitives (ints and pointers, with which you can access strings too) directly to the probes. That way, you don’t have to deliver files into /usr/lib/dtrace.

And a few more things

Believe it or not, all of the above is a gross simplification of what actually goes on. To see all the details of implementing a provider, you’ll have to read all of mod_usdt. For a taste of how complex the mechanism for firing userland probes really is, check out Bryan’s post on what happens when magic collides.

Also, I explained above that there’s no overhead for disabled probes. That’s almost true, as long as there’s no code required to set up the arguments for the DTrace probe. But often times, you do want to include more complex arguments. To minimize the overhead of setting them up when DTrace is not enabled, you can use ISENABLED macros to tell at run-time whether the probe is enabled. mod_usdt uses these to avoid setting up the structure if the probes are not enabled.


USDT is extremely powerful for tracing semantically meaningful events in userland applications. While implementing a provider doesn’t require a lot of work per se, the steps required are not very obvious. I hope that mod_usdt will serve as a useful reference for those looking to add USDT to their own application, besides being useful in its own right for those using Apache.

[1] This contrasts with the pid provider, which allows users to trace almost any instruction, function entry, and function exit in an application. This flexibility makes the pid provider extremely valuable for ad-hoc investigation, but the resulting dependence on application implementation details makes it unsuitable for stable tools.

[2] The “dtrace -G” pass emits relocations, since the final locations of these instructions won’t be known until the run-time linker runs.

Node.js/V8 postmortem debugging

2016 Update: The commands and output in this post are pretty dated now, but you can find an up-to-date docs (including a tutorial) in the mdb_v8 user guide.

Photo by Auntie P (flickr)

I recently wrote an article for ACM Queue discussing postmortem debugging for dynamic environments. I argued that while native environments for years have provided rich tools for understanding software failures postmortem, most popular dynamic environments have not developed similar facilities, and this gap is becoming more important as these environments grow popular for building complex core components of large distributed systems.

In building Cloud Analytics, we knew we needed a robust facility for postmortem debugging. The CA service comprises several components: a few global services that manage state, an agent on each server in the datacenter that collects performance data, and a cluster of data aggregators. Debugging distributed systems is always hard, and it’s even harder when the software is running on other people’s production systems.

We built a module called node-panic that dumps predefined program state in the event of a crash. It also has a tool for manually triggering a crash in the event of a hard-hang (e.g., infinite loop). These tools have been extremely valuable in helping us root-cause issues in both production and development environments. But they’re really just a first step, and as I explain in the article, this approach has several important limitations, mostly owing to the fact that node-panic is pure JavaScript and so isn’t baked into the VM itself and doesn’t have access to a lot of important program state.

A next step

More recently I’ve been playing around with building an MDB module for inspecting V8 state. (Recall that V8 is the blazingly fast JavaScript virtual machine in both Node.js and Chrome, and MDB is the illumos modular debugger, an extensible tool for analyzing both live programs and process core dumps.) With this approach, you can attach MDB to either a running Node program, a core file manually created using gcore(1), or a core file generated as a result of a Node crash (e.g., segfault).  Importantly, you don’t need a debug build of node for this so you can use it in production.

Here’s a very simple example program:

dap@devel ~ $ cat -n node-stack/examples/loop-anon.js
     1	/*
     2	 * This script demonstrates a simple stack trace involving an anonymous
     3	 * function, then goes into an infinite loop so you can catch it with the
     4	 * debugger.  One caveat: we grab a stack trace at the start to force V8 to
     5	 * compute line number information for this script.  If we didn't have this, the
     6	 * debugger would only print out position-in-file information.
     7	 */
     8	new Error().stack;
    10	function main() { func1(); }
    11	function func1() { func2(); }
    13	function func2()
    14	{
    15		(function () {
    16			for (;;)
    17				;
    18		})();
    19	}
    21	main();

Note the comment at the top of the script. V8 computes line number information for a given script lazily. If it has never done this for a particular script, the debugger module won’t be able to print out the line number.  This example grabs a stacktrace at the start of the script to make sure V8 has computed line numbers for it.  (There’s an example of the other case below.)

Now I’ll run this script with a stock node build and attach MDB:

dap@devel ~ $ node node-stack/examples/loop-anon.js &
[1] 400524

dap@devel ~ $ mdb -p 400524
Loading modules: [ ]
> ::status
debugging PID 400524 (32-bit)
file: /usr/bin/node
threading model: native threads
status: stopped by debugger

Now I’ll grab a JavaScript stack trace:

> ::load
> ::jsstack
804738c <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon.js line 15
80473a4 func2 at /home/dap/node-stack/examples/loop-anon.js line 13
80473bc func1 at /home/dap/node-stack/examples/loop-anon.js line 1180473d4 main at /home/dap/node-stack/examples/loop-anon.js line 10
80473f0 <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon.js line 1
8047424 <InternalFrame>
804745c <anonymous> (as Module._compile) at module.js line 340
804747c <anonymous> (as Module._extensions) at module.js line 406
80474a0 <anonymous> (as Module.load) at module.js line 324
80474cc <anonymous> (as Module._load) at module.js line 257
80474ec <anonymous> (as Module.runMain) at module.js line 419
8047518 <anonymous> (as startup.processNextTick.process._tickCallback) at node.js line 120
8047534 <InternalFrame>
8047568 <EntryFrame>
fe84e240 <None>

If V8 hadn’t computed the line number information for our script, the stack trace would look more like this:

> ::jsstack
804738c <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon-noinfo.js
position 814
80473a4 func2 at /home/dap/node-stack/examples/loop-anon-noinfo.js position 772
80473bc func1 at /home/dap/node-stack/examples/loop-anon-noinfo.js position 710
80473d4 main at /home/dap/node-stack/examples/loop-anon-noinfo.js position 652
80473f0 <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon-noinfo.js
position 2
8047424 <InternalFrame>
804745c <anonymous> (as Module._compile) at module.js position 19476
804747c <anonymous> (as Module._extensions) at module.js position 23260
80474a0 <anonymous> (as Module.load) at module.js position 18526
80474cc <anonymous> (as Module._load) at module.js position 15200
80474ec <anonymous> (as Module.runMain) at module.js position 23916
8047518 <anonymous> (as startup.processNextTick.process._tickCallback) at node.js
position 7224
8047534 <InternalFrame>
8047568 <EntryFrame>
fe84e240 <None>

Besides getting a stack trace, you can examine individual frames in more detail using the frame pointer:

> 80473a4::jsframe -v
frame pointer: 80473a4
    function: fd862b8d (JSFunction)
        shared function info: fc615229 (SharedFunctionInfo)
            function name: fc6941a9 (ascii SeqString: "func2")
            function token position: SMI: value = 515
            script: fc614f4d (Script)
                script name: fd85ec05 (ascii ConsString: "
                line endings: fd863605 (FixedArray)
            function token at: line 13

and you can inspect individual heap objects as well. Given a pointer to any JavaScript heap object, “::jsprint” will figure out what type it is and dump a representation of the corresponding C++ class inside the V8 VM:

> fd862b8d::jstype
0xfd862b8d: JSFunction

> fd862b8d::jsprint
fd862b8d JSFunction {
    fd862b8d JSObject {
        fd862b8d HeapObject < Object < MaybeObject  {
            fd862b8c map = fe82df41 (Map)
        fd862b90 properties = fe86e021 (FixedArray)
        fd862b94 elements = fe86e021 (FixedArray)
    fd862b9c prototype_or_initial_map = fe86e0cd (Oddball)
    fd862ba0 shared = fc615229 (SharedFunctionInfo)
    fd862ba8 literals = fe86e021 (FixedArray)
    fd862bac next_function_link = fe86e05d (Oddball "undefined")

This is still very much a prototype, but we think this could be very powerful. In the meantime, there are some issues to be solved with the existing implementation and several more problems to address.


The heap inspector requires pretty deep knowledge of the internal memory layout of the V8 heap. How can this be maintained in a reasonable way as V8 itself evolves?

For C programs shipped with illumos, we use CTF at build-time to encode type definitions in the binary itself. CTF is compact enough to be shipped in production binaries, so MDB understands all in-memory structures for these programs on all illumos systems. Application-specific debugger commands can be implemented in terms of CTF so that they don’t have to be built specifically for a particular application binary.

This approach is tougher for C++ programs, which have additional information like class hierarchies that can’t be expressed with CTF. But while V8 uses C++ classes (and an inheritance hierarchy) to represent different heap object types, it doesn’t use standard C++ member fields. Instead, each class defines static constants describing the offsets of its member fields, and macros are used to define accessors in terms of these constants. This actually makes it easier to build and maintain a C debugger module because it’s much easier to export these constants as part of the build process than to reverse-engineer how a C++ structure gets laid out in memory. If this proves to be the right approach, then we could save these constants into the binary using CTF and then build the debugger module independent of the V8 binary.


Today, this module can dump a simple stack trace and provide basic inspection of heap objects. But there’s a lot more that would be useful:

  • global variables
  • arguments and local variables for each function on the stack
  • list of events pending on the event queue
  • list of function contexts to be invoked in the future (e.g., when pending I/O completes)

I’ve also only tested it on pretty simple examples so far. My examples have included JavaScript frames for named and anonymous functions, constructor frames, argument adaptor frames, and simple entry frames. I haven’t yet tried optimized frames, exit frames, or an exit/entry sequence.

This module has already helped us understand enough to write a basic DTrace ustack helper for Node programs, which allows similar stack traces to be gathered via DTrace while the program is running. In future blog posts I’ll talk more about the DTrace helper as well as what I’ve learned about V8, which I hope will be useful for others hacking around with it.