Debugging Node.js in Production (Fluent slides)

Thanks to all who attended my talk at Fluent today on “Debugging Node.js in Production”. The slides are available here. There’s some extra content there that I didn’t have time to cover in just 40 minutes, most notably some implementation notes about mdb_v8 and the DTrace ustack helper. Please leave questions, comments, or feedback below (or @dapsays)!

Posted on May 31, 2012 at 4:43 pm by dap · Permalink · 2 Comments
In: Node.js

Debugging RangeError from a core dump

Last week, I tweeted:

I had just run into this nasty Node.js error:

$ node foo.js 

timers.js:96
            if (!process.listeners('uncaughtException').length) throw e;
                                                                      ^
RangeError: Maximum call stack size exceeded

What went wrong? It was reasonably obvious from the error message that the program blew its stack, which I assumed was likely the result of some errant recursive function, which was surprising, because I didn’t know I was using any recursive functions. But given that the problem is too many function invocations on the stack, the obvious question is: what’s on the stack? But the RangeError exception doesn’t include a stack trace. Now what do I do?

I’d previously been playing around with the idea of having Node dump core via abort(3C) when an exception is thrown that is not caught. The idea is that with a core dump, we could use mdb_v8 to examine the stack at the time that the program crashed, including function arguments and any other heap objects we can find. This would be much richer than just the stack trace that most fatal failures leave behind.

I ran my program with this modified Node, it dumped core as expected, and I opened the core file. As expected, the stack was huge, but it was pretty clear what the pattern was:

> ::jsstack
...
80469c4 0x2bf66683 <anonymous> (as ReadStream._emitData) (3a84f56d)
80469dc 0x2bf66b7b <anonymous> (as ReadStream.resume) (3a84f5d9)
8046a14 0x2bf73766 emitNextRecord (3a889829)
8046a44 0x2bfaa133 gotRecord (3a8897f5)
8046a74 0x2bf74c18 handleLogLine (3a8898f9)
8046aa8 0x2bf7703f <anonymous> (as stream.on.leftover) (3a897151)
8046ae0 0x2bf2d2bc <anonymous> (as EventEmitter.emit) (3a83a579)
8046af8 0x2bf0db41 <ArgumentsAdaptorFrame>

8046b18 0x2bf66683 <anonymous> (as ReadStream._emitData) (3a84f56d)
8046b30 0x2bf66b7b <anonymous> (as ReadStream.resume) (3a84f5d9)
8046b68 0x2bf73766 emitNextRecord (3a889829)
8046b98 0x2bfaa133 gotRecord (3a8897f5)
8046920 0x2bf74c18 handleLogLine (3a8898f9)
8046bfc 0x2bf7703f <anonymous> (as stream.on.leftover) (3a897151)
8046c34 0x2bf2d2bc <anonymous> (as EventEmitter.emit) (3a83a579)
8046c4c 0x2bf0db41 <ArgumentsAdaptorFrame>

8046c6c 0x2bf66683 <anonymous> (as ReadStream._emitData) (3a84f56d)
...

Starting from the bottom, we see that Node is emitting a “data” event on a ReadStream object, which invokes my “leftover” callback, which calls a couple of my program’s internal functions, one of which calls resume() on the same read stream. Then we do it all over again.

We can inspect the arguments to see what data is being emitted. The program in question was the bunyan log reader, so the data turned out to be random log contents, but I was able to verify that multiple calls to my “data” callback were getting the exact same data. This took me to the resume() function in Node.js, where I found the root cause:

ReadStream.prototype.resume = function() {
  this.paused = false;

  if (this.buffer) {
    this._emitData(this.buffer);
    this.buffer = null;
  }

On resume, we emit any buffered data, and then remove it. But if emitting the data causes us to call resume() again (as it did here), then we emit the same data again and end up in this infinite loop until we run out of stack space and crash. I reported the issue on Friday, and koichik immediately fixed the bug. (Thanks!)

This turned out to be a very minor bug in Node core, but the consequences for my program were pretty serious: fatal failure, with almost no information left behind to debug it. If I hadn’t had this experimental V8 around, I probably would have resorted to commenting out half my code at a time, binary searching until I found the bad code. I could also have tried the debugger, but I’d still have to have some idea where to set breakpoints, which boils down to the same search problem. (Stepping through doesn’t work, since the problem is triggered by an asynchronous event.) This makes me wonder: how do people debug RangeErrors today?

While the V8 change to dump core on an uncaught exception is actually quite simple, I’ve been putting it off while I consider the best way to expose it without breaking other V8 users. After this experience, I’m thinking it’s worth trying to do sooner rather than later!

Posted on May 14, 2012 at 2:43 pm by dap · Permalink · 2 Comments
In: Joyent, Node.js

Profiling Node.js

(For returning readers, this is basically a “tl;dr” version of my previous post on Node.js performance. The post below also appears on the Node.js blog.)

It’s incredibly easy to visualize where your Node program spends its time using DTrace and node-stackvis (a Node port of Brendan Gregg’s FlameGraph tool):

  1. Run your Node.js program as usual.
  2. In another terminal, run:
    $ dtrace -n 'profile-97/execname == "node" && arg1/{
        @[jstack(100, 8000)] = count(); } tick-60s { exit(0); }' > stacks.out

    This will sample about 100 times per second for 60 seconds and emit results to stacks.out. Note that this will sample all running programs called “node”. If you want a specific process, replace execname == "node" with pid == 12345 (the process id).

  3. Use the “stackvis” tool to transform this directly into a flame graph. First, install it:
    $ npm install -g stackvis

    then use stackvis to convert the DTrace output to a flamegraph:

    $ stackvis dtrace flamegraph-svg < stacks.out > stacks.svg
  4. Open stacks.svg in your favorite browser.

You’ll be looking at something like this:

This is a visualization of all of the profiled call stacks. This example is from the “hello world” HTTP server on the Node.js home page under load. Start at the bottom, where you have “main”, which is present in most Node stacks because Node spends most on-CPU time in the main thread. Above each row, you have the functions called by the frame beneath it. As you move up, you’ll see actual JavaScript function names. The boxes in each row are not in chronological order, but their width indicates how much time was spent there. When you hover over each box, you can see exactly what percentage of time is spent in each function. This lets you see at a glance where your program spends its time.

That’s the summary. There are a few prerequisites:

There are a few other notes:

For more on the underlying pieces, see my previous post on Node.js profiling and Brendan’s post on Flame Graphs.

.

Posted on April 25, 2012 at 10:21 am by dap · Permalink · 4 Comments
In: DTrace, Joyent, Node.js, SmartOS · Tagged with: , , ,

Managing Node.js dependencies with shrinkwrap

This post also appears on the Node.js blog.


Photo by Luc Viatour (flickr)

Managing dependencies is a fundamental problem in building complex software. The terrific success of github and npm have made code reuse especially easy in the Node world, where packages don’t exist in isolation but rather as nodes in a large graph. The software is constantly changing (releasing new versions), and each package has its own constraints about what other packages it requires to run (dependencies). npm keeps track of these constraints, and authors express what kind of changes are compatible using semantic versioning, allowing authors to specify that their package will work with even future versions of its dependencies as long as the semantic versions are assigned properly.

This does mean that when you “npm install” a package with dependencies, there’s no guarantee that you’ll get the same set of code now that you would have gotten an hour ago, or that you would get if you were to run it again an hour later. You may get a bunch of bug fixes now that weren’t available an hour ago. This is great during development, where you want to keep up with changes upstream. It’s not necessarily what you want for deployment, though, where you want to validate whatever bits you’re actually shipping.

Put differently, it’s understood that all software changes incur some risk, and it’s critical to be able to manage this risk on your own terms. Taking that risk in development is good because by definition that’s when you’re incorporating and testing software changes. On the other hand, if you’re shipping production software, you probably don’t want to take this risk when cutting a release candidate (i.e. build time) or when you actually ship (i.e. deploy time) because you want to validate whatever you ship.

You can address a simple case of this problem by only depending on specific versions of packages, allowing no semver flexibility at all, but this falls apart when you depend on packages that don’t also adopt the same principle. Many of us at Joyent started wondering: can we generalize this approach?

Shrinkwrapping packages

That brings us to npm shrinkwrap[1]:

NAME
       npm-shrinkwrap -- Lock down dependency versions

SYNOPSIS
       npm shrinkwrap

DESCRIPTION
       This  command  locks down the versions of a package's dependencies so
       that you can control exactly which versions of each  dependency  will
       be used when your package is installed.

Let’s consider package A:

{
    "name": "A",
    "version": "0.1.0",
    "dependencies": {
        "B": "<0.1.0"
    }
}

package B:

{
    "name": "B",
    "version": "0.0.1",
    "dependencies": {
        "C": "<0.1.0"
    }
}

and package C:

{
    "name": "C,
    "version": "0.0.1"
}

If these are the only versions of A, B, and C available in the registry, then a normal “npm install A” will install:

A@0.1.0
└─┬ B@0.0.1
  └── C@0.0.1

Then if B@0.0.2 is published, then a fresh “npm install A” will install:

A@0.1.0
└─┬ B@0.0.2
  └── C@0.0.1

assuming the new version did not modify B’s dependencies. Of course, the new version of B could include a new version of C and any number of new dependencies. As we said before, if A’s author doesn’t want that, she could specify a dependency on B@0.0.1. But if A’s author and B’s author are not the same person, there’s no way for A’s author to say that she does not want to pull in newly published versions of C when B hasn’t changed at all.

In this case, A’s author can use

# npm shrinkwrap

This generates npm-shrinkwrap.json, which will look something like this:

{
    "name": "A",
    "dependencies": {
        "B": {
            "version": "0.0.1",
            "dependencies": {
                "C": {  "version": "0.1.0" }
            }
        }
    }
}

The shrinkwrap command has locked down the dependencies based on what’s currently installed in node_modules. When “npm install” installs a package with a npm-shrinkwrap.json file in the package root, the shrinkwrap file (rather than package.json files) completely drives the installation of that package and all of its dependencies (recursively). So now the author publishes A@0.1.0, and subsequent installs of this package will use B@0.0.1 and C@0.1.0, regardless the dependencies and versions listed in A’s, B’s, and C’s package.json files. If the authors of B and C publish new versions, they won’t be used to install A because the shrinkwrap refers to older versions. Even if you generate a new shrinkwrap, it will still reference the older versions, since “npm shrinkwrap” uses what’s installed locally rather than what’s available in the registry.

Using shrinkwrapped packages

Using a shrinkwrapped package is no different than using any other package: you can “npm install” it by hand, or add a dependency to your package.json file and “npm install” it.

Building shrinkwrapped packages

To shrinkwrap an existing package:

  1. Run “npm install” in the package root to install the current versions of all dependencies.
  2. Validate that the package works as expected with these versions.
  3. Run “npm shrinkwrap”, add npm-shrinkwrap.json to git, and publish your package.

To add or update a dependency in a shrinkwrapped package:

  1. Run “npm install” in the package root to install the current versions of all dependencies.
  2. Add or update dependencies. “npm install” each new or updated package individually and then update package.json.
  3. Validate that the package works as expected with the new dependencies.
  4. Run “npm shrinkwrap”, commit the new npm-shrinkwrap.json, and publish your package.

You can still use npm outdated(1) to view which dependencies have newer versions available.

For more details, check out the full docs on npm shrinkwrap, from which much of the above is taken.

Why not just check node_modules into git?

One previously proposed solution is to “npm install” your dependencies during development and commit the results into source control. Then you deploy your app from a specific git SHA knowing you’ve got exactly the same bits that you tested in development. This does address the problem, but it has its own issues: for one, binaries are tricky because you need to “npm install” them to get their sources, but this builds the [system-dependent] binary too. You can avoid checking in the binaries and use “npm rebuild” at build time, but we’ve had a lot of difficulty trying to do this.[2] At best, this is second-class treatment for binary modules, which are critical for many important types of Node applications.[3]

Besides the issues with binary modules, this approach just felt wrong to many of us. There’s a reason we don’t check binaries into source control, and it’s not just because they’re platform-dependent. (After all, we could build and check in binaries for all supported platforms and operating systems.) It’s because that approach is error-prone and redundant: error-prone because it introduces a new human failure mode where someone checks in a source change but doesn’t regenerate all the binaries, and redundant because the binaries can always be built from the sources alone. An important principle of software version control is that you don’t check in files derived directly from other files by a simple transformation.[4] Instead, you check in the original sources and automate the transformations via the build process.

Dependencies are just like binaries in this regard: they’re files derived from a simple transformation of something else that is (or could easily be) already available: the name and version of the dependency. Checking them in has all the same problems as checking in binaries: people could update package.json without updating the checked-in module (or vice versa). Besides that, adding new dependencies has to be done by hand, introducing more opportunities for error (checking in the wrong files, not checking in certain files, inadvertently changing files, and so on). Our feeling was: why check in this whole dependency tree (and create a mess for binary add-ons) when we could just check in the package name and version and have the build process do the rest?

Finally, the approach of checking in node_modules doesn’t really scale for us. We’ve got at least a dozen repos that will use restify, and it doesn’t make sense to check that in everywhere when we could instead just specify which version each one is using. There’s another principle at work here, which is separation of concerns: each repo specifies what it needs, while the build process figures out where to get it.

What if an author republishes an existing version of a package?

We’re not suggesting deploying a shrinkwrapped package directly and running “npm install” to install from shrinkwrap in production. We already have a build process to deal with binary modules and other automateable tasks. That’s where we do the “npm install”. We tar up the result and distribute the tarball. Since we test each build before shipping, we won’t deploy something we didn’t test.

It’s still possible to pick up newly published versions of existing packages at build time. We assume force publish is not that common in the first place, let alone force publish that breaks compatibility. If you’re worried about this, you can use git SHAs in the shrinkwrap or even consider maintaining a mirror of the part of the npm registry that you use and require human confirmation before mirroring unpublishes.

Final thoughts

Of course, the details of each use case matter a lot, and the world doesn’t have to pick just one solution. If you like checking in node_modules, you should keep doing that. We’ve chosen the shrinkwrap route because that works better for us.

It’s not exactly news that Joyent is heavy on Node. Node is the heart of our SmartDataCenter (SDC) product, whose public-facing web portal, public API, Cloud Analytics, provisioning, billing, heartbeating, and other services are all implemented in Node. That’s why it’s so important to us to have robust components (like logging and REST) and tools for understanding production failures post mortem, profile Node apps in production, and now managing Node dependencies. Again, we’re interested to hear feedback from others using these tools.


[1] Much of this section is taken directly from the “npm shrinkwrap” documentation.

[2] We’ve had a lot of trouble with checking in node_modules with binary dependencies. The first problem is figuring out exactly which files not to check in (*.o, *.node, *.dynlib, *.so, *.a, …). When Mark went to apply this to one of our internal services, the “npm rebuild” step blew away half of the dependency tree because it ran “make clean”, which in dependency ldapjs brings the repo to a clean slate by blowing away its dependencies. Later, a new (but highly experienced) engineer on our team was tasked with fixing a bug in our Node-based DHCP server. To fix the bug, we went with a new dependency. He tried checking in node_modules, which added 190,000 lines of code (to this repo that was previously a few hundred LOC). And despite doing everything he could think of to do this correctly and test it properly, the change broke the build because of the binary modules. So having tried this approach a few times now, it appears quite difficult to get right, and as I pointed out above, the lack of actual documentation and real world examples suggests others either aren’t using binary modules (which we know isn’t true) or haven’t had much better luck with this approach.

[3] Like a good Node-based distributed system, our architecture uses lots of small HTTP servers. Each of these serves a REST API using restify. restify uses the binary module node-dtrace-provider, which gives each of our services deep DTrace-based observability for free. So literally almost all of our components are or will soon be depending on a binary add-on. Additionally, the foundation of Cloud Analytics are a pair of binary modules that extract data from DTrace and kstat. So this isn’t a corner case for us, and we don’t believe we’re exceptional in this regard. The popular hiredis package for interfacing with redis from Node is also a binary module.

[4] Note that I said this is an important principle for software version control, not using git in general. People use git for lots of things where checking in binaries and other derived files is probably fine. Also, I’m not interested in proselytizing; if you want to do this for software version control too, go ahead. But don’t do it out of ignorance of existing successful software engineering practices.

Posted on February 27, 2012 at 11:54 am by dap · Permalink · Comments Closed
In: Joyent, Node.js · Tagged with: ,

Playing with Node/V8 postmortem debugging

“Post Mortem” by C. MacLaurin

Several weeks ago I posted about postmortem debugging for Node.js, a critical technique for understanding fatal software failure (and thereby keeping up software quality). Now that the underlying pieces are freely available[1], you can use the documentation below to start debugging your own Node programs. With these tools you can inspect the call stack and arbitrary JavaScript heap objects for both live programs and core dumps. Documentation is often dry reading, but I’ve worked to keep this document concise and relevant. You can read this start-to-finish in half an hour and you’ll learn a bunch about V8 internals by exploring your own node program. If you don’t care about this level of detail, just read my older post to get a sense of what this tool does. Shortly I’ll post an example of using this tool to debug a problem we weren’t able to debug with existing tools.

This tool works out of the box for Node.js on the latest SmartOS release. So if you’ve got a SmartOS live CD and any release of Node.js since 0.4.8, these tools will just work. If you’re running an older version of SmartOS, another illumos-based system, or a Joyent SmartMachine that hasn’t yet been upgraded to the latest SmartOS, you can just download the binaries that you need. The instructions below cover both of these cases.


Introduction

The V8 mdb dmod facilitates inspection of V8-level and JS-level state from MDB, the illumos modular debugger. If you’re not familiar with MDB, you may want to skim the MDB man page. The syntax may take some getting used to, but you can probably pick up enough to get started from the examples below.

The various dcmds and walkers are prefixed with “v8″ or “js” depending on which level they operate on.

The JavaScript-level dcmds are:

  walk jsframe              - walk V8 JavaScript stack frames
  dcmd jsframe              - summarize a JavaScript stack frame
  dcmd jsstack              - print a JavaScript stacktrace
  dcmd jsprint              - print a JavaScript object

The V8-level dcmds are:

  dcmd v8array              - print elements of a V8 FixedArray
  dcmd v8classes            - list known V8 heap object C++ classes
  dcmd v8load               - load canned config for a specific V8 version
  dcmd v8print              - print a V8 heap object
  dcmd v8str                - print the contents of a V8 string
  dcmd v8type               - print the type of a V8 heap object
  dcmd v8types              - list known V8 heap object types

Getting started

Step 1: Run MDB

First, run your program. Then attach with MDB (using mdb -p <pid>) or take a core dump with gcore(1M) and run MDB on the core file.

Step 2: Load the dmod

The dmod is not loaded by default. You must explicitly load it with:

> ::load v8

If this fails, you’re running an older version of SmartOS and you’ll need to download the v8.so binary and use ::load /path/to/v8.so. (You have to use ::load ./v8.so even if the file is in the same directory so that MDB knows you’re giving it a relative path; by default, MDB searches the dynamic library path, which probably doesn’t include “.”.)

Step 3: Configure the dmod

The dmod requires knowledge of internal V8 structures that change between versions of V8. Some builds of V8 include the required information directly in the binary, in which case the dmod will auto-configure itself and you’ll see this when you initially load the dmod:

> ::load v8
Loaded V8 support.

If you don’t see that, your version of Node doesn’t have support built-in, but you can load a pre-canned configuration using ::v8load, which supports arguments for “node-0.4″ (for v0.4.8 and later) and “node-0.6.5″ (which works on v0.6.5 and v0.6.6, and possibly later versions).

Summary

To summarize how you’d start debugging a core file from node 0.6.5:

$ mdb core.node.jpc
Loading modules: [ libc.so.1 ld.so.1 ]
> ::load v8
> ::v8load node-0.6.5
V8 dmod configured based on node-0.6.5

or, if you downloaded the binaries separately above:

$ ./mdb core.node.jpc
Loading modules: [ libc.so.1 ld.so.1 ]
> ::load ./v8.so
> ::v8load node-0.6.5
V8 dmod configured based on node-0.6.5

Inspecting JavaScript-level state

::walk jsframe

Iterates stack frames and prints out the frame pointers. Starts with %ebp, and then follows frame pointers until it finds something that doesn’t look like a V8 JavaScript stack frame. This is very much like MDB’s $C macro but only prints the frame pointers. To get more details, pipe this to ::jsframe.

Note: libv8 is typically compiled with -fomit-frame-pointer, so if mdb catches “node” while it’s somewhere inside of V8, you may not be able to get a stacktrace at all.

::jsframe [-v]

Given a frame pointer, prints out a summary of the stack frame. With no arguments, prints a one-line summary for use in a stacktrace that includes the frame pointer, the name of the function (which may be inferred), the script where it was defined, and either the line number or character position in the file where the function was defined. The line number information will only be available if V8 has computed that information, which it does on a per-script basis as needed.

Example 1: anonymous function, invoked with a name, with line number information present:

> 8046df0::jsframe
8046df0 <anonymous> (as EventEmitter.emit) at events.js line 38

Example 2: named function with no line number information:

> 8046af8::jsframe
8046af8 caAggrValueHeatmapImage at
/opt/local/lib/node/.npm/cabase/1.0.3vrelease-20110901-20110901-3-gc693a90-dirty/package/lib/ca/ca-agg.js position 48960

Example 3: anonymous function invoked with no name and with no line number information:

> 8047500::jsframe
8047500 <anonymous> (as <anon>) at /home/dap/node-stack/examples/loop-anon.js line 15

With -v, prints out a best guess at function arguments. This uses the “length” property of the Function object to determine how many arguments the function consumes and then prints them from the stack. For example:

> 8046a9c::jsframe -v
8046a9c <anonymous> (as exports.bucketize)
    func: fc435fcd
    file:
/opt/local/lib/node/.npm/cabase/1.0.3vrelease-20110901-20110901-3-gc693a90-dirty/package/node_modules/.npm/heatmap/0.0.1/package/lib/heatmap.js
    posn: position 7838
    arg1: fc070709 (JSArray)
    arg2: fc070719 (JSObject)

You can use ::jsprint to print these objects.

Note: there are several types of non-JavaScript frames that ::jsframe recognizes:

::jsstack [-v]

Equivalent to ::walk jsframe | ::jsframe [-v]. This iterates the known JS stack frames and prints out the summary or detailed view.

::jsprint [-a] [-d depth]

Given a JavaScript object of class Object, Array, Function, Number, Boolean, or special value (like null or undefined), print it out. Except for functions, the result should closely resemble the JavaScript syntax you would use to define the object inline. For example:

> fc070719::jsprint
{
    base: 1320886447,
    weighbyrange: false,
    height: 281,
    width: 624,
    nbuckets: 50,
    max: 11538462,
    nsamples: 30,
    hue: [ 21 ],
    linear: false,
    min: 11538462,
}

Note: Not all types of objects are supported, but we haven’t yet encountered those that aren’t, and the dmod should behave sanely if they are encountered.

With -a, prints the address of the V8 heap object representing each JavaScript object so you can inspect it individually. For example:

> fc070719::jsprint -a
fc070719: {
    base: fbf4ec4d: 1320886447,
    weighbyrange: fe88e0ad: false,
    height: 232: 281,
    width: 4e0: 624,
    nbuckets: 64: 50,
    max: 160203c: 11538462,
    nsamples: 3c: 30,
    hue: fc043179: [ 2a: 21 ],
    linear: fe88e0ad: false,
    min: 160203c: 11538462,
}

Since many objects are circular (or just large), the -d option is provided to control how deep to recurse. The default recursion depth is 2.

Note: You may see expected special values like “undefined”, “null”, “true”, and “false”. There’s an additional special value called “hole” which represents holes in an array. For example:

> fc070709::jsprint
[
    [],    [],
...
    hole,
    hole,
    hole,
]

indicates that the array’s length has somehow been extended beyond the last actual element so the remaining elements are holes.

Inspecting V8-level state

Since not all information is contained directly in JavaScript state, it may be useful to inspect V8-level heap objects directly. To make sense of this, you must understand a few basics about the V8 heap. The details contained here are obviously subject to change at any time as V8 itself evolves.

Pointers in the heap are word-aligned, but the low-order bits are used to encode some information about the item. Specifically, most heap objects belong to one of two classes: SMI (small integer) values and HeapObjects.

Heap Object type information

Various C++ classes are used to represent objects of the heap, including JSFunction, JSObject, JSArray, Oddball (for undefined, null, true, and false), HeapNumber (for non-SMI numbers), and various types of String. You can list all of the ones that the dmod knows about with ::v8classes.

There’s a related but separate set of type values (a C++ enum): FUNCTION_TYPE, JSOBJECT_TYPE, JSARRAY_TYPE, ODDBALL_TYPE, etc. For strings, the type field also encodes whether the string is encoded in ASCII or two-byte encoding and whether it’s represented as a sequence of bytes or a ConsString. You can list the type values that the dmod knows about with ::v8types.

All of the various heap object classes are subclasses of HeapObject. HeapObject defines a field called “map” (of class Map, itself a HeapObject) which describes the layout of the object in detail. Most importantly, the Map includes the type of the object.

Using the type information in a HeapObject’s map, and a separate built-in mapping of type values to C++ classes, the dmod can take an arbitrary pointer and tell you what C++ class it must be. This is what the ::v8type dcmd does:

> fc070719::v8type
0xfc070719: JSObject

> fbf4ec4d::v8type
0xfbf4ec4d: HeapNumber

> fc043179::v8type
0xfc043179: JSArray

> fe88e0ad::v8type
0xfe88e0ad: Oddball: "false"

> 160203c::v8type
0x160203c: SMI: value = 11538462

(The above examples are taken from the JSObject printed with ::jsprint -a in the example above.)

The dmod also knows the layout of these C++ classes. You can use ::v8print to print out an object as an instance of its C++ class (which is automatically inferred as described above):

> fc070719::v8print
fc070719 JSObject {
    fc070719 HeapObject < Object < MaybeObject  {
        fc070718 map = fe85e7e1 (Map)
    }
    fc07071c properties = fc07b9c9 (FixedArray)
    fc070720 elements = fe88e021 (FixedArray)
}

> fc043179::v8print
fc043179 JSArray {
    fc043179 JSObject {
        fc043179 HeapObject < Object < MaybeObject  {
            fc043178 map = fe84caa1 (Map)
        }
        fc04317c properties = fe88e021 (FixedArray)
        fc043180 elements = fc0480f1 (FixedArray)
    }
    fc043184 length = 2 (SMI: value = 1)
}

A few notes about this output:

Note: Some fields may be missing because the dmod does not know about every field in every class.

Type-specific dcmds

There are two additional dcmds which are useful for printing out instances of certain specific V8 classes. You can always use ::v8print, but these commands give additional information.

::v8str [-v]

Prints the contents of any ASCII V8 string, regardless of representation. There are basically three representations:

With -v, this dcmd shows what it did. For example:

> fc3e2081::v8str
exports.bucketize

> fc3e2081::v8str -v
ConsString
    ptr1: fe8a7ba1
    ptr2: fc417a75
    SeqAsciiString
        length: 7, will read: 7
        value: "exports"
    SeqAsciiString
        length: 10, will read: 10
        value: ".bucketize"
exports.bucketize

In this case, the original pointer is a ConsString. Each pointer in this top-level ConsString is itself a SeqString. In general, these pointers may themselves be ConsStrings that must be followed recursively.

::v8array

Prints the contents of a V8 array. This is mostly useful for exploring and debugging the dmod. For example, here’s how you can get the contents of a JSArray:

> fc043179::v8print
fc043179 JSArray {
    fc043179 JSObject {
        fc043179 HeapObject < Object < MaybeObject  {
            fc043178 map = fe84caa1 (Map)
        }
        fc04317c properties = fe88e021 (FixedArray)
        fc043180 elements = fc0480f1 (FixedArray)
    }
    fc043184 length = 2 (SMI: value = 1)
}

We see that this is a JSArray with length 1 (since “length” is stored as an SMI with value 2). We can use ::v8array to print out the elements of the “elements” FixedArray:

> fc0480f1::v8array
2a

and then we can print out the one element’s value:

> 2a::v8print
SMI: value = 21

We can check our work with ::jsprint, which does all this automatically:

> fc043179::jsprint
[ 21 ]

Putting it all together

To reinforce all of this, we’ll get the details about a particular stack frame manually. First, walk the stack:

> ::walk jsframe
8046a9c
8046af8
...

We’ll take the top frame. Use ::jsframe -v to get a pointer to the frame’s “func” object:

> 8046a9c::jsframe -v
8046a9c <anonymous> (as exports.bucketize)
    func: fc435fcd
    file: /opt/local/lib/node/.npm/cabase/1.0.3vrelease-20110901-20110901-3-gc693a90-dirty/package/node_modules/.npm/heatmap/0.0.1/package/lib/heatmap.js
    posn: position 7838
    arg1: fc070709 (JSArray)
    arg2: fc070719 (JSObject)

We’ve cheated because we have the function and file names already, but we’ll show how to get this from just the function pointer, which is a fixed offset of -0×8 from the frame pointer. We only used ::jsframe to get the function pointer.

Print out the function:

> fc435fcd::v8print
fc435fcd JSFunction {
    fc435fcd JSObject {
        fc435fcd HeapObject < Object < MaybeObject  {
            fc435fcc map = fe84d8a1 (Map)
        }
        fc435fd0 properties = fe88e021 (FixedArray)
        fc435fd4 elements = fe88e021 (FixedArray)
    }
    fc435fdc prototype_or_initial_map = fe88e0cd (Oddball: "hole")
    fc435fe0 shared = fc4348c1 (SharedFunctionInfo)
    fc435fe8 literals = fc436021 (FixedArray)
    fc435fec next_function_link = fe88e05d (Oddball: "undefined")
}

The interesting details are all in the associated SharedFunctionInfo:

> fc4348c1::v8print
fc4348c1 SharedFunctionInfo {
    fc4348c1 HeapObject < Object < MaybeObject  {
        fc4348c0 map = fe84c681 (Map)
    }
    fc4348c4 name = fe88e2f1 (SeqAsciiString: "")
    fc4348d0 construct_stub = fe86ea21 (Code)
    fc4348d4 instance_class_name = fe88e199 (SeqAsciiString: "Object")
    fc4348d8 function_data = fe88e05d (Oddball: "undefined")
    fc4348dc script = fc434581 (Script)
    fc4348e0 debug_info = fe88e05d (Oddball: "undefined")
    fc4348e4 inferred_name = fc3e2081 (ConsString: "exports.bucketize")
    fc4348e8 initial_map = fe88e05d (Oddball: "undefined")
    fc4348ec this_property_assignments = fe88e021 (FixedArray)
    fc4348f4 length = 4 (SMI: value = 2)
    fc4348f8 formal_parameter_count = 4 (SMI: value = 2)
    fc4348fc expected_nof_properties = 14 (SMI: value = 10)
    fc434900 num_literals = 8 (SMI: value = 4)
    fc434904 start_position_and_type = 7ac0 (SMI: value = 15712)
    fc434908 end_position = 3968 (SMI: value = 7348)
    fc43490c function_token_position = 1e9e (SMI: value = 3919)
    fc434910 compiler_hints = 0 (SMI: value = 0)
    fc434914 this_property_assignments_count = 0 (SMI: value = 0)
    fc434918 opt_count = 0 (SMI: value = 0)
}

We can already see that the name is “”, indicating an anonymous function, and the inferred name is “exports.bucketize”. If ::v8print hadn’t printed the function name, we could get it with ::v8str:

> fc3e2081::v8str
exports.bucketize

To get the script name, follow the “script” field:

> fc434581::v8print
fc434581 Script {
    fc434581 Struct {
        fc434581 HeapObject < Object < MaybeObject  {
            fc434580 map = fe84c581 (Map)
        }
    }
    fc434584 source = fc130021 (
    SeqAsciiString: "(function (exports, require, module, __filename, __dirname) { /*
     * Copyright 2010 Joyent, Inc.  All rights reserved.
     * Use is subject to license terms.
     */

    var assert = require('assert');
    var libpng = require('png');
    var sys = require()
    fc434588 name = fc4120bd (
    SeqAsciiString: "/opt/local/lib/node/.npm/cabase/1.0.3vrelease-20110901-20110901-3-gc693a90-dirty/package/node_modules/.npm/heatmap/0.0.1/package/lib/heatmap.js")
    fc43458c line_offset = 0 (SMI: value = 0)
    fc434590 column_offset = 0 (SMI: value = 0)
    fc434594 data = fe88e05d (Oddball: "undefined")
    fc434598 context_data = fe88e05d (Oddball: "undefined")
    fc43459c wrapper = fc400bf1 (<unknown>)
    fc4345a0 type = 4 (SMI: value = 2)
    fc4345a4 compilation_type = 0 (SMI: value = 0)
    fc4345a8 line_ends = fe88e05d (Oddball: "undefined")
    fc4345ac id = 96 (SMI: value = 75)
    fc4345b0 eval_from_shared = fe88e05d (Oddball: "undefined")
    fc4345b4 eval_from_instructions_offset = 0 (SMI: value = 0)
}

See the “name” field. (Interestingly, the “Script” object includes as a member the source code of the script itself, but we only print out the first 255 characters or so.)

You can now see how the JavaScript-level dcmds work based on the lower-level V8 facilities.

Finally, we can look back at the ::jsframe output to get the argument pointers and then print out the corresponding values. We saw that arg2 was “fc070719″, a JSObject, and we can print it out with ::jsprint:

> fc070719::jsprint
{
    base: 1320886447,
    weighbyrange: false,
    height: 281,
    width: 624,
    nbuckets: 50,
    max: 11538462,
    nsamples: 30,
    hue: [ 21 ],
    linear: false,
    min: 11538462,
}

The way the dmod extracts the object’s properties is surprisingly complex and beyond the scope of this document.


I hope this documentation is helpful for people looking to debug Node programs. Expect more information and examples in the coming weeks.

[1] For the curious, the postmortem support for V8 is contained in two illumos commits (1, 2) plus a V8 patch that has not yet landed. The illumos commits include support for the versions of V8 from Node.js 0.4.8 through 0.6.6 (and possibly later) so that you don’t actually need the V8 patch yet.

Posted on January 13, 2012 at 11:42 am by dap · Permalink · Comments Closed
In: Joyent, Node.js, SmartOS · Tagged with: , ,