Tracing Node.js add-on latency

Node.js has some built-in DTrace probes for looking at HTTP request start/done and GC start/done, and I wrote nhttpsnoop to report latency based on these probes. There’s also a future project that will make it easy to trace any asynchronous operations. But in the meantime, there’s one category of latency that’s missing, which is latency of asynchronous operations kicked off by add-ons.

As an example, I’ll use node-sqlite3. Below, I have sqlite-demo.js, which is a version of the demo program that’s been modified to run in a loop where it opens the database, inserts some records, selects some records, and closes the database. As this program runs, the add-on kicks off async operations to open the database, run queries, and close the database. We’re going to trace those operations to see how long each type of operation takes.

Below that, I have trace-addon-latency.d, a D script that traces the libuv uv_queue_work function as well as all functions in all add-ons in the target Node process. This is a little heavy-handed for what we’re trying to do, and can have a noticeable impact on program performance, depending on the add-on. For many of the small add-ons I use, the impact is negligible.

By running the program and the D script for a few minutes, then stopping the D script, you get output that looks like this:

  node_sqlite3.node`node_sqlite3::Database::Work_BeginExec(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@@@@@@@@@@@@@@@                        6018
            8192 |@@@@@                                    1590
           16384 |@                                        302
           32768 |@@@@@@@@@@@@@@@@@                        5722
           65536 |                                         156
          131072 |                                         0

There’s a histogram like this for each “kind” of asynchronous operation that each add-on kicks off. By “kind”, I’m referring to the C++ function that kicks off the operation. In this case, Work_BeginExec kicked off 6018 operations that took between 4us and 8us.

This is a quick way to get a look at what async operations your Node program is doing (from add-ons) and how long they’re taking. This works on OS X and SmartOS, and should work on BSD and any other system with DTrace.

Full steps to reproduce

$ mkdir demo
$ cd demo
$ npm install sqlite3
$ curl -O https://gist.github.com/davepacheco/104e29e6aa2a5a3c1dca/raw/bbcd859c7469ed20b7126c56f3a5c45dc43986e1/sqlite-demo.js
$ curl -O https://gist.github.com/davepacheco/104e29e6aa2a5a3c1dca/raw/45c7e0da1b5aeb0cced2163e525e7caffb1a6ebe/trace-addon-latency.d
$ node sqlite-demo.js &
$ dtrace -s trace-addon-latency.d -p $!     # you may need "sudo" here, particularly on OS X

Let that run for a little while, then kill either one of the “node” or “dtrace” processes. (Killing the “node” one causes “dtrace” to exit, since it was attached to it.) That will cause “dtrace” to exit and print the results.

sqlite-demo.js

var sqlite3 = require('sqlite3').verbose();
go();

function go()
{
    var db = new sqlite3.Database(':memory:');
    db.serialize(function () {
            db.run("CREATE TABLE lorem (info TEXT)");
            var stmt = db.prepare("INSERT INTO lorem VALUES (?)");
            for (var i = 0; i < 10; i++) {
                    stmt.run("Ipsum " + i);
            }
            stmt.finalize();
            db.exec("SELECT rowid AS id, info FROM lorem", function () {
                    db.close();
                    go();
            });
    });
}

trace-addon-latency.d

#!/usr/sbin/dtrace -s

/*
 * trace-addon-latency.d: Run this script as:
 *
 *     dtrace -s trace-addon-latency.d -p PID
 *
 * to trace the latency of asynchronous operations kicked off by all add-ons in
 * the node process PID.  NOTE: this will instrument all native functions in all
 * add-ons, which can have a noticeable performance impact, depending on the
 * add-on.
 */

pid$target::uv_queue_work:entry
{
    callers[arg1] = ucaller;
    donefunc[arg1] = arg3;
    starttime[arg1] = timestamp;
}

pid$target:*.node::entry
/donefunc[arg0]/
{
    @[ufunc(callers[arg0])] = quantize(timestamp - starttime[arg0]);
    callers[arg0] = 0;
    starttime[arg0] = 0;
    donefunc[arg0] = 0;
}

trace-addon-latency.out

  node_sqlite3.node`node_sqlite3::Database::Work_BeginClose(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |                                         167
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@                8529
           16384 |@@@@@@@                                  2326
           32768 |@@@@@@@@                                 2740
           65536 |                                         26
          131072 |                                         0

  node_sqlite3.node`node_sqlite3::Database::Work_BeginExec(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@@@@@@@@@@@@@@@                        6018
            8192 |@@@@@                                    1590
           16384 |@                                        302
           32768 |@@@@@@@@@@@@@@@@@                        5722
           65536 |                                         156
          131072 |                                         0

  node_sqlite3.node`node_sqlite3::Database::Work_BeginOpen(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@                                        286
            8192 |@@@@@@@@@@@@@@@@@@                       6064
           16384 |@@                                       642
           32768 |@@@@@@@@@@@@@@@@@@@                      6701
           65536 |                                         94
          131072 |                                         1
          262144 |                                         0

  node_sqlite3.node`node_sqlite3::Statement::Work_BeginPrepare(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@@@                                    3604
            8192 |@@@@@@@@@@@@                             8376
           16384 |@@                                       1341
           32768 |@@@@@@@@@@@@@@@@                         10831
           65536 |@@@@@                                    3423
          131072 |                                         2
          262144 |                                         0

  node_sqlite3.node`node_sqlite3::Statement::Work_BeginRun(node_sqlite3::Statement::Baton*)
           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@@@@@@@@@@@@@@@@@@@@@                  87884
            8192 |@@@@@                                    19478
           16384 |@@                                       8692
           32768 |@@@@@@@@                                 32160
           65536 |@                                        3455
          131072 |                                         0

Stopping a broken program in its tracks

Last week I debugged a Node issue where under some conditions, fork() failure could result in some file descriptors being closed, leading shortly to bedlam. In the best cases, the program would immediately crash, but often it would just do the wrong thing. This happens a lot, of course, and in many ways it’s worse than a crash because you have no core dump nor even a stack trace to go on.

In my write-up of the bug, I glossed over a key step to debugging the problem:

"Using DTrace and reproducing on SmartOS, I stopped the
process at the moment close() returned EBADF, and caught
this stack trace ..."

I want to explain this technique in more detail because it’s very useful for getting a stack trace or core dump when a program starts going south, even if the program isn’t actually going to crash. (This technique isn’t specific to Node.js or SmartOS, either, though it’s easier with Node.js and on SmartOS.)

In this case, I was debugging a production problem that I reproduced in development, which made things much easier. I was free to kill the process to create a core dump without worrying about disrupting any users. Since the Node program was blowing up on a “close, EBADF” error, I used this DTrace one-liner to make the program dump core precisely when that happened (instead of barreling on with the wrong file descriptor):

# dtrace -w -n 'syscall::close:return
  /execname == "node" && arg1 == -1 && errno == EBADF/
  { trace(pid); raise(SIGABRT); exit(0); }'

Taking apart the script:

  • dtrace -w: The “-w” flag to “dtrace” enables destructive actions. Without “-w”, DTrace won’t let you do anything that can significantly change the behavior of running programs (like raising a signal, in this case).
  • syscall::close:return: We’re tracing returns from the “close” system call. We’re tracing the kernel, not a particular process, so this will catch all closes in all processes.
  • /execname == "node" && arg1 == -1 && errno == EBADF/: We’re only going to take action for processes called “node”, where the return value from “close” is -1, and the kernel errno is EBADF.
  • { trace(pid); raise(SIGABRT); exit(0); }: The operative piece here is raise(SIGABRT), which immediately sends SIGABRT to the current process. (The reason for trace(pid) is so that you know what process was killed, and the reason for exit(0) is to end the D script as soon as this happens. If you don’t exit and you get the predicate wrong, you can accidentally do a lot of damage!)

The default behavior on receiving SIGABRT is to dump core immediately (see signal.h). It’s possible to alter this behavior, but very few programs do, so this technique almost always works.

Once I had a core file, I fired up MDB to get a JavaScript stack trace, and from there it was easy to figure out what had gone wrong. For details, see the Node issue.

What about production?

I mentioned that in this case, I’d reproduced the problem in development, so I was free to kill the process. What if (as so often happens) the first time we saw this problem was in production, and we wanted to get a core dump at exactly the right time, but without causing the process to crash? That’s not actually much harder, at least on systems with a program like gcore(1):

# dtrace -w -n 'syscall::close:return
  /execname == "node" && arg1 == -1 && errno == EBADF/
  { trace(pid); stop(); system("gcore %d; prun %d", pid, pid); exit(0); }'

This one is very similar to the previous script, except that instead of raising SIGABRT, we use stop() to stop the process in its tracks, and asynchronously run gcore PID (to save the core dump) followed by prun PID (to start the process running again). This is still technically destructive, because we’re stopping the process for however long it takes to save the core dump, but in most cases this isn’t too long and the disruption is insignificant.

To recap: you can use DTrace for more than just data collection, including debugging non-fatal buggy program behavior. DTrace can instrument very specific events on the system (like “the close syscall in a Node process returned -1 with EBADF” or any application-level event), and you can use that to take action like stopping the process to take a core dump and then resuming it again. Putting these pieces together makes it possible to gather all kinds of debugging information from a program that’s behaving badly.

Node.js in production: runtime log snooping

This post also appeared on the Joyeur blog.

Log

This post is the one of several about how we operate Node.js in production at Joyent. Most of my experience comes from working on our Manta Storage Service, which currently comprises a few thousand Node processes across three datacenters.

Logging is one of the most primitive but most valuable forms of postmortem debugging. Logs let you figure out why a program crashed (or did the wrong thing) after it’s already done so. In this post I’ll describe a critical feature of the Bunyan logging library that we use daily to help debug failures in production: runtime log snooping.

The problem

In many production systems, there’s a nasty tradeoff: the more verbose your logs, the more information you have to debug issues, but the greater the overhead of constructing, recording, transmitting, storing, and analyzing these logs as well. Log frameworks typically manage this with log levels: you use “debug” in development, which spews all kinds of information about what the program is doing, but you use “info” or some higher level in production, which filters only the “important” messages.

But it’s often in production that you really need the debug messages that are too expensive to emit in production. It’s when you’ve been paged in the middle of the night because some service’s error count has exceeded some threshold. If you’re lucky, the failure is simple enough that the program can tell you exactly what’s happened, but more often, particularly in distributed systems, where failure modes aren’t so crisp, the error will be somewhat vague. You’ll need more information about what the program was doing in order to figure out what went wrong.

In many of these cases, restarting the service will bring it back to life, but it will also eliminate any hope of root causing the problem. You can already see yourself getting paged again a few nights later for the same issue…

Bunyan

Trent Mick at Joyent wrote a Node module called Bunyan that we use in all of our services to log everything from free-form text intended for humans to programmatically consumable events that trip alarms or feed into our billing system. Bunyan logs are just newline-separated JSON records. A few fields have special meaning, like “time” and “msg”, but programs can log whatever other JSON structures they want for interpretation by humans or other programs. (In Manta, we use Bunyan log records for billing. These log entries include structured payloads that describe the billable operation, the user, the result, and so on.1) The “bunyan” command-line tool facilitates easy filtering, colorizing output, and so on, while programmatic consumers can parse each line as a JSON object.

There are lots of logging libraries out there, but Bunyan’s killer feature is the ability to snoop debug-level logs in production, without restarting (or otherwise impacting) the service. When an application calls log.debug("something went wrong"), and the log level is “info” (i.e., higher than “debug”), the bunyan library drops the message, as you’d expect. When you want to get at these debug messages, you fire up “bunyan -p *”, which is just sugar for:

$ dtrace -Z -x strsize=4k -x switchrate=10hz \ -q -n 'bunyan*:::log-*{ printf("%s\n", copyinstr(arg0)); }' | bunyan

When you run this, you’re enabling DTrace probes built into Bunyan. Enabling these probes actually modifies your Node program’s behavior to fire the probe just before dropping the log message. Firing the probe traps into the kernel, where the log message is buffered and eventually funneled up to the “dtrace” process and printed out to your terminal.

This is a subtle point, but it’s very important: enabling these probes doesn’t cause the debug messages to appear in the regular log, but rather it causes these messages to be collected, buffered by the kernel, and then made available to your “dtrace” process. That means your service is still running at near full speed, not going through the code paths that write entries to the log. Importantly, if the volume of messages gets too high, DTrace will drop them rather than slow down the process. (You can tune the “switchrate” and buffer sizes to control that.)

When you don’t need the extra logging any more, CTRL-C the “bunyan” (or “dtrace”) process. The program modifications are undone, and your program goes on dropping debug log entries.

Voilá! By using Bunyan as a logging library, you get the ability to walk up to a production service and enable debug logging without restarting it. This has proven critical for a number of cases where it’s too expensive to log debug messages all the time, but we had a live service behaving badly, and restarting the service would likely have cleared the issue without giving us a chance to root-cause it.

Try it yourself

I ran these examples on OS X because it’s convenient, but all of our production services run on SmartOS (an illumos distribution). This facility works on any system that supports DTrace USDT.

If you’ve got Node installed already, it’s ridiculously easy to try this out. First, create a temporary directory and install bunyan locally:

dap@sharptooth $ npm install bunyan

then start a Node REPL, load up bunyan, and create a logger at level “info”:

dap@sharptooth $ node
> var r = require('bunyan')
undefined
> var l = new r({ 'name': 'test', 'level': 'info' })
undefined

At this point, you can log “info” messages, but “debug” messages will be hidden:

> l.info('test info message')
{"name":"test","hostname":"sharptooth.local","pid":1076,"level":30,"msg":"test info message","time":"2013-09-27T18:13:00.820Z","v":0}
undefined
> l.debug('something went wrong')
undefined

Now in another shell, use “bunyan -p”:

# In another shell:
dap@sharptooth $ sudo bunyan -p '*'   # sudo required for DTrace

That will block waiting for something to happen. Back in the first shell, log that debug message again:

> l.debug('something went wrong')
undefined

and over in the second shell, you should see something like this:

[2013-09-27T18:16:19.679Z] DEBUG: test/1076 on sharptooth.local: something went wrong

Success! Because we used bunyan -p '*', we’ll see log messages from all Node processes on the system. You can also specify a specific pid instead of ‘*’ to see only messages from that process.

Dynamic probes for your application

Bunyan’s dynamic probes are implemented using Chris Andrews’s node-dtrace-provider module, which makes it very easy to add custom probes to any application. (The whole Bunyan integration with DTrace is about 20 lines of code.)

Another great example that we also use heavily is node-restify, which creates DTrace probes for the beginning and end of each HTTP request, as well as each of the phases of request processing (auth, parsing, logging, and so on). You can enable these probes in production to get a fine-grained look at request latency, including what parts of each request are taking a long time.

 

In this post, I described Bunyan, the Node.js logging library we use at Joyent. I explained how we use DTrace probes built into Bunyan to extract more debug information from a production service than we can normally extract without disrupting the service itself. We focus on tools for production because the hardest issues often show up only in production deployments, and the production environment is far more constrained than development. By building tools designed for production use, we’re able to debug failures both in the field and in development, in many cases root-causing them the first time we see them.


1 Manta service logs are a good example of both the value of structured logs and Manta’s ability to process these logs. Manta service logs include both debug messages and more structured messages used to generate user’s usage reports and bills. These logs are uploaded periodically to Manta. To generate each user’s report, we run a map-reduce Manta job over these logs that parses each record and tallies up reports per-user. Similarly, when we want to see how often some particular error message has been seen, we run a Manta job to grep over the logs (literally just “grep PATTERN”). For more on using Manta for log analysis, check out Wanelo’s post on business data analytics using Manta.