Tracing Node.js add-on latency

Node.js has some built-in DTrace probes for looking at HTTP request start/done and GC start/done, and I wrote nhttpsnoop to report latency based on these probes. There’s also a future project that will make it easy to trace any asynchronous operations. But in the meantime, there’s one category of latency that’s missing, which is latency of asynchronous operations kicked off by add-ons.

As an example, I’ll use node-sqlite3. Below, I have sqlite-demo.js, which is a version of the demo program that’s been modified to run in a loop where it opens the database, inserts some records, selects some records, and closes the database. As this program runs, the add-on kicks off async operations to open the database, run queries, and close the database. We’re going to trace those operations to see how long each type of operation takes.

Below that, I have trace-addon-latency.d, a D script that traces the libuv uv_queue_work function as well as all functions in all add-ons in the target Node process. This is a little heavy-handed for what we’re trying to do, and can have a noticeable impact on program performance, depending on the add-on. For many of the small add-ons I use, the impact is negligible.

By running the program and the D script for a few minutes, then stopping the D script, you get output that looks like this:

  node_sqlite3.node`node_sqlite3::Database::Work_BeginExec(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@@@@@@@@@@@@@@@                        6018
            8192 |@@@@@                                    1590
           16384 |@                                        302
           32768 |@@@@@@@@@@@@@@@@@                        5722
           65536 |                                         156
          131072 |                                         0

There’s a histogram like this for each “kind” of asynchronous operation that each add-on kicks off. By “kind”, I’m referring to the C++ function that kicks off the operation. In this case, Work_BeginExec kicked off 6018 operations that took between 4us and 8us.

This is a quick way to get a look at what async operations your Node program is doing (from add-ons) and how long they’re taking. This works on OS X and SmartOS, and should work on BSD and any other system with DTrace.

Full steps to reproduce

$ mkdir demo
$ cd demo
$ npm install sqlite3
$ curl -O https://gist.github.com/davepacheco/104e29e6aa2a5a3c1dca/raw/bbcd859c7469ed20b7126c56f3a5c45dc43986e1/sqlite-demo.js
$ curl -O https://gist.github.com/davepacheco/104e29e6aa2a5a3c1dca/raw/45c7e0da1b5aeb0cced2163e525e7caffb1a6ebe/trace-addon-latency.d
$ node sqlite-demo.js &
$ dtrace -s trace-addon-latency.d -p $!     # you may need "sudo" here, particularly on OS X

Let that run for a little while, then kill either one of the “node” or “dtrace” processes. (Killing the “node” one causes “dtrace” to exit, since it was attached to it.) That will cause “dtrace” to exit and print the results.

sqlite-demo.js

var sqlite3 = require('sqlite3').verbose();
go();

function go()
{
    var db = new sqlite3.Database(':memory:');
    db.serialize(function () {
            db.run("CREATE TABLE lorem (info TEXT)");
            var stmt = db.prepare("INSERT INTO lorem VALUES (?)");
            for (var i = 0; i < 10; i++) {
                    stmt.run("Ipsum " + i);
            }
            stmt.finalize();
            db.exec("SELECT rowid AS id, info FROM lorem", function () {
                    db.close();
                    go();
            });
    });
}

trace-addon-latency.d

#!/usr/sbin/dtrace -s

/*
 * trace-addon-latency.d: Run this script as:
 *
 *     dtrace -s trace-addon-latency.d -p PID
 *
 * to trace the latency of asynchronous operations kicked off by all add-ons in
 * the node process PID.  NOTE: this will instrument all native functions in all
 * add-ons, which can have a noticeable performance impact, depending on the
 * add-on.
 */

pid$target::uv_queue_work:entry
{
    callers[arg1] = ucaller;
    donefunc[arg1] = arg3;
    starttime[arg1] = timestamp;
}

pid$target:*.node::entry
/donefunc[arg0]/
{
    @[ufunc(callers[arg0])] = quantize(timestamp - starttime[arg0]);
    callers[arg0] = 0;
    starttime[arg0] = 0;
    donefunc[arg0] = 0;
}

trace-addon-latency.out

  node_sqlite3.node`node_sqlite3::Database::Work_BeginClose(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |                                         167
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@                8529
           16384 |@@@@@@@                                  2326
           32768 |@@@@@@@@                                 2740
           65536 |                                         26
          131072 |                                         0

  node_sqlite3.node`node_sqlite3::Database::Work_BeginExec(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@@@@@@@@@@@@@@@                        6018
            8192 |@@@@@                                    1590
           16384 |@                                        302
           32768 |@@@@@@@@@@@@@@@@@                        5722
           65536 |                                         156
          131072 |                                         0

  node_sqlite3.node`node_sqlite3::Database::Work_BeginOpen(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@                                        286
            8192 |@@@@@@@@@@@@@@@@@@                       6064
           16384 |@@                                       642
           32768 |@@@@@@@@@@@@@@@@@@@                      6701
           65536 |                                         94
          131072 |                                         1
          262144 |                                         0

  node_sqlite3.node`node_sqlite3::Statement::Work_BeginPrepare(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@@@                                    3604
            8192 |@@@@@@@@@@@@                             8376
           16384 |@@                                       1341
           32768 |@@@@@@@@@@@@@@@@                         10831
           65536 |@@@@@                                    3423
          131072 |                                         2
          262144 |                                         0

  node_sqlite3.node`node_sqlite3::Statement::Work_BeginRun(node_sqlite3::Statement::Baton*)
           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@@@@@@@@@@@@@@@@@@@@@                  87884
            8192 |@@@@@                                    19478
           16384 |@@                                       8692
           32768 |@@@@@@@@                                 32160
           65536 |@                                        3455
          131072 |                                         0