New metrics on no.de

You may have noticed that the no.de service got a big facelift last week. The new version of the software has a lot of new features, among them some pretty substantial improvements to Analytics. Recall that we already had metrics for Node.js HTTP server and client operations, garbage collection, socket ops, filesystem ops, and CPU executions. We’ve now got a slew more.

First we have system calls. Whether a program is writing to disk, using the network, or talking to other processes on the same system, it’s making system calls. This metric shows you who’s making which syscalls and how long they’re taking (as a heatmap).

Then we have basic resource usage: CPU, memory, and network:

CPU: aggregated CPU usage shows what percent of total CPU your instance is using. This number’s a bit tricky to interpret: our compute nodes have more than 1 CPU, so your app can be using more than 100% of CPU within a given second. But your app can be compute-bound even if it’s only at 100% if you only have 1 main thread (as Node.js does) and it’s saturating 1 CPU.

CPU: aggregated wait time measures how much time threads in your instance spend ready to run but waiting for a CPU. Some amount of friction is expected, but if this number gets high it likely means the system is under unusually high load.

Memory: resident set size shows how much DRAM your instance is using. There’s a separate metric for maximum resident set size which is generally constant and shows how much memory your instance is allowed to use. When your instance exceeds its max RSS, you will see non-zero excess memory reclaimed, indicating that the system is paging out some of your instance’s memory. Your app won’t experience performance problems from this unless you also see non-zero pages paged in. When you see these, your app will be slow because it has to wait for memory to be brought in from disk.

Network: bytes and packets sent/received are pretty self-explanatory. These measure network throughput.

For reference, there are a number more arcane (but sometimes very useful) new metrics as well:

  • Memory: (maximum) virtual memory reserved: shows how much anonymous memory your instance is using (or allowed to use)
  • Filesystem: logical read/write operations and logical bytes read/written: like the existing filesystem logical operations metrics, but can be much lighter weight.
  • ZFS: disk space used, unused, and quota.

Most importantly, we now support predicating and changing decompositions.  So while looking at system calls decomposed by application name, right-click on a particular application you’re interested in and you can select only the system calls from that application and then decompose by something else, like system call name.  This is an incredibly powerful feature for iterating on a performance investigation, but the details will have to wait for another post.

Thanks to everyone at Joyent for the hard work on the new no.de standup. We hope you all find the new metrics useful and we look forward to getting your feedback!

JavaScript Lint on SmartOS


Photo by Sam Fraser-Smith

Back at Fishworks, we used a tool called JavaScript Lint (JSL) for static code analysis. You may know that lint was originally written to identify potential semantic problems in C code, like use of uninitialized variables or blocks of unreachable code. Lint warnings are usually static checks that could reasonably have been compiler warnings, but for whatever reasons those checks didn’t make it into the compiler.

JSL helps us catch similar errors in JavaScript: undeclared variables, variables hiding other variables in the same scope, etc. There exist several JavaScript linters out there, including Crockford’s JSLint and Google’s Closure Linter. But there are two relatively unique properties about JSL:

  • It does not conflate style with lint. Style refers to arbitrary code formatting rules (like leading whitespace rules). Lint refers to actual program correctness issues (like missing “break” statements inside a switch). The line is certainly fuzzy, as in the case of JavaScript semicolon style, but that’s why:
  • It’s highly configurable. Each individual warning can be turned on or off, and warnings can be overridden for individual lines of code. This is essential for cases where potentially dangerous behavior is being deliberately used (carefully, of course).

We’ve been using JSL on Cloud Analytics since day 1, but until recently we were using a hacked-up build I created back in November just to make forward progress. As we started using it more, it became clear that we needed to be able to build JSL reliably, which was not trivial on SmartOS because the old version of SpiderMonkey that JSL bundles doesn’t build on Solaris 10 or later out of the box. I worked out how to build it (see below), but in doing so I decided it wasn’t worth maintaining the complexity of the existing build system.  JSL hasn’t been changed much in the last many months, so I created a github fork of JSL where I removed everything that clearly wasn’t necessary for JSL and replaced the whole build system with a couple of Makefiles. The result is much less portable, but it builds on SmartOS and I expect it can be made to build on MacOS (note: see update below) and Linux with few modifications. If you want to build the existing JavaScript Lint subversion tree on Illumos, here’s what you have to do:

  • Since Sun Studio is no longer available, you’ll want to build with gcc.
  • Remove the file spidermonkey/src/lock_SunOS.s.  According to the comments, this file is only needed with Sun Studio, and it won’t build with the GNU assembler.  If you don’t remove the file, it will be picked up by a Makefile wildcard (not the line in Makefile.ref that appears to refer to it!).
  • Copy the file below into spidermonkey/src/config/SunOS5.11_i86pc.mk.
  • In the root, run “python setup.py build”.
  • This should work except for the very last step in the build.  The build system runs something like this:
    gcc -shared build/temp.solaris-2.11-i86pc-2.4/javascriptlint/pyspidermonkey/pyspidermonkey.o build/temp.solaris-2.11-i86pc-2.4/javascriptlint/pyspidermonkey/nodepos.o -Lbuild/spidermonkey -ljs -o build/lib.solaris-2.11-i86pc-2.4/javascriptlint/pyspidermonkey.so

    On my system, this produces hundreds of linker errors because gcc is trying to use the GNU linker instead of the OS linker. If you replace the “gcc” in that line with “ld” and run it by hand, it should work.

Here’s the contents of SunOS5.11_i86pc.mk:

#
# Config stuff for SunOS5.11
#

AS = as
CC = gcc
CCC = g++
CFLAGS +=  -Wall -Wno-format
RANLIB = echo
OS_CFLAGS = -DXP_UNIX -DSVR4 -DSYSV -DSOLARIS -DHAVE_LOCALTIME_R
OS_LIBS = -lsocket -lnsl -ldl
HAVE_PURIFY = 1
MKSHLIB = $(LD) -G
# Use the editline library to provide line-editing support.
JS_EDITLINE = 1

Of course, these instructions are very specific to the build environment, so YMMV. It took me a while to figure out the right settings in SunOS5.11_i86pc.mk, so I wanted to make that available in case anyone else is trying to build JSL on SmartOS, Illumos or other Solaris-based systems.  That said, if you don’t care about remaining close to the original source, you may as well just use my fork on github. Even if it doesn’t build out of the box in your environment, the Makefiles should be far easier to understand and modify.

Update: The github fork now builds on MacOSX, too, though you need to install Python first because the one shipped with OSX doesn’t include headers.

Distributed Web Architures @ SF Node.js Meetup

At the Node Meetup here at Joyent‘s offices a few weeks ago I gave a brief talk about Cloud Analytics as a distributed web architectureMatt Ranney of Voxer and Curtis Chambers of Uber also spoke about their companies’ web architectures. Thanks to Jackson for putting the videos together. All around it was a great event!