Modifying USDT providers with translated arguments

I originally wrote this post in 2012 but didn’t get around to publishing it until now. It’s a pretty technical post about an arcane bug in an uncommonly modified component. If you’re not already pretty familiar with translators for USDT providers, this post is probably not useful for you.

Several years ago, a colleague here at Joyent was trying to use the Node HTTP server probes and saw this:

dtrace: invalid probe specifier node57257:::http-client-request { trace(args[0]->url); }: in action list: failed to resolve native type for args[0]

It was the start of a nightmare: this bug was introduced by code I committed back in June that year on a project I knew at the time was not a lot of code, but extremely tricky to get right. I obviously hadn’t gotten it quite right. Over the next two days, we iterated on several possible solutions. As we fleshed out each one, we discovered something new that exposed an important case that the solution didn’t handle. I’m documenting what we learned here in case anyone runs into similar issues in the future.

Updating USDT providers

An important thing to keep in mind when designing a USDT provider is future extensibility: you may want to add new telemetry later, so you’ll want to be able to extend the native and translated structures in a way that allows the translators to work on both old and new binaries. Be sure to leave zeroed padding in the native (application) structs so that you can easily extend them later. If you do that, you can just add new fields to the native type and translated type and have the translator check for a NULL value. End of story.

Adding fields to existing providers

If you find yourself wanting to add a field to an existing provider that doesn’t have padding (and want to preserve compatibility with older binaries), one approach is to:

  • Modify the structure to lead with a fixed-width sentinel or version field.
  • Then have the translator check the value of this field to determine whether it’s looking at an old- or new-format structure and translate appropriately.
  • You’ll save yourself a lot of pain by ensuring that the sentinel field is the same on both 32-bit and 64-bit processes. If the old structure started with a 64-bit pointer on 64-bit systems, the sentinel should probably be 64-bit as well so you can be more sure that it doesn’t accidentally look like part of a 64-bit pointer. (This is a problem for the node provider, which treats anything less than 4096 in the first 32-bit field as the sentinel. Unfortunately, these values could be contained inside a 64-bit pointer, and the translator would guess wrong.)

Making deeper changes to existing providers

Sometimes you need to make deeper changes, like renaming structures used as probe arguments (perhaps because you need to break a single common structure into two, as was done with the node provider). In that case, remember that:

  • The structures defined in the provider must also be defined in the library file. If you fail to do this, DTrace will barf trying to instrument a particular process because it fails to resolve the native type. However, it may seem to work because if you instrument all processes, it may find one of the old binaries that doesn’t reference the unresolved type. Most confusingly, it will also successfully instrument new binaries and use one of the translators, even though the source type doesn’t match.
  • Be sure to test by using the translator automatically rather than using the “xlate” operator. If you use “xlate”, you won’t see the above problem because you end up casting the argument to the correct type, so DTrace never has to resolve the (wrong) native type. That hides the problem.
  • Beware of this DTrace compiler bug.

You can see the solution we came up with in all its glory.

Summary

If you make changes to an existing USDT provider with translated arguments, be sure to consider and test:

  • new binaries (i.e. using the new provider file) on systems with an updated library file
  • old binaries (i.e. using the old provider file) on systems with an updated library file
  • existing DTrace scripts (important for stable tools like Cloud Analytics). Remember, the point of USDT is stability!
  • with “xlate” as well as with automatic translators
  • instrumenting individual old and new processes, not just “app*:::”

You may be able to ignore new binaries on an older system because users can probably use “dtrace -L” to get the new file.

I had thought I knew a lot about how USDT providers with translated arguments worked, but debugging this issue reminded me how complex this problem is intrinsically. Despite its warts, it works very well.