dtrace.conf 2012!
April 3rd, San Francisco: information and registration
Visualizing Process Execution
In Visualizing Process Snapshots I showed processes and their parent-child hierarchy over time, using snapshots of process information. This approach misses short-lived processes that occur between the snapshots. Here I’ll fill in the gaps using system tracing, visualizing all processes that occurred.
Snapshots vs Tracing
The left animation visualizes a server using process snapshots, similar to what I’ve done before, but this time by taking a snapshot every second instead of every minute, and displaying it at a normal rate instead of sped up. On the right is the same server during the same time, visualizing the processes using system wide tracing. This is a two second loop (click for larger). Tracing has captured many short-lived processes that were missed by process snapshots.
In the following sections I’ll explore visualizing trace data using mostly images so that they can be studied more easily.
Process Snapshot
I’ll begin with a process snapshot (ps -eo zone,ppid,pid,pcpu,comm) and render it as before:
This is a cloud application server that includes a web server, a database, and a ruby application.
Process Tracing
Now using system wide tracing to capture every new process for five minutes, adding them with transparent colors, dashed lines and the run time in milliseconds (click for larger):
While tracing the number of httpd and ruby18 processes were static. Short-lived processes where launched for munin (monitoring software) including munin-cron by the system scheduler (cron), and a number of “ps” processes were launched from PassengerHelperAgent.
Thermal Cam
The previous images indicate process type using color hue. The following uses only the red hue and adjusts saturation based on the run time for the short-lived processes (redder == longer). The persistent server processes have been colored flat gray (non-transparent).
Now the longer runnning short-lived processes stand out along with their children, allowing the critical path of latency to be followed. The longest running was munin-cron, launched from a cron shell (sh) taking 815 ms. Two of its children, munin-graph and fgrep, both took 311 ms, which suggests they ran together via a Unix pipe. This highlights that run time – the time from a process birth to its exit – is inclusive of wait time, in this case, the piped fgrep was likely waiting on munin-graph. A different type of time could be measured and visualized – CPU time – to reflect only the time that processes spent running on a processor.
Watching Cron
Here’s another example. This server was traced for five minutes:
This shows a cron scheduled task (status.sh) running every minute.
Highlighting run time:
The run time for wget is likely to be waiting on a HTTP request to complete. The nawk run time is the same as it is waiting for wget to complete, via a Unix pipe. The number of spokes from cron reflects the duration of tracing: five instances of a per-minute cron were traced.
Shell Hell
Here’s another system, which from a process snapshot looks like an ordinary Apache web server. Donning the thermal goggles for five minutes (click for larger):
Woah – the web server is dwarfed by a cloud of short-lived processes in the bottom left.
This cloud of processes was called from nrpe → sh → check_apache2. These are from the nagios monitoring software: nrpe is the nagios remote plugin executor, and check_apache2 is a plugin to check the health of the Apache web server. Zooming in:
The run time for check_apache2 is over 2.4 seconds, which is spent calling 500 processes. Some quick digging revealed the shell code responsible:
cpu_load="$(cpu_load=0; ps -Ao pcpu,args | grep "$path_binary/apache2" \
| awk '{print $1}' | while read line
do
cpu_load=`echo "scale=3; $cpu_load + $line" | bc -l`
[...]
!
This was being executed every 10 minutes, on every node in a large scale cloud environment. Fortunately the 2.4 second run time wasn’t all burning CPU time, as it includes a one second call to “sleep” (look closely). It was also one of the worst cases seen.
In the full image was another hot path on the right, which shows a check_zone_cpu process taking over one second. Its children include prstat, ghead, ggrep and gawk. These all had a similar run time, as the GNU shell commands waited on a slow prstat to complete (a “prstat -Z”).
Two Seconds
I had seen another dense cloud of short lived processes from nagios, which I turned into the two second animation at the start of this post. As a thermal image, those two seconds look like this:
It looks even worse across a five minute span.
This is using the neato placement algorithm from graphvis; it’s possible that other algorithms could do a slightly better job with this, however, when there are more nodes than can be placed on a canvas, some overlap is inevitable. This presents a limit to this type of visualization. A workaround was to reduce the time span, like the two second image shown above.
Application Servers
Apart from monitoring services like munin and nagios, application servers can also call short lived processes as part of their normal operation.
Ruby on Rails
Processes named mongrel_rails can be seen in this five minute image, with child processes running between 3 and 4 seconds.
Sphinx
The default behavior of the sphinx search server forks a process on every request, which led to the most processes I’ve seen during five minutes (~17,000). It’s another case where five minutes spans too many processes, and the graphing algorithm can’t place them in a legible way. Here is 1/5th of a second of the same trace data as an animation, played at 1/10th normal speed (slow motion).
Server
As an example of scaling this visualization, here is an entire physical server that is running many small OS virtualized instances:
The large zone in the middle includes calls to jinf (Joyent server info command), which itself makes use of child processes. We’re planning to rewrite that and reduce the process count.
Tracing Technique
These images and animations were achieved by beginning with a process snapshot:
# ps -eo zone,ppid,pid,pcpu,comm
ZONE PPID PID %CPU COMMAND
zone-00001 0 0 0.0 sched
zone-00001 0 1 0.0 /sbin/init
zone-00001 0 2 0.0 pageout
[...]
Then adding details using system tracing – specifically a DTrace script – to add process event details in a similar format to the “ps” output, but with extra columns including nanosecond timestamps:
# ./processtrace.d zone-00001 25822 25824 0 cpu fork - - 33352623644379691 zone-00001 25822 25824 0 awk exec - - 33352623645616308 zone-00001 25822 25824 0 awk exit 42 ms 33352623686801252 [...]
System wide tracing can log all process events, capturing information for all short-lived processes. A problem can be the data collection overhead – some tracing tools I wouldn’t dare run in production as it would severely slow the target. DTrace has been designed to be production safe, and handles high rates of events with minimal overhead by using per-CPU buffers and asynchronous buffer draining. The rate of process creation – which I’d usually expect to be less than one thousand per second – should be handled by DTrace with negligible overhead.
Image Notes
These images were created using some awk scripts to convert the process and trace data into graphviz format, and then rendered. The thermal images increased color saturation (red) and reduced transparency for the longer running processes. Each image had the color scale adjusted to make best use of the range, with between 1 and 4 seconds set to be the reddest color, based on the longest run time captured in the image. For the server image, processes beyond 10 seconds run time were not colored (at that point, it’s not really a short-lived process).
Tracing for long periods created a new problem that I have side stepped for now: the PID numbers can wrap. I hit this by accident a couple of times, which created bizarre connections across zones as graphviz created more than one PPID → PID arrow.
Animation Notes
The snapshot animation was just that – imaging the processes that were running at each one second instant. Each frame of the tracing animations includes all process data from the previous to the current frame, so that no information was lost. The tracing data itself is a series of process events with nanosecond timestamps, which were turned into frame data by some Perl, which loaded all the events and then stepped through time generating each frame. Because of this, the number of frames and the frame rate can be whatever is desired. All of this post-processing was done remotely – minimum overhead was induced on the production server to collect the raw data.
As mentioned in my previous post, getting graphviz to generate animations is a little tricky, and involves rendering all elements all the time, with the non-visible ones marked with the “invis” style.
Conclusion
In my first post on Visualizing the Cloud, I showed how capturing process snapshots at different scales was interesting and useful. Then in Visualizing Process Snapshots, I included multiple snapshots over time to see how processes change. However, I knew that this was only part of the story — we were missing the shorter-lived processes that occurred between snapshots. This post showed how to complete the picture by using tracing.
The actual data behind these visualizations is still straightforward: process status (ps) plus system-wide tracing (DTrace). Apart from the virtues of DTrace (and graphviz), the only other special technology at play here is OS virtualization, which facilitates capturing process snapshots and trace data across the cloud. The physical server image above was captured using one ps(1) and one dtrace(1M) command, which could see across all thirty active OS instances on a server running SmartOS.
These new visualizations allow us to analyze the use of short-lived processes – a software inefficiency that can cause latency and poor performance. We are also able to identify other unexpected processes that aren’t visible from process snapshots taken with, eg, top.
These visualizations are still experimental, intended to explore new potential tools for cloud computing analysis. With DTrace, the possibilities are vast: these visualizations were built upon only three DTrace probes (proc:::create/exec-success/exit), while the total number of probes available on these servers is over seventy thousand.
In: Performance · Tagged with: cloud, dtrace, experimental, performance, visualizations
Visualizing Process Snapshots
In Visualizing the Cloud I showed processes and their parent-child hierarchy, across a cloud environment, exploring patterns at different scales. Here I’ll take this a little further and look at processes over time. These are experimental visualizations shared to explore ideas.
It’s Alive!

Many who have seen this have described it as “creepy” — looking like a beating heart or some microscopic life form.
It’s showing a busy Apache web server, with frames (process snapshots) taken at 60 second intervals. The above animation spans 20 minutes, shown as a 2 second loop, creating an effect similar to time lapse photography. The size of the processes varies based on recent CPU usage, with larger meaning busier, and the color shows the type of process based on its name (process names that I didn’t assign a type for default to gray with a black border). The processes grow and shrink as their recent CPU usage varies over time, which looks like it happens across all of the httpds suggesting Apache is balancing work evenly.
Application Server
While the previous animation was somewhat interesting, many servers look more like this:

It’s common in the cloud environment that a single OS instance runs a single application, in this case a web server. Not much changes over time, as seen by taking snapshots each minute.
Multi Application Server
This is an OS instance running multiple applications (click for higher res):

These include a web server and a mail server. This animation is much busier, as processes are created to dispatch work and then terminate. These include php5-cgi processes, which shows the more expensive method of serving PHP, instead of using mod_php as an Apache module. (The earlier beating heart may have been a PHP site using mod_php – we can’t tell from the process name alone.)

A problem with animations (as I’ve mentioned before) is that identifying details relies on memory and patience. One frame in particular shows short-lived processes appear in the lower left of the animation, launched by an imapd. These are shown on the right: a bash process launches two greps, a sort and an awk — typical shell scripting commands (they may have been launched together via pipes). It can be important to be aware of short-lived processes like these when investigating server performance or troubleshooting odd behavior.
Time as Transparency
By merging together the frame details, we can see all the processes at once:
The first frame is shown as before, with subsequent frame processes added with transparent colors and dashed lines. This way, the original layout is included in opaque colors, and additional information from the time domain is represented as transparent elements. The length of each arrow has been stretched a little further to better separate process names (click for high res).
This image includes 45 process snapshots at one minute intervals (the animation only included 20 of those snapshots). The density and patterns created are related to the activity and how many snapshots are included. Consider how this might look if it included a web server restart. In that case, two clouds of httpd processes would be visible, the second launched from a transparent master.
The use of transparency reveals more detail in areas of dense processes, as process names can be seen through overlapped processes. These overlaps also increase the color saturation, which provides another visual clue to the density.
Long Exposure
Another approach is to show opaque colors for the processes that were present in all frames, and transparent colors for processes that were in a partial set. The degree of transparency is weighted based on the number of frames that include that process – fewer frames meaning more transparent, and less time that the process was running during the sampling period.
This creates an effect similar to long exposure photography.
Instead of using the first frame to size processes, the size here reflects the average CPU usage across all samples.
Ordinary Example
Not all merged process visualizations are as pretty as the ones above; here is the single application server from earlier, adding time as transparent elements:

And as a long exposure:

The difference between the types can be compared. An advantage of the first (time as transparency) is that a complete process snapshot can be seen in opaque, which can be studied and understood before considering the transparent time domain. For example, it shows that the “master” process was running one “pickup” process only, for the first snapshot. Comprehending the process tree in the long exposure version is less clear. It provides a better sense of how long the shorter-lived processes were running, based on their color saturation, but it’s less clear what a single valid snapshot would look like (eg, “master” may have run two “pickup” processes half the time, and none the other half).
Server
As an example of scaling this visualization, here is an entire physical server that is running many small OS virtualized instances (mostly web servers):
This is the time-as-transparency version. Here the snapshots have picked up some shorter-lived processes launched from cron.
Snapshot Limitations
Taking process snapshots every 60 seconds – or even every second – may miss many short-lived processes altogether. Processes can execute and complete in less than ten milliseconds, as is especially the case with shell scripting built upon small process-based commands.
Process snapshots only provide a coarse idea of what has been running.
Any tool that takes a process snapshot suffers this problem. Taking snapshots more frequently increases the collection overhead from reading /proc often. This may be more noticeable in an OS virtualized cloud environment, where /proc is larger than usual as it contains processes from multiple tenants.
Comments
I showed three different experimental ways to visualize process snapshots: as a time lapse animation, as an image by adding time as transparent elements, and as a long exposure image. The data being visualized is basic: process snapshots (“ps -eo zone,ppid,pid,pcpu,comm”).
The images above were generated using graphviz, with some awk to convert the ps output into neato graphs. The most tricky part was building the animations: graphviz initially did layout for each frame completely differently, often rotating the entire scene by 90 degrees or more. The workaround was to render all frame elements in every frame, and applying an “invis” style to the non-present elements for each frame. (It’s as fun as it sounds.)
For cloud environments this type of visibility has been made easy for me at Joyent as we are using OS virtualization as our primary cloud architecture. This allows ps from the global zone on each physical server to inspect details from all OS instances directly, without needing to connect to agents for that information. To visualize the entire cloud, I need to connect to the physical servers only.
In my next post, I’ll fill in the gaps that are missed by the snapshot approach.
In: Performance · Tagged with: cloud, experimental, performance, visualizations
Performance Analysis talk at SCALE10x
Last week I gave a talk at the Southern California Linux Expo (SCALE) titled “Performance Analysis: New Tools and Concepts from The Cloud”. There was a great turnout for my talk, which was videoed by Deirdré (video below). The conference was a lot of fun too.
In my talk I presented six problems of performance analysis and introduced new tools that were developed to solve these. This allowed many concepts to be covered, which I summarized at the end, including how cloud computing is presenting new problems for performance observability. DTrace on SmartOS was used as an example way to provide observability, although the concepts and ideas presented are applicable to many environments (this was a Linux conference, too).
4-part playlist, 60 minutes (YouTube):
Here is the slide deck.
This talk referenced some work that I’ve previously blogged about, including the CPU utilization heatmaps, Flame Graphs and File System Latency analysis.
UPDATE: Viewers have noticed that YouTube’s experimental audio transcription doesn’t handle my accent very well. See the humorous full transcript from when the videos were first posted.
Thanks to the SCALE organizers for a great conference, and all who attended!
In: Performance · Tagged with: performance, slides, video

















