Visualizing the Cloud
I’ve worked on visualizations for a while, most recently with heatmaps for Joyent’s Cloud Analytics. While we’re using and enhancing these right now, we are also in a great position to continue developing new visualizations for cloud computing, given:
- Easy observability into all nodes via zones, and deeper analysis using DTrace.
- Large datacenters running interesting cloud computing workloads from a large variety of customers.
Before I get into deeper analysis with DTrace, I’ll show something simple that has has proven interesting so far.
The goal was to visualize the entire cloud to get a sense of what is running. Basic process details were collected: PID, PPID, process name, and recent percent CPU% (fetched using “ps -o zone,pid,ppid,pcpu,comm”). This was then graphed (using graphvis for now).
Examining just a few processes to begin with (click any of these images for the full version):
Parent-child relationships are shown with arrows. The size of each process reflects recent CPU usage: bigger means busier. The color identifies the type of process: system processes are shown in light blue. These details can be adjusted – the process size could show memory footprint, for example.
This is what a typical cloud computing node looks like (also known as a “zone”, or Joyent “SmartMachine“), in this case, a web server:
The master process for the web server can be seen surrounded by its worker processes, all shown in red. The worker processes are drawn larger, since they are busier on CPU doing work to respond to web requests. In the middle is a gray oval representing the “init” process of the zone (the real customer zone name has been scrubbed here). The full set of system processes that make up the zone can also be seen, with their relationship.
Now scaling to show an entire physical server, which is running nine zones (plus one “global” zone):
Green is for language related processes, such as php, python, java, etc. Pink shows database processes, including MySQL, memcached, Riak, etc. The green/red zone is a Ruby/Apache server, and the top left zone has both mysqld and memcached. The largest pink process at the top is a busy MySQL server.
Previously we could look at lists of processes using ps or ptree to see the same data, but getting a quick sense of what’s running – and what’s busy – from pages of text output took a lot more time. Consider examining the same data on a rack of servers – this could become hundreds of pages of text output.
Visualizing all the zones in a rack:
More zone types pop out and can be identified quickly. The chain of five green circles is a Perl server, with five busy perl processes.
5. Availability Zone
Now for an entire Joyent “availability zone”, which consists of a fleet of racks in a datacenter:
It’s the first time we’ve seen every process that’s running on a single page. This includes over 300 servers and over 3500 zones. I could zoom this out a bit further to span an entire datacenter, and further still to span the entire company. Although, the full version of the above image is already way too big to share on this blog!
This image can be generated automatically to look for anomalies and changes in the cloud. We’ve made many discoveries so far, with the graphs often beautiful and unexpected.
One of the discoveries can be seen in the middle of the graph above: six large zones that appear as concentric circles. Here’s how they look zoomed in:
Our jaw dropped when we first saw this. What’s happened is that this zone is running a shell program via cron (system scheduler), that processes the result of getent. The getent process is stuck on an LDAP lookup that never completes, and so all its related processes are also stuck. Cron kept generating these mindlessly, until the zone had hit its process limit.
Fortunately these were old Joyent test zones that were not being used by a customer.
This is just the beginning: the data above is very simple, process details from “ps”. We’ve also been using DTrace to add detail to these process maps, which I hope to blog about when I get the time.
These are not yet part of the Joyent Cloud Analytics product; whether they will be depends on proving their usefulness in solving real problems. So far it’s looking promising: we are finding useful information quickly with these experimental visualizations.
In: Joyent · Tagged with: cloud, visualizations