USENIX LISA 2012: Performance Analysis Methodology

At USENIX LISA 2012, I gave a talk titled Performance Analysis Methodology. This covered ten performance analysis anti-methodologies and methodologies, including the USE Method. I wrote about these in the ACMQ article Thinking Methodically about Performance, which is worth reading for more detail. I’ve also posted USE Method-derived checklists for Solaris- and Linux-based systems.

The video of the talk is on the LISA site, and the slides are below, also available as a PDF.

I’ve summarized the methodologies in the talk below.

Methodology Summaries

Blame-Someone-Else Anti-Method:

  1. Find a system or environment component you are not responsible for
  2. Hypothesize that the issue is with that component
  3. Redirect the issue to the responsible team
  4. When proven wrong, go to 1

Streetlight Anti-Method:

  1. Pick observability tools that are
    • familiar
      found on the Internet
      found at random
  2. Run tools
  3. Look for obvious issues

Ad Hoc Checklist Method:

  1. ..N. Run A, if B, do C

Problem Statement Method:

  1. What makes you think there is a performance problem?
  2. Has this system ever performed well?
  3. What has changed recently? (Software? Hardware? Load?)
  4. Can the performance degradation be expressed in terms of latency or run time?
  5. Does the problem affect other people or applications
(or is it just you)?
  6. What is the environment? What software and hardware is used? Versions? Configuration?

Scientific Method:

  1. Question
  2. Hypothesis
  3. Prediction
  4. Test
  5. Analysis

Workload Characterization Method:

  1. Who is causing the load? PID, UID, IP addr, …
  2. Why is the load called? code path
  3. What is the load? IOPS, tput, type
  4. How is the load changing over time?

Drill-Down Analysis Method:

  1. Start at highest level
  2. Examine next-level details
  3. Pick most interesting breakdown
  4. If problem unsolved, go to 2

Latency Analysis Method:

  1. Measure operation time (latency)
  2. Divide into logical synchronous components
  3. Continue division until latency origin is identified
  4. Quantify: estimate speedup if problem fixed

USE Method:

For every resource, check:

  1. Utilization
  2. Saturation
  3. Errors

Stack Profile Method:

  1. Profile thread stack traces (on- and off-CPU)
  2. Coalesce
  3. Study stacks bottom-up
Print Friendly
Posted on December 13, 2012 at 2:51 pm by Brendan Gregg · Permalink
In: Performance · Tagged with: , , , ,

One Response

Subscribe to comments via RSS

  1. Written by Greg
    on December 20, 2012 at 3:45 am
    Permalink

    Hi Brendan,

    I thought I would stop by and say a big overdue thank you for all your inspirational work over the years. So much so that when I’m looking at a system performance problem on Solaris, I am thinking would Brendan do?. The wonders of Dtrace demonstrated many a time in your in-depth informative blogs drove me to get the Dtrace book which is now my first port of call when I need to extract something meaningful from the system or application processes. You are the leading light for some of us who have been puzzling over performance problems on Solaris for so long. So much work in terms of performance has been done since the first edition of Solaris Internals book in 2000!. Please carry on blogging and making video, we need a shepard to lead the way :-)

    Greg

Subscribe to comments via RSS