load average video

Here’s another short video about a performance tool: uptime(1M), specifically the load average numbers that it prints:

# uptime
  7:49pm  up 203 day(s), 23:39,  1 user,  load average: 4.43, 4.48, 4.54

These are the same load averages printed by top(1), prstat(1) and w(1). I’ll discuss load averages for Solaris-based operating systems.

uptime: Load Averages

The load average is a measure of how many threads are running or wanting to run on CPU, averaged over time intervals.

The three numbers given are called the 1, 5 and 15 minute load averages, which by comparison can show if CPU load is getting better or worse. If the load average is higher than the CPU count, then there aren’t enough CPUs to service the threads, and some will be waiting (what we call “CPU dispatcher queue latency”). If the load average is lower than the CPU count, it (probably) means that threads could run on-CPU when they wanted.

It’s a rough measure of CPU load; consider it briefly before moving onto other tools.

Such as mpstat(1M) for examining activity per-CPU.

A system running with a higher load average than its CPU count is evidence of a performance issue. Threads will be waiting for their turn on-CPU, causing latency.

The video is 22 minutes as I explain how the numbers are actually calculated, including some suprising details:

“1 minute load average” isn’t really 1 minute, nor really an average.

To confirm how these worked, I setup an experiment that ran a single busy thread on an idle server, and watched the load averages creep up over time. They should settle to a value of 1.0, and you may expect the three numbers to reach 1.0 after 1, 5 and 15 minutes. But that’s not what actually happens; here’s a plot:

I added vertical gray lines at the 1 and 5 minute points, which showed that their averages had only reached about 0.61. The average is actually based on an exponential decay function, and it takes much, much longer than 1, 5 and 15 minutes to settle.

For more details on load average, see the video above, the source code behind the metrics, and the book Solaris Performance and Tools.

Print Friendly
Posted on June 24, 2011 at 8:50 am by Brendan Gregg · Permalink
In: Performance · Tagged with: , , ,

One Response

Subscribe to comments via RSS

  1. Written by Baron Schwartz
    on July 10, 2011 at 7:19 pm
    Permalink

    On Linux, the numbers are computed slightly differently than other OSs; load average includes processes that are blocked on IO. (I’m not aware of any other OS that accounts this way.) But it’s the same thing — the average only becomes an average as time goes to infinity, and it’s really exponentially decaying. I think that part of the algorithm is pretty much the same across OS’s.

Subscribe to comments via RSS