Eric Schrock's Blog

Month: October 2004

There’s a nice little review of the w2100z over at AnandTech. Primarily a hardware review, it also contains loads of benchmarks between several different operating systems, including the Java Desktop System. It was nice to see Solaris 10 thrown into the mix, but it only showed up in a few benchmarks, as they were primarily geared towards the 64-bit OSes. The good news is that amd64 support will hit the streets in a month or two, so maybe we can force our way into a few more benchmarks – although the amount of amd64-specific performance tuning we’ve done is next to nil.

It was also refreshing to see Solaris presented as a desktop OS. I’m certainly not used to seeing Solaris showing up in mp3 encoding benchmarks. Hopefully this will become more common in the future – we’ve put a lot of effort into making Solaris run well on small systems and commodity hardware. Java Desktop System version 3 is now part of Solaris, and will show up in the next Solaris Express release.

One more thing:

86 years

Enough said.

With the tidal wave of features that is Solaris 10, it’s all too easy to miss the less visible enhancements. Once of these “lost features” is microstate accounting turned on by default for all processes. Process accounting keeps track of time spent by each thread in various states – idle, in system mode, in user mode, etc. This is used for calculating the usage percentages you see in prstat(1) as well as time(1) and ptime(1). Historically there have been two ways of doing process accounting:

  1. Clock-based accounting

    Virtually every operating system in existence has a clock function – a periodic interrupt used to perform regular system activity1. On Solaris, the clock function does a number of simple tasks, mostly having to do with process and CPU accounting. Clock based accounting examines each CPU, and increments per-process statistics depending on whether the current thread is in user mode, system mode, or idle. This uses statistical sampling (taking a snapshot every 10 milliseconds) to come up with a picture of system resource usage.

  2. Microstate accounting

    With microstate accouning, the counters are updated with each state transition as they occur in real time. This results in much more accurate, if more expensive, accounting information. Previously, you could turn this on through the /proc interfaces (proc(4)) using the PR_MSACCT control. This has since become a no-op; you cann’t disable microstate accounting in Solaris 10.

The clock-based sampling method has several drawbacks. One is that it is only a guess for rapidly transitioning threads. If you’re going between system and user mode faster than once every 10 milliseconds, the clock will only account for a single state during this time period. Even worse is the fact that it can miss threads entirely if it catches a CPU in an idle state. If you have a thread that wakes up every 10 milliseconds and does 9 milliseconds of work before going to sleep, you will never account for the fact that it is using 90% of the CPU2.

These ‘phantom’ threads can hide from traditional process tools. The now-infamous gtik2_applet2 outlined in the DTrace USENIX paper is a good example of this. These inaccuracies also affect the fair share scheduler (FSS(7)), which must make resource allocation decisions based on the accuracy of these accounting statistics.

With microstate accounting, all of these artifacts disappear. Since we do the calculations in situ, it’s impossible to “miss” a transition. The reason we haven’t always relied on microstate accounting is that it used to be a fairly expensive process. Threads transition between states with high frequency, and with each transition we need to get a timestamp to determine how long we spent in the last state.

The key obversation that made this practical is that the expensive part is not reading the hardware timestamp. The expensive part is the conversion to nanoseconds – clock ticks are useless when doing any sort of real accounting. The solution was to use clock ticks for all the internal accounting, and only do the conversion to nanoseconds when actually requested. With a few other tricks, we were able to reduce the impact to virtually nil. You can see the difference in micro-benchmark performance of short system calls (such as getpid(2)), but it’s unnoticeable in any normal system call (like read(2)), and nonexistent in any macro benchmark.

All the great benefits of microstate acocunting at a fraction of the price.


1 The clock function is actually a consumer of the cyclic subsystem – a nice callout mechanism written by Bryan back in Solaris 8. Certainly worth a post of its own in the future.

2This is especially true on x86, because historically the hardware clock has been the source of all timer activity. This means every thread ran lockstep with the clock with respect to timers and wakeups. This has been addressed in Solaris 10 with the introduction of arbitrary precision timer interrupts, which makes the cyclic subsystem far more interesting on x86 machines (Solaris SPARC has had this functionality for a while).

Just wanted to welcome fellow kernel engineers Jonathan Adams (smf guru and debugger extraordinaire) and Joe Bonasera (amd64 guru and x86 VM master) to the madness that is blogs.sun.com. Blog Away!

Those students in the class paying attention may have noticed that Solaris on amd64 is quietly coming to life. While it’s been thriving in development for a while now, it’s finally flexing its muscles publicly. A few sightings at blogs.sun.com:

As for myself, I have been pitching in with amd64 development here and there; mostly getting the userland debugging tools functional. Recently, I ported ZFS to amd64 – which was really only a bunch of Makefile and compiler changes (the joys of porting a well-designed 64-bit capable subsystem). There are, of course, others doing most of the work – Bryan, Bart, and the many anonymous team members too busy doing the heavy lifting to maintain an external blog.

For you Solaris Express junkies out there, amd64 support will probably not be available in the next release. Look for it in a later release or S10 FCS.

All told, amd64 support is pretty exciting. The hardware is blazingly fast (and cheap!), and we finally have an OS that can really take advantage of all that it has to give. With our resident hardware genius behind the wheel of our amd64 platforms, we’re going to be coming out with some absolutely killer hardware. 2005 will be an interesting year…

Recent Posts

April 21, 2013
February 28, 2013
August 14, 2012
July 28, 2012

Archives