DTracing in Anger

My Macbook has becomeso sluggish that it feels like I’m typing ove a 9600 baud modem aagn. Or 2400. It’s alo droping keystokes – which is irritatng as hll – so please forgive theapparent tyos and mistakes. It comes and goes each minute, so thiswhole post isn’t too bad.

Usually I cn see what’s wrng using my Top 10 DTrace scripts for Mac OS X, which have a focus on disk and file system issues. That’s not the casehere.

Since this isinterferig wth localapplications, specifically, my keystokes as I type in vim, my suspects are CPU related. top(1) and DTrace profiling tells me this running:

# dtrace -n 'profile-99 { @[execname] = count(); } tick-1s { trunc(@, 5); printa(@); trunc(@); }'
dtrace: description 'profile-99 ' matched 2 probes
CPU     ID                    FUNCTION:NAME
  4  16339                         :tick-1s
  configd                                                          39
  firefox                                                          46
  Google Chrome H                                                  55
  SystemUIServer                                                   96  ← who r u??
  kernel_task                                                     444

Ok, so lets start with SystemUIServer, although I have no idea what it is, and there is no man page. A quick Internet search tells me it manages menu bar popups, and device removal (CDROMs, DVDs, USB). Suggestions include:

That doesn’t sound scientific. The only suggested root-cause I saw wasthat it had a “memory leak”.

Lets look using DTrace.

First is to profile the on-CPU usage:

# dtrace -n 'profile-97 /execname == "SystemUIServer"/ { @[ustack()] = count(); }'
dtrace: description 'profile-97 ' matched 1 probe
dtrace: error on enabled probe ID 1 (ID 28: profile:::profile-97): invalid address (0x118ca2f00) in action #2
dtrace: error on enabled probe ID 1 (ID 28: profile:::profile-97): invalid address (0x7fff50b4b1f0) in action #2
dtrace: error on enabled probe ID 1 (ID 28: profile:::profile-97): invalid address (0x7fff50b4a130) in action #2
dtrace: error on enabled probe ID 1 (ID 28: profile:::profile-97): invalid address (0x7fff50b4afc0) in action #2
[...]

Ow. This was supposed to sample on-CPU user-level stacks, but I’ve hit the profiling stack-trace bug currently present on Mac OS X. Once this is fixed, this will be a lot easier to debug, as I can quickly roll Flame Graphs.

Now you get to see how I work around this on Mac OS X.

One level of stack works:

# dtrace -n 'profile-97 /execname == "SystemUIServer"/ { @[ustack(1)] = count(); }'
dtrace: description 'profile-97 ' matched 1 probe
^C
[...]
              CoreFoundation`CFRelease+0x46
                9

              libsystem_kernel.dylib`__workq_kernreturn+0xa
               10

              libobjc.A.dylib`objc::DenseMap, objc::DenseMapInfo >::LookupBucketFor(objc_object* const&, std::__1::pair*&) const+0x3a
               11

              libsystem_c.dylib`OSAtomicCompareAndSwap64Barrier$VARIANT$mp+0x8
               12

              libsystem_kernel.dylib`kevent+0xa
               26

              libsystem_kernel.dylib`mach_msg_trap+0xa
              151

This tells me mach_msg_trap() was on-CPU more than anyone else. Really wish I could see the full sttack trace, but can’t with profiling due to that bug.

I can with the pid provider. This will begin to slow the target due to the ovrheads of pid tracing and user-level stack collection, but given how woefully slow this laptop is already, I don’t mind.

# dtrace -n 'pid$target::mach_msg_trap:entry { @[ustack()] = count(); }' -p 177
dtrace: description 'pid$target::mach_msg_trap:entry ' matched 2 probes
^C
[...]
              libsystem_kernel.dylib`mach_msg_trap
              libsystem_kernel.dylib`mach_msg+0x46
              IOKit`io_iterator_next+0x4c
              IOKit`IOIteratorNext+0x11
              SystemConfiguration`findMatchingInterfaces+0xa4
              SystemConfiguration`_SCNetworkInterfaceCreateWithEntity+0x261
              SystemConfiguration`SCNetworkServiceGetInterface+0xab
              CoreWLAN`copyActiveWLANNetworkServices+0x115
              CoreWLAN`copyPrimaryActiveWLANNetworkService+0xc
              CoreWLAN`-[CWIPMonitor networkServiceID]+0x64
              CoreWLAN`-[CWIPMonitor ipv4StateConfig]+0x4d
              CoreWLAN`-[CWIPMonitor ipv4Routable]+0x21
              AirPort`0x000000011287e25f+0x30b
              Foundation`__NSThreadPerformPerform+0xe1
              CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__+0x11
              CoreFoundation`__CFRunLoopDoSources0+0xf5
              CoreFoundation`__CFRunLoopRun+0x315
              CoreFoundation`CFRunLoopRunSpecific+0x122
              HIToolbox`RunCurrentEventLoopInMode+0xd1
              HIToolbox`ReceiveNextEventCommon+0x164
              676

Many stacks like the above were printed. They look truncated – they should go all the way back to thread creation.

Increasing stack levels:

# dtrace -x ustackframes=100 -n 'pid$target::mach_msg_trap:entry { @[ustack()] = count(); }' -p 177
dtrace: description 'pid$target::mach_msg_trap:entry ' matched 2 probes
[...]
              libsystem_kernel.dylib`mach_msg_trap
              libsystem_kernel.dylib`mach_msg+0x46
              IOKit`io_iterator_next+0x4c
              IOKit`IOIteratorNext+0x11
              SystemConfiguration`findMatchingInterfaces+0xa4
              SystemConfiguration`_SCNetworkInterfaceCreateWithEntity+0x261
              SystemConfiguration`SCNetworkServiceGetInterface+0xab
              CoreWLAN`copyActiveWLANNetworkServices+0x115
              CoreWLAN`copyPrimaryActiveWLANNetworkService+0xc
              CoreWLAN`-[CWIPMonitor networkServiceID]+0x64
              CoreWLAN`-[CWIPMonitor ipv4StateConfig]+0x4d
              CoreWLAN`-[CWIPMonitor ipv4Routable]+0x21
              AirPort`0x000000011287e25f+0x30b
              Foundation`__NSThreadPerformPerform+0xe1
              CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__+0x11
              CoreFoundation`__CFRunLoopDoSources0+0xf5
              CoreFoundation`__CFRunLoopRun+0x315
              CoreFoundation`CFRunLoopRunSpecific+0x122
              HIToolbox`RunCurrentEventLoopInMode+0xd1
              HIToolbox`ReceiveNextEventCommon+0xa6
              HIToolbox`BlockUntilNextEventMatchingListInMode+0x3e
              AppKit`_DPSNextEvent+0x2ad
              AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]+0x80
              AppKit`-[NSApplication run]+0x205
              SystemUIServer`0x000000010f0c747c+0x76
              libdyld.dylib`start
              SystemUIServer`0x1
             3010

That’s better.

Airport? CoreWLAN? ipv4Routable? Is SystemUIServer doing something dumb with my WiFi?

Since that’s just one stack out of hundreds, it’s hard to know if this is the problem or a problem, without reading all the other stacks.

Fortunately there is a quick way to processall those stacks: Flame Graphs – not for thread samples, but for these function tracing stacks. Should still work.

# dtrace -x ustackframes=100 -n 'pid$target::mach_msg_trap:entry { @[ustack()] = count(); } tick-30s { exit(0); }' -p 177 -o out.SystemUIServer_stacks
dtrace: description 'pid$target::mach_msg_trap:entry ' matched 3 probes
# ~/Git/FlameGraph/stackcollapse.pl out.SystemUIServer_stacks2 | ~/Git/FlameGraph/flamegraph.pl > out.SystemUIServer_stacks.svg

The resulting SVG is:

Click for the interactive version. This confirmed that the mach_msg_trap()s were mostly AirPort.

I can dig around te stack more using dynamic tracing; taknig a quick look for strings to see if it sheds any light:

# dtrace -n 'pid$target::io_service_get_matching_services:entry { @[copyinstr(arg1)] = count(); }' -p 177
dtrace: description 'pid$target::io_service_get_matching_services:entry ' matched 1 probe
^C
  <dict><key>BSD Name</key><string>en1</string></dict>               51
  <dict><key>BSD Name</key><string>en2</string></dict>               51
  <dict><key>IOProviderClass</key><string>IOSerialBSDClient</string><key>IOTTYBaseName</key><string>Bluetooth-Modem</string></dict>               51
  <dict><key>IOProviderClass</key><string>IOSerialBSDClient</string><key>IOTTYDevice</key><string>Bluetooth-Modem</string></dict>               51
  <dict><key>BSD Name</key><string>en0</string></dict>               52
  <dict><key>BSD Name</key><string>fw0</string></dict>               52
  <dict><key>IOProviderClass</key><string>IOBluetoothHCIController</string></dict>              102
  <dict><key>IOProviderClass</key><string>IO80211Interface</string></dict>
       1917

Given that I’m not really doing anything with the WiFi, that it’s so busy managing interfaces is starting to look like a bug. Possibly a memory leak too, as SystemUIServer is at 600 Mbytes RSS.

Opening the WiFi menu results in muchspinning beachball. As anexerient, I’l tun off WiFi.

Hooray! I can type again!

Turning WiFi back on returned the issue soon after. I did resort to the “killall SystemUIServer”, which appears to have returned the laptop back to normal, while on WiFi.

I can keeping digging further in the stack when this issue returns, and browse the parts of this that are open source on Apple’s website, and probably put my finger on the root cause. But this has been a distraction – I have real work to get back to doing.

And hey WindowServer – don’t make me sic the pony on you, too.

Print Friendly
Posted on November 14, 2012 at 10:04 pm by Brendan Gregg · Permalink
In: DTrace · Tagged with: , ,

2 Responses

Subscribe to comments via RSS

  1. Written by Pierre Lebeaupin
    on November 15, 2012 at 1:29 am
    Permalink

    Sometimes I try to figure out what’s wrong when Mac OS X slows down, using DTrace, but I’m not very good at it. For people who might not be so inclined as to run DTrace commands and interpret the results themselves, Apple has added a command called sysdiagnose (to call as root) that will run a number of collection commands (including a few DTrace scripts, I believe) and package the result so that you can send it to Apple. It can run without a parameter, but also take a param giving the pid or process name that you are, ahem, suspecting; for some reason, since Safari 5 I have kept a Terminal window open with a sysdiagnose for “WebProcess” ready to be run at a moment’s notice…

    (notice there was no real need to increase the stack levels, everything below and including Foundation`__NSThreadPerformPerform+0xe1 is boring)

  2. Written by Nikolam
    on December 6, 2012 at 7:06 am
    Permalink

    This is quite the same thing I sometimes get into (slow unresponsive keystrokes or even repeating keystrokes, ususlly solved by restarting).
    I suspect is might be something with maybe motherboard overheating or keyboard chip or something elese, like you found have something to do with wifi driver.

    But I run Illumos based Openindiana 151a7 on Dell Latitude D620, Core2Duo.
    So it is also intel machine, maybe similar hardware to that Mac,
    wifi is Intel 3945ABG (Golan) and it Is On during those incidents.

Subscribe to comments via RSS