Eric Schrock's Blog

Month: March 2007

As I continue down the path of improving various aspects of ZFS and Solaris platform integration, I found myself in the thumper (x4500) fmd platform module. This module represents the latest attempt at Solaris platform integration, and an indication of where we are headed in the future.

When I say “platform integration”, this is more involved than the platform support most people typically think of. The platform teams make sure that the system boots and that all the hardware is supported properly by Solaris (drivers, etc). Thanks to the FMA effort, platform teams must also deliver a FMA portfolio which covers FMA support for all the hardware and a unified serviceability plan. Unfortunately, there is still more work to be done beyond this, of which the most important is interacting with hardware in response to OS-visible events. This includes ability to light LEDs in response to faults and device hotplug, as well as monitoring the service processor and keeping external FRU information up to date.

The sfx4500-disk module is the latest attempt at providing this functionality. It does the job, but is afflicted by the same problems that often plague platform integration attempts. It’s overcomplicated, monolithic, and much of what it does should be generic Solaris functionality. Among the things this module does:

  • Reads SMART data from disks and creates ereports
  • Diagnoses ereports into corresponding disk faults
  • Implements an IPMI interface directly on top of /dev/bmc
  • Responds to disk faults by turning on the appropriate ‘fault’ disk LED
  • Listens for hotplug and DR events, updating the ‘ok2rm’ and ‘present’ LEDs
  • Updates SP-controlled FRU information
  • Monitors the service process for resets and resyncs necessary information

Needless to say, every single item on the above list is applicable to a wide variety of Sun platforms, not just the x4500, and it certainly doesn’t need to be in a single monolithic module. This is not meant to be a slight against the authors of the module. As with most platform integration activities, this effort wasn’t communicated by the hardware team until far too late, resulting in an unrealistic schedule with millions of dollars of revenue behind it. It doesn’t help that all these features need to be supported on Solaris 10, making the schedule pressure all the more acute, since the code must soak in Nevada and then be backported in time for the product release. In these environments even the most fervent pleas for architectural purity tend to fall on deaf ears, and the engineers doing the work quickly find themselves between a rock and a hard place.

As I was wandering through this code and thinking about how this would interact with ZFS and future Sun products, it became clear that it needed a massive overhaul. More specifically, it needed to be burned to the ground and rebuilt as a set of distinct, general purpose, components. Since refactoring 12,000 lines of code with such a variety of different functions is non-trivial and difficult to test, I began by factoring out different pieces individually, redesigning the interfaces and re-integrating them into Solaris on a piece-by-piece basis.

Of all the functionality provided by the module, the easiest thing to separate was the IPMI logic. The Intelligent Platform Management Interface is a specification for communicating with service Pprocessors to discover and control available hardware. Sadly, it’s anything but “intelligent”. If you had asked me a year ago what I’d be doing at the beginning of this year, I’m pretty sure that reading the IPMI specification would have been at the bottom of my list (right below driving stakes through my eyeballs). Thankfully, the IPMI functionality needed was very small, and the best choice was a minimally functional private library, designed solely for the purpose of communicating with the Service Processor on supported Sun platforms. Existing libraries such as OpenIPMI were too complicated, and in their efforts to present a generic abstracted interface, didn’t provide what we really needed. The design goals are different, and the ON-private IPMI library and OpenIPMI will continue to develop and serve different purposes in the future.

Last week I finally integrated libipmi. In the process, I eliminated 2,000 lines of platform-specific code and created a common interface that can be leveraged by other FMA efforts and future projects. It is provided for both x86 and SPARC, even though there are currently no supported SPARC machines with an IPMI-capable service processor (this is being worked on). This library is private and evolving quite rapidly, so don’t use it in any non-ON software unless you’re prepared to keep up with a changing API.

As part of this work, I also created a common fmd module, sp-monitor, that monitors the service processor, if present, and generates a new ESC_PLATFORM_RESET sysevent to notify consumers when the service processor is reset. The existing sfx4500-disk module then consumes this sysevent instead of monitoring the service processor directly.

This is the first of many steps towards eliminating this module in its current form, as well as laying groundwork for future platform integration work. I’ll post updates to this blog with information about generic disk monitoring, libtopo indicators, and generic hotplug management as I add this functionality. The eventual goal is to reduce the platform-specific portion of this module to a single .xml file delivered via libtopo that all these generic consumers will use to provide the same functionality that’s present on the x4500 today. Only at this point can we start looking towards future applications, some of which I will describe in upcoming posts.

I’ve been heads down for a long time on a new project, but occasionally I do put something back to ON worth blogging about. Recently I’ve been working on some problems which leverage sysevents (libsysevent(3LIB)) as a common transport mechanism. While trying to understand exactly what sysevents were being generated from where, I found the lack of observability astounding. After poking around with DTrace, I found that tracking down the exact semantics was not exactly straightforward. First of all, we have two orthogonal sysevent mechanisms, the original syseventd legacy mechanism, and the more recent general purpose event channel (GPEC) mechanism, used by FMA. On top of this, the sysevent_impl_t structure isn’t exactly straightforward, because all the data is packed together in a single block of memory. Knowing that this would be important for my upcoming work, I decided that adding a stable DTrace sysevent provider would be useful.

The provider has a single probe, sysevent:::post, which fires whenever a sysevent post attempt is made. It doesn’t necessarily indicate that the syevent was successfully queued or received. The probe has the following semantics:

# dtrace -lvP sysevent
ID   PROVIDER            MODULE                          FUNCTION NAME
44528   sysevent           genunix                    queue_sysevent post
Probe Description Attributes
Identifier Names: Private
Data Semantics:   Private
Dependency Class: Unknown
Argument Attributes
Identifier Names: Evolving
Data Semantics:   Evolving
Dependency Class: ISA
Argument Types
args[0]: syseventchaninfo_t *
args[1]: syseventinfo_t *

The ‘syseventchaninfo_t’ translator has a single member, ‘ec_name’,which is the name of the event channel. If this is being posted via the legacy sysevent mechanism, then this member will be NULL. The ‘syeventinfo_t’ translator has three members, ‘se_publisher’, ‘se_class’, and ‘se_subclass’. These mirror the arguments to sysevent_post(). The following script will dump all sysevents posted to syseventd(1M):

#!/usr/sbin/dtrace -s
#pragma D option quiet
BEGIN
{
printf("%-30s  %-20s  %s\n", "PUBLISHER", "CLASS",
"SUBCLASS");
}
sysevent:::post
/args[0]->ec_name == NULL/
{
printf("%-30s  %-20s  %s\n", args[1]->se_publisher,
args[1]->se_class, args[1]->se_subclass);
}

And the output during a cfgadm -c unconfigure:

PUBLISHER                       CLASS                 SUBCLASS
SUNW:usr:devfsadmd:100237       EC_dev_remove         disk
SUNW:usr:devfsadmd:100237       EC_dev_branch         ESC_dev_branch_remove
SUNW:kern:ddi                   EC_devfs              ESC_devfs_devi_remove

This has already proven quite useful in my ongoing work, and hopefully some other developers out there will also find it useful.

Recent Posts

April 21, 2013
February 28, 2013
August 14, 2012
July 28, 2012

Archives