Eric Schrock's Blog

Month: July 2008

Over the past few years, I’ve been working on various parts of Solaris platform integration, with an emphasis on disk monitoring. While the majority of my time has been focused on fishworks, I have managed to implement a few more pieces of the original design.

About two months ago, I integrated the libscsi and libses libraries into Solaris Nevada. These libraries, originally written by Keith Wesolowski, form an abstraction layer upon which higher level software can be built. The modular nature of libses makes it easy to extend with vendor-specific support libraries in order to provide additional information and functionality not present in the SES standard, something difficult to do with the kernel-based ses(7d) driver. And since it is written in userland, it is easy to port to other operating systems. This library is used as part of the fwflash firmware upgrade tool, and will be used in future Sun storage management products.

While libses itself is an interesting platform, it’s true raison d’etre is to serve as the basis for enumeration of external enclosures as part of libtopo. Enumeration of components in a physically meaningful manner is a key component of the FMA strategy. These components form FMRIs (fault managed resource identifiers) that are the target of diagnoses. These FMRIs provide a way of not just identifying that “disk c1t0d0 is broken”, but that this device is actually in bay 17 of the storage enclosure whose chassis serial number is “2029QTF0809QCK012”. In order to do that effectively, we need a way to discover the physical topology of the enclosures connected to the system (chassis and bays) and correlate it with the in-band I/O view of the devices (SAS addresses). This is where SES (SCSI enclosure services) comes into play. SES processes show up as targets in the SAS fabric, and by using the additional element status descriptors, we can correlate physical bays with the attached devices under Solaris. In addition, we can also enumerate components not directly visible to Solaris, such as fans and power supplies.

The SES enumerator was integrated in build 93 of nevada, and all of these components now show up in the libtopo hardware topology (commonly referred to as the “hc scheme”). To do this, we walk over al the SES targets visible to the system, grouping targets into logical chassis (something that is not as straightforward as it should be). We use this list of targets and a snapshot of the Solaris device tree to fill in which devices are present on the system. You can see the result by running fmtopo on a build 93 or later Solaris machine:

# /usr/lib/fm/fmd/fmtopo
...
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:serial=2029QTF0000000002:part=Storage-J4400:revision=3R13/ses-enclosure=0
hc://:product-id=SUN-Storage-J4400:chassis-id=22029QTF0809QCK012:server-id=:part=123-4567-01/ses-enclosure=0/psu=0
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:part=123-4567-01/ses-enclosure=0/psu=1
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=/ses-enclosure=0/fan=0
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=/ses-enclosure=0/fan=1
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=/ses-enclosure=0/fan=2
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=/ses-enclosure=0/fan=3
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:serial=2029QTF0811RM0386:part=375-3584-01/ses-enclosure=0/controller=0
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:serial=2029QTF0811RM0074:part=375-3584-01/ses-enclosure=0/controller=1
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=/ses-enclosure=0/bay=0
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:serial=5QD0PC3X:part=SEAGATE-ST37500NSSUN750G-0720A0PC3X:revision=3.AZK/ses-enclosure=0/bay=0/disk=0
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=/ses-enclosure=0/bay=1
...

To really get all the details, you can use the ‘-V’ option to fmtopo to dump all available properties:

# fmtopo -V '*/ses-enclosure=0/bay=0/disk=0'
TIME                 UUID
Jul 14 03:54:23 3e95d95f-ce49-4a1b-a8be-b8d94a805ec8
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:serial=5QD0PC3X:part=SEAGATE-ST37500NSSUN750G-0720A0PC3X:revision=3.AZK/ses-enclosure=0/bay=0/disk=0
group: protocol                       version: 1   stability: Private/Private
resource          fmri      hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:serial=5QD0PC3X:part=SEAGATE-ST37500NSSUN750G-0720A0PC3X:revision=3.AZK/ses-enclosure=0/bay=0/disk=0
ASRU              fmri      dev:///:devid=id1,sd@TATA_____SEAGATE_ST37500NSSUN750G_0720A0PC3X_____5QD0PC3X____________//scsi_vhci/disk@gATASEAGATEST37500NSSUN750G0720A0PC3X5QD0PC3X
label             string    SCSI Device  0
FRU               fmri      hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:serial=5QD0PC3X:part=SEAGATE-ST37500NSSUN750G-0720A0PC3X:revision=3.AZK/ses-enclosure=0/bay=0/disk=0
group: authority                      version: 1   stability: Private/Private
product-id        string    SUN-Storage-J4400
chassis-id        string    2029QTF0809QCK012
server-id         string
group: io                             version: 1   stability: Private/Private
devfs-path        string    /scsi_vhci/disk@gATASEAGATEST37500NSSUN750G0720A0PC3X5QD0PC3X
devid             string    id1,sd@TATA_____SEAGATE_ST37500NSSUN750G_0720A0PC3X_____5QD0PC3X____________
phys-path         string[]  [ /pci@0,0/pci10de,377@a/pci1000,3150@0/disk@1c,0 /pci@0,0/pci10de,375@f/pci1000,3150@0/disk@1c,0 ]
group: storage                        version: 1   stability: Private/Private
logical-disk      string    c0tATASEAGATEST37500NSSUN750G0720A0PC3X5QD0PC3Xd0
manufacturer      string    SEAGATE
model             string    ST37500NSSUN750G 0720A0PC3X
serial-number     string    5QD0PC3X
firmware-revision string       3.AZK
capacity-in-bytes string    750156374016

So what does this mean, other than providing a way for you to finally figure out where disk ‘c3t0d6’ is really located? Currently, it allows the disks to be monitored by the disk-transport fmd module to generate faults based on predictive failure, over temperature, and self-test failure. The really interesting part is where we go from here. In the near future, thanks to work by Rob Johnston on the sensor framework, we’ll have the ability to manage LEDs for disks that are part of external enclosures, diagnose failures of power supplies and fans, as well as the ability to read sensor data (such as fan speeds and temperature) as part of a unified framework.

I often like to joke about the amount of time that I have spent just getting a single LED to light. At first glance, it seems like a pretty simple task. But to do it in a generic fashion that can be generalized across a wide variety of platforms, correlated with physically meaningful labels, and incorporate a diverse set of diagnoses (ZFS, SCSI, HBA, etc) requires an awful lot of work. Once it’s all said and done, however, future platforms will require little to no integration work, and you’ll be able to see a bad drive generate checksum errors in ZFS, resulting in a FMA diagnosis indicating the faulty drive, activate a hot spare, and light the fault LED on the drive bay (wherever it may be). Only then will we have accomplished our goal of an end-to-end storage strategy for Solaris – and hopefully someone besides me will know what it has taken to get that little LED to light.

Recent Posts

April 21, 2013
February 28, 2013
August 14, 2012
July 28, 2012

Archives