Logzillas: to mirror or stripe?

The Hybrid Storage Pool integrates flash into the storage hierarchy in two specific ways: as a massive read cache and as fast log devices. For read cache devices, Readzillas, there’s no need for redundant configurations; it’s a clean cache so the data necessarily also resides on disk. For log devices, Logzillas, redundancy is essential, but how that translates to their configuration can be complicated. How to decide whether to stripe or mirror?

ZFS intent log devices

Logzillas are used as ZFS intent log devices (slogs in ZFS jargon). For certain synchronous write operations, data is written to the Logzilla so the operation can be acknowledged to the client quickly before the data is later streamed out to disk. Rather than the milliseconds of latency for disks, Logzillas respond in about 100μs. If there’s a power failure or system crash before the data can be written to disk, the log will be replayed when the system comes back up, the only scenario in which Logzillas are read. Under normal operation they are effectively write-only. Unlike Readzillas, Logzillas are integral to data integrity and they are relied upon for data integrity in the case of a system failure.

A common misconception is that a non-redundant Logzilla configuration introduces a single point of failure into the system, however this is not the case since the data contained on the log devices is also held in system memory. Though that memory is indeed volatile, data loss could only occur if both the Logzilla failed and the system failed within a fairly small time window.

Logzilla configuration

While a Logzilla doesn’t represent a single point of failure, redundant configurations are still desirable in many situations. The Sun Storage 7000 series implements the Hybrid Storage Pool, and offers several different redundant disk configurations. Some of those configurations add a single level of redundancy: mirroring and single-parity RAID. Others provide additional redundancy: triple-mirroring, double-parity RAID and triple-parity RAID. For disk configurations that provide double disk redundancy of better, the best practice is to mirror Logzillas to achieve a similar level of reliability. For singly redundant disk configurations, non-redundant Logzillas might suffice, but there are conditions such as a critically damaged JBOD that could affect both Logzilla and controller more or less simultaneously. Mirrored Logzillas add additional protection against such scenarios.

Note that the Logzilla configuration screen (pictured) includes a column for No Single Point of Failure (NSPF). Logzillas are never truly a single point of failure as previous discussed; instead, this column refers to the arrangement of Logzillas in JBODs. A value of true indicates that the configuration is resilient against JBOD failure.

The most important factors to consider when deciding between mirrored or striped Logzillas are the consequences of potential data loss. In a failure of Logzillas and controller, data will not be corrupted, but the last 5-30 seconds worth of transactions could be lost. For example, while it typically makes sense to mirror Logzillas for triple-parity RAID configurations, it may be that the data stored is less important and the implications for data loss not worthy of the cost of another Logzilla device. Conversely, while a mirrored or single-parity RAID disk configuration provides only a single level of redundancy, the implications of data loss might be such that the redundancy of volatile system memory is insufficient. Just as it’s important to choose the appropriate disk configuration for the right balance of performance, capacity, and reliability, it’s at least as important to take care and gather data to make an informed decision about Logzilla configurations.

Posted on December 9, 2009 at 11:31 am by ahl · Permalink
In: Fishworks

4 Responses

Subscribe to comments via RSS

  1. Written by Marcelo Leal
    on December 9, 2009 at 2:53 pm

    Hello Adam!
    If i did understand correctly what you are saying, you are making a parallel between the probability of two disk failures (same mirror), and two failures in sequence: slog and controller (7410 head). It is that right? Well, if that is right, who wants to have a triple mirror configuration, should mirror the slog devices.
    So, my question is: the 7410 storage has a sla like 99.995% ? In that calculation you are accounting the probability about the first scenario (the two disk failures)? Because seems to me that two disks failures are more easy to happen than a failure of a controller after the slog. The last one i think is so small, that is better to have the performance rather than the redundancy.

  2. Written by Christopher George
    on December 13, 2009 at 10:33 am

    Thank you for answering in detail, a question we get asked
    quite frequently. As you said, logzillas are integral to data
    integrity so defining a ZFS Intent Log (ZIL) best practice is
    For the high-end segment, we take this one step further and
    recommend that the ZIL not be hard disk based. This avoids not
    only the well known reliability issues of spinning disk but the
    complexity of the RAID based solutions required to compensate.
    Although we do believe NAND (flash) based SSD’s offer very
    tangible benefits over disk, the ultimate matching of technology
    to purpose leads to a mirrored NVRAM based solution. By
    nature of the self-contained PCIe card design, each offers a
    completely independent controller in addition to storage.
    Christopher George
    DDRdrive LLC

  3. Written by Adam Leventhal
    on December 13, 2009 at 10:04 pm

    @Marcelo A weakness of this post was that it was too qualitative, but the quantitative data was not immediately available to me and I wanted to make some information available. Once the data has been collected and analyzed, I’ll post another update that examines those issues in more depth.
    @Chris Agreed! The requirements for a Logzilla — small capacity, low latency — match very well with NV-DRAM. Indeed, the SSD that we use in the Sun Storage 7000 series combines a fair bit of NV-DRAM with flash in order to achieve much lower latency than other SSDs. PCIe doesn’t work for the 7310 and 7410 because for clustering we require Logzillas to be dual-attached so that both heads can see them. While it doesn’t have the performance of PCIe, this is why we’ve chosen SAS as the interconnect for Logzillas today.

  4. Written by Christopher George
    on December 14, 2009 at 1:50 am

    Excellent point. A PCIe based SSD (DDRdrive X1) is not a solution for
    dual-attached 7310/7410 clustering. With the inherent benefits of a
    native PCIe implementation and recent standout products such as the
    Sun F20 PCIe card, I wonder if such a limitation could be addressed
    in the future?
    Christopher George
    DDRdrive LLC

Subscribe to comments via RSS