The Observation Deck

Search
Close this search box.

Catching disk latency in the act

December 31, 2008

Today, Brendan made a very interesting discovery about the potential sources of disk latency in the datacenter. Here’s a video we made of Brendan explaining (and demonstrating) his discovery:



This may seem silly, but it’s not farfetched: Brendan actually made this discovery while exploring drive latency that he had seen in a lab machine due to a missing screw on a drive bracket. (!) Brendan has more details on the discovery, demonstrating how he used the Fishworks analytics to understand and visualize it.

If this has piqued your curiosity about the nature of disk mechanics, I encourage you to read Jon Elerath’s excellent ACM Queue article, Hard disk drives: the good, the bad and the ugly! As Jon notes, noise is a known cause of what is called a non-repeatable runout (NRRO) — though it’s unclear if Brendan’s shouting is exactly the kind of noise-induced NRRO that Jon had in mind…

19 Responses

  1. Well, it was bound to happen sometime: Bryan’s incredible energy and hyper-speed spoken output has finally driven his co-workers crazy. 😉
    That’s interesting–exactly how much performance degradation were you seeing? And I mean in transactions or megabytes per second, not time lost due to co-workers rolling around the floor laughing…

  2. Derek,
    Check out Brendan’s blog entry where he has the graphs posted; the hit to throughput is tremendous. (Throughput drops from ~1 GB/sec to practically nothing while Brendan is shouting.) The next experiment is obviously to take the biggest amp we can find and see if sustained loud noise will induce sustained high latency and low bandwidth. Another question we don’t know the answer to: is this due to frequency or volume or some combination? Science demands answers! I’m only half kidding, as there is one question which is legitimately on my mind: can the high noise levels found in most data centers be potentially responsible for NRROs? And can high noise levels shorten drive life? If so, are there ways to configure a datacenter such that this issue is either exacerbated or eliminated? Or is there just something magical about Brendan’s primal scream?

  3. Interesting… I wonder if shouting at a single disk would result in as dramatic a drop in performance? (Now I’m going to have to try that.) Also, it looks like Brendan actually touches the drive brackets when he’s shouting. There’s another whole branch of research right there!
    >>can the high noise levels found in most data centers be potentially responsible for NRROs?
    There’s a thought, although I’d guess (as a complete failure at high school physics) that Brendan’s screaming is more intense and focused than the overall hum in a datacenter. The drives are already stabilized to some degree by way of the bracket and the chassis, so how much further can one stabilize a drive without burying it in bricks?
    Do you folks ever sleep? 😉 Keep us posted if you discover anything and have a great (if hoarse) new year.

  4. So does this mean I should go out and buy some noise cancellation headphones for our storage?
    A more serious question: What would be the effect of vibrations of a datacentre next to a major highway or railway when traffic shakes the datacentre/racks? Or when a bus, truck or car hits a manhole cover next to the datacentre.

  5. Wow, I think why this amazes me so much is that it makes sense when you think about it, but to actually have the instruments to measure it… wow. Nice work guys.

  6. Karl … depends on the underground .. there is a reason why chip fabs can´t build everywhere … there are examples of defect numbers correlated with the time of the day, as of the urban train near of a fab.
    I would assume, that you could measure the effect as well in hard disk. But Brendans scream seems to be very effective.
    Maybe three harddisks are a formidable seismometer ;

  7. Now you found this on JBOD, have you tested this on other types of arrays? If so what ones?
    Just a suggestion:
    1. Use a db meter to measure your scream.
    2. Use a db meter to measure the sound of your systems
    Does location of the storage device matter if between other servers or between other storage devices makes a difference?
    Is storage device location near other type of datacenter infrastructure causing noise that impact storage like you have proven such as being near diesel generators.
    Sounds like to me for sure we need to keep storage devices away from exterior vibrations that could impact data lost. So question should we be thinking about how we layout our datacenters when exterior noise can cause data disruption?

  8. > A more serious question: What would be the effect of vibrations of a datacentre next to a
    > major highway or railway when traffic shakes the datacentre/racks? Or when a bus, truck or
    > car hits a manhole cover next to the datacentre.
    I don’t think it’s a concern for two reasons: The axis the vibration is being delivered won’t be focused like it was in this case, but more importantly the energy level of the vibration experienced by the drive would be a *lot* less, particularly at the frequencies that are likely to cause problems.
    I’ll let you in on a secret – the fishworks lab is directly on the corner of two busy streets, with bus stops below. It also has a large bus terminus on the opposite side of the street! It’s not an issue.
    > Sounds like to me for sure we need to keep storage devices away from exterior vibrations
    > that could impact data lost. So question should we be thinking about how we layout our
    > datacenters when exterior noise can cause data disruption?
    Keep in perspective the amount of energy and frequency Brendan was directing (with cupped hands touching the drive) versus the energy level experienced in any typical building due to vibration – it’s not a problem unless your disk drive is in front of the speaker stack at a Van Halen concert.

  9. Yes a Van Halen or any Hard Rock Concert certainly could pose a similar problem.
    Even Jimi Hendricks "Purple Haze" will send some good vibrations. 🙂
    Military environments would certainly need to understand those kind of impact.
    Besides Datacenters are noisy enough and if you listen to that video it is proof of that without the extra screaming at your disks. 🙂
    Also if you do download that video you need a VLC complaint player. You can get one from http://download.videolan.org/pub/videolan/vlc/0.9.8a/vlc-0.9.8a.tar.bz2
    Then you can use the YourTube Video Download Tool to get the video.
    http://www.downloadyoutubevideos.com/
    Phillip

  10. Derek, there is a throughput drop for one second – but that’s for the disk subsystem from ZFS, not the delivered performance over NFS. Since this is a heavy streaming write test, ZFS is asynchronously flushing data from DRAM to disk, but the clients don’t wait for that to complete. So whether that takes longer may not affect the client application performance at all (it can a little in this case, as it is a constant streaming write.) As for synchronous writes – the 7000 series supports Logzilla, which is SSD and should be immune to vibration (I assume – I’ve never shouted at an SSD to find out. 🙂
    It’s also worth noting that we believe that disks are more vulnerable to vibration during writes than reads, since for writes the disk must write the data properly – for reads the data must just pass the sector CRC.
    We doubt that data center noise can cause this – the video is shot in a very noisy data center such that I needed to shout the entire time! And we never notice the tell-tale outlier disk latency caused by vibration just from our data center alone (even when the blade server in the neighboring rack is doing POST, which sounds like a jet aircraft.) We only think this happens if you cup your hands to disks and shout very loudly, as they are doing a heavy write workload.
    Still, I’d rather have Analytics to confirm if vibration is an issue or not – which is what the video is about. People may have extreme circumstances where vibration is an issue, but lack the tools to identify it.

  11. Brendan,
    I would think SSD would never be an issue with this because it is requires no moving parts. Still when you have heavy I/O to such a point where even caching no longer makes sense to use. I have seen folks turn OFF caching simply because of this.
    So was caching is turned on, I would think you get the vibration issue regardless verses having a drive that is ENTIRELY SSD which vibrations should never occur.
    The only thing you have to worry about with vibrations is how well secure the memory is in the Drive Unit itself. Why? It would depend on the position of the memory simms in the drives. Example: Memory place in flat like on motherboards vs being vertically placed on a daughter board configuration. Or if there is no socket configuration but completely all solder to the motherboard be a better solution. Most SSD are still using 200 to 240 pin DIM sockets. Simply put if they are not secure enough that if a tech doesn’t locked them down can be a reason why
    memory errors to occur given in other NON-Data center environments.
    I tend to agree with you about the testing. I don’t know what I/O tool your using to generate the I/O, could be dd, bonnie, Medusa tools, iometer, vdbench or others that are available could better test the drive and see if you get the same results.

  12. Try generating a simple sine wave into a .wav file and play that out your laptop into a IPod boombox… it will save Brendan’s voice, and permit more reproducible experiments. I’m curious
    as to which frequencies cause the problem; the acceleration due to sound waves is clearly preventing the heads from settling…
    – Bart

  13. I’ve seen prototype disk arrays where the disks next to the fan had worse performance than the other disks due to vibration issues, too. Took a while to figure out the details there, a pity we didn’t have Fishworks then.

Leave a Reply

Recent Posts

November 18, 2023
November 27, 2022
October 11, 2020
July 31, 2019
December 16, 2018
September 18, 2018
December 21, 2016
September 30, 2016
September 26, 2016
September 13, 2016
July 29, 2016
December 17, 2015
September 16, 2015
January 6, 2015
November 10, 2013
September 3, 2013
June 7, 2012
September 15, 2011
August 15, 2011
March 9, 2011
September 24, 2010
August 11, 2010
July 30, 2010
July 25, 2010
March 10, 2010
November 26, 2009
February 19, 2009
February 2, 2009
November 10, 2008
November 3, 2008
September 3, 2008
July 18, 2008
June 30, 2008
May 31, 2008
March 16, 2008
December 18, 2007
December 5, 2007
November 11, 2007
November 8, 2007
September 6, 2007
August 21, 2007
August 2, 2007
July 11, 2007
May 20, 2007
March 19, 2007
October 12, 2006
August 17, 2006
August 7, 2006
May 1, 2006
December 13, 2005
November 16, 2005
September 13, 2005
September 9, 2005
August 21, 2005
August 16, 2005

Archives