My Sun Storage 7310 perf limits

As part of my role in Fishworks, I push systems to their limits to investigate and solve bottlenecks. Limits can be useful to consider as a possible upper bound of performance – as it shows what the target can do. I previously posted my results for the Sun Storage 7410, which is our current top performing product. Today the Sun Storage 7310 has been launched, which is an entry level offering that can be clustered for high availability environments like the 7410.

The following summarises the performance limits I found for a single 7310 head node, along with the current 7410 results for comparison:

As shown above, the 7310 has very reasonable performance in comparison to the high end 7410. (As of this blog post, I’ve yet to update my 7410 results for the 2009.Q2 software update, which gave about a 5% performance boost).

Next I’ll show the source of these results – screenshots taken from the 7310 using Analytics, to measure how the target NFS server actually performed. I’ll finish by describing the 7310 and clients used to perform these tests.

NFSv3 streaming read from DRAM

To find the fastest read throughput possible, I used 10 Gbytes of files (working set) and had 20 clients run two threads per client to read 1 Mbyte I/O through them repeatedly:

Over this 10 minute interval, we’ve averaged 1.08 Gbytes/sec. This includes both inbound NFS requests and network protocol headers, so the actual data transferred will be a little less. Still, breaking 1 Gbyte/sec of delivered NFS for an entry level server is a great result.

NFSv3 streaming read from disk

This time 400 Gbytes of files were used as the working set, to minimize caching in the 7310′s 16 Gbytes of DRAM. As before, 20 clients and 2 threads per client read through the working set repeatedly:

This screenshot from Analytics includes the disk bytes/sec, to confirm that this workload really did read from disk. It averaged 780 Mbytes/sec – a solid result. The 792 Mbytes/sec average on the network interfaces includes the NFS requests and network protocol headers.

NFSv3 streaming write to disk

To test streaming writes, 20 clients ran 5 threads per client performing writes with a 32 Kbyte I/O size:

While there were 477 Mbytes/sec on the network interfaces (which includes network protocol headers and ACKs), at the disk level the 7310 has averaged 1.09 Gbytes/sec. This is due to software mirroring, which has doubled the data sent to the storage devices (plus ZFS metadata). The actual data bytes written will be a little less than 477 Mbytes/sec.

NFSv3 max read IOPS

I tested this with the 7410, more as an experiment than of practical value. This is the most 1 byte reads that NFS can deliver to the clients, with the 10 Gbyte working set entirerly cached in DRAM. While 1 byte I/O isn’t expected, that doesn’t render this test useless – it does give the absolute upper bound for IOPS:

The 7310 reached 182,282 reads/sec, and averaged around 180K.

NFSv3 4 Kbyte read IOPS from DRAM

For a more realistic test of read IOPS, the following shows an I/O size of 4 Kbytes, and a working set of 10 Gbytes cached in the 7310′s DRAM. Each client runs an additional thread every minute (stepping up the workload), from one to ten threads:

This screenshot shows %CPU utilization, to check that this ramping up of workload is pushing some limit on the 7310 (in this case, what is measured as CPU utilization). With 10 threads running per client, the 7310 has served a whopping 110,724 x 4 Kbyte read ops/sec, and still has a little CPU headroom for more. Network bytes/sec was also included in this screenshot to double check the result, which should be at least be 432 Mbytes/sec (110,724 x 4 Kbytes), which it is.

Configuration

As a filer, I used a single Sun Storage 7310 with the following config:

It’s not a max config system – the 7310 can currently scale to 4 JBODs and have 2 sockets of quad-core Opteron.

The clients were 20 blades, each:

These are great, apart from the CPU clock speed – which at 1600 MHz is a little low.

The network consists of multiple 10 GbE switches to connect the client 1 GbE ports to the filer 10 GbE ports.

Conclusion

We’ve spent years working on maximizing performance of our highest end offering, the Sun Storage 7410. New family members of the Sun Storage 7000 series inherit most of this work, and really hit the ground running. When Adam Leventhal designed the 7310 product, we weren’t thinking of a system capable of 1 Gbyte/sec – but as the pieces came together we realized the performance could in fact be very good, and was. And it’s worth emphasizing – the results above are all from a single socket 7310; the 7310 can be configured with two sockets of quad core Opteron!

Print Friendly
Posted on May 27, 2009 at 6:00 am by Brendan Gregg · Permalink
In: Fishworks · Tagged with: , ,

10 Responses

Subscribe to comments via RSS

  1. Written by Dale Sides
    on May 27, 2009 at 1:08 pm
    Permalink

    Are there any performance tests using the 7110? All the performance information I am able to find so far is based on the 7310 and the 7410. I am interested in using the 7110 in a VMware ESX environment and I would like to see some performance numbers of how a small SQL Server Database running in ESX may perform using the 7110 as its primary storage.

  2. Written by William Usher
    on May 28, 2009 at 5:05 am
    Permalink

    I would like to second the request for these benchmarks with the 7110, we like to use them for small VMware projects and want to see how far they will scale up.
    Thanks.

  3. Written by bruce modell
    on May 29, 2009 at 7:47 am
    Permalink

    Brendan, where the 7310 test run on a single socket machine, or was the 2nd processor installed? How much different would performance be with a 2nd CPU. Also, why NFS3 tests versus NFS4?

  4. Written by Rich McClellan
    on May 31, 2009 at 12:59 pm
    Permalink

    The performance and Analytics are fantastic. It would be interesting to see how the 7310 and 7410 perform when doing lots of small writes over NFS.

  5. Written by Brendan Gregg
    on May 31, 2009 at 2:51 pm
    Permalink

    @Dale, I’ll see if I can get the basics for the 7110 posted; iSCSI for any platform is also on my todo list.
    @Bruce, yes, this was a single socket machine without the 2nd processor. If it had a 2nd processor and I wanted to do single socket testing, I’d have yanked it out (psradm to offline it isn’t sufficient; it could still serve memory I/O.) The second socket will help most (up to 2x) with CPU intensive workloads, such as small I/O, CIFS/iSCSI, compression on shares, and when enabling heavyweight Analytics. There should be some improvement to throughput bound workloads (like the ones I demo’d above) but I doubt it’d be near 2x; those workloads benefit better by improving to the bus architecture than with more CPU cycles.
    @Rich, the 7310/7410 should be great with small writes: either they are asynchronous and ZFS aggregates them into transaction groups, or they are synchronous (O_DSYNC) and Logzilla devices can be used as an SSD based intent log. We are working on software to test our products with a matrix of performance metrics, which should answer questions like this (and 7110 numbers, etc.)

  6. Written by Brendan Gregg
    on May 31, 2009 at 3:00 pm
    Permalink

    @Bruce; sorry, and to answer why NFSv3 instead of NFSv4: I believe NFSv3 is more commonly used, and happens to be slightly faster with the workloads I test – but that doesn’t meat it is faster for all workloads, or that it will remain this way (protocol performance is constantly improving as we tweak the code.) I posted an example of the current difference here: http://blogs.sun.com/brendan/entry/a_quarter_million_nfs_iops

  7. Written by Jason Ozolins
    on June 4, 2009 at 8:44 pm
    Permalink

    Brendan, I would be very interested to see latency numbers as the IOPS are scaled (i.e. for small # of clients, going out to the full set) for these operations, also for iSCSI to a similar cached target. I’m downloading the VM image to check out the analytics to see if you can produce a useful correlated measurement of both those quantities.
    We have use cases for multiple clients actively banging on the same block devices (QFS data volumes) and 1 well-spec’d active client seeing a block device (QFS metadata partitions), so IOPS and rates for both those iSCSI use cases are interesting to us.
    PS: are you going to the kernel.conf.au at University of Queensland in July? Some web pages say that you’ve been invited to speak but not confirmed, but the conference agenda suggests that you’re not presenting.

  8. Written by Chris Richardson
    on August 12, 2009 at 11:14 am
    Permalink

    Any news on a performance benchmark for the 7110, particularly interested on stats/config over NFS? Thanks

  9. Written by Jim Curran
    on August 21, 2009 at 6:51 am
    Permalink

    Ditto – need some info on iSCSI and NFS performance with the 7110.

  10. Written by Malware Removal Bot
    on August 24, 2009 at 2:39 am
    Permalink

    would like to see some performance numbers of how a small SQL Server Database running in ESX may perform using the 7110 as its primary storage.

Subscribe to comments via RSS