ZFS L2ARC
An exciting new ZFS feature has now become publicly known – the second level ARC, or L2ARC. I’ve been busy with its development for over a year, however this is my first chance to post about it. This post will show a quick example and answer some basic questions.
Background in a nutshell
The “ARC” is the ZFS main memory cache (in DRAM), which can be accessed with sub microsecond latency. An ARC read miss would normally read from disk, at millisecond latency (especially random reads). The L2ARC sits in-between, extending the main memory cache using fast storage devices – such as flash memory based SSDs (solid state disks).
![]() |
![]() |
![]() |
Some example sizes to put this into perspective, from a lab machine named “walu”:
Layer | Medium | Total Capacity |
ARC | DRAM | 128 Gbytes |
L2ARC | 6 x SSDs | 550 Gbytes |
Storage Pool | 44 Disks | 17.44 Tbytes (mirrored) |
For this server, the L2ARC allows around 650 Gbytes to be stored in the total ZFS cache (ARC + L2ARC), rather than just DRAM with about 120 Gbytes.
A previous ZFS feature (the ZIL) allowed you to add SSD disks as log devices to improve write performance. This means ZFS provides two dimensions for adding flash memory to the file system stack: the L2ARC for random reads, and the ZIL for writes.
Adam has been the mastermind behind our flash memory efforts, and has written an excellent article in Communications of the ACM about flash memory based storage in ZFS; for more background, check it out.
L2ARC Example
To illustrate the L2ARC with an example, I’ll use walu – a medium sized server in our test lab, which was briefly described above. Its ZFS pool of 44 x 7200 RPM disks is configured as a 2-way mirror, to provide both good reliability and performance. It also has 6 SSDs, which I’ll add to the ZFS pool as L2ARC devices (or “cache devices”).
I should note – this is an example of L2ARC operation, not a demonstration of the maximum performance that we can achieve (the SSDs I’m using here aren’t the fastest I’ve ever used, nor the largest.)
20 clients access walu over NFSv3, and execute a random read workload with an 8 Kbyte record size across 500 Gbytes of files (which is also its working set).
1) disks only
Since the 500 Gbytes of working set is larger than walu’s 128 Gbytes of DRAM, the disks must service many requests. One way to grasp how this workload is performing is to examine the IOPS that the ZFS pool delivers:
walu# zpool iostat pool_0 30 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- pool_0 8.38T 9.06T 95 4 762K 29.1K pool_0 8.38T 9.06T 1.87K 15 15.0M 30.3K pool_0 8.38T 9.06T 1.88K 3 15.1M 20.4K pool_0 8.38T 9.06T 1.89K 16 15.1M 39.3K pool_0 8.38T 9.06T 1.89K 4 15.1M 23.8K [...] |
The pool is pulling about 1.89K ops/sec, which would require about 42 ops per disk of this pool. To examine how this is delivered by the disks, we can either use zpool iostat or the original iostat:
walu# iostat -xnz 10 [...trimmed first output...] extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 43.9 0.0 351.5 0.0 0.0 0.4 0.0 10.0 0 34 c0t5000CCA215C46459d0 47.6 0.0 381.1 0.0 0.0 0.5 0.0 9.8 0 36 c0t5000CCA215C4521Dd0 42.7 0.0 349.9 0.0 0.0 0.4 0.0 10.1 0 35 c0t5000CCA215C45F89d0 41.4 0.0 331.5 0.0 0.0 0.4 0.0 9.6 0 32 c0t5000CCA215C42A4Cd0 45.6 0.0 365.1 0.0 0.0 0.4 0.0 9.2 0 34 c0t5000CCA215C45541d0 45.0 0.0 360.3 0.0 0.0 0.4 0.0 9.4 0 34 c0t5000CCA215C458F1d0 42.9 0.0 343.5 0.0 0.0 0.4 0.0 9.9 0 33 c0t5000CCA215C450E3d0 44.9 0.0 359.5 0.0 0.0 0.4 0.0 9.3 0 35 c0t5000CCA215C45323d0 45.9 0.0 367.5 0.0 0.0 0.5 0.0 10.1 0 37 c0t5000CCA215C4505Dd0 [...etc...] |
iostat is interesting as it lists the service times: wsvc_t + asvc_t. These I/Os are taking on average between 9 and 10 milliseconds to complete, which the client application will usually suffer as latency. This time will be due to the random read nature of this workload – each I/O must wait as the disk heads seek and the disk platter rotates.
Another way to understand this performance is to examine the total NFSv3 ops delivered by this system (these days I use a GUI to monitor NFSv3 ops, but for this blog post I’ll hammer nfsstat into printing something concise):
walu# nfsstat -v 3 1 | sed '/\^Server NFSv3/,/\^[0-9]/!d' [...] Server NFSv3: calls badcalls 2260 0 Server NFSv3: calls badcalls 2306 0 Server NFSv3: calls badcalls 2239 0 [...] |
That’s about 2.27K ops/sec for NFSv3; I’d expect 1.89K of that to be what our pool was delivering, and the rest are cache hits out of DRAM, which is warm at this point.
2) L2ARC devices
Now the 6 SSDs are added as L2ARC cache devices:
walu# zpool add pool_0 cache c7t0d0 c7t1d0 c8t0d0 c8t1d0 c9t0d0 c9t1d0 |
And we wait until the L2ARC is warm.
Time passes …
Several hours later the cache devices have warmed up enough to satisfy most I/Os which miss main memory. The combined ‘capacity/used’ column for the cache devices shows that our 500 Gbytes of working set now exists on those 6 SSDs:
walu# zpool iostat -v pool_0 30 [...] capacity operations bandwidth pool used avail read write read write ------------------------- ----- ----- ----- ----- ----- ----- pool_0 8.38T 9.06T 30 14 245K 31.9K mirror 421G 507G 1 0 9.44K 0 c0t5000CCA216CCB905d0 - - 0 0 4.08K 0 c0t5000CCA216CCB74Cd0 - - 0 0 5.36K 0 mirror 416G 512G 0 0 7.66K 0 c0t5000CCA216CCB919d0 - - 0 0 4.34K 0 c0t5000CCA216CCB763d0 - - 0 0 3.32K 0 [... 40 disks truncated ...] cache - - - - - - c7t0d0 84.5G 8.63G 2.63K 0 21.1M 11.4K c7t1d0 84.7G 8.43G 2.62K 0 21.0M 0 c8t0d0 84.5G 8.68G 2.61K 0 20.9M 0 c8t1d0 84.8G 8.34G 2.64K 0 21.1M 0 c9t0d0 84.3G 8.81G 2.63K 0 21.0M 0 c9t1d0 84.2G 8.91G 2.63K 0 21.0M 1.53K ------------------------- ----- ----- ----- ----- ----- ----- |
The pool_0 disks are still serving some requests (in this output 30 ops/sec) but the bulk of the reads are being serviced by the L2ARC cache devices – each providing around 2.6K ops/sec. The total delivered by this ZFS pool is 15.8K ops/sec (pool disks + L2ARC devices), about 8.4x faster than with disks alone.
This is confirmed by the delivered NFSv3 ops:
walu# nfsstat -v 3 1 | sed '/\^Server NFSv3/,/\^[0-9]/!d' [...] Server NFSv3: calls badcalls 18729 0 Server NFSv3: calls badcalls 18762 0 Server NFSv3: calls badcalls 19000 0 [...] |
walu is now delivering 18.7K ops/sec, which is 8.3x faster than without the L2ARC.
However the real win for the client applications is that of read latency; the disk-only iostat output showed our average was between 9 and 10 milliseconds, the L2ARC cache devices are delivering the following:
walu# iostat -xnz 10 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device [...] 2665.0 0.4 21317.2 0.0 0.7 0.7 0.2 0.2 39 67 c9t0d0 2668.1 0.5 21342.0 3.2 0.6 0.7 0.2 0.2 38 66 c9t1d0 2665.4 0.4 21320.4 0.0 0.7 0.7 0.3 0.3 42 69 c8t0d0 2683.6 0.4 21465.9 0.0 0.7 0.7 0.3 0.3 41 68 c8t1d0 2660.7 0.6 21295.6 3.2 0.6 0.6 0.2 0.2 36 65 c7t1d0 2650.7 0.4 21202.8 0.0 0.6 0.6 0.2 0.2 36 64 c7t0d0 |
Our average service time is between 0.4 and 0.6 ms (wsvt_t + asvc_t columns), which is about 20x faster than what the disks were delivering.
What this means …
An 8.3x improvement for 8 Kbyte random IOPS across a 500 Gbyte working set is impressive, as is improving storage I/O latency by 20x.
But this isn’t really about the numbers, which will become dated (these SSDs were manufactured in July 2008, by a supplier who is providing us with bigger and faster SSDs every month).
What’s important is that ZFS can make intelligent use of fast storage technology, in different roles to maximize their benefit. When you hear of new SSDs with incredible ops/sec performance, picture them as your L2ARC; or if it were great write throughput, picture them as your ZIL.
The example above was to show that the L2ARC can deliver, over NFS, whatever these SSDs could do. And these SSDs are being used as a second level cache, in-between main memory and disk, to achieve the best price/performance.
Questions
I recently spoke to a customer about the L2ARC and they asked a few questions which may be useful to repeat here:
What is L2ARC?
- The L2ARC is best pictured as a cache layer in-between main memory and disk, using flash memory based SSDs or other fast devices as storage. It holds non-dirty ZFS data, and is currently intended to improve the performance of random read workloads.
Isn’t flash memory unreliable? What have you done about that?
- It’s getting much better, but we have designed the L2ARC to handle errors safely. The data stored on the L2ARC is checksummed, and if the checksum is wrong or the SSD reports an error, we defer that read to the original pool of disks. Enough errors and the L2ARC device will offline itself. I’ve even yanked out busy L2ARC devices on live systems as part of testing, and everything continues to run.
Aren’t SSDs really expensive?
- They used to be, but their price/performance has now reached the point where it makes sense to start using them in the coming months. See Adam’s ACM article for more details about price/performance.
What about writes – isn’t flash memory slow to write to?
- The L2ARC is coded to write to the cache devices asynchronously, so write latency doesn’t affect system performance. This allows us to use “read-bias” SSDs for the L2ARC, which have the best read latency (and slow write latency).
What’s bad about the L2ARC?
- It was designed to either improve performance or do nothing, so there isn’t anything that should be bad. To explain what I mean by do nothing – if you use the L2ARC for a streaming or sequential workload, then the L2ARC will mostly ignore it and not cache it. This is because the default L2ARC settings assume you are using current SSD devices, where caching random read workloads is most favourable; with future SSDs (or other storage technology), we can use the L2ARC for streaming workloads as well.
Internals
If anyone is interested, I wrote a summary of L2ARC internals as a block comment in usr/src/uts/common/fs/zfs/arc.c, which is also surrounded by the actual implementation code. The block comment is below (see the source for the latest version), and is an excellent reference for how it really works:
/* * Level 2 ARC * * The level 2 ARC (L2ARC) is a cache layer in-between main memory and disk. * It uses dedicated storage devices to hold cached data, which are populated * using large infrequent writes. The main role of this cache is to boost * the performance of random read workloads. The intended L2ARC devices * include short-stroked disks, solid state disks, and other media with * substantially faster read latency than disk. * * +-----------------------+ * | ARC | * +-----------------------+ * | \^ \^ * | | | * l2arc_feed_thread() arc_read() * | | | * | l2arc read | * V | | * +---------------+ | * | L2ARC | | * +---------------+ | * | \^ | * l2arc_write() | | * | | | * V | | * +-------+ +-------+ * | vdev | | vdev | * | cache | | cache | * +-------+ +-------+ * +=========+ .-----. * : L2ARC : |-_____-| * : devices : | Disks | * +=========+ `-_____-' * * Read requests are satisfied from the following sources, in order: * * 1) ARC * 2) vdev cache of L2ARC devices * 3) L2ARC devices * 4) vdev cache of disks * 5) disks * * Some L2ARC device types exhibit extremely slow write performance. * To accommodate for this there are some significant differences between * the L2ARC and traditional cache design: * * 1. There is no eviction path from the ARC to the L2ARC. Evictions from * the ARC behave as usual, freeing buffers and placing headers on ghost * lists. The ARC does not send buffers to the L2ARC during eviction as * this would add inflated write latencies for all ARC memory pressure. * * 2. The L2ARC attempts to cache data from the ARC before it is evicted. * It does this by periodically scanning buffers from the eviction-end of * the MFU and MRU ARC lists, copying them to the L2ARC devices if they are * not already there. It scans until a headroom of buffers is satisfied, * which itself is a buffer for ARC eviction. The thread that does this is * l2arc_feed_thread(), illustrated below; example sizes are included to * provide a better sense of ratio than this diagram: * * head --> tail * +---------------------+----------+ * ARC_mfu |:::::#:::::::::::::::|o#o###o###|-->. # already on L2ARC * +---------------------+----------+ | o L2ARC eligible * ARC_mru |:#:::::::::::::::::::|#o#ooo####|-->| : ARC buffer * +---------------------+----------+ | * 15.9 Gbytes \^ 32 Mbytes | * headroom | * l2arc_feed_thread() * | * l2arc write hand <--[oooo]--' * | 8 Mbyte * | write max * V * +==============================+ * L2ARC dev |####|#|###|###| |####| ... | * +==============================+ * 32 Gbytes * * 3. If an ARC buffer is copied to the L2ARC but then hit instead of * evicted, then the L2ARC has cached a buffer much sooner than it probably * needed to, potentially wasting L2ARC device bandwidth and storage. It is * safe to say that this is an uncommon case, since buffers at the end of * the ARC lists have moved there due to inactivity. * * 4. If the ARC evicts faster than the L2ARC can maintain a headroom, * then the L2ARC simply misses copying some buffers. This serves as a * pressure valve to prevent heavy read workloads from both stalling the ARC * with waits and clogging the L2ARC with writes. This also helps prevent * the potential for the L2ARC to churn if it attempts to cache content too * quickly, such as during backups of the entire pool. * * 5. After system boot and before the ARC has filled main memory, there are * no evictions from the ARC and so the tails of the ARC_mfu and ARC_mru * lists can remain mostly static. Instead of searching from tail of these * lists as pictured, the l2arc_feed_thread() will search from the list heads * for eligible buffers, greatly increasing its chance of finding them. * * The L2ARC device write speed is also boosted during this time so that * the L2ARC warms up faster. Since there have been no ARC evictions yet, * there are no L2ARC reads, and no fear of degrading read performance * through increased writes. * * 6. Writes to the L2ARC devices are grouped and sent in-sequence, so that * the vdev queue can aggregate them into larger and fewer writes. Each * device is written to in a rotor fashion, sweeping writes through * available space then repeating. * * 7. The L2ARC does not store dirty content. It never needs to flush * write buffers back to disk based storage. * * 8. If an ARC buffer is written (and dirtied) which also exists in the * L2ARC, the now stale L2ARC buffer is immediately dropped. * * The performance of the L2ARC can be tweaked by a number of tunables, which * may be necessary for different workloads: * * l2arc_write_max max write bytes per interval * l2arc_write_boost extra write bytes during device warmup * l2arc_noprefetch skip caching prefetched buffers * l2arc_headroom number of max device writes to precache * l2arc_feed_secs seconds between L2ARC writing * * Tunables may be removed or added as future performance improvements are * integrated, and also may become zpool properties. */ |
Jonathan recently linked to this block comment in a blog entry about flash memory, to show that ZFS can incorporate flash into the storage hierarchy, and here is the actual implementation.
In: Performance · Tagged with: L2ARC, latency, performance, SSD, ZFS
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on July 22, 2008 at 10:23 pm
Permalink
Are you testing any of this with your NBC Olympics web site in August? That could be a great way to prove the benefits? Just an idea.
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on July 23, 2008 at 12:42 am
Permalink
Hello Brendan,
Really fantastic in-depth article there – really enjoyed it!
jason.
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on July 23, 2008 at 3:42 am
Permalink
[Trackback] Brendan Gregg wrote a good piece about the performance of L2ARC in ZFS L2ARC:The pool_0 disks are still serving some requests (in this output 30 ops/sec) but the bulk of the reads are being serviced by the L2ARC cache devices – each providing around 2….
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on July 23, 2008 at 4:41 am
Permalink
Hi Brendan, now I really understand how SSDs can make ZFS fster for writing AND reading. Thanks!
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on July 23, 2008 at 5:26 am
Permalink
Brendan:
Awesome work! This was really an enjoyment to read. I really appreciate how clear / concise it was to understand the new implementation of the L2ARC. Now the rest of us can’t wait to get our hands on some new SSD systems which should start hitting the enterprise in the next coming months!
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on July 23, 2008 at 7:52 pm
Permalink
Very interesting stuff. I have been doing a lot of experiments with SSDs and other flash devices lately. Could you possibly repeat your experiment using a much larger dataset? It’s not very informative that the L2ARC works well when the working set fits entirely in the cache, and is entirely read-only.
For instance, a 10TB working set with 100GB of flash on the front end would be quit informative.
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on July 23, 2008 at 9:49 pm
Permalink
This is SO COOL, you just made my year. I understand the limitations of SSD, and this is about the best that anyone can ask for. Full use of the SSD, data can be written off to disk, solve the write and random write latency problem…
I also would like to see what the results are for a larger working set size. Also, what would be the effect in a DSS system with a mixed workload… perhaps sequentials get left on spinning disk and randoms in the cache? That would be awesome.
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on July 24, 2008 at 5:34 am
Permalink
Thanks for the positive feedback; here are some individual replies:
Kevin – I don’t know of a plan, but you are right, getting some customer case studies published would really help promote the benefits (I’m sure we will in the coming months.)
Jeffrey – ideally the system will be configured to have enough SSDs to cover the working set, which is why I demo’d that case – it’s what we are aiming for. With today’s SSDs, if your working set is less than 550 Gbytes, then a server such as what I demo’d would be ideal; and this capacity is only getting larger.
Are you sure this is a 10 Tbyte working set – ie, hot data – and not the total database size? 10 Tbytes of random read working set is enormous; and is this a real production server (google cache?). Just curious (yes, I’ve heard of working set possibly getting this large, but it hasn’t been common.)
If my walu server tackled a 10 Tbyte working set, then 550 Gbytes would be cached leaving 9.46 Tbytes uncached. If the workload was uniformly distributed across the working set – which is the worst case – then we’ve just made about 5% of our I/O run much faster, which would be around the expected performance improvement (which, for the cost of SSDs, may be a good deal.) If the workload wasn’t so uniform, then the improvement value can get higher.
So yes, it’s very important to consider working set size. While your database may be dozens of Tbytes, your working set may only be 10s or 100s of Gbytes – and the L2ARC with current SSDs can work very well. But if your working set is much larger somehow, you should try some calculations to estimate what that means.
If I can get the time for a larger than L2ARC run, I’ll post how it looks. I won’t be posting "best possible" results – there are groups at Sun to handle this (and official benchmarks), who will make sure that all tunables are set correctly for maximum performance.
Ken – sequential data (which ZFS will prefetch) is already skipped by the L2ARC and left on disk (it’s the l2arc_noprefetch tunable), leaving random data for the L2ARC. So this should already work.
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on July 24, 2008 at 11:06 am
Permalink
[Trackback] Your story was featured in BeleniX! Here is the link to vote it up and promote it: http://belenix.org/node/178
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on August 10, 2008 at 10:48 am
Permalink
Brendan, thanks for the info. Cache is and will remain a useful tool in the need for speed. I’ve been looking at DRAM and NAND flash SSDs for awhile. The capacities and speeds are truly jaw dropping. Using your model I’d project that the bottom layer will disappear in the near future. NAND flash will be the primary storage and DRAM SSDs serve as L2ARC. NAND Flash is pushing 100K IOPS, DRAM is over 10X that. Expensive? Yes. But it wasn’t that long ago that disk storage was over $1000/Gb. The move to very high speed mass storage is here now. I’d love to see your model run on a 10Tb NAND flash SSD array with 1Tb of DRAM for cache!!! I’d expect around 600K IOPS with todays parts. That brings up a new set of problems dealing with systems software designed around IO latency, file systems layouts, etc. associated with rotating media and even process scheduling as IO times approach context switch times.
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on August 14, 2008 at 2:09 am
Permalink
Hello Brendan,
Really fantastic, now I really understand it
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on August 14, 2008 at 2:14 am
Permalink
[Trackback] fantastic post on ZFS Second Level ARC – L2ARC – Testing Show 8x More Throughput ( Brendan Gregg ).
must read!
The "ARC" is the ZFS main memory cache (in DRAM),
which can be accessed with sub microsecond latenc…
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on August 15, 2008 at 8:30 am
Permalink
Which GUI do you use NFS?
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on September 20, 2008 at 12:17 pm
Permalink
Brendan: Any new data or thoughts on cases where datasets don’t fit into the L2ARC devices? Also, is compression supported while destaging to the devices? Any thoughts on whether that is a good idea?
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on October 6, 2008 at 8:40 pm
Permalink
Brendan,
Jignesh K. Shah thought of another way to use storage tiers:
http://blogs.sun.com/jkshah/entry/zfs_with_cloud_storage_and
He puts the ZFS log and cache on local drives and the next tier bi-coastal using iSCSI…
This ZFS feature is a great tool for distributed computing too… I hope.
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on October 17, 2008 at 1:00 pm
Permalink
[Trackback] EMC e Compellent hanno entrambi annunciato il supporto a questo tipo di HD, EMC li chiama EFD (Enterpise Flash Drives… perchè così li possono far pagare di più ma sono sempre quelli, ;-) ) Per EMC l’implementazione degli SSD è come quella di quals…
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on November 14, 2008 at 1:50 pm
Permalink
Brendan:
thanks for all the info. I have two questions about l2arc:
1)When choosing buffers to write to l2, do you prefer MFU buffers over MRU buffers?
2)After a system restart, does l2arc have a mechanism to reuse the buffers in l2 or are they discarded?
thanks.
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on January 2, 2009 at 8:00 am
Permalink
Would there be any combination (SSD/ZFS/Zones/Xvm/) here that would increase performance for virtual machines?
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on February 26, 2009 at 10:11 pm
Permalink
Excellent post. I particularly liked the Question segment too as the answers are spot on easy to understand.
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on April 29, 2009 at 3:39 am
Permalink
hi Brendan, or others,
When sizing an active/active or active/passive 7410 cluster, are there any thoughts on the number of readzilla’s ?
Our national SUN storage tech, mentions that we should size at least 2 x readzilla per pool, as he claims, Solaris needs to be able to balance reads and writes over two devices. I have not been able to find any docs on this matter yet and interestingly enough there is a cluster bundle that has only 1 readzilla per head, which seems to indicate that he may be mistaken.
In case of failover in an active/active cluster, does the failover head always need an equal amount of "idle" readzillas to the "active" readzilla’s in the the primary head ? Or can the failover unit function on the pool of the primary unit without SSD L2ARC ?
Looking forward to your response.
Frans
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on June 9, 2009 at 1:48 pm
Permalink
Tested this "cache" on FreeBSD-8.0-Current:
created a couple GB ram disk and added it to a pool containing data.
Works weirdly: doesn’t seem to have any visible positive effect neither with large (1gig) nor with small files (total – a set of 200MB).
Nuked data from pool.
Unpacked dataset of small files (200MB, average size of 1-2KB): yes, there is an effect (several times), but drives get trashed anyway, despite that 2GB is definitely bigger than dataset of 200MB. But w/o writing the data to get cached – nothing good happens at all.
It’s not very tunable as well.
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on July 3, 2009 at 9:30 am
Permalink
Thanks for this fantastic article Brendan. Anyone know why on Solaris 10 05/09 i’m getting the error "Operation not supported on this type of pool". I was aware of a bug in an earlier version of Solaris, but understood this to be fixed? zpool upgrade shows I’m running zfs 10, which should support L2ARC.
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on July 10, 2009 at 3:41 pm
Permalink
Re: L2ARC not supported in Solaris 10 05/09. http://opensolaris.org/os/community/zfs/version/10/ mentions it doesn’t work for S10U6, so apparently it’s still not fixed in S10U7 even though both releases support ZFS version 10. This person’s asking for it too, so maybe someone will answer: http://opensolaris.org/jive/thread.jspa?threadID=106865&tstart=0
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on July 22, 2009 at 3:11 pm
Permalink
@FreeBSD/ZFS: Why would expect better performance from a RAM disk, when you would normally use that RAM in the ARC layer? Essentially you are caching your cache. I can’t make sense of your approach.
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on August 21, 2009 at 9:28 pm
Permalink
I was aware of a bug in an earlier version of Solaris, but understood this to be fixed? zpool upgrade shows I’m running zfs 10, which should support L2ARC.
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on August 24, 2009 at 3:14 am
Permalink
Does the failover head always need an equal amount of "idle" readzillas to the "active" readzilla’s in the the primary head ?
Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta('email') instead. in /home/knmngmprl21d/public_html/blogs/wp-includes/functions.php on line 3467
on September 23, 2009 at 4:31 pm
Permalink
L2ARC is supported in OpenSolaris 2009.06 and will be supported in Solaris 10 Update 8 (supposedly shipping in Oct or Nov). L2ARC is not natively supported under Solaris 10 Update 6 or Update 7. I haven’t heard whether a future ZFS patch might enable it there.