Adam Leventhal's blog

Search
Close this search box.

Number 11 of 20: libumem

July 13, 2004

go to the Solaris 10 top 11-20 list for more

libumem

In Solaris 2.4 we replaced the old buddy allocator1 the slab allocator2 invented by Jeff Bonwick. The slab allocator is covered in pretty much every operating systems text book — and that’s because most operating systems are now using it. In Solaris 103, Jonathan Adams brought the slab allocator to user-land in the form of libumem4.

Getting started with libumem is easy; just do the linker trick of setting LD_PRELOAD to “libumem.so” and any program you execute will use libumem’s malloc(3C) and free(3C) (or new and delete if you’re into that sort of thing). Alteratively, if you like what you see, you can start linking your programs against libumem by passing -lumem to your compiler or linker. But I’m getting ahead of myself; why is libumem so great?

Scalability

The slab allocator is designed for systems with many threads and many CPUs. Memory allocation with naive allocators can be a serious bottleneck (in fact we recently used DTrace to find such a bottleneck; using libumem got us a 50% improvement). There are other highly scalable allocators out there, but libumem is about the same or better in terms of performance, has compelling debugging features, and it’s free and fully supported by Sun.

Debugging

The scalability and performance are impressive, but not unique to libumem; where libumem really sets itself apart is in debugging. If you’ve ever spent more than 20 seconds debugging heap corruption or chasing down a memory leak, you need libumem. Once you’ve used libumem it’s hard to imagine debugging this sort of problem with out it.

You can use libumem to find double-frees, use-after-free, and many other problems, but my favorite is memory leaks. Memory leaks can really be a pain especially in large systems; libumem makes leaks easy to detect, and easy to diagnose. Here’s a simple example:

$ LD_PRELOAD=libumem.so
$ export LD_PRELOAD
$ UMEM_DEBUG=default
$ export UMEM_DEBUG
$ /usr/bin/mdb ./my_leaky_program
> ::sysbp _exit
> ::run
mdb: stop on entry to _exit
mdb: target stopped at:
libc.so.1`exit+0x14:    ta        8
mdb: You've got symbols!
mdb: You've got symbols!
Loading modules: [ ld.so.1 libumem.so.1 libc.so.1 ]
> ::findleaks
CACHE     LEAKED   BUFCTL CALLER
0002c508       1 00040000 main+4
----------------------------------------------------------------------
Total       1 buffer, 24 bytes
> 00040000::bufctl_audit
ADDR  BUFADDR    TIMESTAMP THR  LASTLOG CONTENTS    CACHE     SLAB     NEXT
DEPTH
00040000 00039fc0 3e34b337e08ef   1 00000000 00000000 0002c508 0003bfb0 00000000
5
libumem.so.1`umem_cache_alloc+0x13c
libumem.so.1`umem_alloc+0x60
libumem.so.1`malloc+0x28
main+4
_start+0x108

Obviously, this is a toy leak, but you get the idea, and it’s really that simple to find memory leaks. Other utilities exist for debugging memory leaks, but they dramatically impact performance (to the point where it’s difficult to actually run the thing you’re trying to debug), and can omit or incorrectly identify leaks. Do you have a memory leak today? Go download Solaris Express, slap your app on it and run it under libumem. I’m sure it will be well worth the time spent.

You can use other mdb dcmds like ::umem_verify to look for corruption. The kernel versions of these dcmds are described in the Solaris Modular Debugger Guide today; we’ll be updating the documentation for Solaris 10 to describe all the libumem debugging commands.

Programmatic Interface

In addition to offering the well-known malloc() and free(), also has a programmatic interface for creating your own object caches backed by the heap or memory mapped files or whatever. This offers additional flexibility and precision and allows you to futher optimize your application around libumem. Check out the man pages for umem_alloc() and umem_cache_alloc() for all the details.

Summary

Libumem is a hugely important feature in Solaris 10 that just slipped off top 10 list, but I doubt there’s a Solaris user (or soon-to-be Solaris user) that won’t fall in love with it. I’ve only just touched on what you can do with libumem, but Jonathan Adams (libumem’s author) will soon be joining the ranks of blogs.sun.com to tell you more. Libumem is fast, it makes debugging a snap, it’s easy to use, and you can get down and dirty with it’s expanded API — what else couldn anyone ask for in an allocator?

1. Jeff’s USENIX paper is definitely worth a read
2. For more about Solaris history, and the internals of the slab allocator check out Solaris Internals
3. Actually, Jonathan slipped libumem into Solaris 9 Update 3 so you might have had libumem all this time and not known…
4. Jeff and Jonathan wrote a USENIX paper about some additions to the allocator and its extension to user-land in the form of libumem

7 Responses

  1. libumem is OK as far as it goes, but there is a whole class of errors that it can’t catch. Compared to tools like purify, valgrind or the tools included in the Forte compilers it’s pretty much a toy. Sorry.

  2. The big difference is that you can run libumem in production with acceptable overhead. I haven’t played with purify much, but I’ve been told by several customers that it can be _painfully_ slow and can actually miss real leaks and incorrectly flag non-leaks. Libumem is something you can use in production — not just in your development environment.

  3. I love comments like this – “…there is a whole class of errors that it can’t catch.” Please give an example. I have used libumem on a number of occassions with code that has been “purified”. It amazes me to find the leaks that were flagged only as possbile leaks (and therefore usually ignored). I have never had Purify identify anything after using libumem on an application – and this is based on numerous engagements.
    As Adam says, the probe effect of Purify is so high (not to mention you have to run an altered version of your app) that you cannot get quality information. I have seen the performance impact of purify be more than 10X for applications with high allocation rates.
    Libumem is obviously far more than a toy.

  4. I’m also going to chime in on this — purify and valgrind are fine tools, but they are in a very different design space than libumem. libumem is designed to be a low-overhead, highly scalable allocator with memory debugging features which are usable in a production environment.
    Neither purify nor valgrind (nor the Forte tools, for that matter) have low enough overhead (in time *or* space) to run on a production system, nor were they initially designed with scalability in mind — as Jarod points out, Purify’s overhead can be so ruinous that the system cannot even reach the state where the problems occur.

  5. I can see both sides here; I’d reinforce Jonathan’s comment about purify/valgrind being a very different kind of beast.

    Both purify and valgrind are excellent pieces of technology, and they can both catch a wide family of problems. But, as others have noted, the intrusive nature makes running them on a big application a trial of patience – I’d say Jarod’s metric of 10x worse performance is a fairly typical figure! Running them in production is out of the question, but for the developer, purify in particular presents what you want to know in a very nice format.

    Perhaps more subtly, the intrusiveness makes both tools very fragile – new compiler or linker or C++ runtime patches from Sun will invariably break purify. And I can’t get valgrind working properly with NPTL on RHEL3.

    Having said that, there are some things which a malloc debug library (and I know I’m belittling libumem by calling it that) isn’t going to catch, by definition – overruns in buffers which are on the stack, for instance.

    libumem sounds fantastic, particularly if the performance is better than watchmalloc. Slab allocation is also something which can be a big boost in terms of runtime performance and reduced memory usage, especially in a C++ program where you’re using lots of small objects. Maybe someone should put together a FAQ showing how to overload new and delete so that a class uses a libumem cache for its allocation?

    Presumably for heap-based buffer overruns, libumem uses watchpoints, just like watchmalloc, so the performance would be about the same?

Recent Posts

January 13, 2024
December 29, 2023
February 12, 2017
December 18, 2016
August 9, 2016

Archives

Archives