libsunw_ssl, or, How SmartOS Avoids Sadness

The recent hubbub around CVE-2014-0160 (aka “heartbleed“) has led to a few questions about SmartOS‘s use of OpenSSL in the platform image.  This is actually a very interesting side trip that has nothing to do with cryptography and very little to do with security; instead, it affords an opportunity to talk about ensuring correctness in dynamically-linked library references.  Just to get this out of the way, I’ll note that one of the two versions of OpenSSL delivered by the platform prior to Robert Mustacchi‘s change was in fact vulnerable to this attack, but that library is not used to provide any TLS services so there is no way to exploit it.  As we will see, that library is not usable at all by any software other than the platform itself, greatly limiting the potential scope of the problem.  If you’re using SmartOS, you don’t need to worry about the OpenSSL in the platform image; OpenSSL in zones or KVM instances is another matter, and very likely does need attention.

Of greater interest to me is the presence of these files in the platform image:

/lib/64/libsunw_crypto.so.1.0.0
/lib/64/libsunw_ssl.so.1.0.0
/lib/libsunw_crypto.so.1.0.0
/lib/libsunw_ssl.so.1.0.0

You’ll note that there are no compilation symlinks (e.g., libsunw_ssl.so), nor are there any OpenSSL headers in /usr/include; together, that makes it very difficult for anyone to compile software that consumes the interfaces these libraries provide.  We have however gone two steps beyond even that in our efforts to prevent customer software from using these libraries, and our reasons for doing so stem from many years of miserable experience delivering and using third-party libraries in Solaris and more recently illumos.

A Brief History of Sadness

In the very distant past, the Unix operating system was entirely self-contained: everything it relied upon was part of itself, and things that were not part of Unix were simply third-party software that operators could install or not as they saw fit.  This was mostly fine, except that inevitably Bell Labs would not have a monopoly on the creation of software useful as part of an operating system.  In time, the broad spread of innovation combined with changing expectations about what an operating system ought to provide led to the incorporation of various previously third-party software into many common Unix distributions.  To the extent that Unix took over this software as repository of record, this was goodness: Unix grew new capabilities by adding the best software developed by others.  Often, however, that was not the case, and the seeds of sadness were sown.

Fast-forward to the Solaris 10 era.  By then, Solaris was delivering a fairly broad range of software for which the repository of record was outside Sun.  There were two basic reasons for this:

With the retirement of static linking with Solaris 9, these upstream libraries were being delivered as dynamic libraries just like the rest.  Many of them, in order to accommodate the first use case in our list above, were delivered with headers and compilation symlinks as well.  And with those headers and symlinks, the great sadness burst forth and thrived.

Architecturally, these libraries (and other software, though we’re concerned primarily with libraries here) provided interfaces that were not under Sun’s control.  PSARC made some effort to communicate this to customers by requiring that these third-party interfaces generally be classified as External, later amended to Volatile before being collapsed back to Uncommitted.  The gist of this, regardless of the precise terminology, was that customers consuming these interfaces were being told that they could not rely on them to remain compatible across minor releases (or even patches and updates).  In a world in which the would-be consumers of these interfaces were mainly customers writing their own software from scratch, as was often the case with respect to other Solaris libraries in the past, that was usually adequate to address the problem.  But the world had changed.  Most of the software consuming these interfaces now is other third-party software, most of it originally developed on GNU/Linux and often built and tested nowhere else.  Customers, somewhat understandably, wanted to be able to build and use (if not simply install via the OS’s packaging tools) those consumers — and have them work.  Warning people that the interfaces were not to be relied upon was of little help; most customers were not directly aware that they were consuming them at all.  People who were used to building software on a Solaris 2.4 system and running their binaries on everything from 2.4 through 10 were in for a rude surprise when patch releases broke their third-party software.  Worse still, others were frustrated by the lag between injection of a piece of third-party software into the OS and its delivery; the versions of this software included with the operating system were invariably months or even years older than those currently available on the Internet.  As a result, many pieces of third-party software that consumed these interfaces (but were not delivered with the OS) would not build or work correctly.  The sadness raged throughout the land, ultimately contributing significantly to Solaris’s demise.  Presumably Solaris still has this problem today.

Inside the Sadness

It should be apparent that this architectural model is untenable.  An all-inclusive OS delivers, or attempts to deliver, almost every conceivable library as well as every remotely popular consumer.  Since every piece of software is built consistently and packaged afresh for each release of the distribution, incompatible change in these third-party libraries (or even the OS itself) is of little importance to most customers.  The primary measure of value is how many packages the distribution incorporates, not how well software built from third-party sources works across OS upgrades.  While there are serious problems with this model, customers who stick with the packages provided by their distributor and upgrade frequently will at least in principle get the best of both worlds: recent software and a hassle-free upgrade path.

Similarly, an OS that is entirely self-contained and delivers no extra software (that is, software for which the repository of record is not the OS distributor’s own) is also safe.  While users will be forced to obtain and build whatever software they desire, there is no architectural conflict: the interfaces provided by the OS remain compatible over time, so that the third-party software built by customers continues to work.  This is the historical Unix model; it works well technically but often fails to meet customer expectations.  Specifically, third-party software tends to be of exceptionally bad quality and is often developed by people with no understanding of portability whatsoever (or worse, an active disdain for it); as a result, getting it to build and work correctly is often a massive chore.  Not surprisingly, customers aren’t enthusiastic about doing that themselves.

The origin of the sadness lies in seeking the middle ground: delivering a small subset of the extra software that customers would like to use.  This “third way” is an architectural disaster, especially when combined with dynamic linking.  Early Windows users may recall this class of problems as “DLL Hell”.  Instead of offering the best of both worlds (plenty of extra software packages, low maintenance burden for the vendor, flexibility for the operator), it not only delivers the worst of both but introduces additional problems all its own:

At the core of the sadness are two fundamental problems:

The first objective is relatively easy to achieve.  If there is only one version of a library on the system, it’s simply a matter of ensuring that some combination of the consumer’s DT_RPATH and the system’s crle(1) configuration contains the path to the library, that the library’s filename matches the DT_SONAME of the library against which the consumer was built (because the consumer’s DT_NEEDED entry is recorded from it), and that the library is actually present on the system.  It is easy to satisfy these constraints for all OS-delivered software; illumos’s build system generally does all of this, as does SmartOS’s.  All that’s left for the end user to do is make sure LD_LIBRARY_PATH is not set, which sadly seems to be more difficult than one would expect (this is further complicated by third-party software that delivers shell scripts that explicitly set this environment variable even though it’s almost never necessary or appropriate).

The second objective is much more problematic.  Consider the following library dependencies:

fooprog (RPATH: /usr/local/lib)
 |
 + DT_NEEDED: libA.so.1 => /usr/local/lib/libA.so.1
 |
 + DT_NEEDED: libB.so.1 => /lib/libB.so.1 (RPATH: /lib:/usr/lib)
                            |
                            + DT_NEEDED: libA.so.1 => /lib/libA.so.1

In our example, libB is an illumos-specific library, while libA is an upstream library.  The copy in /lib is delivered with the operating system, while the one in /usr/local/lib has been built by the customer (perhaps because fooprog requires a different version of it from the one delivered by the OS in /lib).  It is very easy to end up in a situation in which both copies of libA will occupy this fooprog process’s address space.  Chaos will ensue; initialisation code may reference the wrong static data, functions with incompatible signatures may be called, and a generally difficult to debug core dump is the likely eventual outcome.  Direct binding can alleviate some of these effects, but few customers build third-party software using the correct options.  In general, much stronger medicine is required.

Avoiding the Sadness with OpenSSL

In our example above, libA.so.1 is an OpenSSL library.  There are two incompatible versions of OpenSSL in general circulation: 0.9.8 and 1.0.1.  Historically, illumos used 0.9.8, and most distributions delivered that version along with compilation symlinks and headers.  Because SmartOS did so as well prior to May 2012, simply removing OpenSSL 0.9.8 from the platform was not an option; doing so could easily break customer binaries accidentally built against it in the past.  While we will eventually remove this library, it will be some time before we can safely assume that no one is still using binaries build prior to the removal of its compilation symlinks.  This, then, is why the OpenSSL 0.9.8 libraries are still delivered by SmartOS.  The platform software itself does not use this version, however.

Instead, the platform uses the new libsunw_ssl.so and libsunw_crypto.so.  In order to avoid the problem described above, these libraries are protected in four ways:

All of these changes are needed both to avoid accidental use of these Private libraries by customer software and to allow safe coexistence with customer- or pkgsrc-delivered OpenSSL libraries in a customer process’s address space.  As a user of a SmartOS instance (whether in the Joyent Public Cloud, your own private cloud based on SmartDataCenter, or on your SmartOS system at home), you don’t have to worry about the operating system’s copy of OpenSSL.  The platform software linked with our Private copy will always use that copy; your software will always use the copy provided by pkgsrc or your own application deployment package.  And if they do end up in the same address space, their names and symbols will not conflict.

If you’re curious how this is achieved, take a look at our upstream software build system.  Because the rest of the platform software is built against a set of headers that include our special sunw_prefix.h, there is normally no change required to consumers.  A few upstream consumers relying on GNU autoconf or similar mechanisms bypass headers in their attempts to detect the presence or version of OpenSSL; in these cases, a few modest changes are required.  All told, the maintenance burden associated with this approach has been very modest; my colleague Robert Mustacchi was able to upgrade our Private OpenSSL from 1.0.1d+ to 1.0.1g with less than a few hours of work and a very simple set of changes.

Other Software

There are a few other pieces of upstream software in SmartOS that will eventually require similar treatment.  For now, because we have not modified the versions of libxml2, libexpat, and other such software in the platform from the last revision that was delivered with compilation symlinks, the existing libraries are doing double duty: they provide both backward-compatibility for customer software and important functionality consumed by the platform.  As these are upgraded, we will take the same approach: the existing version will continue to be delivered for compatibility, while the new version will have its name and globally-visible symbols mangled.  In all such cases, we already do not deliver compilation symlinks and headers.

As a user of a SmartOS instance, all you need to know is that software you build yourself should always depend on pkgsrc-delivered libraries, never those in the platform.  It is of course safe to rely on platform libraries for which SmartOS or illumos is the repository of record, such as libc; this discussion is relevant only to software that is delivered by the platform but is also available from third parties.  We’ll do the rest.

Posted on April 10, 2014 at 17:47 by wesolows · Permalink
In: Uncategorized