libsunw_ssl, or, How SmartOS Avoids Sadness
The recent hubbub around CVE-2014-0160 (aka “heartbleed“) has led to a few questions about SmartOS‘s use of OpenSSL in the platform image. This is actually a very interesting side trip that has nothing to do with cryptography and very little to do with security; instead, it affords an opportunity to talk about ensuring correctness in dynamically-linked library references. Just to get this out of the way, I’ll note that one of the two versions of OpenSSL delivered by the platform prior to Robert Mustacchi‘s change was in fact vulnerable to this attack, but that library is not used to provide any TLS services so there is no way to exploit it. As we will see, that library is not usable at all by any software other than the platform itself, greatly limiting the potential scope of the problem. If you’re using SmartOS, you don’t need to worry about the OpenSSL in the platform image; OpenSSL in zones or KVM instances is another matter, and very likely does need attention.
Of greater interest to me is the presence of these files in the platform image:
/lib/64/libsunw_crypto.so.1.0.0 /lib/64/libsunw_ssl.so.1.0.0 /lib/libsunw_crypto.so.1.0.0 /lib/libsunw_ssl.so.1.0.0
You’ll note that there are no compilation symlinks (e.g., libsunw_ssl.so), nor are there any OpenSSL headers in /usr/include; together, that makes it very difficult for anyone to compile software that consumes the interfaces these libraries provide. We have however gone two steps beyond even that in our efforts to prevent customer software from using these libraries, and our reasons for doing so stem from many years of miserable experience delivering and using third-party libraries in Solaris and more recently illumos.
A Brief History of Sadness
In the very distant past, the Unix operating system was entirely self-contained: everything it relied upon was part of itself, and things that were not part of Unix were simply third-party software that operators could install or not as they saw fit. This was mostly fine, except that inevitably Bell Labs would not have a monopoly on the creation of software useful as part of an operating system. In time, the broad spread of innovation combined with changing expectations about what an operating system ought to provide led to the incorporation of various previously third-party software into many common Unix distributions. To the extent that Unix took over this software as repository of record, this was goodness: Unix grew new capabilities by adding the best software developed by others. Often, however, that was not the case, and the seeds of sadness were sown.
Fast-forward to the Solaris 10 era. By then, Solaris was delivering a fairly broad range of software for which the repository of record was outside Sun. There were two basic reasons for this:
- The architecture of a few popular GNU/Linux distributions had changed customer expectations; the definition of “an operating system” had expanded to include a huge range of random third-party software packages that are not needed to use or manage the system itself but could be installed using OS tooling for the customer’s use.
- Several third-party software packages, such as OpenSSL and libxml2, were being consumed directly by system software.
With the retirement of static linking with Solaris 9, these upstream libraries were being delivered as dynamic libraries just like the rest. Many of them, in order to accommodate the first use case in our list above, were delivered with headers and compilation symlinks as well. And with those headers and symlinks, the great sadness burst forth and thrived.
Architecturally, these libraries (and other software, though we’re concerned primarily with libraries here) provided interfaces that were not under Sun’s control. PSARC made some effort to communicate this to customers by requiring that these third-party interfaces generally be classified as External, later amended to Volatile before being collapsed back to Uncommitted. The gist of this, regardless of the precise terminology, was that customers consuming these interfaces were being told that they could not rely on them to remain compatible across minor releases (or even patches and updates). In a world in which the would-be consumers of these interfaces were mainly customers writing their own software from scratch, as was often the case with respect to other Solaris libraries in the past, that was usually adequate to address the problem. But the world had changed. Most of the software consuming these interfaces now is other third-party software, most of it originally developed on GNU/Linux and often built and tested nowhere else. Customers, somewhat understandably, wanted to be able to build and use (if not simply install via the OS’s packaging tools) those consumers — and have them work. Warning people that the interfaces were not to be relied upon was of little help; most customers were not directly aware that they were consuming them at all. People who were used to building software on a Solaris 2.4 system and running their binaries on everything from 2.4 through 10 were in for a rude surprise when patch releases broke their third-party software. Worse still, others were frustrated by the lag between injection of a piece of third-party software into the OS and its delivery; the versions of this software included with the operating system were invariably months or even years older than those currently available on the Internet. As a result, many pieces of third-party software that consumed these interfaces (but were not delivered with the OS) would not build or work correctly. The sadness raged throughout the land, ultimately contributing significantly to Solaris’s demise. Presumably Solaris still has this problem today.
Inside the Sadness
It should be apparent that this architectural model is untenable. An all-inclusive OS delivers, or attempts to deliver, almost every conceivable library as well as every remotely popular consumer. Since every piece of software is built consistently and packaged afresh for each release of the distribution, incompatible change in these third-party libraries (or even the OS itself) is of little importance to most customers. The primary measure of value is how many packages the distribution incorporates, not how well software built from third-party sources works across OS upgrades. While there are serious problems with this model, customers who stick with the packages provided by their distributor and upgrade frequently will at least in principle get the best of both worlds: recent software and a hassle-free upgrade path.
Similarly, an OS that is entirely self-contained and delivers no extra software (that is, software for which the repository of record is not the OS distributor’s own) is also safe. While users will be forced to obtain and build whatever software they desire, there is no architectural conflict: the interfaces provided by the OS remain compatible over time, so that the third-party software built by customers continues to work. This is the historical Unix model; it works well technically but often fails to meet customer expectations. Specifically, third-party software tends to be of exceptionally bad quality and is often developed by people with no understanding of portability whatsoever (or worse, an active disdain for it); as a result, getting it to build and work correctly is often a massive chore. Not surprisingly, customers aren’t enthusiastic about doing that themselves.
The origin of the sadness lies in seeking the middle ground: delivering a small subset of the extra software that customers would like to use. This “third way” is an architectural disaster, especially when combined with dynamic linking. Early Windows users may recall this class of problems as “DLL Hell”. Instead of offering the best of both worlds (plenty of extra software packages, low maintenance burden for the vendor, flexibility for the operator), it not only delivers the worst of both but introduces additional problems all its own:
- Assuming that upstream library developers do not offer a usable backward-compability guarantee (or do, but that new major releases of them are being made and consumed by other software customers want to build and use), there is an unresolvable tension between providing customers the latest and greatest for their own use and avoiding breakage across patch or minor releases.
- The vendor takes on a significant maintenance burden; if an upstream library consumed by the OS itself changes incompatibly, the vendor must recognise this and adapt the OS consumers accordingly. Otherwise, the only option is to fix the version of that library the OS delivers forever — which falls afoul of customer expectations.
- Because the libraries delivered with the OS are often too old to be consumed by the latest revisions of third-party software, it’s common for the customer to build their own copies of the upstream library. There are other reasons for this practice as well, ranging from customisation to self-contained application deployment models. While the other aspects of the sadness are problematic from a business perspective, this problem, as we will see, is a clear and present danger to correct operation — of both the OS itself and customer-built software.
- In Solaris, most of this upstream software was delivered in /usr/sfw, and software that consumed it had /usr/sfw/lib added to its DT_RPATH. In order to make the delivered gcc work properly, gcc itself then added this entry as well. In some releases. Meanwhile, GNU autoconf scripts throughout the world were chaotically updated to add it themselves (and/or to look in /usr/sfw/lib for libraries, whether the person building the software wanted it to or not). The end result was that reliably finding libraries at runtime became hit or miss.
At the core of the sadness are two fundamental problems:
- Ensuring that the linker finds the library version against which the consumer was built, when loaded at runtime.
- Preventing two incompatible library versions from occupying the same address space.
The first objective is relatively easy to achieve. If there is only one version of a library on the system, it’s simply a matter of ensuring that some combination of the consumer’s DT_RPATH and the system’s crle(1) configuration contains the path to the library, that the library’s filename matches the DT_SONAME of the library against which the consumer was built (because the consumer’s DT_NEEDED entry is recorded from it), and that the library is actually present on the system. It is easy to satisfy these constraints for all OS-delivered software; illumos’s build system generally does all of this, as does SmartOS’s. All that’s left for the end user to do is make sure LD_LIBRARY_PATH is not set, which sadly seems to be more difficult than one would expect (this is further complicated by third-party software that delivers shell scripts that explicitly set this environment variable even though it’s almost never necessary or appropriate).
The second objective is much more problematic. Consider the following library dependencies:
fooprog (RPATH: /usr/local/lib) | + DT_NEEDED: libA.so.1 => /usr/local/lib/libA.so.1 | + DT_NEEDED: libB.so.1 => /lib/libB.so.1 (RPATH: /lib:/usr/lib) | + DT_NEEDED: libA.so.1 => /lib/libA.so.1
In our example, libB is an illumos-specific library, while libA is an upstream library. The copy in /lib is delivered with the operating system, while the one in /usr/local/lib has been built by the customer (perhaps because fooprog requires a different version of it from the one delivered by the OS in /lib). It is very easy to end up in a situation in which both copies of libA will occupy this fooprog process’s address space. Chaos will ensue; initialisation code may reference the wrong static data, functions with incompatible signatures may be called, and a generally difficult to debug core dump is the likely eventual outcome. Direct binding can alleviate some of these effects, but few customers build third-party software using the correct options. In general, much stronger medicine is required.
Avoiding the Sadness with OpenSSL
In our example above, libA.so.1 is an OpenSSL library. There are two incompatible versions of OpenSSL in general circulation: 0.9.8 and 1.0.1. Historically, illumos used 0.9.8, and most distributions delivered that version along with compilation symlinks and headers. Because SmartOS did so as well prior to May 2012, simply removing OpenSSL 0.9.8 from the platform was not an option; doing so could easily break customer binaries accidentally built against it in the past. While we will eventually remove this library, it will be some time before we can safely assume that no one is still using binaries build prior to the removal of its compilation symlinks. This, then, is why the OpenSSL 0.9.8 libraries are still delivered by SmartOS. The platform software itself does not use this version, however.
Instead, the platform uses the new libsunw_ssl.so and libsunw_crypto.so. In order to avoid the problem described above, these libraries are protected in four ways:
- They have no compilation symlinks, so that linking in ‘-lsunw_crypto’ or similar will fail at build time.
- There are no associated headers, making it impossible to accidentally build third-party software using the provided interfaces.
- They have different names from those expected of OpenSSL libraries by third-party software (the “sunw_” prefix).
- The globally-visible symbols within the library are different from those delivered by normal OpenSSL libraries; they too are prefixed with “sunw_”.
All of these changes are needed both to avoid accidental use of these Private libraries by customer software and to allow safe coexistence with customer- or pkgsrc-delivered OpenSSL libraries in a customer process’s address space. As a user of a SmartOS instance (whether in the Joyent Public Cloud, your own private cloud based on SmartDataCenter, or on your SmartOS system at home), you don’t have to worry about the operating system’s copy of OpenSSL. The platform software linked with our Private copy will always use that copy; your software will always use the copy provided by pkgsrc or your own application deployment package. And if they do end up in the same address space, their names and symbols will not conflict.
If you’re curious how this is achieved, take a look at our upstream software build system. Because the rest of the platform software is built against a set of headers that include our special sunw_prefix.h, there is normally no change required to consumers. A few upstream consumers relying on GNU autoconf or similar mechanisms bypass headers in their attempts to detect the presence or version of OpenSSL; in these cases, a few modest changes are required. All told, the maintenance burden associated with this approach has been very modest; my colleague Robert Mustacchi was able to upgrade our Private OpenSSL from 1.0.1d+ to 1.0.1g with less than a few hours of work and a very simple set of changes.
There are a few other pieces of upstream software in SmartOS that will eventually require similar treatment. For now, because we have not modified the versions of libxml2, libexpat, and other such software in the platform from the last revision that was delivered with compilation symlinks, the existing libraries are doing double duty: they provide both backward-compatibility for customer software and important functionality consumed by the platform. As these are upgraded, we will take the same approach: the existing version will continue to be delivered for compatibility, while the new version will have its name and globally-visible symbols mangled. In all such cases, we already do not deliver compilation symlinks and headers.
As a user of a SmartOS instance, all you need to know is that software you build yourself should always depend on pkgsrc-delivered libraries, never those in the platform. It is of course safe to rely on platform libraries for which SmartOS or illumos is the repository of record, such as libc; this discussion is relevant only to software that is delivered by the platform but is also available from third parties. We’ll do the rest.