Containers in SX build 56

February 1, 2007

The many Resource Management (RM) features in Solaris
have been developed and evolved over the course of years and several releases.
We have resource controls, resource pools, resource capping
and the Fair Share Scheduler (FSS). We have rctls, projects, tasks,
cpu-shares, processor sets and the rcapd(1M). All of these features
have different commands and syntax to configure the
feature. In some cases, particularly with resource pools, the
syntax is quite complex and long sequences of commands are needed
to configure a pool. When you first look at RM it is not immediately
clear when to use one feature vs. another or if some combination
of these features is needed to achieve the RM objectives.

In Solaris 10 we introduced Zones, a lightweight system virtualization
capability. Marketing
coined the term ‘containers’ to refer to a combination of
Zones and RM within Solaris. However, the integration
between the two was fairly weak. Within Zones we had the ‘rctl’
configuration option, which you could use to set a couple of zone specific
resource controls, and we had the ‘pool’ property which could
be used to bind the zone to an existing resource pool, but that was it.
Just setting the ‘zone.cpu-shares’ rctl wouldn’t actually
give you the right cpu shares unless you also configured the
system to use FSS. But, that was
a separate step and easily overlooked. Without the correct
configuration of these various, disparate components even a simple
test, such as a fork bomb within a zone, could disrupt the
entire system.

As users started experimenting with Zones we found that
many of them were not leveraging the RM capabilities provided
by the system. We would get dinged in

evaluations
because Zones, without a correct RM configuration, didn’t provide all
of the containment users needed.
We always expected Zones and RM to be used together, but due the
the complexity of the RM features and the loose integration between
the two, we were seeing that few Zones users actually had a proper RM
configuration. In addition, our RM for memory control
was limited to rcapd running within a zone and capping RSS on projects.
This wasn’t really adequate.

About 9 months ago the Zones engineering team started a project to
try to improve this situation. We didn’t want to just paper over
the complexity with things like a GUI or wizards, so it took
us quite a bit of design before we felt like we hit upon
some key abstractions that we could use to truly simplify the
interaction between the two components. Eventually we settled upon
the idea of organizing the RM features into ‘dedicated’ and ‘capped’
configurations for the zone. We enhanced resource pools to add
the idea of a ‘temporary pool’ which we could dynamically instantiate
when a zone boots. We enhanced rcapd(1M) so that we could do physical
memory capping from the global zone. Steve Lawrence did a lot
of work to improve resident set size (RSS) accounting as well
as adding new rctls for maximum swap and locked memory.
These new features significantly improve RM of memory for Zones.
We then enhanced the Zones infrastructure to automatically do
the work to set up the various RM features that were configured
for the zone. Although the project made many smaller
improvements, the key ideas are the two new configuration options
in zonecfg(1M). When configuring a zone you can now configure
‘dedicated-cpu’ and ‘capped-memory’. Going forward, as additional
RM features are added, we anticipate this idea will evolve gracefully
to add ‘dedicated-memory’ and ‘capped-cpu’ configuration. We also
think this concept can be easily extended to support RM features for other
key parts of the system such as the network or storage subsystem.

Here is our simple diagram of how we eventually unified the RM
view within Zones.

| dedicated  |  capped
---------------------------------
cpu    | temporary  | cpu-cap
| processor  | rctl*
| set        |
---------------------------------
memory | temporary  | rcapd, swap
| memory     | and locked
| set*       | rctl

*

memory sets
and

cpu caps
are under development but are not yet part of Solaris.

With these enhancements, it is now almost
trivial to configure RM for a zone. For example, to configure
a resource pool with a set of up to four cpu’s, all you do in zonecfg is:

zonecfg:my-zone> add dedicated-cpu
zonecfg:my-zone:dedicated-cpu> set ncpus=1-4
zonecfg:my-zone:dedicated-cpu> set importance=10
zonecfg:my-zone:dedicated-cpu> end

To configure memory caps, you would do:

zonecfg:my-zone> add capped-memory
zonecfg:my-zone:capped-memory> set physical=50m
zonecfg:my-zone:capped-memory> set swap=128m
zonecfg:my-zone:capped-memory> set locked=10m
zonecfg:my-zone:capped-memory> end

All of the complexity of configuring the associated RM capabilities
is then handled behind the scenes when the zone boots. Likewise,
when you migrate a zone to a new host, these RM settings migrate too.

Over the course of the project we

discussed
these ideas within the
opensolaris Zones community where we benefited from much good
input which we used in the final design and implementation.
The full details of the project are available

here
and

here.

This work is available in

Solaris Express
build 56 which was just
posted. Hopefully folks using Zones will get a chance to try
out the new features and let us know what they think. All of
the core engineering team actively participates in the

zones discuss
list and we’re happy to try to answer any questions or just hear
your thoughts.

2 Responses

savagex says:

February 1, 2007 at 1:18 pm

Very interesting. Zones with memory and cpu restrictions are very power tool for virtualization.
Pingback: Getting Your Virtualization Priorities Straight « Joyeur

Jerry Jelinek's blog

Containers in SX build 56

2 Responses

Recent Posts

Joyent. Wow!

An End and a Beginning

solaris10 branded zones on OpenSolaris

Community One Slides

Free Community One Deep Dive

Running Solaris 10 on OpenSolaris

Archives

Archives