Racing in the depths of SMF

I recently found myself having to dive into the depths of SMF — The SunOS (illumos / Solaris) Service Management Framework — to debug a nasty race condition between svccfg import and svcadm enable -s. Understanding what happened sent me chasing around and dealing with a cheerful cast of characters that you might or might not expect: svc.configd, svc.startd, the EMI (early manifest import) service, and the ON build process. I found myself digging and doing a lot of reading to understand how all these different pieces worked together and communicated, which made me realize that this would be incredibly useful for the next person (really when I forget) who has to make another trip back into this important yet quite complicated subsystem.

The Problem

We had a heavily loaded system that was doing boot up and initializing lots of zones. This was running on VMware Fusion, which while great for development, is understandably not a performance king. During this process we have lots of scripts that do something similar to the following shell snippet:

# svccfg import service.xml
# svcadm enable -s service
svcadm: svc:/SERVICE/:default is misconfigured (lacks "restarter" property group)

Well, that’s a problem. Now, you might say that obviously our manifest is misconfigured, but that actually isn’t the case. Manifests optionally may specify a restarter property group. If they don’t, svc.startd takes control of restarting the instance. This is what the majority of services want so the problem here isn’t that we didn’t specify the restarter group, but for some reason it’s missing after we imported! Before we can explain what actually happened and how to fix it, we need to do a bit of an explanation for how SMF works and communicates. Keep in mind I didn’t write SMF, so there may be one or two oversights.

Rough SMF Architecture

There are a few different components that make up SMF and are responsible for different pieces of functionality that are used:

Now how all of these work together is far from simple, in fact it can be quite confusing. Here’s a block diagram I put together that helps explain everything and how they all communicate:

/*
 * The SMF Block Diagram
 *                                                       Repository
 *   This attempts to show       ___________             __________
 *   the relations between       |         |     SQL     |        |
 *   the different pieces        | configd |<----------->| SQLite |
 *   that make SMF work and      |         | Transaction |        |
 *   users/administrators        -----------             ----------
 *   call into.                  /|\    /|\
 *                                |      |
 *                   door_call(3C)|      | door_call(3C)
 *                                |      |
 *                               \|/    \|/
 *      ____________     __________      __________      ____________
 *      |          |     |        |      |        |      |  svccfg  |
 *      |  startd  |<--->| libscf |      | libscf |<---->|  svcadm  |
 *      |          |     | (3LIB) |      | (3LIB) |      |   svcs   |
 *      ------------     ----------      ----------      ------------
 *       /|\    /|\
 *        |      | fork(2)/exec(2)
 *        |      | libcontract(3LIB)
 *       \|/    \|/                          Various System/User services
 *       ---------------------------------------------------------------------
 *       | system/filesystem/local:default      system/coreadm:default       |
 *       | network/lookpback:default            system/zones:default         |
 *       | network/ntp:default                  system/cron:default          |
 *       | smartdc/agent/ca/cainstsvc:default   network/ssh:default          |
 *       | appliance/kit/akd:default            system/svc/restarter:default |
 *       ---------------------------------------------------------------------
 */

Chatting with configd and sharing repository information

As you run commands with svcs, svccfg, and svcadm, they are all creating a libscf handle to communicate with configd. As calls are made via libscf they ultimately go and talk to configd to get information. However, how we actually are talking to configd is not as straightforward as it appears.

When configd starts up it creates a door located at /etc/svc/volatile/repository_door. This door runs the routine called main_switcher() from usr/src/cmd/svc/configd/maindoor.c. When you first invoke svc(cfg|s|adm), one of the first things that occurs is creating a scf_handle_t and binding it to configd by calling scf_handle_bind(). This function makes a door call to configd and gets returned a new file descriptor. This file descriptor is itself another door which calls into configd’s client_switcher(). This is the door that is actually used when getting and fetching properties, and many other useful things.

svc.startd needs a way to notice the changes that occur to the repository. For example, if you enabled a service that was not previously running, it’s up to startd to notice that this has happened, check dependencies, and eventually start up the service. The way it gets these notifications is via a thread who’s sole purpose in life is to call _scf_notify_wait(). This function acts like poll(2) but for changes that occur in the repository. Once this thread gets the event, it dispatches it handles the event appropriately.

The Events of svc.startd

svc.startd has to handle a lot of complexity. Understanding how you go from getting the notification that a service was enabled to actually enabling it is not obvious from a cursory glance. The first thing to keep in mind is that startd maintains a graph of all the related services and instances so it can keep track of what is enabled, what dependencies exist, etc. all so that it can answer the question of what is affected by a change. Internally there are a lot of different queues for events, threads to process these queues, and different paths to have events enter these queues. What follows is a diagram that attempts to explain some of those paths, though it’s important to note that for some of these pieces, such as the graph and vertex events, there are many additional ways and code paths these threads and functions can take. And yes, restarter_event_enqueue() is not the same thing as restarter_queue_event().

/*
 *   Threads/Functions                 Queues                  Threads/Functions
 *
 * called by various
 *     ------------------             ---------                  ---------------
 * --->| graph_protocol | graph_event | graph |   graph_event_   | graph_event |
 * --->| _send_event()  |------------>| event |----------------->| _thread     |
 *     ------------------ _enqueue()  | queue |   dequeue()      ---------------
 *                                    ---------                         |
 *  _scf_notify_wait()                               vertex_send_event()|
 *  |                                                                  \|/
 *  |  --------------------                              ----------------------
 *  |->| repository_event | vertex_send_event()          | restarter_protocol |
 *     | _thread          |----------------------------->| _send_event()      |
 *     --------------------                              ----------------------
 *                                                          |    | out to other
 *                restarter_                     restarter_ |    | restarters
 *                event_dequeue() -------------  event_     |    | not startd
 *               |----------------| restarter |<------------|    |------------->
 *              \|/               |   event   |  enqueue()
 *      -------------------       |   queue   |             |------------------>
 *      | restarter_event |       -------------             ||----------------->
 *      | _thread         |                                 |||---------------->
 *      -------------------                                 ||| start/stop inst
 *               |               ----------------       ----------------------
 *               |               |   instance   |       | restarter_process_ |
 *               |-------------->|    event     |------>| events             |
 *                restarter_     |    queue     |       | per-instance lwp   |
 *                queue_event()  ----------------       ----------------------
 *                                                          ||| various funcs
 *                                                          ||| controlling
 *                                                          ||| instance state
 *                                                          |||--------------->
 *                                                          ||---------------->
 *                                                          |----------------->
 */

What’s important to take away is that there is a queue for each instance on the system that handles events related to dealing directly with that instance and that events can be added to it because of changes to properties that are made to configd and acted upon asynchronously by startd.

How does the restarter property group show up

The last thing that we wanted to answer was where does the restarter property actually get set if it is not specified. While looking around the source code, I finally came across an interesting function: libscf_inst_get_or_add_pg. This function was getting called in a few various places and specifies the restarter property group. However, none of this is done in configd or svccfg when you import the manifest. Rather it is all taken care of by startd asynchronously.

To test that this was getting called when you imported a service for the first time and verify that this was getting called by startd, I used the following DTrace snippet that utilizes the pid provider. For more on how to use it, consult Brendan’s blog articles on the pid provider.

[root@headnode (coal:0) ~]# dtrace -n 'pid8::libscf_inst_get_or_add_pg:entry{
printf("%s", copyinstr(arg1)); ustack(); }'
dtrace: description 'pid8::libscf_inst_get_or_add_pg:entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  82690  libscf_inst_get_or_add_pg:entry restarter
              svc.startd`libscf_inst_get_or_add_pg
              svc.startd`libscf_note_method_log+0x6c
              svc.startd`method_run+0x132
              svc.startd`method_thread+0x184
              libc.so.1`_thrp_setup+0x9b
              libc.so.1`_lwp_start

  0  82690  libscf_inst_get_or_add_pg:entry restarter
              svc.startd`libscf_inst_get_or_add_pg
              svc.startd`libscf_note_method_log+0x6c
              svc.startd`method_run+0x132
              svc.startd`method_thread+0x184
              libc.so.1`_thrp_setup+0x9b
              libc.so.1`_lwp_start

  0  82690  libscf_inst_get_or_add_pg:entry restarter
              svc.startd`libscf_inst_get_or_add_pg
              svc.startd`libscf_write_start_pid+0x6e
              svc.startd`method_run+0x43a
              svc.startd`method_thread+0x184
              libc.so.1`_thrp_setup+0x9b
              libc.so.1`_lwp_start

  1  82690  libscf_inst_get_or_add_pg:entry restarter
              svc.startd`libscf_inst_get_or_add_pg
              svc.startd`libscf_write_method_status+0xbc
              svc.startd`write_status+0x2f
              svc.startd`method_run+0x616
              svc.startd`method_thread+0x184
              libc.so.1`_thrp_setup+0x9b
              libc.so.1`_lwp_start

From this, we see that as a part of getting ready to actually run the specified instance we’re writing out the restarter property group. Thus svccfg should not return until this this property group has been added by startd otherwise we will see invalid state that causes the tools like svcs and svcadm to complain.

The fix and some gotchas

So, the fix here is actually pretty straightforward. What we want to do is after we have imported all of the services and instances associated with a given manifest, we want to verify that every service and instance has a restarter property group. They will have this property group regardless of whether the instance is enabled, disabled, in maintenance, or can’t start due to missing dependencies. The logic here is very simple, iterate over each service and instance specified in the manifest and don’t move on until we can retrieve that property group. Once we can, move onto the next instance. This is pretty straightforward, but there are two times when this logic surprisingly breaks that we have to watch out for and special case.

Native Build

I discovered that as a part of the build process for ON, there is a phase where it builds a version of svc.configd and svccfg which it calls svc.configd-native and svccfg-native. These create initial repositories for the system. However, they are designed to run separately from the normal series of configd and startd that are on the system. In fact, there is no native startd while the native configd and svccfg are running. If we did this check, the restarter property groups will never be created and the build will always spin forever. The only solution is to not do the check. There are a few other places throughout configd and svccfg that already have to deal with the fact that we’re using the same source base and running it in two very different environments. We can work around it by using the preprocessor directive NATIVE_BUILD and a few #ifdefs. I did not introduce that directive, it was already being used liberally in configd and in a few places in svccfg.

Early Manifest Import

PSARC 2010/013 SMF Early Manifest Import introduced a substantial change in when various manifest are imported into the repository during boot. In this case svc.startd purposefully does not listen for notifications from configd while it is running EMI. This has two important ramifications:

To deal with this, we check the state of the EMI service. If the instance is online, that means that EMI has successfully finished and will never run again until the next time the system boots. This is how svc.startd makes sure not to run it twice in case startd restarts. In our case, we do not try and verify that the instance has a restarter property group unless svc:/system/early-manifest-import:default is online.

The likelihood of the race condition occurring after EMI starts is very unlikely because most start methods are not calling svcadm enable -s on some other service that was imported via EMI, but that does not mean it does not exist and it is worth keeping that in mind if writing the manifest for such a service.

Takeaways

Hopefully the block diagrams here help someone who is making future dives into the depths of SMF. If you do, here are a couple things to keep in mind:

Posted on April 4, 2011 at 11:15 am by rm · Permalink
In: SunOS · Tagged with: ,