Eric Schrock's Blog

Lessons in broken interfaces

July 24, 2004

In build 60 (Beta 5 or SX 7/04), I fixed a long standing Solaris bug: mounted filesystems could not contain spaces. We would happily mount the filesystem, but then all consumers of /etc/mnttab would fail. This resulted in sad situations like:

# df -h
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c0d0s0         36G    13G    22G    38%    /
/devices                 0K     0K     0K     0%    /devices
/dev/dsk/c0d0p0:boot    11M   2.3M   8.4M    22%    /boot
/proc                    0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
fd                       0K     0K     0K     0%    /dev/fd
swap                  1002M    24K  1002M     1%    /var/run
swap                  1003M   1.3M  1002M     1%    /tmp
# mount -F lofs /export/space\ dir /mnt/space\ mnt
/export/space dir       /mnt/space mnt  lofs    dev=1980000     1090718041
# df -h
df: a line in /etc/mnttab has too many fields
#

Luckily you could unmount the filesystem, but it was quite annoying to say the least. The resulting fix was really an exploration into bad interface design.

/etc/mnttab

This file has been around since the early days of Unix (at least as far back as SVR3). Each line is a whitespace-delimited set of fields, including special device, mount point, filesystem type, mount options, and mount time (see mnttab(4) for more information). Historically, this was a plain text file. This meant that the user programs mount(1M) and umount(1M) were responsible for making sure its contents were kept up to date. This could be very problematic: imagine what would happen if the program died partway through adding an entry, or root accidently removed an entry without actually unmounting it. Once the contents were corrupted, the admin usually had to resort to rebooting, rather than trying to guess what the proper contents. Not to mention it makes mounting filesystems from within the kernel unnecessarily complicated.

In Solaris 8, we solved part of the problem by creating the mntfs pseudo filesystem. From this point onward, /etc/mnttab was no longer a regular text file, but a mounted filesystem. The contents are generated on-the-fly from the kernel data structures. This means that the contents are always in sync with the kernel1, and that the user can’t accidentally change the contents. However, we still had the problem that the mount points could not contain spaces, because space was a delimiter with special meaning.

getmntent() and friends

On top of this broken interface, a C API was developed that had even worse problems. Consider getmntent(3c):

int getmntent(FILE *fp, struct mnttab *mp);

There are several problems with this interface:

  1. The user is responsible for opening and closing the file

    There is only one mount state for the kernel; why should the user have to know that /etc/mnttab is the place where the entries are stored?

  2. The first parameter is a FILE *

    If you’re developing a system interface, you should not enforce using the C stdio library. Every other system API takes a normal file descriptor instead./p>

  3. The memory is allocated by the function on demand

    This causes all sorts of problems, including making multithreaded difficult, and preventing the user from controlling the size of the buffer used to read in the data.

  4. There is no relationship between the memory and the open file

    Because of this, a lazy programmer can close the file after the last call to getmntent() while still using the memory, so it must be kept around indefinitely.

By now, it should be obvious that this was an ill-conceived API built on top of a broken interface. Off the top of my head, if I were to re-design these interfaces I would come up with something more like:

mnttab_t *mnttab_init(void);
int mnttab_get(mnttab_t *mnttab, struct mntent *ent, void *scratch, size_t scratchlen);
void mnttab_fini(mnttab_t *mnttab);

The solution

Once /etc/mnttab became a filesystem, we could add ioctl(2) calls to do whatever we wanted. Once we’re in the kernel, we know exactly how long each field of the structure is. We create a set of NULL-terminated strings directly in user space, and simply return pointers to them. This was more complicated than it sounds for the reasons outlined above. We also had to maintain the ability to read the file directly. With this fix, all C consumers “just work”. Scripted programs will still choke on a mnttab entry with spaces, but this is a minority by far.

Note that the files /etc/vfstab and /etc/dfs/sharetab still suffer from this problem. There has been some discussion about how to resolve these issues, with the new Service Management Facility being touted as a possible solution. And ZFF (Sun’s next generation filesystem) is avoiding /etc/vfstab altogether.


1 There is always the possibility that the mounted filesystems change between the time the file is opened and the data is read.

Recent Posts

April 21, 2013
February 28, 2013
August 14, 2012
July 28, 2012

Archives