What is Shadow Migration?
In the Sun Storage 7000 2009.Q3 software release, one of the major new features I worked on was the addition of what we termed “shadow migration.” When we launched the product, there was no integrated way to migrate data from existing systems to the new systems. This resulted in customers rolling it by hand (rsync, tar, etc), or paying for professional services to do the work for them. We felt that we could present a superior model that would provide for a more integrated experience as well as let the customer leverage the investment in the system even before the migration was complete.
The idea in and of itself is not new, and various prototypes of this have been kicking around inside of Sun under various monikers (“brain slug”, “hoover”, etc) without ever becoming a complete product. When Adam and myself sat down shortly before the initial launch of the product, we decide we could do this without too much work by integrating the functionality directly in the kernel. The basic design requirements we had were:
- We must be able to migrate over standard data protocols (NFS) from arbitrary data sources without the need to have special software running on the source system.
- Migrated data must be available before the entire migration is complete, and must be accessible with native performance.
- All the data to migrate the filesystem must be stored within the filesystem itself, and must not rely on an external database to ensure consistency.
With these requirements in hand, our key insight was that we could create a “shadow” filesystem that could pull data from the original source if necessary, but then fall through to the native filesystem for reads and writes one the file has been migrated. What’s more, we could leverage the NFS client on Solaris and do this entirely at the VFS (virtual filesystem) layer, allowing us to migrate data between shares locally or (eventually) over other protocols as well without changing the interpositioning layer. The other nice thing about this architecture is that the kernel remains ignorant of the larger migration process. Both synchronous requests (from clients) and background requests (from the management daemon) appear the same. This allows us to control policy within the userland software stack, without pushing that complexity into the kernel. It also allows us to write a very comprehensive automated test suite that runs entirely on local filesystems without need a complex multi-system environment.
So what’s better (and worse) about shadow migration compared to other migration strategies? For that, I’ll defer to the documentation, which I’ve reproduced here for those who don’t have a (virtual) system available to run the 2009.Q3 release:
Migration via synchronization
This method works by taking an active host X and migrating data to the new host Y while X remains active. Clients still read and write to the original host while this migration is underway. Once the data is initially migrated, incremental changes are repeatedly sent until the delta is small enough to be sent within a single downtime window. At this point the original share is made read-only, the final delta is sent to the new host, and all clients are updated to point to the new location. The most common way of accomplishing this is through the rsync tool, though other integrated tools exist. This mechanism has several drawbacks:
- The anticipated downtime, while small, is not easily quantified. If a user commits a large amount of change immediately before the scheduled downtime, this can increase the downtime window.
- During migration, the new server is idle. Since new servers typically come with new features or performance improvements, this represents a waste of resources during a potentially long migration period.
- Coordinating across multiple filesystems is burdensome. When migrating dozens or hundreds of filesystems, each migration will take a different amount of time, and downtime will have to be scheduled across the union of all filesystems.
Migration via external interposition
This method works by taking an active host X and inserting a new appliance M that migrates data to a new host Y. All clients are updated at once to point to M, and data is automatically migrated in the background. This provides more flexibility in migration options (for example, being able to migrate to a new server in the future without downtime), and leverages the new server for already migrated data, but also has significant drawbacks:
- The migration appliance represents a new physical machine, with associated costs (initial investment, support costs, power and cooling) and additional management overhead.
- The migration appliance represents a new point of failure within the system.
- The migration appliance interposes on already migrated data, incurring extra latency, often permanently. These appliances are typically left in place, though it would be possible to schedule another downtime window and decommission the migration appliance.
Shadow migration uses interposition, but is integrated into the appliance and doesn’t require a separate physical machine. When shares are created, they can optionally “shadow” an existing directory, either locally (see below) or over NFS. In this scenario, downtime is scheduled once where the source appliance X is placed into read-only mode, a share is created with the shadow property set, and clients are updated to point to the new share on the Sun Storage 7000 appliance. Clients can then access the appliance in read-write mode.
Once the shadow property is set, data is transparently migrated in the background from the source appliance locally. If a request comes from a client for a file that has not yet been migrated, the appliance will automatically migrate this file to the local server before responding to the request. This may incur some initial latency for some client requests, but once a file has been migrated all accesses are local to the appliance and have native performance. It is often the case that the current working set for a filesystem is much smaller than the total size, so once this working set has been migrated, regardless of the total native size on the source, there will be no perceived impact on performance.
The downside to shadow migration is that it requires a commitment before the data has finished migrating, though this is the case with any interposition method. During the migration, portions of the data exists in two locations, which means that backups are more complicated, and snapshots may be incomplete and/or exist only on one host. Because of this, it is extremely important that any migration between two hosts first be tested thoroughly to make sure that identity management and access controls are setup correctly. This need not test the entire data migration, but it should be verified that files or directories that are not world readable are migrated correctly, ACLs (if any) are preserved, and identities are properly represented on the new system.
Shadow migration implemented using on-disk data within the filesystem, so there is no external database and no data stored locally outside the storage pool. If a pool is failed over in a cluster, or both system disks fail and a new head node is required, all data necessary to continue shadow migration without interruption will be kept with the storage pool.
In a subsequent post, I’ll discuss some of the thorny implementation detail we had to solve, as well as provide some screenshots of migration in progress. In the meantime, I suggest folks download the simulator and upgrade to the latest software to give it a try.