illumos Overlay Networks Development Preview 02

September 23, 2014

I’m happy to announce the second development preview of my network vitalization or if you like to use buzzwords, software defined networking, for illumos. Like the previous entry, the goal of this is to give folks something to play around with and get a sense of what this looks like for a user and administrator.

The dev-overlay branch of illumos-joyent has all the source code and has been merged up to illumos and illumos-joyent from September 22nd.

This is a development preview, so it’s using a debug build of illumos. This is not suitable for production use. There are bugs; expect panics.

How we got here

It’s worth taking a little bit of time to understand the class of problems that we’re trying to solve. At the core of this work is a desire to have multiple logical layer two networks be able to all use one physical, or underlay, network. This means, that you can run multiple virtual machines that each have their own independent set of VLANs and private address space, so both Alice and Bob can have their own VMs using the same private IP addresses, like 10.1.2.3/24 and be confident that they will not see each others traffic.

What’s in this Release

This release builds on from the last release which had simple point to point tunnels. This release adds support for the following:

snoop support for decoding VXLAN frames
Kernel overlay driver support for dynamic plugins
A new files backend for varpd that supports proxy arp, ndp, and dhcpv4

This release also has a similar set of known issues:

All overlay devices are temporary
Overlay device deletion still isn’t 100% there
Overlay devices only work in the global zone
It is missing manual pages

Dynamic Plugins

In the first release, overlay devices only supported the direct plugin which always sent all traffic to a single destination. While useful, it meant that a given tunnel was limited to being point to point. The notion of a dynamic plugin changes this entirely. In this world, traffic can be encapsulated and sent to different hosts based on its destination mac address. Instead of getting a single destination from userland at device creation, the kernel goes and asks userland to supply it with the destination on demand.

Allowing an answer to be supplied this way makes it much easier to write different ways of answering the question in userland. As individuals and organizations figure out their own strategy here, it makes it much easier to interface with existing centralized databases or extant distributed systems.

In addition, as part of writing a simple files backend, I wrote several routines that can be used to inject proxy ARP, proxy NDP, and proxy DHCPv4 requests. Having these primitives in the common library makes it much easier for different backends which don’t support multicast or broadcast traffic to have something to use.

The files plugin format

In the next section we’ll show an example of getting started and having three different VMs use the same file for understanding our virtual network’s layout. Here’s a copy of the file /var/tmp/hosts.json that I’ve been using:

# cat /var/tmp/hosts.json
{

        "de:ad:be:ef:00:00": {
                "arp": "10.55.55.2",
                "ip": "10.88.88.69",
                "ndp": "fe80::3",
                "port": 4789
        }, "de:ad:be:ef:00:01": {
                "arp": "10.55.55.3",
                "dhcp-proxy": "de:ad:be:ef:00:00",
                "ip": "10.88.88.70",
                "ndp": "fe80::4",
                "port": 4789
        }, "de:ad:be:ef:00:02": {
                "arp": "10.55.55.4",
                "ip": "10.88.88.71",
                "ndp": "fe80::5",
                "port": 4789
        }
}

In this JSON blob, the key is the MAC address of a VNIC. With each key, there must be a member entitled ip and port. These are used by the plugin to answer the question of, where should a packet with this mac address be sent? The ip member may be either an IPv4 or IPv6 address.

Machines send packets to a specific MAC address. They look up the mappings between a MAC address and an IP address using different mechanisms for IPv4 and IPv6. IPv4 uses ARP to get this information which devolves into using broadcast frames, while NDP is built into IPv6 and uses ICMPv6. Those messages are generally sent using specific multicast addresses. However, because this backend does not support broadcast or multicast traffic, we need to do something a little different.

When the kernel encounters a destination MAC address that it doesn’t recognize, it asks userland where it should send it. Userland in turn looks at the layer 2 header and determines what it should do. When it sees something that gives the sign that it might be an ARP or NDP packet, it pulls down a copy of the entire packet and if it confirms that it is in fact an ARP or NDP packet, it will generate a response on its own using information encoded in the JSON file above and that will be injected into the overlay device for delivery.

The system determines the mapping between an IPv4 address and its MAC address by supplying an IP address in the arp field. It determines the mapping between an IPv6 address and its mac address by using the ndp field.

Finally, to better explore this prototype, I implemented a DHCP proxy capability. While DHCP has a defined system of relaying, the relay expects to be able to receive layer 2 broadcast packets. Instead, if we see a UDP broadcast packet that’s doing a DHCP query, we rewrite the frame to send it explicitly to the destination MAC address listed in the dhcp-proxy member. In this case, if I run a DHCPv4 server on the host listed on the first entry, it will properly serve a DHCP address to the mac address that has the dhcp-proxy entry. However, an important thing to remember with this, is that even though DHCP was able to assign an address, one still needs to be able to perform ARP and therefore if the address doesn’t match the one in the files entry, it will not work. To be able to do that properly, you need to write a plugin that’s a bit more complicated than the files backend.

Getting Started

This development release of SmartOS comes in three flavors:

Once you boot this version of SmartOS, you should be good to go. As an example, I’ll show how I set up three individual hosts, which we’ll call zlan, zlan2, and zlan3. I put the JSON file shown above, as the file /var/tmp/hosts.json. On the host zlan I ran the following
commands:

# dladm create-overlay -v 23 -e vxlan -s files -p vxlan/listen_ip=10.88.88.69 -p files/config=/var/tmp/hosts.json overlay0
# dladm create-vnic -m de:ad:be:ef:00:00 -l overlay0 foo0
# ifconfig foo0 plumb up 10.55.55.2/24
# ifconfig foo0 inet6 plumb up
# ifconfig foo0 inet6 fe80::3

On the host zlan2 I ran the following:

# dladm create-overlay -v 23 -e vxlan -s files -p vxlan/listen_ip=10.88.88.70 -p files/config=/var/tmp/hosts.json overlay0
# dladm create-vnic -m de:ad:be:ef:00:01 -l overlay0 foo0
# ifconfig bar0 plumb up 10.55.55.3/24
# ifconfig bar0 inet6 plumb up
# ifconfig bar0 inet6 fe80::4

And finally on the host zlan3, I ran the following:

# dladm create-overlay -v 23 -e vxlan -s files -p vxlan/listen_ip=10.88.88.71 -p files/config=/var/tmp/hosts.json overlay0
# dladm create-vnic -m de:ad:be:ef:00:02 -l overlay0 baz0
# ifconfig baz0 plumb up 10.55.55.4/24
# ifconfig baz0 inet6 plumb up
# ifconfig baz0 inet6 fe80::5

With all that done, all three hosts could ping and access network services on each other.

Concluding

The dynamic plugins allow us to start building and experimenting with something a bit more and interesting than the point to point tunnel. From here, there isn’t as much core functionality that’s necessary to add, but there’s a lot more stability and improvements throughout the stack. In addition, from here, I’ll be experimenting with some more distributed systems to make the next dynamic plugin, much more dynamic.

If you have any feedback, suggestions, or anything else, please let me know. You can find me on IRC (rmustacc in #smartos and #illumos on irc.freenode.net) or on the smartos-discuss mailing list. If you’d like to work on support for other encapsulation methods such as NVGRE or want to see how implementing a dynamic mapping service might be, reach out to me.

Tales from a Core File

illumos Overlay Networks Development Preview 02

How we got here

What’s in this Release

Dynamic Plugins

The files plugin format

Getting Started

Concluding

Recent Posts

USB Topology

Transceivers: The Device Between the NIC and the Network

A Tale of Two LEDs

CPU and PCH Temperature Sensors in illumos

Turtles on the Wire: Understanding how the OS uses the Modern NIC

illumos day 2014

Archives

Archives