TCP puzzlers

This post was cross-posted to the Joyent blog.

It’s been said that we don’t really understand a system until we understand how it fails. Despite having written a (toy) TCP implementation in college and then working for several years in industry, I’m continuing to learn more deeply how TCP works — and how it fails. What’s been most surprising is how basic some of these failures are. They’re not at all obscure. I’m presenting them here as puzzlers, in the fashion of Car Talk and the old Java puzzlers. Like the best of those puzzlers, these are questions that are very simple to articulate, but the solutions are often surprising. And rather than focusing on arcane details, they hopefully elucidate some deep principles about how TCP works.

Prerequisites

These puzzlers assume some basic knowledge about working with TCP on Unix-like systems, but you don’t have to have mastered any of this before diving in. As a refresher:

  • TCP states, the three-way handshake used to establish a connection, and the way connections are terminated are described pretty concisely on the TCP Wikipedia page.
  • Programs typically interact with sockets using read, write, connect, bind, listen, and accept. There’s also send and recv, but for our purposes, these work the same way as read and write.
  • I’ll be talking about programs that use poll. Although most systems use something more efficient like kqueue, event ports, or epoll, these are all equivalent for our purposes. As for applications that use blocking operations instead of any of these mechanisms: once you understand how TCP failure modes affect poll, it’s pretty easy to understand how it affects blocking operations as well.

You can try all of these examples yourself. I used two virtual machines running under VMware Fusion. The results match my experiences in our production systems. I’m testing using the nc(1) tool on SmartOS, and I don’t believe any of the behavior shown here is OS-specific. I’m using the illumos-specific truss(1) tool to trace system calls and to get some coarse timing information. You may be able to get similar information using dtruss(1m) on OS X or strace(1) on GNU/Linux.

nc(1) is a pretty simple tool. We’ll use it in two modes:

  • As a server. In this mode, nc will set up a listening socket, call accept, and block until a connection is received.
  • As a client. In this mode, nc will create a socket and establish a connection to a remote server.

In both modes, once connected, each side uses poll to wait for either stdin or the connected socket to have data ready to be read. Incoming data is printed to the terminal. Data you type into the terminal is sent over the socket. Upon CTRL-C, the socket is closed and the process exits.

In these examples, my client is called kang and my server is called kodos.

Warmup: Normal TCP teardown

This one demonstrates a very basic case just to get the ball rolling. Suppose we set up a server on kodos:

Server

[root@kodos ~]# truss -d -t bind,listen,accept,poll,read,write nc -l -p 8080
Base time stamp:  1464310423.7650  [ Fri May 27 00:53:43 UTC 2016 ]
 0.0027 bind(3, 0x08065790, 32, SOV_SOCKBSD)            = 0
 0.0028 listen(3, 1, SOV_DEFAULT)                       = 0
accept(3, 0x08047B3C, 0x08047C3C, SOV_DEFAULT, 0) (sleeping...)

(Remember, in these examples, I’m using truss to print out the system calls that nc makes. The -d flag prints a relative timestamp and the -t flag selects which system calls we want to see.)

Now on kang, I establish a connection:

Client

[root@kang ~]# truss -d -t connect,pollsys,read,write,close nc 10.88.88.140 8080
Base time stamp:  1464310447.6295  [ Fri May 27 00:54:07 UTC 2016 ]
...
 0.0062 connect(3, 0x08066DD8, 16, SOV_DEFAULT)         = 0
pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...)

Over on kodos, we see:

Server

23.8934 accept(3, 0x08047B3C, 0x08047C3C, SOV_DEFAULT, 0) = 4
pollsys(0x08045680, 2, 0x00000000, 0x00000000) (sleeping...)

At this point, the TCP connections are in state ESTABLISHED and both processes are in poll. We can see this using the netstat tool on each system:

Server

[root@kodos ~]# netstat -f inet -P tcp -n

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
10.88.88.140.8080    10.88.88.139.33226   1049792      0 1049800      0 ESTABLISHED
...
Client

[root@kang ~]# netstat -f inet -P tcp -n

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
10.88.88.139.33226   10.88.88.140.8080    32806      0 1049800      0 ESTABLISHED
...

The question is, when we shut down one of these processes, what happens in the other process? Does it find out? How? Try to predict the specific syscall behavior and explain why each system call does what it does.

Let’s try it. I’ll CTRL-C the server on kodos:

Server

pollsys(0x08045680, 2, 0x00000000, 0x00000000) (sleeping...)
^C127.6307          Received signal #2, SIGINT, in pollsys() [default]

Here’s what we see on kang:

Client

pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...)
126.1771        pollsys(0x08045670, 2, 0x00000000, 0x00000000)  = 1
126.1774        read(3, 0x08043670, 1024)                       = 0
126.1776        close(3)                                        = 0
[root@kang ~]#

What happened here? Let’s be specific:

  1. We sent SIGINT to the server, causing the process to exit. Upon exit, file descriptors are closed.
  2. When the last file descriptor for an ESTABLISHED socket is closed, the TCP stack on kodos sends a FIN over the connection and enters the FIN_WAIT_1 state.
  3. The TCP stack on kang receives the FIN packet, transitions its own connection to CLOSE_WAIT, and sends an ACK back. Since the nc client is blocked on this socket being ready to read, the kernel wakes this thread with the event POLLIN.
  4. The nc client sees POLLIN for the socket and calls read, which returns 0 immediately. This indicates end-of-stream. nc presumes that we’re done with this socket and closes it.
  5. Meanwhile, the TCP stack on kodos receives the ACK and enters FIN_WAIT_2.
  6. Since the nc client on kang closed its socket, the TCP stack on kang sent a FIN to kodos. The connection on kang enters LAST_ACK.
  7. The TCP stack on kodos receives the FIN, the connection enters TIME_WAIT, and the stack on kodos acknowledges the FIN.
  8. The TCP stack on kang receives the ACK for the FIN and removes the connection entirely.
  9. Two minutes later, the TCP connection on kodos is closed and the stack removes the connection entirely.

(It’s possible for steps to be slightly reordered and for kang to transition through CLOSING instead of FIN_WAIT_2.)

Here’s the final state according to netstat:

Server

[root@kodos ~]# netstat -f inet -P tcp -n

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
10.88.88.140.8080    10.88.88.139.33226   1049792      0 1049800      0 TIME_WAIT

There’s no output for this connection at all on kang.

The intermediate states transition very quickly, but you could see them using the DTrace TCP provider. You can see the packet flow using snoop(1m) or tcpdump(1).

Conclusions: We’ve seen the normal path of system calls for connection establishment and teardown. Note that kang immediately found out when kodos’s connection was closed — it was woken up out of poll and read returned 0 to indicate end-of-stream. At that point, kang elected to close its socket, which cleaned up the connection state on kodos. We’ll revisit this later to see how the results can differ if kang doesn’t close its socket here.

Puzzler 1: Power cycling

What happens to an established, idle TCP connection if one system is power cycled?

Since most processes go through normal exit as part of a graceful reboot (e.g., if you use the “reboot” command), you get basically the same result if you type “reboot” on kodos instead of killing the server with CTRL-C. But what would happen if we’d power-cycled kodos in the previous example? Surely kang will eventually find out, right?

Let’s try it. Set up the connection:

Server

[root@kodos ~]# truss -d -t bind,listen,accept,poll,read,write nc -l -p 8080
Base time stamp:  1464312528.4308  [ Fri May 27 01:28:48 UTC 2016 ]
 0.0036 bind(3, 0x08065790, 32, SOV_SOCKBSD)            = 0
 0.0036 listen(3, 1, SOV_DEFAULT)                       = 0
 0.2518 accept(3, 0x08047B3C, 0x08047C3C, SOV_DEFAULT, 0) = 4
pollsys(0x08045680, 2, 0x00000000, 0x00000000) (sleeping...)
Client

[root@kang ~]# truss -d -t open,connect,pollsys,read,write,close nc 10.88.88.140 8080
Base time stamp:  1464312535.7634  [ Fri May 27 01:28:55 UTC 2016 ]
...
 0.0055 connect(3, 0x08066DD8, 16, SOV_DEFAULT)         = 0
pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...)

Now I’ll use VMware’s “reboot” function to power-cycle the system. It’s important for this test that this be an actual power cycle (or OS panic) — anything that causes a graceful shut down will look more like the first case above.

20 minutes later, kang is still sitting in the same place:

Client

pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...)

We tend to believe that TCP’s job is to maintain a consistent abstraction (namely, the TCP connection) across multiple systems, so it’s surprising to discover cases where the abstraction is broken like this. And lest you think this is some nc(1) issue, it’s not. “netstat” on kodos shows no connection to kang, but kang shows a perfectly healthy connection to kodos:

Client

[root@kang ~]# netstat -f inet -P tcp -n

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
10.88.88.139.50277   10.88.88.140.8080    32806      0 1049800      0 ESTABLISHED
...

We can leave this forever without touching it and kang will never figure out that kodos rebooted.

Now, suppose at this point, kang tries to send data to kodos. What happens?

Let’s try it:

Client

pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...)
kodos, are you there?
3872.6918       pollsys(0x08045670, 2, 0x00000000, 0x00000000)  = 1
3872.6920       read(0, " k o d o s ,   a r e   y".., 1024)     = 22
3872.6924       write(3, " k o d o s ,   a r e   y".., 22)      = 22
3872.6932       pollsys(0x08045670, 2, 0x00000000, 0x00000000)  = 1
3872.6932       read(3, 0x08043670, 1024)                       Err#131 ECONNRESET
3872.6933       close(3)                                        = 0
[root@kang ~]#

When I type the message and hit enter, kodos gets woken up, reads the message from stdin, and sends it over the socket. The write call completes successfully! nc goes back to poll to wait for another event, eventually finds out that the socket can be read without blocking, and then calls read. This time, read fails with ECONNRESET. What does this mean? POSIX’s definition of read(2) says that this means:

[ECONNRESET]
A read was attempted on a socket and the connection was forcibly closed by its peer.

The illumos read(2) man page provides a little more context:

   ECONNRESET
              The filedes argument refers to a connection oriented
              socket and the connection was forcibly closed by the peer
              and is no longer valid.  I/O can no longer be performed to
              filedes.

This error doesn’t mean that something specific to the read call went wrong, but rather that the socket itself is disconnected. Most other socket operations would fail the same way at that point.

So what happened here? At the point when the nc on kang tried to send data, the TCP stack still didn’t know the connection was dead. kang sent a data packet over to kodos, which responded with an RST because it didn’t know anything about the connection. kang saw the RST and tore down its connection. It cannot close the socket file descriptor — that’s not how file descriptors work — but subsequent operations fail with ECONNRESET until nc does close the fd.

Conclusions:

  1. A hard power-cycle is very different than a graceful shutdown. When testing distributed systems, this case needs to be specifically tested. You can’t just kill a process and expect it to test the same thing.
  2. There are situations when one side believes a TCP connection is established, but the other side does not, and where this situation will never be automatically resolved. It’s possible to manage this problem using TCP or application-level keep-alive.
  3. The only reason that kang eventually found out that the remote side had disappeared was that it took action: it sent data and received a response indicating the connection was gone.

That raises the question: what if kodos had not responded to the data message for some reason?

Puzzler 2: Power off

What happens if one endpoint of a TCP connection is powered off for a while? Does the other endpoint ever discover this? If so, how? And when?

Once again, I’ll set up the connection with nc:

Server

[root@kodos ~]# truss -d -t bind,listen,accept,poll,read,write nc -l -p 8080
Base time stamp:  1464385399.1661  [ Fri May 27 21:43:19 UTC 2016 ]
 0.0030 bind(3, 0x08065790, 32, SOV_SOCKBSD)        = 0
 0.0031 listen(3, 1, SOV_DEFAULT)           = 0
accept(3, 0x08047B3C, 0x08047C3C, SOV_DEFAULT, 0) (sleeping...)
 6.5491 accept(3, 0x08047B3C, 0x08047C3C, SOV_DEFAULT, 0) = 4
pollsys(0x08045680, 2, 0x00000000, 0x00000000) (sleeping...)
Client

[root@kang ~]# truss -d -t open,connect,pollsys,read,write,close nc 10.88.88.140 8080
Base time stamp:  1464330881.0984  [ Fri May 27 06:34:41 UTC 2016 ]
...
 0.0057 connect(3, 0x08066DD8, 16, SOV_DEFAULT)         = 0
pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...)

Now, I’ll cut power to kodos abruptly and attempt to send data from kang:

Client

pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...)

114.4971        pollsys(0x08045670, 2, 0x00000000, 0x00000000)  = 1
114.4974        read(0, "\n", 1024)                             = 1
114.4975        write(3, "\n", 1)                               = 1
pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...)

The write completes normally, and I don’t see anything for quite a while. Just over 5 minutes later, I see this:

Client

pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...)
425.5664        pollsys(0x08045670, 2, 0x00000000, 0x00000000)  = 1
425.5665        read(3, 0x08043670, 1024)                       Err#145 ETIMEDOUT
425.5666        close(3)                                        = 0

This looks similar to the case where the system was power-cycled instead of powered off, except for two things: it took 5 minutes for the system to notice, and the error reported was ETIMEDOUT. Note that again, it’s not that the read timed out per se. We would have seen the same error from other socket operations, and subsequent socket operations would likely fail immediately with the same ETIMEDOUT error. That’s because it’s the socket that has entered a state where the underlying connection has timed out. The specific reason for this is that the remote side failed to acknowledge a data packet for too long — 5 minutes, as configured on this system.

Conclusions:

  1. When the remote system is powered off instead of power cycled, again, the first system only finds out if it’s trying to send data. If it doesn’t send packets, it will never find out about the terminated connections.
  2. When the system has been attempting to send data for long enough without receiving acknowledgments from the remote side, the TCP connection will be terminated and a socket operation on them will fail with ETIMEDOUT.

Puzzler 3: Broken connection with no failure

This time, instead of giving you a specific case and asking what happens, I’ll flip it around: here’s an observation, and see if you can figure out how it could happen. We’ve discussed several cases where kang can believe it has a connection to kodos, but kodos doesn’t know about it. Is it possible for kang to have a connection to kodos without kodos knowing about it — indefinitely (i.e., where this will not resolve itself) and even if there’s been no power off, no power cycle, and no other failure of the operating system on kodos and no failure of networking equipment?

Here’s a hint: consider the case above where a connection is stuck in ESTABLISHED. In that case, it’s fair to say that the application has the responsibility to deal with the problem, since by definition it still has the socket open and could find out by sending data, at which point the connection will eventually be terminated. But what if the application didn’t have the socket open any more?

In the Warmup, we looked at the case where kodos’s nc closed its socket, and we said that kang’s nc read 0 (indicating end-of-stream) and then closed its socket. What if it didn’t close the socket, but kept it open? Obviously, it couldn’t read from it. But there’s nothing about TCP that says you can’t send more data to the other side that has sent you a FIN. FIN only means end-of-data-stream in the direction that the FIN was sent.

In order to demonstrate this, we can’t use nc on kang because it automatically closes its socket when it reads 0. So I’ve written a demo version of nc called dnc, which simply skips this behavior. It also prints out explicitly which system calls it’s making. This will give us a chance to observe the TCP states.

First, we’ll set up the connections as usual:

Server

[root@kodos ~]# truss -d -t bind,listen,accept,poll,read,write nc -l -p 8080
Base time stamp:  1464392924.7841  [ Fri May 27 23:48:44 UTC 2016 ]
 0.0028 bind(3, 0x08065790, 32, SOV_SOCKBSD)            = 0
 0.0028 listen(3, 1, SOV_DEFAULT)                       = 0
accept(3, 0x08047B2C, 0x08047C2C, SOV_DEFAULT, 0) (sleeping...)
 1.9356 accept(3, 0x08047B2C, 0x08047C2C, SOV_DEFAULT, 0) = 4
pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...)
Client

[root@kang ~]# dnc 10.88.88.140 8080
2016-05-27T08:40:02Z: establishing connection
2016-05-27T08:40:02Z: connected
2016-05-27T08:40:02Z: entering poll()

Now let’s verify the connections are in the ESTABLISHED state we expect on both sides:

Server

[root@kodos ~]# netstat -f inet -P tcp -n

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
10.88.88.140.8080    10.88.88.139.37259   1049792      0 1049800      0 ESTABLISHED
Client

[root@kang ~]# netstat -f inet -P tcp -n

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
10.88.88.139.37259   10.88.88.140.8080    32806      0 1049800      0 ESTABLISHED

Now, let’s CTRL-C the nc process on kodos:

Server

pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...)
^C[root@kodos ~]#

We immediately see this on kang:

Client

2016-05-27T08:40:12Z: poll returned events 0x0/0x1
2016-05-27T08:40:12Z: reading from socket
2016-05-27T08:40:12Z: read end-of-stream from socket
2016-05-27T08:40:12Z: read 0 bytes from socket
2016-05-27T08:40:12Z: entering poll()

Now let’s look at the TCP connection states:

Server

[root@kodos ~]# netstat -f inet -P tcp -n

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
10.88.88.140.8080    10.88.88.139.37259   1049792      0 1049800      0 FIN_WAIT_2
Client

[root@kang ~]# netstat -f inet -P tcp -n

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
10.88.88.139.37259   10.88.88.140.8080    1049792      0 1049800      0 CLOSE_WAIT

This makes sense: kodos sent a FIN to kang. FIN_WAIT_2 indicates that kodos received an ACK from kang for the FIN that it sent, and CLOSE_WAIT indicates that kang received the FIN but has not sent a FIN back. This is a perfectly valid state for a TCP connection for an indefinite period of time. Imagine kodos had sent a request to kang and was not planning to send any more requests; kang could happily send data in response for hours, and this would work. Only in our case, kodos has actually closed the socket.

Let’s wait a minute and check the TCP connection states again. A minute later, the connection is completely missing from kodos, but it’s still present on kang:

Client

[root@kang ~]# netstat -f inet -P tcp -n

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
10.88.88.139.37259   10.88.88.140.8080    1049792      0 1049800      0 CLOSE_WAIT

What happened here? We hit a lesser-known special case in the TCP stack: when an application has closed a socket, the TCP stack has sent a FIN, and the remote TCP stack has acknowledged the FIN, the local TCP stack waits a fixed period of time and then closes the connection. The reason? In case the remote side has rebooted. This case is actually an analog to the case above where one side had an ESTABLISHED connection and the other doesn’t know about it. The difference is that in this specific case, the application has closed the socket, so there’s no other component that could deal with the problem. As a result, the TCP stack waits a fixed period of time and then tears down the connection (without sending anything to the other side).

Follow-up question: what happens if kang sends data to kodos at this point? Remember, kang still thinks the connection is open, but it’s been torn down on kodos.

Client

2016-05-27T08:40:12Z: entering poll()
kodos, are you there?
2016-05-27T08:41:34Z: poll returned events 0x1/0x0
2016-05-27T08:41:34Z: reading from stdin
2016-05-27T08:41:34Z: writing 22 bytes read from stdin to socket
2016-05-27T08:41:34Z: entering poll()
2016-05-27T08:41:34Z: poll returned events 0x0/0x10
2016-05-27T08:41:34Z: reading from socket
dnc: read: Connection reset by peer

This is the same as we saw in case Puzzler 1: the write() actually succeeds, since the TCP stack doesn’t know that the connection is closed yet. But it does get a RST, which wakes up the thread in poll(), and the subsequent read() returns ECONNRESET.

Conclusions:

  • It’s possible for two sides to disagree about the state of the connection even when there’s been no OS, network, or hardware failure at all.
  • In a case like the one above, it’s not possible for kang to distinguish between the case where kodos is attentively waiting to receive data or kodos has closed the socket and isn’t listening (at least, not without sending a packet). For this reason, maybe it’s not a great idea to design a system that uses sockets in these half-open states for an extended period of time under normal operation.

Conclusions

TCP is typically presented as a protocol for maintaining a consistent abstraction — the “TCP connection” — across two systems. We know that in the face of certain types of network and software problems, connections will fail, but it’s not always obvious that there are cases where the abstraction fails, in that the two systems disagree about the state of the connection. Specifically:

  • It’s possible for one system to think it has a working, established TCP connection to a remote system, while the remote system knows nothing about that connection.
  • This is possible even when there has been no failure of the network, hardware, or operating system.

These behaviors do not reflect deficiencies in TCP. Quite the contrary, in all of these cases, the TCP implementation appears to behave as reasonably as it could given the situation. If we were tempted to implement our own communication mechanism instead of using TCP, the presence of these cases might well remind us how complex the underlying problems are. These are intrinsic problems with distributed systems, and a TCP connection is fundamentally a distributed system.

That said, the single most important lesson in all of this is that the notion of a TCP connection that spans multiple systems is a convenient fiction. When it goes wrong, it’s critical to think about it as two separate state machines that operate simultaneously to try to maintain a consistent view of the connection. It’s the responsibility of the application to handle the cases where these differ (often using a keep-alive mechanism).

Furthermore, there’s a disconnect between the application’s file descriptor and the underlying TCP connection. TCP connections exist (in various states related to closing) even after an application has closed the file descriptor, and a file descriptor can be open when the underlying TCP connection has been closed as a result of a failure.

Other lessons to keep in mind:

  • Ungraceful reboot of a system (as happens when the OS crashes) is not the same as a userland process exiting or closing its socket. It’s important to test this case specifically. Reboot, when the remote system comes back online before the connection times out, is also different than power-off.
  • There’s no proactive notification from the kernel when a TCP socket is torn down. You only find this out when you call read(), write(), or some other socket operation on the socket file descriptor. If your program doesn’t do this for some reason, you’ll never find out about the connection failure.

Some related notes that I’ve found aren’t so commonly known:

  • ECONNRESET is a socket error you can see from read(), write(), and other operations that indicates that the remote peer has sent a RST.
  • ETIMEDOUT is a socket error you can see from read(), write(), and other operations that indicates that some timeout associated with the connection has elapsed. The cases I’ve seen most are when the remote side did not acknowledge some packet for too long. These are usually either data packets, a FIN packet, or a KeepAlive probe.

Importantly, neither of these errors means that there was anything wrong with the read or write operation that you tried to do — just that the socket itself is closed.

ACM Turing Centenary Celebration

This past weekend, I was very fortunate to have a chance to attend the ACM‘s Turing Centenary Celebration in honor of the 100th anniversary of the birth of Alan Turing. The event brought together nearly every living Turing Award winner for a series of talks and panel discussions on subjects like AI, theory of computation, computer architecture, and the role of computation in other fields. A webcast of the entire event is already available online. Below are some of my own notes on the conference.

For many of us in the audience, this weekend was about seeing in person the people we’d learned so much from both academically and professionally. This awe-inspiring feeling was best articulated by Vint Cerf in closing yesterday: “it’s like the history books opened up and the people walked out”. For me, by far, the highlight was the panel on Computer Architecture, moderated by David Patterson (yes, that Patterson) and featuring Frederick Brooks (of OS/360 and Mythical Man-Month fame), Ivan Sutherland, and Chuck Thacker. More than the other panels, I found all of the speakers’ prepared remarks accessible (perhaps because my work and interests most closely align with theirs), but at the same time very instructional. Sutherland began with the “Tyranny of the Clock”, an eloquent articulation of an important barrier in modern chip design and a call to action for designers and researchers. Then, in a sort of reverential but thoughtful engineering-style postmortem, Brooks discussed why the machine that Turing actually built, unlike so much of his other work, was not very influential. Thacker discussed the nature of computer architecture research and the modern developments that have made it more accessible for students today. In the subsequent discussion, Patterson referenced a prophetic quote by Maurice Wilkes at the dawn of modern computing (that Bryan also cited in his QCon talk last year) in which Wilkes suddenly “realized that a large part of my life from then on was going to be spent in ļ¬nding mistakes in my own programs”.

Complexity

Complexity was a prominent theme in several sessions. Ken Thompson expressed his disappointment at the complexity of modern Linux, pointing to its 1000+ syscalls compared to the original Unix’s 200. In his prepared remarks, he also shared some grim reflections on how Turing would feel about the state of computing today: Windows, phishing, botnets, C++, and so on. He compared his feeling to that of an early television pioneer visiting our time and seeing people watching Maury Povich. On a more positive note in the same session, Fernando Corbato (leader on Multics) gave a brief but fascinating account of what it was like to work on computers in the early days. He called Unix actually one the greatest results of Multics, Unix being a “distillation” of the better ideas of Multics without all the complexity. (It’s well worth watching Fernando’s and Ken’s remarks from the “Systems Architecture, Design, Engineering, and Verification” session.) Later, Thacker too called complexity “the enemy”, suggesting that we seriously reconsider many of the held-over assumptions in today’s systems that are costing us enormously today. (I’m sure that’s a good idea, and I’d have loved to hear some examples of these things that he had in mind.)

In the Programming Languages panel, Barbara Liskov lamented that the languages people use today for production systems are pretty complex for introducing new students to computer programming, but also admitted that building languages simple enough to be thoroughly understood in an introductory course and rich enough to support what professional engineers want is a serious challenge. She suggested starting from scratch with only the essential language features of modularity and encapsulation. In the same session, Nicklaus Wirth (in an entertaining, light presentation) explained how he sought to design languages that were both simpler and more powerful than their contemporaries — that these are not opposing goals. All of the participants agreed that in practice, most popular languages seem to accrete lots of cruft from small changes that seem good at the time, but contribute to an overly complex system.

Lucky or good?

Another theme that came up quite a bit was the role of luck in the speakers’ success. Many of them attributed their success to luck, leaving it there, but I liked Dahlia Malkhi’s reference to a golfer who hit a hole-in-one. He was asked if that was a result of luck or training, and replied that he was lucky, but he had to train a lot in order to get that lucky.

Beware elegance?

Several speakers (notably Butler Lampson and Joseph Sifakis) mentioned that they tend to be suspicious of clean, elegant solutions to problems, because such solutions often don’t work well in the real world. I’d never heard it put so generally, especially by leaders in the field, as that goes against a common intuition among mathy people that’s usually nurtured as part of our education. (That’s still a good thing — as Einstein famously said, it’s important to strive for simplicity, but be careful not to go to far.) In fact, Sifakis attributed the lack of serious work in rigorous system design to researchers’ preference for nice theories, even if those theories don’t match reality. (While probably a factor, this explanation seems to leave out the economic cost of such rigor as an important reason why many systems today aren’t built the way he suggests.)

System verification

In the Systems Architecture and Verification session, Vint Cerf noted that automatic verifiers don’t seem to work well for many types of systems we build and asked Sifakis and E. Allen Emerson whether there existed interactive tools that would help programmers test assertions about their systems, rather than automatically trying to verify the whole thing. Emerson pointed out that this is called semi-automatic verification, but still seemed more interested in the fully-automatic kind. Vint’s idea made me think of a sort of extensible lint, since lint is already an admittedly limited tool for checking a fixed set of assertions about a program. But despite its limits, lint is incredibly useful (at least in languages like C and JavaScript) for rooting out large classes of bugs, and it would be interesting to think about a more interactive workflow that would free the tool from having to report only things it knows are problems. (People generally won’t use an error-checking tool that reports many false positives, but they might use a tool that can evaluate static assertions about their code in a less rigid context.)

“What”-based programming

Alan Kay and others talked about the idea of “what”-based programming, rather than the “how”-based approaches we typically use today. The idea is that humans tell the computer what to do, and some engine under the hood figures out how to do it. Kay demonstrated a graphical environment based on this idea, and then wondered why we couldn’t build more complex systems (including operating systems) that way. Bill, Robert, and I tried for a while to imagine what this would look like. On the one hand, many classes of device drivers are similar enough that you could imagine modeling some pieces of them with a declarative “what”-based description, but interaction with physical devices often requires precise sequences of register reads and writes and it’s hard to imagine encoding that without essentially describing “how”. Achieving good performance may be challenging, since humans writing code for such systems today necessarily describe how to organize them to be fast. And if you could solve this for something as constrained as device drivers, how could you generalize it for the schedulers or VM system, without encoding detailed knowledge into the engine that actually translates the “what” to the “how”? You could also imagine that debugging such systems would be very difficult. Still, I found the idea compelling, because there are many cases where we do build “what”-based descriptions and the result is that it’s much easier to verify both that the description does what the human wants it to do and that the “what”-to-”how” system properly translates it (e.g., Meta-D, a Cloud Analytics module that describes a family of DTrace scripts declaratively, or even the D language itself). It would be interesting to hear from Alan Kay what he was thinking in posing this question.

Computation in biology and physics

I was especially intrigued by Leonard Adleman‘s remarks during the “Algorithmic View of the Universe” panel, in which he talked about vastly different notions of computation and how results based on the Turing model, plus the Church-Turing thesis, can inform physics and biology. He discussed protein folding in the cell as a form of computation, and what implications that has for biologists trying to understanding cellular processes. Later he wondered what implications the proven constraints of the Turing model, taken as physical laws, would have on quantum mechanics (e.g., that certain types of time travel allowed by QM must actually be impossible).

 

These were just a few of the bits I found most interesting, but the whole weekend was a humbling experience. Besides being able to see so many important figures in the field, it was a good opportunity to step outside the confines of day-to-day engineering, which for me tends toward a time horizon of a few years. And most of the talks provoked interesting discussions. So thanks to all the speakers, and to the ACM for putting together the event and making the video available.