I love the smell of

in the morning
Everything here is my opinion. I do not speak for your employer.
May 2012
June 2012

2012-05-08 »

TCP doesn't suck, and all the proposed bufferbloat fixes are identical

Background: a not very clear article in ACM Queue led to a post by Bram Cohen claiming TCP sucks.

The first article is long and seems technically correct, although in my opinion it over-emphasizes unnecessary details and under-emphasizes some really key points. The second article then proceeds to misunderstand many of those key points and draw invalid conclusions, while attempting to argue in favour of a solution (uTP) that is actually a good idea. So I'm writing this, I suppose, to refute the second article in order to better support its thesis. That makes sense, right? No? Well, I'm doing it anyway.

First of all, the main problem we're talking about here, "bufferbloat," is actually two problems that we'd better separate. To oversimplify only a little, problem #1 is oversized upstream queues in your cable modem or DSL router. Problem #2 is oversized queues everywhere else on the Internet.

The truth is, for almost everyone reading this, you don't care even a little bit about problem #2. It isn't what makes your Internet slow. If you're running an Internet backbone, maybe you care because you can save money, in which case, go hire a consultant to figure out how to fine tune your overpriced core routers. Jim Gettys and others are on a crusade to get more ISPs to do this, which I applaud, but that doesn't change the fact that it's irrelevant to me and you because it isn't causing our actual problem. (Van Jacobson points this out a couple of times in the course of the first article, but one gets the impression nobody is listening. I guess "the Internet might collapse" makes a more exciting article.)

What I want to concentrate on is problem #1, which actually affects you and which you have some control over. The second article, although it doesn't say so, is also focused on that. The reason we care about that problem is that it's the one that makes your Internet slow when you're uploading stuff. For example, when you're running (non-uTP) BitTorrent.

This is where I have to eviscerate the second article (which happens to be by the original BitTorrent guy) a little. I'll start by agreeing with his main point: uTP, used by modern BitTorrent implementations, really is a very good, very pragmatic, very functional, already-works-right-now way to work around those oversized buffers in your DSL/cable modem. If all your uploads use uTP, it doesn't matter how oversized the buffers are in your modem, because they won't fill up, and life will be shiny.

The problem is, uTP is a point solution that only solves one problem, namely creating a low-priority uplink suitable for bulk, non-time-sensitive uploads that intentionally give way to higher priority stuff. If I'm videoconferencing, I sure do want my BitTorrent to scale itself back, even down to zero, in favour of my video and audio. If I'm waiting for my browser to upload a file attachment to Gmail, I want that to win too, because I'm waiting for it synchronously before I can get any more work done. In fact, if me and my next-door neighbour are sharing part of the same Internet link, I want my BitTorrent to scale itself back even to help out his Gmail upload, in the hope that he'll do the same for me (automatically of course) when the time comes. uTP does all that. But for exactly that reason, it's no good for my Gmail upload or my ssh sessions or my random web browsing. If I used uTP for all those things, then they'd all have the same priority as BitTorrent, which would defeat the purpose.

That gives us a clue to the problem in Cohen's article: he's really disregarding how different protocols interoperate on the Internet. (He discounts this as "But game theory!" as if using sarcasm quotes would make game theory stop predicting inconvenient truths.) uTP was designed to interact well with TCP. It was also designed for a world with oversized buffers. TCP, of course, also interacts well with TCP, but it never considered bufferbloat, which didn't exist at the time. Our bufferbloat problems - at least, the thing that turns bufferbloat from an observation into a problem - come down to just that: they couldn't design for it, because it didn't exist.

Oddly enough, fixing TCP to work around bufferbloat is pretty easy. The solution is "latency-based TCP congestion control," the most famous implementation of which is TCP Vegas. Sadly, when you run it or one of its even better successors, you soon find out that old-style TCP always wins, just like it always wins over uTP, and for exactly the same reason. That means, essentially, that if anyone on the Internet is sharing bandwidth with you (they are), and they're running traditional-style TCP (virtually everyone is), then TCP Vegas and its friends make you a sucker with low speeds. Nobody wants to be a sucker. (This is the game theory part.) So you don't want to run latency-based TCP unless everyone else does first.

If you're Bram Cohen, you decide this state of affairs "sucks" and try to single-handedly convince everyone on the Internet to simultaneously upgrade their TCP stack (or replace it with uTP; same undeniable improvement, same difficulty). If you co-invented the Internet, you probably gave up on that idea in the 1970's or so, and are thinking a little more pragmatically. That's where RED (and its punny successors like BLUE) come in.

Now RED, as originally described, is supposed to run on the actual routers with the actual queues. As long as you know the uplink bandwidth (which your modem does know, outside annoyingly variable things like wireless), you can fairly easily tune the RED algorithm to an appropriate goal queue length and off you go.

By the way, a common misconception about RED, one which VJ briefly tried to dispel in the first article ("mark or drop it") but which is again misconstrued in Cohen's article, is that if you use traditional TCP plus RED queuing, you will still necessarily have packet loss. Not true. The clever thing about RED is you start managing your queue before it's full, which means you don't have to drop packets at all - you can just add a note before forwarding that says, "If I weren't being so nice to you right now, I would have dropped this," which tells the associated TCP session to slow down, just like a dropped packet would have, without the inconvenience of actually dropping the packet. This technique is called ECN (explicit congestion notification), and it's incidentally disabled on most computers right now because of a tiny minority of servers/routers that still explode when you try to use it. That sucks, for sure, but it's not because of TCP, it's because of poorly-written software. That software will be replaced eventually. I assure you, fixing ECN is a subset of replacing the TCP implementation for every host on the Internet, so I know which one will happen sooner.

(By the way, complaints about packet dropping are almost always a red herring. The whole internet depends on packet dropping, and it always has, and it works fine. The only time it's a problem is with super-low-speed interactive connections like ssh, where the wrong pattern of dropped packets can cause ugly multi-second delays even on otherwise low-latency links. ECN solves that, but most people don't use ssh, so they don't care, so ECN ends up being a low priority. If you're using ssh on a lossy link, though, try enabling ECN.)

The other interesting thing about RED, somewhat glossed over in the first article, is VJ's apology for mis-identifying the best way to tune it. ("...it turns out there's nothing that can be learned from the average queue size.") His new recommendation is to "look at the time you've had a queue above threshold," where the threshold is defined as the long-term observed minimum delay. That sounds a little complicated, but let me paraphrase: if the delay goes up, you want to shrink the queue. Obviously.

To shrink the queue, you "mark or drop" packets using RED (or some improved variant).

When you mark or drop packets, TCP slows down, reducing the queue size.

In other words, you just implemented latency-based TCP. Or uTP, which is really just the same thing again, at the application layer.

There's a subtle difference though. With this kind of latency-self-tuning RED, you can implement it at the bottleneck and it turns all TCP into latency-sensitive TCP. You no longer depend on everyone on the Internet upgrading at once; they can all keep using traditional TCP, but if they're going through a bottleneck with this modern form of RED, that bottleneck will magically keep its latencies low and sharing fair.

Phew. Okay, in summary:

  • If you can convince everybody on the internet to upgrade, use latency-sensitive TCP. (Bram Cohen)
  • Else If you can control your router firmware, use RED or BLUE. (Jim Gettys and Van Jacobson)
  • Else If you can control your app, use uTP for bulk uploads. (Bram Cohen)
  • Else You have irreconcilable bufferbloat.

All of the above are the same solution, implemented at different levels. Doing any of them solves your problem. Doing more than one is perfectly fine. Feel happy that multiple Internet superheroes are solving the problem from multiple angles.

Or, tl;dr: - Yes. If you use BitTorrent, enable uTP and mostly you'll be fine.

Update 2012/05/09: Paddy Ganti sent me a link to Controlling Queue Delay, May 6, 2012, a much more detailed and interesting ACM Queue article talking about a new CoDel algorithm as an alternative to RED. It's by Kathleen Nichols and Van Jacobson and uses a target queue latency of 5ms on variable-speed links. Seems like pretty exciting stuff.

I'm CEO at Tailscale, where we make network problems disappear.

Why would you follow me on twitter? Use RSS.

apenwarr on gmail.com