Tasty, nutritious

...part of this complete breakfast
Everything here is my opinion. I do not speak for your employer.
April 2010
July 2010

2010-05-02 »

sshuttle: a new kind of userspace VPN

I just spent an afternoon working on a new kind of VPN. You can get the first release, sshuttle 0.10, on github.

As far as I know, sshuttle is the only program that solves the following common case:

  • Your client machine (or router) is Linux.
  • You have access to a remote network via ssh.
  • You don't necessarily have admin access on the remote network.
  • The remote network has no VPN, or only stupid/complex VPN protocols (IPsec, PPTP, etc). Or maybe you are the admin and you just got frustrated with the awful state of VPN tools.
  • You don't want to create an ssh port forward for every single host/port on the remote network.
  • You hate openssh's port forwarding because it's randomly slow and/or stupid.
  • You can't use openssh's PermitTunnel feature because it's disabled by default on openssh servers; plus it does TCP-over-TCP, which has terrible performance (see below).

This is how you use it:

  • git clone git://github.com/apenwarr/sshuttle
    on your client and server machines. The server can be any ssh server with python available; the client must be Linux with iptables, and you'll need root or sudo access.
  • ./sshuttle -r username@sshserver 0.0.0.0/0 -vv

That's it! Now your local machine can access the remote network as if you were right there! And if your "client" machine is a router, everyone on your local network can make connections to your remote network.

This creates a transparent proxy server on your local machine for all IP addresses that match 0.0.0.0/0. (You can use more specific IP addresses if you want; use any number of IP addresses or subnets to change which addresses get proxied. Using 0.0.0.0/0 proxies everything, which is interesting if you don't trust the people on your local network.)

Any TCP session you initiate to one of the proxied IP addresses will be captured by sshuttle and sent over an ssh session to the remote copy of sshuttle, which will then regenerate the connection on that end, and funnel the data back and forth through ssh.

Fun, right? A poor man's instant VPN, and you don't even have to have admin access on the server.

Theory of Operation

sshuttle is not exactly a VPN, and not exactly port forwarding. It's kind of both, and kind of neither.

It's like a VPN, since it can forward every port on an entire network, not just ports you specify. Conveniently, it lets you use the "real" IP addresses of each host rather than faking port numbers on localhost.

On the other hand, the way it works is more like ssh port forwarding than a VPN. Normally, a VPN forwards your data one packet at a time, and doesn't care about individual connections; ie. it's "stateless" with respect to the traffic. sshuttle is the opposite of stateless; it tracks every single connection.

You could compare sshuttle to something like the old Slirp program, which was a userspace TCP/IP implementation that did something similar. But it operated on a packet-by-packet basis on the client side, reassembling the packets on the server side. That worked okay back in the "real live serial port" days, because serial ports had predictable latency and buffering.

But you can't safely just forward TCP packets over a TCP session (like ssh), because TCP's performance depends fundamentally on packet loss; it must experience packet loss in order to know when to slow down! At the same time, the outer TCP session (ssh, in this case) is a reliable transport, which means that what you forward through the tunnel never experiences packet loss. The ssh session itself experiences packet loss, of course, but TCP fixes it up and ssh (and thus you) never know the difference. But neither does your inner TCP session, and extremely screwy performance ensues.

sshuttle assembles the TCP stream locally, multiplexes it statefully over an ssh session, and disassembles it back into packets at the other end. So it never ends up doing TCP-over-TCP. It's just data-over-TCP, which is safe.

Useless Trivia

Back in 1998 (12 years ago! Yikes!), I released the first version of Tunnel Vision, a semi-intelligent VPN client for Linux. Unfortunately, I made two big mistakes: I implemented the key exchange myself (oops), and I ended up doing TCP-over-TCP (double oops). The resulting program worked okay - and people used it for years - but the performance was always a bit funny. And nobody ever found any security flaws in my key exchange, either, but that doesn't mean anything. :)

The same year, dcoombs and I also released Fast Forward, a proxy server supporting transparent proxying. Among other things, we used it for automatically splitting traffic across more than one Internet connection (a tool we called "Double Vision").

I was still in university at the time. A couple years after that, one of my professors was working with some graduate students on the technology that would eventually become Slipstream Internet Acceleration. He asked me to do a contract for him to build an initial prototype of a transparent proxy server for mobile networks. The idea was similar to sshuttle: if you reassemble and then disassemble the TCP packets, you can reduce latency and improve performance vs. just forwarding the packets over a plain VPN or mobile network. (It's unlikely that any of my code has persisted in the Slipstream product today, but the concept is still pretty cool. I'm still horrified that people use plain TCP on complex mobile networks with crazily variable latency, for which it was never really intended.)

That project I did for Slipstream was what first gave me the idea to merge the concepts of Fast Forward, Double Vision, and Tunnel Vision into a single program that was the best of all worlds. And here we are, at last, 10 years later. You're welcome.

Update 2010/05/02: Oops, maybe it works a little too well. If you're one of the people who was surprised to see eqldata.com where apenwarr.ca should have been this morning, that's because I left my sshuttle proxy running - connected to the "real" server on eqldata.com - as a stress test. Seems that even my DynDNS provider thought my unreliable home PC was part of the eqldata.com network :) (Also, it failed the stress test: some sort of file descriptor leak after a few hours. Will fix.)

2010-05-05 »

Uploading yourself for fun and profit (plus: sshuttle 0.20 "almost" works on MacOS)

After trying out the initial version of sshuttle that I produced this weekend, a few people asked me whether it would be possible to make it work without installing a sshuttle server on the server machine. Can it work with just an sshd, they wondered?

Good question.

Some people pointed out ssh's -D option (dynamic port forwarding using a SOCKS proxy). If we just used that (ie. the sshuttle client transproxies stuff into ssh's SOCKS server), then there wouldn't need to be a server side for sshuttle, and that would solve the problem. But sadly, sshd's latency management is pretty thoroughly awful - among other things, it sets its SO_SNDBUF way too high - so if you have a few connections going at once, performance takes a dump. sshuttle has some clever stuff to make sure that doesn't happen even if you've got giant ISOs downloading over your VPN link. I'd like to keep that.

So then I said to myself, hey, self, what if we just uploaded our source code to the remote server and executed it automatically? It works for viruses (technically worms), after all.

You know it's never a good thing when I start talking to myself. And yet the result is surprisingly simple and elegant. Here's a simplified version of what the "stage 1 reassembler" looks like:

   ssh hostname python -c '
    import sys;
    exec compile(sys.stdin.read(%d), "assembler.py", "exec")'

Where "%d" is substituted with the length of assembler.py. Assembler.py, by the way, is the "stage 2 reassembler," which looks like this:

    import sys, zlib

    z = zlib.decompressobj()
    mainmod = sys.modules[__name__]
    while 1:
        name = sys.stdin.readline().strip()
        if name:
            nbytes = int(sys.stdin.readline())
            if verbosity >= 2:
                sys.stderr.write('remote assembling %r (%d bytes)\n'
                                 % (name, nbytes))
            content = z.decompress(sys.stdin.read(nbytes))
            exec compile(content, name, "exec")
            # FIXME: this crushes everything into a single module namespace,
            # then makes each of the module names point at this one. Gross.
            assert(name.endswith('.py'))
            modname = name[:-3]
            mainmod.__dict__[modname] = mainmod
        else:
            break
    main()

Yeah, that's right, I gzipped it.

You know the best part? When the server throws an exception, it still gives the right filenames and line numbers in the backtrace, because we assemble each file separately.

If anybody knows the right python incantation to make it import each of the modules as a separate actual module object (rather than just dumping it all into the global namespace, as the comment indicates) please send a patch.

Grab the latest version of sshuttle VPN on GitHub.

"Almost" works on MacOS

Responding to popular request, I thought I would try to get the sshuttle client working on MacOS. (The sshuttle server already works - or at least it should work - on just about any platform with an sshd.)

MacOS, being based on BSD, uses the same ipfw stuff as FreeBSD seems to use. So it "should" be just a matter of having it auto-detect whether the current system uses iptables or ipfw, then run the right commands, right?

Well, almost. I did all that stuff, and I've almost got the rules working, but I just can't make it work right. I'm using MacOS X Snow Leopard on my laptop. I checked it in and pushed it anyway in case anybody wants to take a look; the final fix is probably a one liner.

For more information on my conundrum, see my (as yet unanswered) question on ServerFault. If you can contribute an answer, you'll forever be my hero. Even if you don't know anything about ipfw, if you could run through the steps on your version of MacOS or BSD and tell me what happens, it could help narrow things down.

Enjoy!

Update 2010/05/05: Thanks to Ed Maste and drheld for helping confirm. It seems that FreeBSD and MacOS 10.5 Leopard work fine - which means sshuttle 0.20 should work for you, yay! - while Snow Leopard consistently does not.

Really, that restores a bit of balance to the universe; since everyone I know who's upgraded to Snow Leopard went through huge pain and suffering, I was a little shocked to find that my own upgrade had been harmless, and one friend's upgrade really did make their computer go much faster (boggle). Now that Snow Leopard has finally screwed me after all, I once again feel like things are as they should be.

2010-05-08 »

Gunless

Movie recommendation. There's not much to say about it that the reviews haven't already. But if you're looking for something with a little more intelligence than a Hollywood Extravaganza, but not so intelligent that it bores me to tears like most independent films, this one works.

    The idea that the Wild West of the United States didn't have any law is completely bogus. There was law. They were settled by laws like we were. And the idea that we had nothing but law and had no weaponry is also ludicrous. Of course we did.

    -- Paul Gross on the Film's Historical Accuracy

So it's also useful if you want to learn totally wrong but funny things about the Canadian "wild west."

2010-05-10 »

sshuttle 0.30: automatic route and hostname discovery

My fancypants new sshuttle transproxy VPN could already work even if all you had was a ssh session to the other side. And it avoided the TCP-over-TCP trap. And sure, I even made it upload itself to the other end automatically so you wouldn't have to. And it apparently works on MacOS clients now, except Snow Leopard which is not-so-shockingly buggy, and maybe even on FreeBSD. And it manages latency, even under heavy use, so performance doesn't start sucking when you transfer a big file.

So in all those ways, it was already much better than the old Tunnel Vision, which among other things, you had to install by hand on both ends of the connection, and after that the performance was a bit random.

But Tunnel Vision still had a few tricks that sshuttle missed. The first one is automatic route guessing. When TV connected to the other end, the server would tell the client what subnets it was able to reach, and then the client would automatically set up routes for those subnets to go through the tunnel. Neat, right? But with sshuttle, you had to tell the client what to route by hand. No more:

     sshuttle -N -r username@servername

The new -N option enables automatic network determination. You can still add additional subnets (like 0/0 for people who want to route "everything") if you want.

Another fun feature of Tunnel Vision was automatic hostname mapping. You know what sucks about connecting to a remote VPN? You probably don't, so I'll tell you. What sucks is DNS. Your local DNS server doesn't know anything about the hostnames on the other end, and of course they're private so they're not in the public DNS either. So when you try "ssh internalserver", and "internalserver" is some server on the remote internal network, you get an error.

This one is a lot trickier to solve. After all, there's no good way to get a list of hostnames for you to replicate. And once you do, there's also no good way to add them to the local DNS. But does that stop us? Certainly not. It merely confuses us.

     shuttle -H -r username@servername

The new -H option tells the remote sshuttle instance to start prodding around wherever it can (currently, that means at least the local /etc/hosts file, samba nameservers and browse masters, and a bit of DNS) to try to find good hostnames and their matching IP addresses. As it finds them, it beams them back to your client, which adds them temporarily to your local /etc/hosts file. Gross? Oh boy, is it ever! But it works. More or less.

It would be kind of neat to have it get browse lists from things like mdns (aka "zeroconf" aka "bonjour") but I have no idea how to do that.

The old Tunnel Vision sort of had this feature, but it didn't have sshuttle's amazing Name Prodding Technology(tm). You had to configure the names yourself. As it happened, our proprietary Nitix servers had some very scary code to automatically track local hostnames and configure Tunnel Vision appropriately, so the name mapping worked pretty well there. And Nitix servers were usually acting as your DNS, so they could set that up nicely too. Sadly, Nitix's old name prodding is mostly obsolete due to the way modern networks are run (mdns, domain controllers, names-by-dhcp, switched ethernet, and so on). But life marches on. And we all still want the same things.

Anyway, anybody who knows how to get a good list of hostname/ip pairs out of mdns, ideally in a portable fashion, send me an email :) You might also want to look at hostwatch.py and see if you can think of any other interesting sources to scan for names.

2010-05-11 »

Mailing lists are cheap...

...but I still didn't think I'd bother with one for sshuttle, which was just intended to be a weekend toy project. Seems people are actually using it, though, and it's picked up quite a few github followers already. (62 followers isn't that much, but the thing is only 10 days old.)

So okay, here you go: the sshuttle mailing list.

By the way, it seems to not be common knowledge that you can subscribe to googlegroups mailing lists without having a Google account or ever using their web interface. The secret is to send an email to "groupname+subscribe@googlegroups.com", where groupname is the name of the group, in this case, sshuttle. Note that plus sign. It's not a minus sign.

2010-05-18 »

Tell me what surprised you: iPad Edition

If someone is about to tell you a long story about a trip they were on, you should make just one request: "Tell me what surprised you." That simple query changes the whole nature of the conversation.

For example, we all know the basic stuff about Paris. It has French people. The food is good. It's pretty. But what surprised you about Paris? Now there's something we can talk about.1

So. Yes, I got an iPad. And I'll do you a favour: I'll tell you my surprise.

What surprised me was iBooks.

No, no, iBooks looks and works exactly like in the pictures and ads. It really is just like that, for better and for worse. That's not the surprise.

The surprise was that it wasn't installed by default.

Think about that. I had to go to the app store, painfully convince it I was a U.S. resident, search for "iBooks" ("books" is definitely not good enough), and download it, all just to get started.

Meanwhile, I downloaded a bunch of other apps. Some of them had ads. Many of those ads were for the Amazon Kindle app, which is also in the app store, and also free, and doesn't require me to be American. And I could click on any of those ads and get straight to app store. Two more taps, and I'm done.

There weren't any ads for the iBooks app. Anywhere. Thus it was harder to find out about iBooks, and as hard or harder to download it, than the Kindle app.

I've been in the computer world for a long time. I've observed Microsoft and how they do things. Heck, I've observed Apple and how they do things. And one thing I've seen for sure: bundling and cross-selling work. If this were Microsoft, they wouldn't have hesitated for a second to give iBooks a boost by including it with the OS.

But Apple deliberately left it out. iBooks has to compete with Kindle in the very same app store, with no free publicity (other than being a "featured" app in some iPad ads and PR).

I can imagine the iBooks team being told that this is it, yes, you can do your bookstore however you want, but we're not going to make it any easier on you. You have to be the best bookstore in the world all by yourself, not just because you tagged along with something that was already great without you.

Now that is surprising.

For the record, iBooks is doing pretty well so far: it absolutely beats the snot out of Kindle for the iPhone/iPad in pretty much every way (except book prices, which are much higher than Amazon's).2

Also interesting to consider is why they allowed this competition with books, but not with music, movies, and phone calls. Have they had a change of heart? A secret contractual obligation? Does Steve Jobs really just not care about books, as he previously claimed?

You might also ask why their Pages, Numbers, and Presentations (or whatever it's called) apps aren't bundled or cross-sold either; anybody making a word processor is on equal footing with Apple's iWork team. And there's no Weather, Stocks, Voice Memos, Clock, or Calculator app included on an iPad either, even though they were all on the iPhone. The iPad has less bundled stuff than ever before - the diametric opposite of what Microsoft has done in any version of Windows, ever.

The rest of the iPad? It's pretty much as expected. I'll spare you.

Footnotes

1 What surprised me about Paris was that, at their fruit stands, every fruit is arranged with absolute care and precision. Compare to a typical grocery store in Canada, where fruit is typically dumped into a bin so you can sort through it yourself. When I think about how much more time it must take to do it the hard way, yes, it surprises me. How can they afford to do that? It's magic. (I also had other related observations at the time.)

2 I won't bother describing the Kindle app's failings in detail. To get you started, I have just two words for you: page numbers. Compare them in Kindle vs. iBooks. Someone at Amazon needs to be shot.

Update 2010/05/18: Hmm, jordanlev wrote to tell me that on his iPad, it popped up a message right away asking whether he wanted to download iBooks. So maybe they're not playing all that nice after all. He also linked to an interesting article about the ebook market by Charlie Stross.

2010-05-20 »

At last, the circle is complete

My twitter search RSS feed (yes, I have one, so shoot me) for "apenwarr" returned a hit today in which the only usage of the word "apenwarr" was an URL hidden behind a bit.ly link.

Oh yes. That means twitter search is now decoding bit.ly URLs as part of the indexing process, but of course it still serves you the original stupid bit.ly links.

Thank you, oh great technology gods, for inventing new uses for excess CPU that I never could have imagined.

In other news, SEOs can now increase the keyword relevance of their twitter links; just have bit.ly resolve to something like http://whatever.com/stuff?magic-keywords=fuzzy-wuzzy-chickens-multiplied-by-gargantuan-apple-google-flash-ipad-porn-naked

2010-05-22 »

A Programmer's Code of Ethics

  1. My programs encode the rules of modern society. I will take full responsibility for the programs I write.
  2. I will not write a program that intentionally fails to operate.
  3. I will not write a program that refuses to do tomorrow what it was able to do yesterday.
  4. I will not create a single point of failure, whether technical or political.
  5. I will not encode foolish rules just because someone paid me to do it.
  6. I will not give people what they want if what they want is not good enough.
  7. I will not stop people from taking my program's ideas and making them better.
  8. I will write programs to help each person produce their best, not to help the masses produce mediocrity.
  9. I will correct those who believe my program's failure is anyone's fault but mine.
  10. I will write programs to benefit even the people who don't deserve it.

 

Condensed "New Testament" Version

    Don't write for others a program you wouldn't want written for you.

 

Commentary

Do I always follow all the above rules perfectly? Certainly not. In fact, I think I've broken every single one of them.

But thinking over all those situations and knowing what I know now, I'm pretty sure that in every case, it would have been better if I'd done the right thing. The exceptions don't feel like the right move; they just feel dirty.

That's how I know I'm on the right track.

Update 2010/05/22: Based on a suggestion from Chris Frey, slightly rephrased point #3.

April 2010
July 2010

I'm CEO at Tailscale, where we make network problems disappear.

Why would you follow me on twitter? Use RSS.

apenwarr on gmail.com