Misadventures in process containment
I've been working on a series of tutorials using redo for various use cases. (One of the most common user requests is more examples of how to solve real-world problems with redo. The problem with extremely flexible tools is that it can be hard for people to figure out how to start.)
The most recent in the series is a a tutorial on building docker and kvm containers from scratch using redo. I think it turned out pretty well. Maybe the tutorial deserves a disclaimer, though: is this really something you should do?
Good question. I don't know.
You see, there are a few "standard" ways to build containers nowadays, most
commonly using Dockerfiles and
docker build command. It's not at all obvious that we need another
way to do it. Then again, it's not obvious that we don't.
Wait, let's back up a bit. Where are we and how did we get here?
The idea of isolated chroot-based containers has been around for a very long time. In my first startup, we even had a commercial version of the concept as early as 2005, first implemented by Patrick Patterson, called Nitix Virtual Server (NVS). The original idea was to take our very-locked-down Linux-based server appliance and let you install apps on it, without removing the (very useful for security and reliability) locked-down-ness of the operating system. Nitix was a very stripped down, minimal Linux install, but NVS was a full install of a CentOS-based system in userspace, so you could do basically anything that Linux could do. (You might recognize the same concepts in ChromeOS's Crostini.) We could start, stop, and snapshot the "inner" NVS operating system and make backups. And most importantly, if you were an app developer, you could take one of these pre-installed and customized snapshots, package it up, and sell it to your customers. Hopefully with one of our appliances!
Eventually one of these packaged apps became appealing enough that the app maker, who was much larger than us, decided to acquire our company, then (as often happens) accidentally killed it with love, and that was sadly the end of that branch of the container evolutionary tree.
Our containers were installed on physical appliance hardware - another branch of evolution that seems to have died. Nobody believes anymore that you can have a zero-maintenance, self configuring, virus free appliance server that runs on your office network without needing a professional sysadmin. Come to think of it, most people didn't believe us back then either, at least not until after seeing a demo. But whatever, the product doesn't exist anymore, so nowadays they're right.
In any case, the modern solution to this is for everybody to host everything in the Cloud. The Cloud has its own problems, but at least those problems are fairly well understood, and most importantly, you can pay by the minute for a crack team of experts, the ones who own the servers, to fix problems for you. For most people, this works pretty well.
But back to containers. The way we made them, long ago, was a bit ad-hoc: a person installed a fresh NVS, then installed the app, then wrote a few scripts to take system configuration data (user accounts, remember when we used those? and hostnames, IP addresses, and so on) and put them in the right app-specific config files. Then they'd grab a snapshot of the whole NVS and distribute it. Making new versions of an app container involved either making additional tweaks to the live image (a little risky) and re-snapshotting, or having a human start over from scratch and re-run all the manual steps.
Those were simpler times.
Nowadays, people care a lot more about automated builds and automated testing than they did back in 2005, and this is a big improvement. They also collaborate a lot more. Docker containers share almost the same basic concepts: take a base filesystem, do some stuff to it, take another snapshot, share the snapshot. But it's more automated, and there are better ways to automate the "do some stuff" part. And each step is a "layer", and you can share your layers, so that one person can publish a base OS install, another person can install the Go compiler, another person can build and install their app, and another person can customize the app configuration, and all those people can work at different companies or live in different countries.
As a sign of how out of touch I am with the young'uns, I would never have thought you could trust a random unauthenticated person on the Internet to provide a big binary image for the OS platform your company uses to distribute its app. And maybe you can't. But people do, and surprisingly it almost never results in a horrible, widespread security exploit. I guess most people are surprisingly non-evil.
Anyway, one thing that bothered me a lot, both in the old NVS days and with today's Dockerfiles, was all the extra crud that ends up in your image when you install it this way. It's a whole operating system! For example, I looked at what might be the official Dockerfile for building a MySQL server image (although I'm not sure how one defines "official") and it involves installing a whole C++ compiler toolchain, then copying in the source code and building and installing the binary. The resulting end-user container still has all that stuff in it, soaking up disk space and download time and potentially adding security holes and definitely adding a complete lack of auditability.
I realize nobody cares. I care, though, because I'm weird and I care about boring things, and then I write about them.
Anyway, there are at least two other ways to do it. One way endorsed by Dockerfiles is to "skip" intermediate layers: after building your package, uninstall the compiler and extra crap, and then don't package the "install compiler" and "install source code" layers at all. Just package one layer for the basic operating system, and one more for all the diffs between the operating system and your final product. I call this the "blacklist" approach: you're explicitly excluding things you don't want from your image. Dockerfiles make this approach relatively easy, once you get the hang of it.
A more obsessive approach is a "whitelist": only include the exact files you want to include. The trick here is to first construct the things you want in your final container, and then at the end, to copy only the interesting things into a new, fresh, empty container. Docker doesn't really make this harder than anything else, but Docker doesn't really help here, either. The problem is that Docker fundamentally runs on the concept of "dive into the container, execute some commands, make a snapshot" and we don't even have a container to start with.
So that's the direction I went with my redo tutorial; I built some scripts that actually construct a complete, multi-layered container image without using Docker at all. (To Docker's credit, this is pretty easy, because their container format is simple and pretty well-defined.) Using those scripts, it's easy to just copy some files into a subdirectory, poke around to add in the right supporting files (like libc), and then package it up into a file that can be loaded into docker and executed. As bonus, we can do all this without being the 'root' user, having any docker permissions, or worrying about cluttering the local docker container cache.
I like it, because I'm that kind of person. And it was a fun exercise. But I'm probably living in the past; nobody cares anymore if it takes a few gigabytes to distribute an app that should be a few megabytes at most. Gigabytes are cheap.
Side note: incremental image downloads
While we're here, I would like to complain about how people distribute incremental changes to containers. Basically, the idea is to build your containers in layers, so that most of the time, you're only replacing the topmost layers (ie. your app binaries) and not the bottommost layers (ie. the OS). And you can share, say, the OS layer across multiple containers, so that if you're deploying many containers to a single machine, it only has to download the OS once.
This is generally okay, but I'm a bit offended1 that if I rebuild the OS with only a few changes - eg. a couple of Debian packages updated to fix a security hole - then it has to re-download the whole container. First of all, appropriate use of rsync could make this go a lot more smoothly.
But secondly, I already invented a solution, eight years ago, and open sourced it, and then promptly failed to document or advertise it so that (of course) nobody knew it exists. Oops.
Unlike bup, the bupdate client works with any dumb (static files only)
http server. bupdate takes any group of files - in this case, tarballs,
.iso images, or VM disk images - runs the bupsplit algorithm
to divide them into chunks, and writes their file offsets to
files ending in
.fidx (file index, similar to git's
indexes and bup's
.midx multi-pack indexes), which you then publish along
with the original files. The client downloads the
.fidx files, generates
its own index of all the local files it already has lying around (eg. old
containers, in this case), and constructs exact replicas of the new files
out of the old chunks and any necessary newly-downloaded chunks. It
requests the new chunks using a series of simple HTTP byterange requests
from the image files sitting on the server.
It's pretty neat. There's even an NSIS plugin so that you can have NSIS do the download and reassembly for you when installing a big blob on Windows (which I implemented for one of our clients at one point), like for updating big video game WAD files.
(By the way, this same technique would help a lot with, say,
update's process for retrieving its Packages files. All we'd need to do is
Packages.fidx alongside the
Packages file itself, and a new client
which understood bupdate could use that to retrieve only the parts of the
Packages file that has changed since last time. This could reduce
incremental Packages downloads from several megabytes to tens of kilobytes.
Old clients would ignore the
.fidx and just download the whole
file as before.)
bupdate is pretty old (8 years now!), and relies on an even older C++ library that probably doesn't work with modern compilers, but it wouldn't be too hard to rejuvenate. Somebody really ought to start using it for updating container layers or frequently-updated large lists. Contact me if you think this might be useful to you, and maybe I'll find time to bring bupdate back to life.
On that note, if you haven't heard of it already, you really should know
It's widely known that gzip'd files don't work well with rsync, because if even one byte changes near the beginning of the file, that'll change the compression for the entire rest of the file, so you have to re-download the whole thing. And bupdate, which is, at its core, really just a one-sided rsync, suffers from the same problem.
But gzip with
--rsyncable is different. It carefully changes the
compression rules so the dictionary is flushed periodically, such that if
you change a few bytes early on, it'll only disrupt the next few kilobytes
rather than the entire rest of the file. If you compress your files with
--rsyncable, then bupdate will work a lot better.
Alternatively, if you're using a web server that supports on-the-fly compression, you can serve an uncompressed file and let the web server compress the blocks you're requesting. This will be more byte-efficient than gzip --rsyncable (since you don't have to download the entire block, up to the next resync point), but costs more CPU time on the server. Nowadays, CPU time is pretty cheap and gzip is pretty fast, so that might be a good tradeoff.
1 When I say I'm offended by the process used to update containers, it's not so much that I'm offended by people failing to adopt my idea - which, to be fair, I neglected to tell anyone about. Mostly I'm offended that nobody else managed to invent a better idea than bupdate, or even a comparably good idea2,3, in the intervening 8 years. Truly, there is no point worrying about people stealing my ideas. Rather the opposite.
2 Edit 2019-01-13: Eric Anderson pointed me to casync, which is something like bup and bupdate, and references bup as one of its influences. So I guess someone did invent at least a "comparably good idea." I think the bupdate file format is slightly cuter, since it sits alongside and reuses the original static files, which allows for cheap backward compatibility with plain downloads or rsyncs. But I'm biased.
3 Edit 2019-01-13:
JNRowe points out
a program called zsync, which sounds
very similar to bupdate and shares the same goal of not disturbing your
original file set. In fact, it's even more clever, because you can publish
a zsync index on one web site that refers to chunks on another web site,
allowing you to encourage zsync use even if the upstream maintainer doesn't
play along. And it can look inside .gz files even if you don't use
--rsyncable! Maybe use that instead of bupdate. (Disclaimer: I haven't
tried it yet.)
Factors in authentication
Multi-factor authentication remains hard-to-use, hard-to-secure, and error-prone. I've been studying authentication lately to see if it might be possible to adapt some security practices, especially phishing prevention, from big companies to small companies and consumers.
Here's what I have so far.
(Sorry: this is another long one. Security is complicated!)
The three factors aren't independent
The traditional definition of multi-factor authentication (although I couldn't find a reference to where it originated) is that you should have two or more of:
- Something you know (eg. a password)
- Something you have (eg. a card)
- Something you are (eg. a fingerprint or retinal scan).
This definition is actually not bad, and has done its job for years. But it tends to lead people into some dead ends that, years later, we now know are wrong.
First of all, let's talk about #3: something you are. The obvious implementation of this is to scan your fingerprint or retina at a door or computer screen. The computer looks it up in a central database somewhere and allows or denies entry. Right?
Well, it turns out, not so much. First of all, there are huge privacy implications to having your biometric data in a central server. I have a Nexus card, which is a programme created jointly by the Canada and U.S. governments to allow theoretically slightly faster border crossing between the two countries. In reality, it is almost certainly a ploy to collect more biometric data. For some reason, Canada insisted on collecting my retinal scans, and the U.S. insisted on collecting my fingerprints, and it seems a bit unlikely to me that the information from one of those will never leak to the other. So now two of my biometrics are in two different databases, someday to be (if not already) merged into one. Whether or not this is legal is probably of only academic interest to the spy agencies doing it.
Now that's for government use, which is bad enough. But I don't need every bank, employer, or apartment building to have my biometrics, for all sorts of reasons. It's a privacy invasion. It can be used to track me well beyond the intended use case. It can be copied, by a sufficiently skilled attacker, and used to masquerade as me. And when that happens, it's not something I can replace like I can replace a stolen password. The dark joke among security engineers is at least I have ten fingers, so I get ten tries at keeping it safe.
Yeah, yeah, this is old news, right? But actually, biometrics have an even bigger problem that people don't talk about enough: false positives. If you have a giant database of fingerprints or retinas or DNA samples or voiceprints or faces, and you have an algorithm for searching the database, and you have one sample you want to look up in the database to find the right match... it turns out to be a hard problem. ML/AI people hide this problem from you all the time. There's a fun toy that tells you which celebrity you look most similar to, and it's eerily accurate, right? Sure. But the toy works because of false positives. It's easy, relatively, to find matches that look similar to you. It's tough to figure out which of those matches, if any, really are you. Which is fine for a toy, but bad for authentication.
(iPhotos and Facebook's face tagging are all susceptible to the same thing. It's pretty easy to see if a given photo matches someone already on your friend list, but very hard, maybe impossible, to match accurately against a single worldwide global database. That's why it only ever suggests tagging a face as one of your friends, even if it's wrong. This whole thing also creates serious obstacles to those dystopian "we'll assign a social score to every citizen and track them all everywhere through public security cameras" plans.)
The fix for false positives is surprisingly easy, and lets you fix a bunch of the other problems with biometrics at the same time: combine it with factor #2, ie. something you have, like a card or a phone.
Instead of storing your fingerprints or retinal scan or facial structure in a big central database, just put it on a card or a phone, in a secure element. Then, when you want to authenticate, the scanner makes a connection to your device, sends the current biometric scan, and the card compares it to its single target (you), and chooses whether or not to unlock itself in response. If it does, it tells the scanner (usually with a cryptographic signature) that all's well.
This is much better: your search algorithm can be much worse. Even a 1 in 1000 false positive rate will be undetectable by random end users, even if it would be utterly hopeless in a million-person database. You don't need to put your biometrics in a central database, which is a big improvement when that database inevitably gets stolen. And best of all, a skilled attacker can't only clone your biometrics, because they won't have the key from your device. No more showing up at the secret facility with some synthetic rubber fingers and waltzing right in. That other factor (something you have... and will notice when it's stolen) goes a long way.
But what exactly is that second factor?
"Something you have" is not quite right
So much for biometrics, factor #3. But what about #2? "Something you have" is pretty straightforward, right? A card, a phone, an OTP fob, a U2F key, whatever. If you have it, that's an authentication factor.
Well, almost. We're going to need to be a little more specific in order to explain exactly why things work the way they do, and why some attacks are possible when you might think they're impossible.
First, let's be clear. "Something you have" is always an encryption key, a long sequence of bits, a large number. It's not a screwdriver, or an 1800s-style physical iron key (physical analogies for two-factor authentication all eventually turn out badly), or an ear infection. It's an encryption key. Let's call it that.
Secondly, it's not just any encryption key. Anybody can generate an encryption key if they have a big enough random number generator. Your encryption key is special. It's a key that you have previously agreed upon ("enrolled") with an authentication provider.
So let's change the definition. Factor #2 isn't something you have; it's just a particular very large number. Maybe you keep it on a device. Maybe you memorize it (if you can memorize 256 digits of pi, I guess you can memorize a 2048-bit RSA key, but I don't envy your key rotation strategy), and then it's something you "know." Maybe you get a QR code tattoo, and then it's something you "are." But it's a big number, and it's valuable only because the other end already knows, by prior agreement, that they will trust someone who has that number.
(Side note: instead of saving each enrolled key in a database, the authentication provider might sign your (public) key using a Certificate Authority (CA). The net result is the same: in order to get a signature from the CA, you had to enroll with it at some point in the past. The difference only affects the format of the backend authentication database on the server, which is opaque to us.)
From now on, factor #2 is "your previously-enrolled private key."
All factors are not created equal
An incorrect conclusion from the "something you know, something you have, something you are" definition is that you can pick any two of these and have a system that's about equally secure as any other combination.
In fact, factor #2 - the previously-enrolled private key - is by far the safest of the three. This is why banks give you bank cards but are willing to "authenticate" those cards with (in the U.S.) a useless pen-and-paper signature, or (elsewhere) an easily-stolen-or-guessed 4-digit PIN.
Your factor #2 key can be easily replaced (and re-enrolled). You can have as many as you want, corresponding to as many pseudonymous identities as you want. It's a truly random stream of bits, not a weak one you composed because it would be easy to remember. It's easily and perfectly distinguished from everyone else's key. It can't be used to track you between use cases (unless your device is designed to leak it intentionally, sigh). With software configured correctly, it can't be phished. It can be stolen - but (if the hardware is designed right) your attacker has to steal it in the physical world, which eliminates Internet-based attackers, which is most of them. And if they steal your physical key card, you'll probably notice pretty fast.
Even ancient magstripe credit cards share most of these advantages. Their main weakness is that a physical "attacker" can clone rather than steal your card, which is much harder to detect. That's the only real advantage of "chip" cards: they can't be trivially copied. (The switch from signatures to PINs is mostly a distraction and adds little.) We make fun of old-fashioned banks for "still" accepting magstripe cards and being trapped in the past, but those cards are vastly more secure than almost anything else your non-bank service providers use.
(Let's skip over Internet shopping, in which you just type in the not-so-secret magic number from the back of the credit card and combine it with your not-so-secret postal code. Oh well. Sometimes convenience trumps security, even for a bank. But they still keep issuing those physical cards.)
Your phone is probably doing this right
By the way, this whole post is just repeating what experts already know. Your phone, especially if it's an iPhone, already benefits from people who know all this stuff.
Your iPhone contains a previously-enrolled private key, in a secure element, that can uniquely identify it to Apple. You can generate more of them, like the key used to encrypt your local storage. Actions on the key can be restricted to require a fingerprint first, and even the fingerprint reader uses its own previously-enrolled private key link to the secure enclave, preventing attackers from tearing apart your phone and feeding it digital copies of your fingerprint, bypassing the sensor. Plus, after a power cycle, the iPhone secure element refuses to unlock at all without first seeing your PIN, which an attacker can't lift from your hotel room or jail cell like they can lift your fingerprints. And it only lets an attacker try a few guesses at the PIN before it locks iself permanently.
FaceID is similar. It doesn't really matter if the fingerprint or FaceID algorithms have a fairly high false positive rate, because of all the other protections, especially that previously-enrolled private key. Somehow, an attacker either needs your phone, or needs to hack the software on your phone and wait for you to scan a biometric.
Awesome! Let's use our phone to authenticate everything!
Yeah, hold on, we're getting there.
That pesky "previous" enrollment
So here's the catch. The whole multi-factor authentication thing is almost completely solved at this point. Virtually everybody has a phone already (anyway, more people have phones than computers), and any phone can store a secret key - it's just a number, after all - even if it doesn't have secure element hardware. (The secure element helps against certain kinds of malware attacks, but factor #2 authentication is still a huge benefit even with no secure element.)
The secret key on your phone can be protected with a PIN, or biometric, or both, so even if someone steals your phone, they can't immediately pretend to be you.
And, assuming your phone was not a victim of a supply chain attack, you have a safe and reliable way to tell your phone not to authorize anybody unless they have your PIN or biometric: you just need to be the person who initially configures the phone. Nice! Passwords are obsolete! Your phone is all three authentication factors in one!
But... how does a random Internet service know your phone's key is the key that identifies you? Who are you, anyway?
The thing about a previously-enrolled private key is you have to... previously... enroll it... of course. Which is a really effective way of triggering Inception memes. Just log into the web site, and tell it to trust... oh, rats.
Authentication is easy. Enrollment is hard.
Which leads us to a surprising conclusion: security research has actually come really far. We now have good technology to prevent phishing while decreasing the chance of forgotten passwords (PINs are easier than passwords!) and avoiding all the privacy problems with biometrics. Cool! Why isn't it deployed everywhere?
Because enrollment remains unsolved.
Here are some ways we do enrollment today:
Pick a password at account creation time and hope you weren't being phished at that time. (Phishable. Incidentally, "password manager" apps are a variant of this.)
Get a security token (OTP, U2F, whatever) issued by your employer in person, after some kind of identity check. (Very secure, but tedious, which is why only corporations bother with it.)
Mail a token to your physical mailbox and hope nobody else gets there first. Banks do this with your credit card. (Subject to (eventually detectable) mail theft by criminals and roommates, but very effective, albeit slow.)
Log in once with a password, then associate a security token for subsequent logins. (Password logins are phishable.)
Your physical device (eg. iPhone) arrives with a key that the vendor (eg. Apple) already trusts, so at least it knows it's talking to a real iPhone. (Great, but it doesn't prove who you are.)
Send a confirmation message to your email address or phone number, thus ensuring you own it. (Inception again: this just verifies that you can log in somewhere else, with an unknown level of security. Email addresses and phone numbers are stolen all the time.)
Ask for personal information, eg. "security questions", your government SIN, your physical address, your credit card number. (These answers aren't as secret as you'd like to think.)
All the above methods have different sorts of flaws or limitations.
That said, we should draw attention to some special kinds of "enrollment" that are actually nearly perfect:
Setting a PIN on my iPhone, assuming the iPhone's supply chain was secure, is reliable and easy because of a physical link between it and me.
Setting fingerprint or face unlock on my iPhone is reliable and easy, again because of a physical (sensor) link between it and me.
My corporate-issued U2F token can issue signatures to my web browser because they are physically linked, and it can assume I plugged it into whatever computer on purpose.
Bluetooth devices make me do an explicit enrollment ("pairing") phase on both devices, which prevents attacks other than by people who are physically nearby at the time of initial pairing. (When available, a pairing PIN adds a small amount of security on top, but is usually more work than it's worth unless the NSA is parked outside.)
Registering a new iCloud account on my iPhone can be safe because the iPhone's built-in certificate can be used to mutually validate between Apple's auth server and the phone. (This is only safe if you create your account from a suitably secure iPhone in the first place. Securing a non-secured account later is subject to the usual enrollment problems.)
Registering additional Apple devices to your existing iCloud account, where Apple pops up a confirmation on your already-registered Apple devices and then makes you type an OTP into the new device. (The OTP in this case might seem extraneous, but it ensures that someone can't just trigger a request from their own device at the same moment as you trigger a request from yours, and have them both go through because you click confirm twice. It's also cool that they show the GPS location of the requester in the confirmation dialog, although that seems forgeable.)
...okay, I didn't expect this article to turn out as an iPhone ad, but you know what? They've really thought through this consumer authentication stuff. (To their credit, although Google is behind on consumer authentication, their work on U2F for corporate athentication is really great. It relies on the tedious-but-highly-effective human enrollment process.)
It seems to me that the above successful enrollment patterns all use one or more of the following techniques:
A human authenticates you and issues you a token (usually in person).
A short-distance, physical link (proximity-based authentication) like a biometric sensor, or USB or bluetooth connection.
Delegation to an existing authenticator (like a popular web service, eg. via oauth2, or by charging a credit card) to confirm that it's you. Once you have even a single chicken, you no longer have a chicken-or-egg problem. After delegating once, you can enroll a new token and avoid future delegations to that service if you want. (That might or might not be desirable.)
That's all I've got. In this article, I'm just writing about what other people do. I don't have any magic solutions to enrollment. So let's leave enrollment for now as an only-partially-solved, rather awkward problem.
(By the way, the reason gpg is so complicated and unusable is that the "web of trust" is all enrollment, all the time.)
But now that we understand about enrollment, let's talk about...
Why is U2F so effective?
Other than enrollment, using U2F (Universal Second Factor) is pretty simple. You get a physical key, plug it into a USB port (or nowadays sometimes a bluetooth link), and you press it when prompted.
When you press it, the device signs a request from any requesting web site, via the web browser, thus proving that you are you. (Whoever "you" are was determined during enrollment.)
The non-obvious best feature is that the signature only applies to a particular web domain that made the request, and the web browser enforces that the domain name is "real" by using HTTPS certificates. (Inception again! How do we know we can trust the web site? Because their HTTPS cert was signed by a global CA. How do we know which CAs to trust? A list of them came with our browser or our OS. How can we trust the browser or OS? Uh... supply chain?)
Anyway, put it all together, and this domain validation is like magic; it completely defeats phishing.
A good mental model for phishing is to imagine some person sitting there with their own web browser, but also operating a web site that proxies your requests straight through to the "real" web site, while capturing the traffic. So you accidentally visit gmail.example.com, which proxies to gmail.com. Somehow you are fooled into thinking you need to log in as if it's gmail. You type your username and password, which are logged by the proxy and sent along to gmail.com. Our hacker friend reads the username and password from their trace, and logs into gmail.com on their web browser. You've been phished!
Now let's add U2F. After you type your password, the site asks you to tap your U2F. You tap it, and generate a valid signature directed to... gmail.example.com, the site your browser confirmed as having made the request. They proxy that valid signature along to gmail.com, but it's useless: gmail.com doesn't trust signed authenticators directed at gmail.example.com. The phishing has been blocked completely.
(By the way, OTP - any device that generates a number that you have to type in - doesn't prevent phishing at all. At least, not if the attacker can do man-in-the-middle attacks, which they probably can. In our example, you'd happily type the OTP into gmail.example.com, for the same reason you typed your password, and it would proxy it to gmail.com, and bam, you lose.)
(I mentioned Apple's authenticator above, which requires you to type an OTP. It doesn't have this weakness, because they are also authenticating using a U2F-like mechanism behind the scenes. Forcing you to also type the OTP is an extra layer of security, part of a very subtle enrollment process that defends against even rarer types of attack.)
Note also that you can unplug your U2F token and plug it into any other computer and web browser instance. Without any special pairing process, it can trust the new web browser and the web browser can trust the U2F. Why? Because of that physical link again. Somebody plugged this U2F key into this browser. That means they intended for them to trust each other. It's axiomatic; that's the design.
For this reason, pure U2F doesn't work well without a second factor, such as a password. Otherwise, if someone steals your U2F key, they could impersonate you by plugging it into their own computer, unless there's something else proving it's you.
The reason these two factors are so effective is because of the threat model. Basically, there are physical attackers (eg. purse snatchers) and there are Internet attackers (eg. spammers). Physical attackers can steal your U2F device, but they also need to see you type your password, and the U2F is traceable when used, and the attack is unscalable (you need a physical human to be near you to steal it). On the other hand, Internet attackers can easily phish your password, but can't steal your U2F.
Skilled, well-funded attackers with physical presence can do both. Sorry. But most of us aren't important enough to worry about that.
Can I use my phone as a U2F?
Learning about the above led me to ask a question many people have asked: why do I need the stupid physical security token, which is only plugged into one computer at a time and which I'm guaranteed to misplace? Can't my phone be the security token?
This is a good question. Aside from enrollment being generally hard, the stupid, expensive physical security token is, almost certainly, the thing preventing widespread adoption of U2F. I doubt it'll ever catch on, other than for employees of corporations (who pay you for the inconvenience and have good ways to enroll a new token when you inevitably lose yours).
It ought to be easy, right? A U2F is really just a secure element containing a previously-enrolled private key. As established earlier, you can put a private key anywhere you want. Heck, your iPhone has one already, protected by a PIN or fingerprint or face. We're almost there!
Almost. The thing is, when I'm browsing on my computer with my phone in my pocket, it's not my phone that needs to authenticate with the web site: it's my computer. So my phone and computer are going to have to talk to each other, just like a USB U2F and a computer talk to each other.
Both my web browser and my phone are connected to the Internet, so that's the most obvious way for them to talk. (If my computer weren't on the Internet, it presumably wouldn't need to authenticate with a web site. And my phone probably has a cellular data plan, so situations where it's on the Internet are "mostly" a superset of situations where my computer is.)
So all I need to do is generate an encrypted push request from my browser, through the Internet, to my phone, saying "please sign an authorization for gmail.com." Then I tap my phone to approve, and it sends the answer back, and the web browser proceeds, exactly as it would with a USB-connected U2F token. Right?
Almost. There's just one catch. How does my phone know the request is coming from a trusted web browser, and not that browser the phishing attacker, up above, was operating, or from some other random browser on the Internet? How can my phone trust my browser?
You have to... previously enroll... the browser's secret key... with your phone's U2F app.
Okay, okay, we can do this. Recall that there are three ways to do a safe enrollment: 1) a physical person gives you a key; 2) a direct physical link; 3) delegation from an existing authenticator. Traditional USB U2F uses method #2. When you physically plug a U2F into a new web browser, it starts trusting that web browser. Easy and safe.
But your computer and your phone aren't linked (if they are, then problem solved, but they usually aren't). Enrollment between your browser and your phone can be done in various different ways: you could make a bluetooth link, or generate a key on one and type it into the other, or scan a QR code from one to the other. Let's assume you picked some way to do this, and your browser is enrolled with your phone.
Now what? Okay, now when I want to log in, I must know my password and must physically possess my computer (containing the enrolled browser), and my phone with its U2F app. I type my password, which triggers a U2F request from the web site to the browser, which signs and pushes a signing request to my phone, which pops up an acknowledgement, which I tap, and then it signs and sends an answer back to my browser, which sends it back to the web site, and success!
Except... something's fishy there. That was more complicated than it needs to be. Notice that:
The browser is trusted by the phone U2F app because of a previous enrollment.
The phone U2F app is trusted by the web site because of a (different) previous enrollment.
My screen tap (or fingerprint scan) is trusted by the U2F app because of my physical presence (or another previous enrollment).
Why do I even need my phone in this sequence? I've already enrolled my browser; it's trusted. During enrollment, we could have given it a key that's just as useful as a U2F device, and we could have done that precisely as safely as enrolling a new U2F device. (It's nice if my computer has a secure element in which to store the key, but as established above, that only matters for security if my computer gets hit by malware.)
We could just do this instead: I type my password. Web site generates U2F request. Web browser signs the request using its previously-enrolled private key, enrolled earlier using my phone. Login completes, and phishing is prevented! Plus we don't rely on my phone or its flakey Internet connection.
(Note that while this does indeed prevent phishing - the biggest threat for most organizations - it doesn't prevent your roommate, spouse, or 6-year old child from watching you type your password, then sometime later unlocking your computer and using its already-enrolled web browser to pretend to be you. By leaving your 30-pound desktop computer just lying around, you've let your second factor get stolen. In that way, phones are better second factors than computers, because we're so addicted to them that we rarely let them out of our sight. That doesn't stop your children from borrowing your fingerprints while you nap, however.)
There is, of course, still a catch. If you do this, you have to enroll every browser you might use, separately, using your phone, with one of these tedious methods (bluetooth, typing a key, or QR code scan). That's kind of a hassle. With a USB U2F key, you can just carry it around and plug it into any computer. Even better, when you stop trusting that computer, you just unplug the token, and it cancels the enrollment. With a physical device, it's super easy to understand exactly what you've enrolled and what you haven't. (With a software key, the server needs a good UI to let you un-enroll computers you don't need anymore, and users will surely screw that up.)
What people tend to miss, though, is that this enrollment is necessary whether or not you send a push notification to the phone during login. The push notification is only secure if this specific browser instance is enrolled; but if this browser instance is enrolled, and my computer is not easier to steal than my phone, then the push notification adds no extra security. The enrollment was the security.
And that's where I'm stuck. I've more or less convinced myself that phone-based OTP (prone to phishing) or phone-push-based U2F (not useful after initial enrollment) add no interesting security but do make things harder for end users. I guess they call that "security theatre." Meanwhile, physical U2F tokens are unlikely to become popular with consumers because they're inconvenient.
At least that conclusion is consistent with the state of the world as it exists today, and what the market leaders are currently doing. Which means maybe I've at least explained it correctly.
(Special thanks to Tavis Ormandy, Reilly Grant, and Perry Lorier for answering some of my questions about this on Twitter. But any mistakes in this article are my fault, not theirs.)