Content-Centric Networking (CCN) as an alternative to IP
I've been meaning to read the CCN
article from Van Jacobson and friends for months now, but I finally got
around to it on this lazy Sunday. It starts with mostly synthesis of other
research in the area of content-named delivery, but has a few major
innovations that make it very interesting. In particular, it is designed
around the same core philosophy as IP: mainly statelessness, automatic
congestion control, and routing tables based on prefixes (though in this
case the prefixes are names rather than addresses). The routing tables go
so far that they even suggest using (unmodified!) OSPF for discovering
topology. (I'm not convinced that part would actually scale, but maybe.)
The basic architecture is that a client sends out an "Interest" packet to
register its interest in a particular named bit of data. Your (content)
router receives that and forwards it to one or more onward routers, based on
its (content) routing table (aka "Forwarding Information Base"), and so on,
recursively. Finally the Interest reaches the end provider of the data, who
sends back a response. So far, so simple. But the key innovation is what
happens when packets are dropped or duplicated. If you get the same
Interest from more than one source, you only forward it the first time; if
you get the Data response more than once, you only forward it once (well, to
each existing Interest) and then throw away the Interest.
That sounds simple for the same reason that IP itself is deceptively simple.
But as with IP, the end result of this simplicity seems like magic. First
of all, it eliminates the problem of "routing loops"; if router X is
configured to send Interests upstream to Y and Z, and it receives the same
Interest from both A and B, then it will only upstream that Interest exactly
once (to Y, or Z, or both, depending how its routing table is set up). So
in that example, if A, B, Y, and Z are actually peers, you don't have to
worry about Y and Z looping Interests back to A and B. That is, they *will*
forward to A and B, but since A and B have already seen that Interest, the
buck stops there.
Secondly, if nobody on the whole network provides a given bit of data, it's
no big deal; nobody will answer it. Routers will keep the Interest in their
tables for a while (so they know who to forward the Data response back to if
it shows up), but they won't send a response, because there is no response.
As in IP, you can't automatically know the difference between "nobody is
there" and "packet got dropped."
This, in turn, is how flow control / congestion control works. Routers
don't ever resend Interest packets on their own; that's the job of the
client endpoint. (Retransmits include a "nonce" - just a random number - to
ensure that they don't get eaten by the anti-routing-loop logic above.) So
as with IP, an overloaded router can simply drop excess Interest packets to
slow down network activity. And just like with TCP, a client endpoint can
create a "sliding window" implemented by having multiple outstanding
Interest packets at once. How many at a time? That's your window size, and
it depends on observed retransmit characteristics.
Apparently it's actually better than TCP's retransmit characteristics in the
sense that the flow control is on a link-by-link basis instead of
end-to-end. I don't really know how or why that's better; there's a note in
the doc that says "We will cover this topic in detail in a future paper,"
which I hope is not equivalent to a "remarkable proof of this theorem which
this margin is too small to contain." I haven't looked - maybe they
published it already. For now, I'll take their word for it.
The design also mostly-transparently deals with support for multiple network
interfaces - like a wired, wifi, and 3G network all at once - by allowing
you to just forward all your Interest packets on all the interfaces if you
want. Then, using techniques similar to ethernet bridging, you adjust your
routing table priorities based on latency/speed/loss of responses received.
You occasionally do an experiment by requesting an answer from a supposedly
non-optimal interface just in case things have changed; if you don't get an
answer back on the supposedly optimal interface, maybe it's dead, so you can
failover right away.
Now, we all know failover is nice for content-named networks (along with
caching, that's basically the whole point). But this is where things get
really fun. You can carry a content-named network over IP network sessions;
in fact, that's how their prototype was built. But what if you could carry
IP network sessions over a content-named network?
Well, according to a
later paper by the same team about VoCCN (Voice Over CCN), you can! And
the way it works falls out naturally from the design. You just send out a
*window* of Interest packets for stuff that doesn't yet exist. Initially,
you get no response. But the unanswered Interest packets are remembered by
the router nodes for a short time, so as the live data is created by the
server endpoint, it just responds to the outstanding Interests and the data
gets distributed back to the original client(s).
The reverse path, data sent from the client to the server, is encoded by
registering Interests in which the last segment of the path is the actual
data you want to send. The server can then send a small Data
(acknowledgement) packet to state that it was received and doesn't need to
be retransmitted.
They point out that one neat side effect of all this is if your client is
multi-homed and one of its network links drops out or moves - say if you
move from one wifi network to another, or from wifi to wired - then you can
still recover, because the intermediate content routers have actually cached
the data you would have lost by hopping networks, and you don't have to
"reconnect" because your connection was always about the content, not the
endpoint.
Internet-wide multicast-like behaviour - with scalable retransmits!! - is an
inherent part of this design. Want to send something to more than one
client with minimal load? Don't do anything special. Just respond to
Interests. If there's more than one Interest for a given bit of data, any
content router along the way can receive just one copy and fan it out to all
the other recipients.
Symmetrically (but separately), you can use multicast or broadcast on a LAN
to send out Interests, so if someone nearby has already seen what you're
asking about, they can send it to you. Your local content router(s) could
also choose to multicast Data response packets on the LAN if more than one
local machine has expressed Interest in it, using whatever heuristics or
conventions you want to use. Clients and routers that receive unsolicited
Data packets just ignore them.
Finally, note that, unlike many recent trends in content storage, content is
not named directly after its hash. As they point out, the problem with that
method is you need a separate name-to-hash conversion layer, and that a
client can't request data which doesn't exist yet - which prevents web-style
dynamic page generation (based on query parameters) as well as disallowing
any kind of two-way communication channel like they created in VoCCP.
Hash-based naming also ends up necessitating things like DHTs for routing,
instead of allowing the prefix-based routing they recommend. I have to say,
prefix-based routing does have a lot of appeal to me, after having
considered hash tables and DHTs pretty extensively. The problem with hash
tables is you lose all locality of reference, so you end up with related
data (for example, consecutive blocks of a big file, or several files in one
directory) scattered all over the place instead of (if your cache is written
carefully) stored consecutively on disk.
On the other hand, naming your content after its hash makes verification
super easy; if you request block 234283123, and the hash of the received
block is not 234283123, then you reject it. CCN, by comparison, if I
understand their paper correctly, has on the order of 325 bytes of security
crap for every 1024 bytes of content. (32% overhead?!) Maybe I'm reading it
wrong, but I suspect not; the security section of their paper seems to be a
complicated multi-layer abomination, leading me to suspect it was either an
afterthought, or designed by someone totally different from the people who
designed the core of CCN.
So the security part is definitely going to need more work, but I think it's
not unresolvable, for the same reason that security over IP (itself not
secure) is resolvable (by adding SSL). For example, with something like
VoCCP, you could just rely on the datastream to itself be encrypted, using
(literally) SSL, and leave it out of the transport layer altogether. That
leaves you with a problem where an attacker could insert fake data into your
SSL stream, which would be rejected by SSL, probably aborting the connection
and leading to a denial of service. That's already possible with TCP, and
hasn't been a huge problem, although it would be even easier to create this
kind of attack in the case of a network that has lots of untrusted caches.
(That is, you wouldn't need to be a man in the middle. Or if you prefer,
there are more men in the middle.)
Anyway, there are lots of really great ideas in these two papers. The best
part is the core concepts - datagram-based Interests and Data packets - that
can be applied separately from the other parts (routing protocols, security
protocols, tunneling protocols). Again, just like with IP.
I think something like CCN really may be the future of networking. It will
take a bit of work first though. :)
Addendum: Replace IP? Really?
Yes. This is not as crazy as it sounds, although of course it would take a
long time to complete. You might recall that I think IPv6 replacing IP as the core of
the Internet is rather unlikely, but that's because of chicken-and-egg
problems. IPv6 won't provide any utility to the Internet until *all* the
clients and *all* the servers have switched; until then, it's just twice as
much work for everyone. Yes, you can tunnel IPv6 in IPv4, and vice versa,
but if your personal workstation *and* the server aren't both speaking IPv6,
you don't benefit from IPv6.
CCN (or variants of it) are different, because they're designed from the
start to be carryable over IP (IPv4 or IPv6). That means they cooperate
with the existing system. Moreover, you can tunnel IP (IPv4 or IPv6) over
CCN, so if you did replace all your IP routers with CCN routers, your
computer could still talk IP if it wanted, and still reach everyone on an IP
network. (This is very different from other content-based networking
proposals, which lack that property and thus could never totally replace
IP.) You could even host IP-based services and advertise them using a
subdomain prefix on the CCN network (using OSPF or whatever they settle on),
so incoming connections would work.
Most importantly, though, CCN can be implemented entirely in userspace.
Your browser could support it even if your OS kernel doesn't. That's unlike
IPv6, where you could theoretically have an "IPv6 proxy server" accessible
from a user space process over IPv4 (or vice versa), but you wouldn't do it
that way because if you're using a proxy server anyhow, you'd probably use a
session-level (ie. HTTP) proxy server, skipping the IPv6 layer altogether.
CCN is different in this case because it provides actual value above and
beyond what IP provides. If a CCN server exists for certain kinds of
things - video files, for instance, like Youtube or iTunes - then your life, and
the life of your ISP, and the lives of the people serving the files - will
get incrementally better as people add more CCN servers, routers, and
clients. That encourages incremental adoption because there's an actual
reward for each increment. With incremental deployment of IPv6, your only
reward is the knowledge that you made the world just a little bit more
complicated today. Yay you.
November 14, 2012 06:30