<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>apenwarr - Business is Programming</title>
    <description>apenwarr - Business is Programming - NITLog</description>
    <link>http://apenwarr.ca/log/</link>
    <language>en-ca</language>
    <generator>NITLog</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <item>
      <title>Cheap Thrills</title>
      <pubDate>Wed, 12 Jun 2013 07:35:30 +0000</pubDate>
      <link>http://apenwarr.ca/log/?m=201306#12</link>
      <guid isPermaLink="true">http://apenwarr.ca/log/?m=201306#12</guid>
      <description>&lt;p&gt;
I've heard it said that you can just alternate between two UI
themes once a week, and every time you switch, the new one will feel
prettier, newer, and more exciting than the old one.
&lt;p&gt;
This is a natural tendency.  The human mind is intrigued by change.  That's
where fashion comes from, and fads.  It gives you a little burst of some
chemical, maybe adrenaline (fear of the unknown?), or endorphins
(appreciation of the unexpected?), or perhaps some other kind of juice I
heard of somewhere but I don't really know what it does.
&lt;p&gt;
In tech, this kind of unlimited attraction to the unexpected is the main
characteristic of the first phase of the &lt;a
href=&quot;http://en.wikipedia.org/wiki/Technology_adoption_lifecycle&quot;&gt;Technology
Adoption Lifecycle&lt;/a&gt;, the so-called &quot;Innovators.&quot;
&lt;p&gt;
&lt;div style=&quot;float: right; text-align: center&quot;&gt;&lt;img
src=&quot;http://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Technology-Adoption-Lifecycle.png/320px-Technology-Adoption-Lifecycle.png&quot;&gt;&lt;br&gt;&lt;i
style=&quot;font-size: 60%&quot;&gt;Source: &lt;a
href=&quot;http://commons.wikimedia.org/wiki/File:Technology-Adoption-Lifecycle.png&quot;&gt;Wikimedia
Commons&lt;/a&gt;&lt;/i&gt;&lt;/div&gt;
&lt;p&gt;
Perhaps people are happy to be included in the Innovator category.  But
Innovation isn't just doing something different for the sake of being
different.  Real innovation is the *willingness* to take the *risk* to do
something different, because you know that difference is expensive, but that
it will pay off in some way that more conservative sorts will fail to
recognize until later.
&lt;p&gt;
In fashion, the end goal is to catch people's attention; if you do that, you
are innovative.  That's why fashion repeats itself every few years: because
you can be innovative over and over again with the same ideas, rehashed
forever.
&lt;p&gt;
In technology, we can hold you to a higher standard.  Innovation requires
difference, but it also requires a vision of usefulness.  Change is
expensive.  Staying the same is cheap.  Make it worth my while.  Or if I'm
an Innovator, or even an Early Adopter, at least give me a hint about how
it's worth my while so I can exploit it while others are too afraid.
&lt;p&gt;
Every needless change creates expensive fragmentation.  Microsoft ruled
their market by being change averse.  So did IBM.  So did Intel.  Even
Apple.  Whenever they forgot this, they stumbled.
&lt;p&gt;
Change aversion works because what makes a platform successful isn't so much
the platform as the complementary products.  For a phone, that means
third-party power adapters, car chargers, headphones with integrated volume
controls, alarm clocks with a connector to charge your phone *and* play your
music at the same time.  For a PC, it could be something as simple as
maintaining the same power supply connector across many years' worth of
models, so that anyone who standardizes on your brand will have an
ever-growing investment in leftover power supplies plugged in wherever they
might want them.  For an operating system, it means keeping the same
approximate style of UI for a long time, so that apps can learn to optimize
for it, and a really great app made two years ago can keep on selling well,
perhaps with bugfixes and new features but no need for rewrites, because it
still looks like it's perfectly integrated into your OS experience.  That
sort of consistency allows developers to focus on quality instead of
flavour, and produces an overall feeling of well-integratedness.  It makes
people feel like when they buy your thing, they're paying for quality.  And
yes, people - moving beyond the innovators into the more profitable market
segments of the curve - will definitely pay for quality.
&lt;p&gt;
Real design genius lies in the ability to make something look pretty, and
with gentle updates to keep it modern looking, without causing huge
disruption to your whole ecosystem every couple of years.  Following fashion
trends, while not caring about disruption, does not require genius at all. 
All it requires is a factory in a third-world country and some photos of
what you want to copy.
&lt;p&gt;
Ironically, even app developers mostly fail to recognize just how bad it is
for them when a platform changes out from under them unnecessarily. 
Instead, they get excited by it.  Finally, I get to rewrite that UI code I
really hated, and while I'm there, I can fix all those interaction bugs I
knew we had but could never justify repairing!  Because now I *have* to
rewrite it!
&lt;p&gt;
Redesigning things to match a moving target of a platform is really
comforting, because it's a ready-made strategy for your company.  The truth
is, you don't have to think about what customers want, or how to make the
workflow smoother, or how to eliminate one more click from that common
operation, or how to fix that really annoying network bug that only happens
1 in 1000 times.  Those bugs are hard; this feels like freedom.  We'll just
dedicate our team to &quot;refreshing&quot; the UI, again, for another few months, and
nobody can complain because it's obviously necessary.  And it is,
obviously, necessary.  Because your platform has screwed you.  Your platform
changed for no reason, and that's why your users can't have what they really
need.  They'll get a UI refresh instead.
&lt;p&gt;
And although they are less productive, they will love it.  Because of
endorphins, or sodium, or whatever.
&lt;p&gt;
And so you will feel good about yourself in the morning.
      </description>
    </item>
    <item>
      <title>2013-06-09</title>
      <pubDate>Sun, 09 Jun 2013 16:08:21 +0000</pubDate>
      <link>http://apenwarr.ca/log/?m=201306#09</link>
      <guid isPermaLink="true">http://apenwarr.ca/log/?m=201306#09</guid>
      <description>You might not realize it, but there's an imminent phone number shortage. 
It's been building up for a while, but the problem has been mitigated by
people using &quot;PBXes&quot;, which basically add a 4-5 digit extension to the end
of your phone number to expand the available range.  The problem with PBXes
is they don't work right with caller id (it makes it look like a bunch of
people near each other all have the same phone number) and you can't easily
direct-dial PBX extensions from a phone's integrated address book, unless
your phone has some kind of special &quot;PBX penetration&quot; technology.  (PBX
penetration is pretty well-understood, but not implemented widely.)
&lt;p&gt;
Even worse: it's no longer possible to route phone calls hierarchically by
the first few digits.  Nowadays any 10-digit U.S. phone number could be
registered anywhere in the U.S. and area codes change all the time.
&lt;p&gt;
So here's my proposal.  Let's fix this once and for all!  We'll double the
number of digits in a Canada/U.S. phone number from 10 to 20.  No, wait,
that might not be enough to do fully hierarchy-based call routing, let's
make it 40 digits.  But that could be too much typing, so instead of using
decimal, we can add a few digits to your phone dialpad and let you use
hexadecimal instead.  Then it should only be 33 digits or so, with the same
numbering capacity as 40 decimal digits! Awesome!
&lt;p&gt;
It'll still be kind of a pain to remember numbers that long, but don't worry
about it, nobody actually dials directly by number anymore.  We have phone
directories for that.  And modern smartphones can just autodial from
hyperlinks on the web or in email.  Or you can send vcards around with NFC
or infrared or QR codes or something.  Okay, those technologies aren't
really perfect and there are a few remaining situations where people
actually rely on the ability to remember and dial phone numbers by hand, but
it really shouldn't be a problem most of the time and I'm sure phone
directory technology will mature, because after all, it has to for my scheme
to work.
&lt;p&gt;
Now, as to deployment.  For a while, we're going to need to run two parallel
phone networks, because old phones won't be able to support the new
numbering scheme, and vice versa.  There's an awful lot of phone software
out there hardcoded to assume its local phone number will be a small number
of digits that are decimal and not hex.  Plus caller ID displays have a
limited number of physical digits they can show.  So at first, every new
phone will be assigned both a short old-style phone number and a longer
new-style phone number.  Eventually all the old phones will be shut down and
we can switch entirely to the new system.  Until then, we'll have to
maintain the old-style phone number compatibility on all devices because
obviously a phone network doesn't make any sense if everybody can't dial
everybody else.
&lt;p&gt;
Actually you only need to keep an old-style number if you want to receive
*incoming* calls.  As you know, not everybody really needs this, so it
shouldn't be a big barrier to adoption.  (Of course, now that I think of it,
if that's true, maybe we can conserve numbers in the existing system by just
not assigning a distinct number to phones that don't care to receive calls. 
And maybe charge extra if you want to be assigned a number.  As a bonus,
people without a routable phone number won't ever have to receive annoying
unsolicited sales calls!)
&lt;p&gt;
For outgoing calls, we can have a &quot;carrier-grade PBX&quot; sort of system that
basically maps from one numbering scheme to the other.  Basically we'll
reserve a special prefix in the new-style number space that you'd dial when
you want to connect to an old-style phone.  And then your new phone won't
need to support the old system, even if not everyone has transitioned yet! 
I mean, unless you want to receive incoming calls.
&lt;p&gt;
...
&lt;p&gt;
Or, you know.  We could just automate connecting through a PBX.
      </description>
    </item>
    <item>
      <title>blip: a tool for seeing your Internet latency</title>
      <pubDate>Fri, 26 Apr 2013 18:20:21 +0000</pubDate>
      <link>http://apenwarr.ca/log/?m=201304#26</link>
      <guid isPermaLink="true">http://apenwarr.ca/log/?m=201304#26</guid>
      <description>&lt;p&gt;
Why is your Internet slow?  It's probably not bandwidth.  Here's a graph of
*your* internet performance, right now:
&lt;p&gt;
&lt;iframe width=600 height=400 style=&quot;border:0&quot; src=&quot;http://gfblip.appspot.com/&quot;&gt;&lt;/iframe&gt;
&lt;p&gt;
(If you're reading this through RSS and your reader doesn't support iframes,
you can visit the app at &lt;a
href=&quot;http://gfblip.appspot.com/&quot;&gt;gfblip.appspot.com&lt;/a&gt;.  Also try it on
your phone or tablet.)
&lt;p&gt;
This real-time latency based measurement is way more accurate than
speedtest.net at predicting your real web browsing performance. 
Although maybe a bit harder to interpret the results.
&lt;p&gt;
For more information, motivation, philosophy, and ranting, read the &lt;a
href=&quot;https://github.com/apenwarr/blip#readme&quot;&gt;README&lt;/a&gt;.
&lt;p&gt;
And it's open source.  Have a nice day.
      </description>
    </item>
    <item>
      <title>In which the White House outshines Canadian Politics</title>
      <pubDate>Mon, 14 Jan 2013 02:15:06 +0000</pubDate>
      <link>http://apenwarr.ca/log/?m=201301#13</link>
      <guid isPermaLink="true">http://apenwarr.ca/log/?m=201301#13</guid>
      <description>&lt;p&gt;
People who are aware of &lt;a href=&quot;http://apenwarr.ca/log/?m=201102#03&quot;&gt;my
political view template&lt;/a&gt; know that I try to follow a simple process, which is to
try to reject low-quality arguments that resort to rhetoric and personal
attacks. The result is I sometimes sound like I'm in favour of some policy
or motion that I actually disagree with (or vice versa) because I tend to
end up arguing about the presentation, and noting the complexity of the
problem, rather than just choosing a side and joining the fray.  Since I
complain that you're being stupid, you assume that I think the opposite
point of view is less stupid, but that's missing the point.
&lt;p&gt;
In short, I want to see politicians (and politically interested citizens)
raising the level of discourse.  Having written off American politics long
ago, I'm still disappointed when Canadians result to &lt;a
href=&quot;http://apenwarr.ca/log/?m=201001&quot;&gt;meaningless sludge&lt;/a&gt; instead of
stopping to understand what's going on.
&lt;p&gt;
So imagine my surprise when I discovered an actual U.S. political web site
with actual facts and opinions and policy statements from an actual
political party, responding to questions from actual citizens in the hope of
raising the level of discourse.
&lt;p&gt;
The web site I'm referring to is the &lt;a
href=&quot;https://petitions.whitehouse.gov/petitions&quot;&gt;whitehouse.gov online
petition system&lt;/a&gt;.  In short, they promise to have some senior policymaker
respond to your petition, no matter how stupid, if you can get at least
25,000 people to online-sign it.  (25,000 is roughly 0.008% of the
population of the United States, so that seems reasonable to me to get the
attention of a high-level executive.)
&lt;p&gt;
Note what they promise: not that they'll change anything, or that the
president itself will read your message, or that the response will be
&lt;i&gt;useful&lt;/i&gt;.  Just that they'll respond, and the response will come from
some actual person that matters.  The content of the response, well, you'll
have to judge that for yourself.
&lt;p&gt;
(This reminds me of the rules for &lt;a
href=&quot;http://www.parl.gc.ca/MarleauMontpetit/DocumentViewer.aspx?Sec=Ch22&amp;Seq=3&amp;Language=E&quot;&gt;petitioning
the Government of Canada&lt;/a&gt;, except doing that only needs 25 signatures
instead of 25,000.  On the other hand, you're only guaranteed your petition
will be &lt;i&gt;read&lt;/i&gt; in parliament, and you probably won't get any response
at all, other than the hope they might be thinking about it.)
&lt;p&gt;
So, how does it turn out?  Well, I read through a few of the responses.
Apparently there are 96 existing responses, which seems like a good number
to me: it means the filter is blocking out the idiotic petitions (and oh
boy, idiotic ones exist) but not just silencing everybody (the total number
of responses is bigger than I want to read).  Moreover, they sometimes
combine multiple related petitions into one response (even if each one has
less than 25,000 votes) and sometimes respond to petitions with less than
25,000 even though they didn't promise to do so.  That tells me real people
are actually reading &lt;i&gt;all&lt;/i&gt; the petitions and looking for input, even
though they don't have to.  Moreover, there are less than 40 petitions open
right now with more than 25,000 votes and no responses.  Since that's less
than half the total responses, that suggests to me that there's simply a
time delay to answer them (which I'd expect), not that they don't take it
seriously.  And I doubt they're just deleting petitions they don't like,
since anything that managed to get 25,000 signatures would obviously
generate a major internet fuss if the signees found it missing.
&lt;p&gt;
So yes, the 25,000 signature threshold works, the accountability works, the
promises are being kept, and there are actual answers up there.
&lt;p&gt;
Are the answers partisan?  Of course, they're written by a political party.
Are they all satisfying?  No, sometimes they just avoid the question and
don't bother to back up their claims, like the &lt;a
href=&quot;https://petitions.whitehouse.gov/response/response-we-people-petition-abolishment-transportation-security-administration&quot;&gt;Transportation
Security Administration&lt;/a&gt; one.  (On the other hand, the petition itself
wasn't so hot either.)
&lt;p&gt;
But what I &lt;i&gt;do&lt;/i&gt; see is a real effort to respond in a way that really
represents what the administration believes.  You might not like the TSA
response, but after reading it, you know exactly what their policy is about
it.  There are also things like the several &lt;a
href=&quot;https://petitions.whitehouse.gov/response/removing-bottlenecks-visa-process&quot;&gt;immigration
reform responses&lt;/a&gt; that are ultra-clear about the policy and beliefs -
while admitting that, well, you kinda came to the wrong place, because the
President isn't the one who sets the immigration policy.
&lt;p&gt;
Even the ones with a &quot;blame the republicans&quot; section, like the &lt;a
href=&quot;https://petitions.whitehouse.gov/response/doubling-and-tripling-what-we-can-accomplish-space&quot;&gt;NASA
funding response&lt;/a&gt;, do it pretty respectfully.  They say &quot;unfortunately,
not everyone is supportive&quot; and explain some problems of the alternative
policy, but they do it with a tone that it encourages you to think about,
and maybe talk to, your representatives to see if you can change their minds. 
They &lt;i&gt;don't&lt;/i&gt; start from the assumption that the alternative viewpoint
is idiotic and the only solution is the vote them the hell out.  I can
respect that.
&lt;p&gt;
Canada should have this (maybe with a different threshold).  The U.S. House
and Senate should have this, or at least the Democrats and the Republicans. 
You know what would be cool?  If every party, not just the one in power,
submitted a response to every petition that got 25,000 votes, to make their
position clear, and we could read them side by side and decide what we
believe.  And if they could refrain from personal attacks and stick to the
issues, like the current site does, and campaigns and TV debates generally
don't.
&lt;p&gt;
That would be progress.      </description>
    </item>
    <item>
      <title>3D Printing</title>
      <pubDate>Sat, 29 Dec 2012 11:29:01 +0000</pubDate>
      <link>http://apenwarr.ca/log/?m=201212#29</link>
      <guid isPermaLink="true">http://apenwarr.ca/log/?m=201212#29</guid>
      <description>&lt;p&gt;
My first 3d-printed creation (and my &lt;a href=&quot;http://www.3dtin.com/r12w&quot;&gt;3d
model&lt;/a&gt; that I printed it from).  The photo below is three printings of
the same design at different sizes:
&lt;p&gt;
&lt;img src=&quot;http://apenwarr.ca/diary/cars.jpg&quot;&gt;
&lt;p&gt;
The entire car prints, bottom to top, as a single run, and yes, those wheels
actually turn.  Each two wheel + axle combination is a single solid object,
and the frame between the two actually has closed loops around the axles. 
So we have the magical-seeming trick of passing the solid axles through the
loops - without needing any welding or gluing after the fact.  Print it,
unstick it, and roll it off the platform.
&lt;p&gt;
Playing with this has really connected a few physical-world concepts that
didn't click for me before.  For example, measurement tolerances are
absolutes, not percentages.  The printer I used is accurate to about 0.1mm. 
Previously, I had never cared about any distances less than a millimeter
(&quot;If it ain't on my ruler, it don't exist&quot;) but at this scale (the smallest
car is only 1cm tall), it matters.  To make the loops that wrap around the
axle so they don't blur into the axle itself, I had to leave quite a bit of
extra space.  This produced a tight fit for the smallest car, but when we
naively scale up the design linearly (the big one is about 3cm tall) that
excess space leaves the axles pretty floppy.  (Or we can call it 4-wheel
steering, and then it's a feature.)
&lt;p&gt;
The other scaling-related lesson comes back to that &lt;a
href=&quot;http://online.wsj.com/article/SB10001424052970204552304577112522982505222.html&quot;
&gt;old interview question&lt;/a&gt;: if somehow you were shrunk down to the size of
a coin and put inside a blender, how would you escape?
&lt;p&gt;
The answer is that you'd jump out.  Why?  Because your body's mass scales
down with n^3, but the strength in your muscles scales down much more
slowly - let's say n^2. Relative to your size, you'd have super strength. That's
why grasshoppers can easily leap to 100x their height, but nothing the size
of a human can do so.
&lt;p&gt;
For the same reason, when you drop my big car on the floor, a wheel tends to
break off.  When you drop the little one on the floor, it stays intact. 
Why?  Because the mass (and thus the force with which it hits the floor) has
scaled up by 3^3 = 27, but the tensile strength is only about 3x higher. 
That's enough to break the rather weak connection between axle and wheel. 
That could presumably be fixed by better engineering, but I'm not really
That Kind of Engineer.
&lt;p&gt;
In short, 3D printing is fun.  But really it just makes me that much more
impatient for nanobots.  Ooh, micron-scale accuracy and toy cars made of
diamonds!
      </description>
    </item>
    <item>
      <title>Programming inside the URL string</title>
      <pubDate>Sun, 16 Dec 2012 04:46:12 +0000</pubDate>
      <link>http://apenwarr.ca/log/?m=201212#18</link>
      <guid isPermaLink="true">http://apenwarr.ca/log/?m=201212#18</guid>
      <description>&lt;p&gt;
&lt;a href=&quot;http://afterquery.appspot.com/help&quot;&gt;Afterquery&lt;/a&gt; is hard to
explain to people, possibly because it actually combines several pretty
unusual concepts.  A single unusual concept is bad enough, but several
at once is likely to just leave you scratching your head.  With that in
mind, here's just one unusual concept: a programming language designed
for URLs.  Not a language for manipulating URLs; the language *is* the
URL.
&lt;p&gt;
In afterquery, if I write this:&lt;pre&gt;
  http://afterquery.appspot.com?url=example.json&amp;group=a,b,c&amp;group=a,b
&lt;/pre&gt;
&lt;p&gt;
Then (assuming a..d are strings and e is numeric) it's about
equivalent to this SQL:&lt;pre&gt;
  select a, b, count(c) as c, sum(d) as d, sum(e) as e
  from (
    select a, b, c, count(d) as d, sum(e) as e
    from example
    group by a, b, c
  )
  group by a, b
&lt;/pre&gt;
&lt;p&gt;
This gives, for each combination of a and b, the number of distinct
values of c, the number of distinct combinations of (c,d), and the sum
of e - each a slightly different useful aggregate.
&lt;p&gt;
Some people have used SQL for so long now that they don't remember
anymore exactly how redundant the language is.  The above SQL mentions
'e' 4 times!  The afterquery code doesn't mention it at all; if you say
you want to group by a, b, and c, then the assumption is that e (one of
the columns in the initial dataset) is an aggregate value field and
doesn't need to be mentioned unless you want an unusual aggregate. 
(The default aggregate is count() for non-numeric fields, and sum() for
numeric fields.)
&lt;p&gt;
The sequence of clauses in SQL is also problematic because it's both
arbitrary and restrictive.  It doesn't reflect the order of operations;
in reality, among other things, &quot;from&quot; obviously comes before &quot;select.&quot;
(Incidentally, LINQ in C# puts &quot;from&quot; first, for that reason.) Worse,
&quot;group by&quot; is highly related to the select operation - whether or not
something must be aggregated depends on whether it's in the &quot;group by&quot;
clause or not, and yet &quot;group by&quot; is way down in the query while
&quot;select&quot; is at the top.  And worst of all, the entire &quot;group by&quot; clause
is actually redundant: you could calculate the &quot;group by&quot;
clause entirely by looking at which fields in the select contain
aggregation functions and which don't.  You know this is true, because
if your select clause doesn't use aggregation functions for exactly the
right fields (no more, and no less), then your query will abort with an
error.
&lt;p&gt;
There's a lot of danger in trying to make a programming language that reads
too much like English.  Maybe it can be done, but you have to be tasteful
about it.  SQL is not tasteful; it's designed with the same mindset that
produced COBOL.  (The joke is that an object oriented COBOL would be called
&quot;ADD 1 TO COBOL GIVING COBOL&quot;, which is the COBOL equivalent of C++ or [incr
tcl].  That line is the actual code for incrementing a number in COBOL. 
Compare with &quot;SELECT sql+1 FROM sql WHERE sql=0 GROUP BY sql&quot;.)
&lt;p&gt;
Still though, despite SQL's insulting repetitiveness, it has one even
more depressing &lt;i&gt;advantage:&lt;/i&gt; it's still the most concise way to
express a database query of moderate complexity.  C# LINQ is the
closest thing we have to a competitor - it was specifically designed to
try to replace SQL because coding database queries in an imperative
language is too messy - and our above nested grouping looks something
like this (based on their &lt;a
href=&quot;http://code.msdn.microsoft.com/101-LINQ-Samples-3fb9811b&quot;&gt;sample
code&lt;/a&gt; - I haven't used LINQ in a while and haven't tested it):&lt;pre&gt;
  var tmp = 
    from x in example
    group x by x.a, x.b, x.c into g
    select new {
      a = g.Key[0], b = g.Key[1], c = g.Key[2],
      d = g.Count(p =&gt; p.d),
      e = g.Sum(p =&gt; p.e)
    };
  var result =
    from r in tmp
    group r by r.a, r.b into g
    select new {
      a = g.Key[0], b = g.Key[1],
      c = g.Count(p =&gt; p.c);
      d = g.Sum(p =&gt; p.d);
      e = g.Sum(p =&gt; p.e);
    };
&lt;/pre&gt;
&lt;p&gt;
That mentions 'e' 4 times, like the SQL does, but also introduces new
temporary variables x, g, r, and p.  g is itself a magical complex
object that includes a &quot;Key&quot; member we have to use, among other things. 
Now, LINQ is also much more powerful than SQL (you can use it for
things other than databases) and in many cases it can translate itself
into SQL (so it can query databases efficiently even though you wrote
it imperatively), so it has redeeming features.  But it definitely
hasn't shortened our basic SQL query.
&lt;p&gt;
There are also ORMs (Object-Relational Mappers) out there, like
ActiveRecord for example, which can be more concise than plain SQL. 
But they can't represent complicated concepts all the way through to
the database.  Generally you end up downloading either all the data and
filtering it client-side, or one record at a time, leading to high
latency, or splitting into two operations, one to get a list of keys,
and another to fetch a bunch of keys in parallel.  A proper &quot;query
language&quot; like SQL doesn't require that kind of hand optimizing.
&lt;p&gt;
Somehow SQL has held on, since 1974, as still the best way to do that
particular thing it does.  You've got to give them some credit for that.
&lt;p&gt;
&lt;b&gt;Imperative or not?&lt;/b&gt;
&lt;p&gt;
SQL is a programming language, although a restricted one.  In my mind,
I like to designate programming languages as one of three types:
imperative, functional, or declarative.  If you read the official
definitions, functional languages are technically a subset of
declarative ones, but I usually find those definitions to be more
misleading than useful.  HTML is a declarative language; LISP is a
functional language.  They're different, even if they do share
underlying mathematical concepts.
&lt;p&gt;
Other declarative non-functional languages include CSS, XML, XQuery,
JSON, regular expressions, Prolog, Excel formulas, and SQL.  We can
observe that declarative languages are a pretty weird bunch.  They tend
to share a few attributes: that it's hard to predict what the CPU will
actually do (so performance depends on external knowledge of how a
particular interpreter works), that expressing data works well and
expressing commands works badly (I'm talking to you, ANT and MSBuild),
and that explicit conditionals always look funny if they're supported
at all.
&lt;p&gt;
These problems are very clear in the case of SQL, after using it for
only a little while.  The hardest part of SQL to understand is its
interaction with the so-called &quot;query optimizer&quot; which decides what
database indexes to use, and most importantly, when to use an index and
when to use a full table scan.  The person writing a SQL query will
generally have a really clear idea when a table scan is appropriate
(that is, usually for small amounts of data) and when it isn't, but
there's no way in SQL to express that; you just have to trust the
optimizer.  It'll generally work, but sometimes it'll go crazy, and
you'll have no idea what just happened to make your query go 100x
slower.  SQL suffers from a definition of &quot;correctness&quot; that doesn't
include performance.
&lt;p&gt;
Declarative (and functional) languages also share a major advantage
over imperative languages, which is that it's easy to manipulate and
rearrange the program without losing its meaning, exactly because the
implementation is left unspecified.  For example, you should be able to
convert an arbitrary SQL query to a map/reduce operation.  Declarative
and functional languages make things like parallelism easier to
implement and reason about.  (The &quot;map&quot; and &quot;reduce&quot; operations in
map/reduce come from functional programming, of course.)
&lt;p&gt;
Let's look at afterquery again.  The above afterquery, which matches
the functionality of the above SQL query, can be broken down like
this:&lt;pre&gt;
  url=example.json
  group=a,b,c
  group=a,b
&lt;/pre&gt;
&lt;p&gt;
Is it imperative, declarative, or functional?  I've been thinking about
it for a couple of weeks, and I don't really know.  The fact that it
can be mapped directly to a (declarative) SQL query suggests its
declarative nature.  But having written the implementation, I also know
that how it works is very clearly imperative.  It ends up translating
the query to almost exactly this in javascript:&lt;pre&gt;
  grid = fetchurl('example.json')
  grid = groupby(grid, ['a', 'b', 'c']);
  grid = groupby(grid, ['a', 'b']);
&lt;/pre&gt;
&lt;p&gt;
That is, it's applying a series of changes to a single data grid
(global shared state - the only state) in memory.  You might notice
that all the above imperative commands, though, have the same
structure, so you could write the same thing functionally:&lt;pre&gt;
  groupby(
    groupby(
      fetchurl('example.json'),
      ['a', 'b', 'c']),
    ['a', 'b'])
&lt;/pre&gt;
&lt;p&gt;
Most imperative programs cannot be so easily mapped directly onto pure
functional notation.  So that leads me to my current theory, which is
that afterquery really is an imperative language, but it happens to be
one so weak and helpless that it can't express complex concepts (like
loops) that would make it incompatible with functional/declarative
interpretation.  It's not turing-complete.
&lt;p&gt;
Nevertheless, the imperative-&lt;i&gt;looking&lt;/i&gt; representation makes it easier
to write and debug queries, and to estimate their performance, than
declarative-looking SQL.  In theory, a sequence of groups and pivots
could be rearranged by a query optimizer to run faster or in parallel,
but in practice, afterquery's goal of working on small amounts of data
(unlike SQL, which is intended to run on large amounts) makes an
optimizer pretty unnecessary, so we can just execute the program as a
series of transformations in the order they're given.
&lt;p&gt;
&lt;b&gt;An imperative language in the URL string&lt;/b&gt;
&lt;p&gt;
Traditionally, URLs are about as declarative as things come.  At one
level, they are just opaque string parameters to one of a very few
functions (GET, POST, PUT, etc), some of which take an even bigger
opaque string (the POST data) as a second parameter.
&lt;p&gt;
One level deeper, we know that URL strings contain certain well-known
traditional components: protocol, hostname, path, query string
(?whatever), anchor string (#whatever).  Inside the query string (and
sometimes the anchor string), we have key=value pairs separated by
&amp;amp;, with special characters in the values traditionally encoded in a
particular way (%-encoding).  HTTP specifies the components, but it
doesn't have to say anything about the structure of the query string,
its key=value pairs, the &amp;amp; signs, or its %-encoding.
&lt;p&gt;
Afterquery uses the same key=value pairs as any query string, but while most
apps treat them as a declarative dictionary of essentially unordered
key=value pairs - with the only ordering being multiple instances of the
same key - afterquery also depends on the ordering of keys. 
&lt;tt&gt;&amp;amp;group=a,b,c&amp;amp;pivot=a,b;c&lt;/tt&gt; is a totally different command
from the other way around.
&lt;p&gt;
Another huge constraint on a URL is its length: there is no predefined
maximum length, but many browsers limit it, maybe to 1024 characters or so. 
Thus, if you want to keep the program stateless (no state stored between
executions, and no programs stored on the server), it's important to keep
things concise, so you can say what you need to say inside a single URL. 
&lt;p&gt;
Luckily, sometimes the best art comes from constraints!  Where perl-style
punctuation-happy languages with lots of implicit arguments are nowadays
unfashionable, we have no choice but to adopt them anyway if we want things
to fit inside a URL.  Our example afterquery is actually equivalent to:&lt;pre&gt;
  url=http://afterquery.appspot.com/example.json
  group=a,b,c;count(d),sum(e)
  group=a,b;count(c),sum(d),sum(e)
&lt;/pre&gt;
&lt;p&gt;
The URL has a default path based on the script location (as URLs always
do), and the semicolon in group= statements separates the keys from the
values.  That leads to the very confusing at first, but very convenient
thereafter, distinction between&lt;pre&gt;
  group=a,b,c
&lt;/pre&gt;
&lt;p&gt;
and&lt;pre&gt;
  group=a,b,c;
&lt;/pre&gt;
&lt;p&gt;
Which mean very different things: the first means &quot;keep all the other
columns, and guess how to aggregate their values&quot; while the second
means &quot;throw away all the other columns.&quot;
&lt;p&gt;
Like a regex, it's totally unfriendly to an initial observer, where an
SQL statement (or COBOL program) might make a beginner feel comfortable
that they can read what's going on.  But my theory is: once you've got
the idea, SQL is just tedious, but afterquery is more fun.
&lt;p&gt;
And unlike SQL, a nontrivial program will fit in an URL string.
      </description>
    </item>
    <item>
      <title>Let's think about *small* data for a change</title>
      <pubDate>Tue, 11 Dec 2012 01:54:25 +0000</pubDate>
      <link>http://apenwarr.ca/log/?m=201212#11</link>
      <guid isPermaLink="true">http://apenwarr.ca/log/?m=201212#11</guid>
      <description>&lt;p&gt;
There's a lot of buzz lately about &quot;big data&quot; - huge, Internet scale
databases that take a long time to query, but are awesome, I guess,
because of how much testosterone you need to have in order to possess
one.
&lt;p&gt;
Let's leave aside the question of exactly how big big data needs to be. 
I've heard of people talking about databases as small as a gigabyte as
being &quot;big,&quot; I guess because downloading it would take a few minutes,
and maybe 'grep' isn't the best way to query it.  Other people would
say a terabyte is big, or a petabyte.
&lt;p&gt;
I don't really care.  All I know is, beyond a certain threshold that
depends on the current state of computer technology, as soon as your
data is &quot;big,&quot; queries are slow.  Maybe a few seconds, which isn't so
bad, but maybe a few *minutes*.  And when things get slow, you start
having to mess with having separate &quot;data warehouse&quot; servers so that
doing your big analytical queries don't bring down your whole database. 
And managing it all becomes a full time job for someone or many people
or a whole company.
&lt;p&gt;
I happen to work for an employer that does that sort of thing a lot. 
And to be honest, I find it pretty boring.  Perhaps I have a
testosterone deficiency.  It's not so much the bigness that bothers me:
it's the waiting.  I like my compile-test-debug cycle to be on the
order of two seconds, but when SQL or mapreduce gets involved, it's more
like two minutes.&lt;sup&gt;1&lt;/sup&gt;
&lt;p&gt;
I know, cry for me, right?  Two minutes of my life, all gone.  But
seriously, when you're trying to find trends and aren't quite sure what
you're looking for and it takes a dozen tries, those two minutes can add
up rapidly, especially when added to the 10 minutes of web browsing or
email that inevitably ensues once I get bored waiting for the two
minutes.
&lt;p&gt;
After worrying about this problem for a long time (years now, I guess),
I think I've come up with a decent workaround.  The trick is to divide
your queries into multiple stages.  At each stage, we reduce the total
amount of data by a few orders of magnitude, and thus greatly decrease
the cost of debugging a complex query.
&lt;p&gt;
Originally, I might have tried to make a single SQL query that goes
from, say, a terabyte of data, down to 10 rows, and made a bar chart. 
Which 10 rows?  Well, it takes a few tries to figure that out, or maybe
a few hundred tries.  But it's much faster if we set the initial goal
instead to, say, only pull out the gigabyte I need from that terabyte. 
And then I can query a gigabyte instead.  From there, I can reduce it
to a megabyte, perhaps, which is easy enough to process in RAM without
any kind of index or complexity or optimization.
&lt;p&gt;
That last part is what I want to talk to you about, because I've been
working on a tool to do it.  I call it afterquery, tagline: &quot;The real
fun starts after the Serious Analysts have gone home.&quot;  Serious people
write mapreduces and SQL queries.  Serious people hire statisticians. 
Serious people have so much data that asking questions about the data
requires a coffee break, but they get paid so much they don't have to
care.
&lt;p&gt;
Afterquery is the opposite of Serious.  It downloads the whole dataset
into RAM on your web browser and processes it in javascript.
&lt;p&gt;
But what it lacks in seriousness, it makes up for in quick turnaround. 
Here's what it produces from 1582 rows of data I got from Statistics
Canada:&lt;sup&gt;2&lt;/sup&gt;
&lt;p&gt;
&lt;iframe style=&quot;width:600px; height:300px; border:0px;&quot;
src=&quot;http://afterquery.appspot.com/?url=http://apenwarr.ca/diary/canada-naics.json&amp;filter=Value!=x&amp;filter=Value%3E0&amp;filter=NAICS2%3E&amp;group=NAICS1;sum(Value)&amp;order=-Value&amp;chart=pie,title=Canada%202011%20GDP&quot;
&gt;&lt;/iframe&gt;
&lt;p&gt;
Those 1582 rows are the breakdown of Canada's 2011 GDP by province and
&lt;a href=&quot;http://www.census.gov/eos/www/naics/&quot;&gt;NAICS&lt;/a&gt; (industry
type) code.  I used Afterquery to sum up the values across all
provinces and make a pie chart by NAICS category group.  Could I have
just queried Statscan and gotten those summarized results in, say, 15
rows instead of 1582?  Sure.  But their web site is a bit clunky, and I
have to redownload a CSV file every time, and it doesn't produce pretty
graphs.  1582 rows is no problem to grunge through in javascript
nowadays.  It even works fine on my phone.
&lt;p&gt;
The real power, though, is that you can take the same data and
present it in many ways.  Here's another chart using exactly the same
input data:
&lt;p&gt;
&lt;iframe style=&quot;width:600px; height:300px; border:0px;&quot;
src=&quot;http://afterquery.appspot.com/?url=http://apenwarr.ca/diary/canada-naics.json&amp;filter=Value!=x&amp;filter=Value%3E0&amp;filter=NAICS2%3E&amp;filter=NAICS1=Health%20care%20and%20social%20assistance&amp;pivot=GEO;NAICS2;sum(Value)&amp;chart=column,title=Healthcare%20Spending&quot;
&gt;&lt;/iframe&gt;
&lt;p&gt;
And here's a treemap chart.  Treemaps are generally pretty hard to
produce, because of the weird way you have to format the data.  But
afterquery helps reformat &quot;normal&quot; data into helpful treemap-compatible
data.  I really like this visualization; see if you can find who spends
the most on mining, or who spends proportionally more on agriculture.
&lt;p&gt;
&lt;iframe style=&quot;width:800px; height:500px; border:0px;&quot;
src=&quot;http://afterquery.appspot.com/?url=http://apenwarr.ca/diary/canada-naics.json&amp;filter=Value!=x&amp;filter=Value%3E0&amp;filter=NAICS2%3E&amp;treegroup=GEO,NAICS1,NAICS2;sum(Value),color(NAICS1)&amp;chart=tree,maxDepth=2,maxPostDepth=0&quot;
&gt;&lt;/iframe&gt;
&lt;p&gt;
(Tip: click on one of the boxes in the treemap for a more detailed
view.  You can use that to find out what specific *kinds* of mining. 
Or click 'Edit' in the top-right of the chart box to customize the
chart.  For example, remove &quot;GEO,&quot; from the treegroup to see overall
results for Canada rather than by province.)
&lt;p&gt;
&lt;b&gt;Some neat things about afterquery&lt;/b&gt;
&lt;p&gt;
&lt;ul&gt;
&lt;p&gt;
&lt;li&gt;It's completely stateless, configured entirely using URL parameters.
&lt;p&gt;
&lt;li&gt;It just grabs a file using jsonp and applies the given transforms.
&lt;p&gt;
&lt;li&gt;It can show either data tables or charts, including &lt;a href=&quot;https://developers.google.com/chart/interactive/docs/gallery&quot;&gt;gviz charts&lt;/a&gt;, timelines, and &lt;a href=&quot;http://dygraphs.com/&quot;&gt;dygraphs&lt;/a&gt;.
&lt;p&gt;
&lt;li&gt;You can do multiple layers of &quot;group by&quot; operations very quickly. 
How many times have you wanted to, say, produce two columns: the total
number of purchases, and the number of customers making purchases?  To
do that in SQL you'd use subqueries, perhaps, like this:&lt;pre&gt;
   select region, count(customer), sum(nsales), sum(value)
     from (
       select region, customer, count(*) nsales, sum(value)
         from sales
         group by region, customer
     )
     group by region
&lt;/pre&gt;
&lt;p&gt;
With afterquery you just write two successive &quot;group&quot; transformations:&lt;pre&gt;
   &amp;amp;group=region,customer;count(*),sum(value)
   &amp;amp;group=region
&lt;/pre&gt;
and it does the rest.
&lt;p&gt;
&lt;li&gt;Anything that can be &quot;group by&quot; can be treegrouped, allowing you to
easily draw a treemap chart.
&lt;p&gt;
&lt;li&gt;Because we don't care about performance optimization - optimization
is for Serious Analysts!  - we can make tradeoffs for consistency
instead.  So you can apply transformations, filters, limits, and
groupings in whatever order you want (instead of SQL's really strict
sequencing requirements).  And 'order by' operations aren't scrambled
by doing 'group by', and we can have handy aggregate functions like
first(x) (the first matching x in each group), last(x) (last matching x
in each group), and cat(x) (concatenation of all the x in each group). 
You can imagine that 'order by x' followed by 'first(x)' could be a
very useful combination.
&lt;p&gt;
&lt;li&gt;A simple &quot;&lt;a
href=&quot;http://en.wikipedia.org/wiki/Pivot_table&quot;&gt;pivot&lt;/a&gt;&quot;
operation that lets you turn multi-level groups
into column headers, allowing you to easily produce interesting
multi-line or multi-bar charts, or seriously dense (but useful) grids
of numbers.
&lt;p&gt;
&lt;li&gt;Because it's stateless, every afterquery chart is user-editable; just
click the 'Edit' link in the upper-right of any chart or table, adjust
the transformations, and see the new result immediately.  Then
cut-and-paste the URL to someone in IM or email, or drop it in an
iframe on a web page, and you're done.
&lt;p&gt;
&lt;li&gt;Because it runs in your web browser and uses jsonp, it uses your
existing browser login credentials and cookies to get access to your
data.&lt;sup&gt;3&lt;/sup&gt;  And because it's purely client side javascript, your data never
gets uploaded to a server.
&lt;/ul&gt;
&lt;p&gt;
Are we having fun yet?  You can find afterquery &lt;a
href=&quot;https://github.com/apenwarr/afterquery&quot;&gt;on github&lt;/a&gt; or &lt;a
href=&quot;http://code.google.com/p/afterquery&quot;&gt;on Google Code&lt;/a&gt;.  For
documentation, look at its &lt;a
href=&quot;http://afterquery.appspot.com/help&quot;&gt;built-in help page&lt;/a&gt;.
&lt;p&gt;
Contributions or suggestions are welcome.  You can join the brand new
mailing list: afterquery on googlegroups.com.
&lt;p&gt;
&lt;b&gt;Footnotes&lt;/b&gt;
&lt;p&gt;
&lt;sup&gt;1&lt;/sup&gt; Of course if you're really processing a lot of data, a
data warehouse query can be much more than two minutes, but generally
it's possible to find a subset to debug your query with that takes less
time.  Once the debugging is done, you can just run the big query and
it doesn't really matter how long it takes.
&lt;p&gt;
&lt;sup&gt;2&lt;/sup&gt; Those 1582 rows are the output of Serious Analysts down at
Statistics Canada, who have processed all the numbers from all the tax
returns from all the companies in Canada, every year for decades. 
That's Big Data, all right.
&lt;p&gt;
&lt;sup&gt;3&lt;/sup&gt; Of course, using jsonp creates security concerns of its own.  If
I have time, I plan to write about those in another post because I believe
it's possible to work around those problems and have the best of all
worlds... but it's a little tricky.  Meanwhile, grep the afterquery source
for &quot;oauth&quot; :)
      </description>
    </item>
    <item>
      <title>Preview: Chicken Inflation</title>
      <pubDate>Mon, 10 Dec 2012 12:04:12 +0000</pubDate>
      <link>http://apenwarr.ca/log/?m=201212#10</link>
      <guid isPermaLink="true">http://apenwarr.ca/log/?m=201212#10</guid>
      <description>&lt;p&gt;
&lt;iframe style=&quot;width:500px; height:300px; border:0px;&quot;
 src=&quot;http://afterquery.appspot.com/?url=http://apenwarr.ca/diary/canada-cpi.json&amp;filter=Ref_Date%3E=1950-01&amp;filter=GEO=Canada&amp;filter=COMM=All-items%20CPI,Fresh%20or%20frozen%20chicken,Other%20dairy%20products&amp;editlink=0&amp;pivot=Ref_Date;COMM;Value&amp;order=Ref_Date&amp;chart=line&quot;
 &gt;&lt;/iframe&gt;
&lt;p&gt;
&lt;sup&gt;*&lt;/sup&gt; Source: &lt;a
href=&quot;http://www.statcan.gc.ca/tables-tableaux/sum-som/l01/cst01/econ09a-eng.htm&quot;&gt;Statistics
Canada&lt;/a&gt;
      </description>
    </item>
    <item>
      <title>Content-Centric Networking (CCN) as an alternative to IP</title>
      <pubDate>Wed, 14 Nov 2012 06:30:25 +0000</pubDate>
      <link>http://apenwarr.ca/log/?m=201211#11</link>
      <guid isPermaLink="true">http://apenwarr.ca/log/?m=201211#11</guid>
      <description>&lt;p&gt;
I've been meaning to read the &lt;a
href=&quot;http://conferences.sigcomm.org/co-next/2009/papers/Jacobson.pdf&quot;&gt;CCN
article from Van Jacobson and friends&lt;/a&gt; for months now, but I finally got
around to it on this lazy Sunday.  It starts with mostly synthesis of other
research in the area of content-named delivery, but has a few major
innovations that make it very interesting.  In particular, it is designed
around the same core philosophy as IP: mainly statelessness, automatic
congestion control, and routing tables based on prefixes (though in this
case the prefixes are names rather than addresses).  The routing tables go
so far that they even suggest using (unmodified!) OSPF for discovering
topology.  (I'm not convinced that part would actually scale, but maybe.)
&lt;p&gt;
The basic architecture is that a client sends out an &quot;Interest&quot; packet to
register its interest in a particular named bit of data.  Your (content)
router receives that and forwards it to one or more onward routers, based on
its (content) routing table (aka &quot;Forwarding Information Base&quot;), and so on,
recursively.  Finally the Interest reaches the end provider of the data, who
sends back a response.  So far, so simple.  But the key innovation is what
happens when packets are dropped or duplicated.  If you get the same
Interest from more than one source, you only forward it the first time; if
you get the Data response more than once, you only forward it once (well, to
each existing Interest) and then throw away the Interest.
&lt;p&gt;
That sounds simple for the same reason that IP itself is deceptively simple. 
But as with IP, the end result of this simplicity seems like magic.  First
of all, it eliminates the problem of &quot;routing loops&quot;; if router X is
configured to send Interests upstream to Y and Z, and it receives the same
Interest from both A and B, then it will only upstream that Interest exactly
once (to Y, or Z, or both, depending how its routing table is set up).  So
in that example, if A, B, Y, and Z are actually peers, you don't have to
worry about Y and Z looping Interests back to A and B.  That is, they *will*
forward to A and B, but since A and B have already seen that Interest, the
buck stops there.
&lt;p&gt;
Secondly, if nobody on the whole network provides a given bit of data, it's
no big deal; nobody will answer it.  Routers will keep the Interest in their
tables for a while (so they know who to forward the Data response back to if
it shows up), but they won't send a response, because there is no response. 
As in IP, you can't automatically know the difference between &quot;nobody is
there&quot; and &quot;packet got dropped.&quot;
&lt;p&gt;
This, in turn, is how flow control / congestion control works.  Routers
don't ever resend Interest packets on their own; that's the job of the
client endpoint.  (Retransmits include a &quot;nonce&quot; - just a random number - to
ensure that they don't get eaten by the anti-routing-loop logic above.) So
as with IP, an overloaded router can simply drop excess Interest packets to
slow down network activity.  And just like with TCP, a client endpoint can
create a &quot;sliding window&quot; implemented by having multiple outstanding
Interest packets at once.  How many at a time?  That's your window size, and
it depends on observed retransmit characteristics.
&lt;p&gt;
Apparently it's actually better than TCP's retransmit characteristics in the
sense that the flow control is on a link-by-link basis instead of
end-to-end.  I don't really know how or why that's better; there's a note in
the doc that says &quot;We will cover this topic in detail in a future paper,&quot;
which I hope is not equivalent to a &quot;remarkable proof of this theorem which
this margin is too small to contain.&quot;  I haven't looked - maybe they
published it already.  For now, I'll take their word for it.
&lt;p&gt;
The design also mostly-transparently deals with support for multiple network
interfaces - like a wired, wifi, and 3G network all at once - by allowing
you to just forward all your Interest packets on all the interfaces if you
want.  Then, using techniques similar to ethernet bridging, you adjust your
routing table priorities based on latency/speed/loss of responses received.
You occasionally do an experiment by requesting an answer from a supposedly
non-optimal interface just in case things have changed; if you don't get an
answer back on the supposedly optimal interface, maybe it's dead, so you can
failover right away.
&lt;p&gt;
Now, we all know failover is nice for content-named networks (along with
caching, that's basically the whole point).  But this is where things get
really fun.  You can carry a content-named network over IP network sessions;
in fact, that's how their prototype was built.  But what if you could carry
IP network sessions over a content-named network?
&lt;p&gt;
Well, according to &lt;a
href=&quot;http://www.parc.com/content/attachments/voccn-voice-over-content.pdf&quot;&gt;a
later paper by the same team about VoCCN (Voice Over CCN)&lt;/a&gt;, you can!  And
the way it works falls out naturally from the design.  You just send out a
*window* of Interest packets for stuff that doesn't yet exist.  Initially,
you get no response.  But the unanswered Interest packets are remembered by
the router nodes for a short time, so as the live data is created by the
server endpoint, it just responds to the outstanding Interests and the data
gets distributed back to the original client(s).
&lt;p&gt;
The reverse path, data sent from the client to the server, is encoded by
registering Interests in which the last segment of the path is the actual
data you want to send.  The server can then send a small Data
(acknowledgement) packet to state that it was received and doesn't need to
be retransmitted.
&lt;p&gt;
They point out that one neat side effect of all this is if your client is
multi-homed and one of its network links drops out or moves - say if you
move from one wifi network to another, or from wifi to wired - then you can
still recover, because the intermediate content routers have actually cached
the data you would have lost by hopping networks, and you don't have to
&quot;reconnect&quot; because your connection was always about the content, not the
endpoint.
&lt;p&gt;
Internet-wide multicast-like behaviour - with scalable retransmits!! - is an
inherent part of this design.  Want to send something to more than one
client with minimal load?  Don't do anything special.  Just respond to
Interests.  If there's more than one Interest for a given bit of data, any
content router along the way can receive just one copy and fan it out to all
the other recipients.
&lt;p&gt;
Symmetrically (but separately), you can use multicast or broadcast on a LAN
to send out Interests, so if someone nearby has already seen what you're
asking about, they can send it to you.  Your local content router(s) could
also choose to multicast Data response packets on the LAN if more than one
local machine has expressed Interest in it, using whatever heuristics or
conventions you want to use.  Clients and routers that receive unsolicited
Data packets just ignore them.
&lt;p&gt;
Finally, note that, unlike many recent trends in content storage, content is
not named directly after its hash.  As they point out, the problem with that
method is you need a separate name-to-hash conversion layer, and that a
client can't request data which doesn't exist yet - which prevents web-style
dynamic page generation (based on query parameters) as well as disallowing
any kind of two-way communication channel like they created in VoCCP. 
Hash-based naming also ends up necessitating things like DHTs for routing,
instead of allowing the prefix-based routing they recommend.  I have to say,
prefix-based routing does have a lot of appeal to me, after having
considered hash tables and DHTs pretty extensively.  The problem with hash
tables is you lose all locality of reference, so you end up with related
data (for example, consecutive blocks of a big file, or several files in one
directory) scattered all over the place instead of (if your cache is written
carefully) stored consecutively on disk.
&lt;p&gt;
On the other hand, naming your content after its hash makes verification
super easy; if you request block 234283123, and the hash of the received
block is not 234283123, then you reject it.  CCN, by comparison, if I
understand their paper correctly, has on the order of 325 bytes of security
crap for every 1024 bytes of content.  (32% overhead?!) Maybe I'm reading it
wrong, but I suspect not; the security section of their paper seems to be a
complicated multi-layer abomination, leading me to suspect it was either an
afterthought, or designed by someone totally different from the people who
designed the core of CCN.
&lt;p&gt;
So the security part is definitely going to need more work, but I think it's
not unresolvable, for the same reason that security over IP (itself not
secure) is resolvable (by adding SSL).  For example, with something like
VoCCP, you could just rely on the datastream to itself be encrypted, using
(literally) SSL, and leave it out of the transport layer altogether.  That
leaves you with a problem where an attacker could insert fake data into your
SSL stream, which would be rejected by SSL, probably aborting the connection
and leading to a denial of service.  That's already possible with TCP, and
hasn't been a huge problem, although it would be even easier to create this
kind of attack in the case of a network that has lots of untrusted caches. 
(That is, you wouldn't need to be a man in the middle.  Or if you prefer,
there are more men in the middle.)
&lt;p&gt;
Anyway, there are lots of really great ideas in these two papers.  The best
part is the core concepts - datagram-based Interests and Data packets - that
can be applied separately from the other parts (routing protocols, security
protocols, tunneling protocols).  Again, just like with IP.
&lt;p&gt;
I think something like CCN really may be the future of networking.  It will
take a bit of work first though. :)
&lt;p&gt;
&lt;b&gt;Addendum: Replace IP? Really?&lt;/b&gt;
&lt;p&gt;
Yes.  This is not as crazy as it sounds, although of course it would take a
long time to complete.  You might recall that I think &lt;a
href=&quot;http://apenwarr.ca/log/?m=201103#28&quot;&gt;IPv6 replacing IP as the core of
the Internet is rather unlikely&lt;/a&gt;, but that's because of chicken-and-egg
problems.  IPv6 won't provide any utility to the Internet until *all* the
clients and *all* the servers have switched; until then, it's just twice as
much work for everyone.  Yes, you can tunnel IPv6 in IPv4, and vice versa,
but if your personal workstation *and* the server aren't both speaking IPv6,
you don't benefit from IPv6.
&lt;p&gt;
CCN (or variants of it) are different, because they're designed from the
start to be carryable over IP (IPv4 or IPv6).  That means they cooperate
with the existing system.  Moreover, you can tunnel IP (IPv4 or IPv6) over
CCN, so if you did replace all your IP routers with CCN routers, your
computer could still talk IP if it wanted, and still reach everyone on an IP
network.  (This is very different from other content-based networking
proposals, which lack that property and thus could never totally replace
IP.) You could even host IP-based services and advertise them using a
subdomain prefix on the CCN network (using OSPF or whatever they settle on),
so incoming connections would work.
&lt;p&gt;
Most importantly, though, CCN can be implemented entirely in userspace. 
Your browser could support it even if your OS kernel doesn't.  That's unlike
IPv6, where you could theoretically have an &quot;IPv6 proxy server&quot; accessible
from a user space process over IPv4 (or vice versa), but you wouldn't do it
that way because if you're using a proxy server anyhow, you'd probably use a
session-level (ie.  HTTP) proxy server, skipping the IPv6 layer altogether. 
CCN is different in this case because it provides actual value above and
beyond what IP provides.  If a CCN server exists for certain kinds of
things - video files, for instance, like Youtube or iTunes - then your life, and
the life of your ISP, and the lives of the people serving the files - will
get incrementally better as people add more CCN servers, routers, and
clients.  That encourages incremental adoption because there's an actual
reward for each increment.  With incremental deployment of IPv6, your only
reward is the knowledge that you made the world just a little bit more
complicated today.  Yay you.
      </description>
    </item>
    <item>
      <title>An Unwise Commentary on Wisdom</title>
      <pubDate>Tue, 18 Sep 2012 03:57:50 +0000</pubDate>
      <link>http://apenwarr.ca/log/?m=201209#15</link>
      <guid isPermaLink="true">http://apenwarr.ca/log/?m=201209#15</guid>
      <description>&lt;p&gt;
I've read a few articles about ageism and wisdom lately.
It's disappointing because people always say the same thing, and they always
don't get anywhere.  Young people say &quot;Wisdom comes with experience, not
age!&quot; and old people say &quot;You'll understand when you grow up!&quot; and the cycle
repeats, forever.
&lt;p&gt;
I'm in my thirties now (which makes me &quot;old&quot;, ha ha, at least by the
definition programmers use).  The theme of this diary is &quot;things I recently
learned that I wish someone had told me sooner,&quot; and so with that in mind,
here are a few things I understood when I grew up, around age 30 or so.
&lt;p&gt;
First of all, you have to actually define wisdom.  Wisdom is not
productivity.  It's not being smart.  It's not being successful, or even a
proxy for being successful.  Wisdom is not the same as insight, although
that's getting closer.  Wisdom is not the same as mere experience.  The
Hollywood symbol for wisdom is a homeless, disabled, ancient, wrinkled,
Chinese guy in a martial arts movie, sitting on a street corner muttering
aphorisms without using the words &quot;the&quot; or &quot;a&quot;.  And that, I think, is
really the closest to what we mean by wisdom.
&lt;p&gt;
Wisdom is knowing what the movie will be about, and how it will probably
end, five minutes in, before the plot has even started.  And then
wisdom is the self control to only tell the hero exactly the part he needs
to hear.
&lt;p&gt;
Let's pull this out of Hollywood and back to programming.  I'm going to
dumb that down a bit and put it like this: wisdom is the ability to predict
the future.
&lt;p&gt;
You might have heard that &quot;&lt;a
href=&quot;http://en.wikiquote.org/wiki/Alan_Kay&quot;&gt;The best way to predict the
future is to invent it&lt;/a&gt;,&quot; a famous statement by famous old wise guy Alan
Kay.  But according to that link, he said it when he was 29 or 30.  That
quote is partly true (what you can control, you can predict), but it's the
perfect wishful thinking of a young person trying to rationalize away the
need for wisdom.
&lt;p&gt;
As you gain wisdom, you can begin to predict even the things you
&lt;i&gt;can't&lt;/i&gt; control.  You might think you can accomplish the same thing
with facts, logic, and a really big search engine, but you can't.  You can
predict some things that way, but not most things.  Maybe someday, after all
the rest of Artificial Intelligence is finished, we can have Artificial
Wisdom.  Wisdom is the thing your computer &lt;i&gt;doesn't&lt;/i&gt; do.  To become
wise, you have to train your intuition using your physical senses.  That
takes time, and it actually takes actually physically being there to know
what it feels like.  People who think they can be wise without
&lt;i&gt;feeling&lt;/i&gt; it are idiots.  You can be lots of things without feeling the
world, including successful, rich, famous, productive, and smart.  But you
can't be wise.
&lt;p&gt;
So far, this is just a hopeless rant about how you're just too young to
understand.  Let's take it past that.  I can't make you wise, but maybe I
can help you spot wisdom when you see it.  For me, learning to spot it was
the first step in learning to get it.
&lt;p&gt;
I co-founded my first company when I was 19 or 20 (depending how you define
&quot;founded&quot;) and surely lacking in wisdom, because wise people don't start
tech companies.  A couple of years later, after me and my technical
co-founder had built and sold the first version of the product (to some
small profit, comparable to taking a paid internship instead), we found some
experienced businesspeople who joined in order to handle the business side. 
This was a smart (not to say wise) choice on our part.  The new people were
a few years older than us and lacked wisdom too, but had experience.  They
got us angel funding, then venture capital, and ramped our sales into the
millions of dollars per year.
&lt;p&gt;
Shortly after our first meeting with those new businesspeople, one of them
presented the rest of us a simple one-page &quot;getting on the same page&quot; memo
of understanding.  (Not a complicated MOU like lawyers draw up, just a
simple letter in his own words.) Not knowing anything about business, I
found a businessperson I knew (friend of a friend) to show the letter to. 
He was an older guy.  His 30-second review was, &quot;Stay away from this guy. 
He isn't the kind of person you want to be dealing with.&quot; I ignored him,
because that's what young people do with advice they don't like.
&lt;p&gt;
Roughly 8 years later, we sold our company to IBM for untold (I can't tell
you) zillions (not a real word) of dollars.  The venture capitalists, who
everyone teaches you to fear and distrust, were respectful, ethical, and
fair during the entire time we spent working with them.  (In particular
Desjardins-Innovatech were great.) This business guy, however, the one who
wrote the memo, turned out to be a slimeball.  To this day, he is still the
only person on my &quot;do not treat as human&quot; blacklist.  I had to create my
blacklist for this purpose.  But that's another story for another day.
&lt;p&gt;
The point is, 5 minutes into my story, someone older, who turned out to be
wise, had this guy pegged after reading half a page of text, without even
meeting him.  He predicted the future, accurately, instantly, and without
any real facts.  That's either luck or wisdom.
&lt;p&gt;
And so we come to our next problem.  I hope what you take away from this
article won't be, &quot;Listen to old people, they know stuff,&quot; because that
would be stupid.  Most old people, like most young people, are dumb, and so
taking their advice is dumb.  Numerous people told me the product we were
building was physically impossible and to maybe try something that made
sense and that people wanted, and they were all 100% wrong, and I was right
to ignore them.  (Maybe less right to tell them so to their face, but oh
well, something something wisdom etc.)
&lt;p&gt;
No, we're not done yet.  All that was just to say, yes, wisdom exists, and
no, you probably don't have it.  But I want you to know that you can, at
least, make a series of observations in order to hypothesize about its
nature.  You can't see protons either, but you know they're there.
&lt;p&gt;
There have been a few very memorable moments of my life when I have acquired
a few Real Actual Nuggets&lt;sup&gt;1&lt;/sup&gt; of Wisdom.  I know these moments,
because they were so astonishingly blatant.  I guess there were probably
other, subtler ones, but let's ignore those, because I can offer no advice
on how to detect them.  I think the big ones are enough.  With those ones,
looking back on my life, I can see the before-wisdom-nugget version of me,
and the after-wisdom-nugget version of me.  The after-nugget version is
dramatically better at predicting the future.
&lt;p&gt;
Perhaps my biggest, most multi-faceted wisdom acquisition event was reading
and understanding &lt;a
href=&quot;http://en.wikipedia.org/wiki/Crossing_the_Chasm&quot;&gt;Crossing the
Chasm&lt;/a&gt;, a book from 1991 about companies from before 1991.  Yeah,
sure, it taught me all sorts of stuff about why my company wasn't growing
exponentially, which was great to know, and explained in retrospect how much
of our time we'd been wasting on stupid initiatives, which was embarrassing
but also great to know.  But as part of discovering those things - and
probably in a moment of weakness caused by it - I learned something else.
&lt;p&gt;
All those problems we were having?  They'd been had by people for decades. 
And people already knew how to solve them.
&lt;p&gt;
When we did finally sell our company, IBM bought it because of about 5% of
the stuff it contained.  The other 95% was great stuff, but it wasn't what
IBM wanted.  In Crossing the Chasm terms, we had finally, 9 years in,
created a &quot;Whole Product&quot; for a &quot;Target Market.&quot; We could have accomplished
the same thing, I think, with only 5% of the work, if only we had known
which 5%.
&lt;p&gt;
You will try to tell me that there was no way to know which 5%.  That the
other 95% of wasted effort was necessary as part of the experiment.  I would
have told you that, too, back then.  Back before I knew it was
false.&lt;sup&gt;2&lt;/sup&gt;
&lt;p&gt;
That was the big lesson about wisdom for me, the one that has put all these
discussions about young vs old and energetic vs experienced into
perspective.  In 1999, there was nobody whose experience would tell you how
to build a Linux-based server appliance that would sell like hotcakes.  But
there &lt;i&gt;were&lt;/i&gt; people who could tell you we were doing it wrong, and
explain exactly why and how, in step-by-step detail, including the totally
predictable consequences of our mistakes (correct) and instructions about
how to do it right.  They published a book about it in 1991, before Linux
even existed.  It said, &quot;You are doing this.  You should do that instead.&quot;
And they were exactly right, on both counts.
&lt;p&gt;
Young people have energy.  We had a lot of energy, and produced a lot of
super crazy amazing stuff that I'm still very proud of today.  But we lacked
wisdom, so 95% of it was wasted.  My goal is no longer to code 10x as fast
as the average programmer; my goal is to not have 19/20 of my production be
useless.
&lt;p&gt;
So here is my first nugget of wisdom, purified and cleaned up and presented
with that huge preface.  It was the one that got me on the long, slow,
painful path to maybe learning other ones someday.  Maybe it will help you
too.
&lt;p&gt;
&lt;ul&gt;&lt;b&gt;&lt;i&gt;If you think nobody has ever done this before, you are almost
certainly wrong.  Many people have done it before.  Most of them have done
it wrong.  Find the ones who did it right, and find out how they knew it was
right.  If you can learn to find wisdom in others, someday you can find
wisdom in yourself.&lt;/i&gt;&lt;/b&gt;&lt;/ul&gt;
&lt;p&gt;
I know what it feels like to be 20 and running a startup.  You have faith in
yourself, and you feel like your world is unique, and nobody else has the
same problems you do.  You certainly feel like experiences from 20 years ago
can't possibly be relevant.  Once you've learned otherwise, then, laddie,
then maybe we can talk.
&lt;p&gt;
&lt;b&gt;Footnotes&lt;/b&gt;
&lt;p&gt;
1. Whenever I try to type &quot;nuggets&quot; I keep typing &quot;nuggles.&quot;  I hope this
footnote has been educational for you.
&lt;p&gt;
2. I'm not trying to be a &quot;Customer Development denier&quot; here.  There will
always be inefficiencies caused by searching for a business model.  But if
you spend 9 years and 95% of your work is wasted, you're doing it wrong.
      </description>
    </item>
  </channel>
</rss>
