apenwarr

Every layer of review makes you 10x slower

Tue, 17 Mar 2026 02:54:43 +0000

We’ve all heard of those network effect laws: the value of a network goes up with the square of the number of members. Or the cost of communication goes up with the square of the number of members, or maybe it was n log n, or something like that, depending how you arrange the members. Anyway doubling a team doesn't double its speed; there’s coordination overhead. Exactly how much overhead depends on how badly you botch the org design.

But there’s one rule of thumb that someone showed me decades ago, that has stuck with me ever since, because of how annoyingly true it is. The rule is annoying because it doesn’t seem like it should be true. There’s no theoretical basis for this claim that I’ve ever heard. And yet, every time I look for it, there it is.

Here we go:

Every layer of approval makes a process 10x slower

I know what you're thinking. Come on, 10x? That’s a lot. It’s unfathomable. Surely we’re exaggerating.

Nope.

Just to be clear, we're counting “wall clock time” here rather than effort. Almost all the extra time is spent sitting and waiting.

Look:

Code a simple bug fix
30 minutes
Get it code reviewed by the peer next to you
300 minutes → 5 hours → half a day
Get a design doc approved by your architects team first
50 hours → about a week
Get it on some other team’s calendar to do all that
(for example, if a customer requests a feature)
500 hours → 12 weeks → one fiscal quarter

I wish I could tell you that the next step up — 10 quarters or about 2.5 years — was too crazy to contemplate, but no. That’s the life of an executive sitting above a medium-sized team; I bump into it all the time even at a relatively small company like Tailscale if I want to change product direction. (And execs sitting above large teams can’t actually do work of their own at all. That's another story.)

AI can’t fix this

First of all, this isn’t a post about AI, because AI’s direct impact on this problem is minimal. Okay, so Claude can code it in 3 minutes instead of 30? That’s super, Claude, great work.

Now you either get to spend 27 minutes reviewing the code yourself in a back-and-forth loop with the AI (this is actually kinda fun); or you save 27 minutes and submit unverified code to the code reviewer, who will still take 5 hours like before, but who will now be mad that you’re making them read the slop that you were too lazy to read yourself. Little of value was gained.

Now now, you say, that’s not the value of agentic coding. You don’t use an agent on a 30-minute fix. You use it on a monstrosity week-long project that you and Claude can now do in a couple of hours! Now we’re talking. Except no, because the monstrosity is so big that your reviewer will be extra mad that you didn’t read it yourself, and it’s too big to review in one chunk so you have to slice it into new bite-sized chunks, each with a 5-hour review cycle. And there’s no design doc so there’s no intentional architecture, so eventually someone’s going to push back on that and here we go with the design doc review meeting, and now your monstrosity week-long project that you did in two hours is... oh. A week, again.

I guess I could have called this post Systems Design 4 (or 5, or whatever I’m up to now, who knows, I’m writing this on a plane with no wifi) because yeah, you guessed it. It's Systems Design time again.

The only way to sustainably go faster is fewer reviews

It’s funny, everyone has been predicting the Singularity for decades now. The premise is we build systems that are so smart that they themselves can build the next system that is even smarter, that builds the next smarter one, and so on, and once we get that started, if they keep getting smarter faster enough, then the incremental time (t) to achieve a unit (u) of improvement goes to zero, so (u/t) goes to infinity and foom.

Anyway, I have never believed in this theory for the simple reason we outlined above: the majority of time needed to get anything done is not actually the time doing it. It’s wall clock time. Waiting. Latency.

And you can’t overcome latency with brute force.

I know you want to. I know many of you now work at companies where the business model kinda depends on doing exactly that.

Sorry.

But you can’t just not review things!

Ah, well, no, actually yeah. You really can’t.

There are now many people who have seen the symptom: the start of the pipeline (AI generated code) is so much faster, but all the subsequent stages (reviews) are too slow! And so they intuit the obvious solution: stop reviewing then!

The result might be slop, but if the slop is 100x cheaper, then it only needs to deliver 1% of the value per unit and it's still a fair trade. And if your value per unit is even a mere 2% of what it used to be, you’ve doubled your returns! Amazing.

There are some pretty dumb assumptions underlying that theory; you can imagine them for yourself. Suffice it to say that this produces what I will call the AI Developer’s Descent Into Madness:

Whoa, I produced this prototype so fast! I have super powers!
This prototype is getting buggy. I’ll tell the AI to fix the bugs.
Hmm, every change now causes as many new bugs as it fixes.
Aha! But if I have an AI agent also review the code, it can find its own bugs!
Wait, why am I personally passing data back and forth between agents
I need an agent framework
I can have my agent write an agent framework!
Return to step 1

It’s actually alarming how many friends and respected peers I’ve lost to this cycle already. Claude Code only got good maybe a few months ago, so this only recenlty started happening, so I assume they will emerge from the spiral eventually. I mean, I hope they will. We have no way of knowing.

Why we review

Anyway we know our symptom: the pipeline gets jammed up because of too much new code spewed into it at step 1. But what's the root cause of the clog? Why doesn’t the pipeline go faster?

I said above that this isn’t an article about AI. Clearly I’m failing at that so far, but let’s bring it back to humans. It goes back to the annoyingly true observation I started with: every layer of review is 10x slower. As a society, we know this. Maybe you haven't seen it before now. But trust me: people who do org design for a living know that layers are expensive... and they still do it.

As companies grow, they all end up with more and more layers of collaboration, review, and management. Why? Because otherwise mistakes get made, and mistakes are increasingly expensive at scale. The average value added by a new feature eventually becomes lower than the average value lost through the new bugs it causes. So, lacking a way to make features produce more value (wouldn't that be nice!), we try to at least reduce the damage.

The more checks and controls we put in place, the slower we go, but the more monotonically the quality increases. And isn’t that the basis of continuous improvement?

Well, sort of. Monotonically increasing quality is on the right track. But “more checks and controls” went off the rails. That’s only one way to improve quality, and it's a fraught one.

“Quality Assurance” reduces quality

I wrote a few years ago about W. E. Deming and the "new" philosophy around quality that he popularized in Japanese auto manufacturing. (Eventually U.S. auto manufacturers more or less got the idea. So far the software industry hasn’t.)

One of the effects he highlighted was the problem of a “QA” pass in a factory: build widgets, have an inspection/QA phase, reject widgets that fail QA. Of course, your inspectors probably miss some of the failures, so when in doubt, add a second QA phase after the first to catch the remaining ones, and so on.

In a simplistic mathematical model this seems to make sense. (For example, if every QA pass catches 90% of defects, then after two QA passes you’ve reduced the number of defects by 100x. How awesome is that?)

But in the reality of agentic humans, it’s not so simple. First of all, the incentives get weird. The second QA team basically serves to evaluate how well the first QA team is doing; if the first QA team keeps missing defects, fire them. Now, that second QA team has little incentive to produce that outcome for their friends. So maybe they don’t look too hard; after all, the first QA team missed the defect, it’s not unreasonable that we might miss it too.

Furthermore, the first QA team knows there is a second QA team to catch any defects; if I don’t work too hard today, surely the second team will pick up the slack. That's why they're there!

Also, the team making the widgets in the first place doesn’t check their work too carefully; that’s what the QA team is for! Why would I slow down the production of every widget by being careful, at a cost of say 20% more time, when there are only 10 defects in 100 and I can just eliminate them at the next step for only a 10% waste overhead? It only makes sense. Plus they'll fire me if I go 20% slower.

To say nothing of a whole engineering redesign to improve quality, that would be super expensive and we could be designing all new widgets instead.

Sound like any engineering departments you know?

Well, this isn’t the right time to rehash Deming, but suffice it to say, he was on to something. And his techniques worked. You get things like the famous Toyota Production System where they eliminated the QA phase entirely, but gave everybody an “oh crap, stop the line, I found a defect!” button.

Famously, US auto manufacturers tried to adopt the same system by installing the same “stop the line” buttons. Of course, nobody pushed those buttons. They were afraid of getting fired.

Trust

The basis of the Japanese system that worked, and the missing part of the American system that didn’t, is trust. Trust among individuals that your boss Really Truly Actually wants to know about every defect, and wants you to stop the line when you find one. Trust among managers that executives were serious about quality. Trust among executives that individuals, given a system that can work and has the right incentives, will produce quality work and spot their own defects, and push the stop button when they need to push it.

But, one more thing: trust that the system actually does work. So first you need a system that will work.

Fallibility

AI coders are fallible; they write bad code, often. In this way, they are just like human programmers.

Deming’s approach to manufacturing didn’t have any magic bullets. Alas, you can’t just follow his ten-step process and immediately get higher quality engineering. The secret is, you have to get your engineers to engineer higher quality into the whole system, from top to bottom, repeatedly. Continuously.

Every time something goes wrong, you have to ask, “How did this happen?” and then do a whole post-mortem and the Five Whys (or however many Whys are in fashion nowadays) and fix the underlying Root Causes so that it doesn’t happen again. “The coder did it wrong” is never a root cause, only a symptom. Why was it possible for the coder to get it wrong?

The job of a code reviewer isn't to review code. It's to figure out how to obsolete their code review comment, that whole class of comment, in all future cases, until you don't need their reviews at all anymore.

(Think of the people who first created "go fmt" and how many stupid code review comments about whitespace are gone forever. Now that's engineering.)

By the time your review catches a mistake, the mistake has already been made. The root cause happened already. You're too late.

Modularity

I wish I could tell you I had all the answers. Actually I don’t have much. If I did, I’d be first in line for the Singularity because it sounds kind of awesome.

I think we’re going to be stuck with these systems pipeline problems for a long time. Review pipelines — layers of QA — don’t work. Instead, they make you slower while hiding root causes. Hiding causes makes them harder to fix.

But, the call of AI coding is strong. That first, fast step in the pipeline is so fast! It really does feel like having super powers. I want more super powers. What are we going to do about it?

Maybe we finally have a compelling enough excuse to fix the 20 years of problems hidden by code review culture, and replace it with a real culture of quality.

I think the optimists have half of the right idea. Reducing review stages, even to an uncomfortable degree, is going to be needed. But you can’t just reduce review stages without something to replace them. That way lies the Ford Pinto or any recent Boeing aircraft.

The complete package, the table flip, was what Deming brought to manufacturing. You can’t half-adopt a “total quality” system. You need to eliminate the reviews and obsolete them, in one step.

How? You can fully adopt the new system, in small bites. What if some components of your system can be built the new way? Imagine an old-school U.S. auto manufacturer buying parts from Japanese suppliers; wow, these parts are so well made! Now I can start removing QA steps elsewhere because I can just assume the parts are going to work, and my job of "assemble a bigger widget from the parts" has a ton of its complexity removed.

I like this view. I’ve always liked small beautiful things, that’s my own bias. But, you can assemble big beautiful things from small beautiful things.

It’s a lot easier to build those individual beautiful things in small teams that trust each other, that know what quality looks like to them. They deliver their things to customer teams who can clearly explain what quality looks like to them. And on we go. Quality starts bottom-up, and spreads.

I think small startups are going to do really well in this new world, probably better than ever. Startups already have fewer layers of review just because they have fewer people. Some startups will figure out how to produce high quality components quickly; others won't and will fail. Quality by natural selection?

Bigger companies are gonna have a harder time, because their slow review systems are baked in, and deleting them would cause complete chaos.

But, it’s not just about company size. I think engineering teams at any company can get smaller, and have better defined interfaces between them.

Maybe you could have multiple teams inside a company competing to deliver the same component. Each one is just a few people and a few coding bots. Try it 100 ways and see who comes up with the best one. Again, quality by evolution. Code is cheap but good ideas are not. But now you can try out new ideas faster than ever.

Maybe we’ll see a new optimal point on the monoliths-microservices continuum. Microservices got a bad name because they were too micro; in the original terminology, a “micro” service was exactly the right size for a “two pizza team” to build and operate on their own. With AI, maybe it's one pizza and some tokens.

What’s fun is you can also use this new, faster coding to experiment with different module boundaries faster. Features are still hard for lots of reasons, but refactoring and automated integration testing are things the AIs excel at. Try splitting out a module you were afraid to split out before. Maybe it'll add some lines of code. But suddenly lines of code are cheap, compared to the coordination overhead of a bigger team maintaining both parts.

Every team has some monoliths that are a little too big, and too many layers of reviews. Maybe we won't get all the way to Singularity. But, we can engineer a much better world. Our problems are solvable.

It just takes trust.

Systems design 3: LLMs and the semantic revolution

Thu, 20 Nov 2025 14:19:14 +0000

Long ago in the 1990s when I was in high school, my chemistry+physics teacher pulled me aside. "Avery, you know how the Internet works, right? I have a question."

I now know the correct response to that was, "Does anyone really know how the Internet works?" But as a naive young high schooler I did not have that level of self-awareness. (Decades later, as a CEO, that's my answer to almost everything.)

Anyway, he asked his question, and it was simple but deep. How do they make all the computers connect?

We can't even get the world to agree on 60 Hz vs 50 Hz, 120V vs 240V, or which kind of physical power plug to use. Communications equipment uses way more frequencies, way more voltages, way more plug types. Phone companies managed to federate with each other, eventually, barely, but the ring tones were different everywhere, there was pulse dialing and tone dialing, and some of them still charge $3/minute for international long distance, and connections take a long time to establish and humans seem to be involved in suspiciously many places when things get messy, and every country has a different long-distance dialing standard and phone number format.

So Avery, he said, now they're telling me every computer in the world can connect to every other computer, in milliseconds, for free, between Canada and France and China and Russia. And they all use a single standardized address format, and then you just log in and transfer files and stuff? How? How did they make the whole world cooperate? And who?

When he asked that question, it was a formative moment in my life that I'll never forget, because as an early member of what would be the first Internet generation… I Had Simply Never Thought of That.

I mean, I had to stop and think for a second. Wait, is protocol standardization even a hard problem? Of course it is. Humans can't agree on anything. We can't agree on a unit of length or the size of a pint, or which side of the road to drive on. Humans in two regions of Europe no farther apart than Thunder Bay and Toronto can't understand each other's speech. But this Internet thing just, kinda, worked.

"There's… a layer on top," I uttered, unsatisfyingly. Nobody had taught me yet that the OSI stack model existed, let alone that it was at best a weak explanation of reality.

"When something doesn't talk to something else, someone makes an adapter. Uh, and some of the adapters are just programs rather than physical things. It's not like everyone in the world agrees. But as soon as one person makes an adapter, the two things come together."

I don't think he was impressed with my answer. Why would he be? Surely nothing so comprehensively connected could be engineered with no central architecture, by a loosely-knit cult of mostly-volunteers building an endless series of whimsical half-considered "adapters" in their basements and cramped university tech labs. Such a creation would be a monstrosity, just as likely to topple over as to barely function.

I didn't try to convince him, because honestly, how could I know? But the question has dominated my life ever since.

When things don't connect, why don't they connect? When they do, why? How? …and who?

Postel's Law

The closest clue I've found is this thing called Postel's Law, one of the foundational principles of the Internet. It was best stated by one of the founders of the Internet, Jon Postel. "Be conservative in what you send, and liberal in what you accept."

What it means to me is, if there's a standard, do your best to follow it, when you're sending. And when you're receiving, uh, assume the best intentions of your counterparty and do your best and if that doesn't work, guess.

A rephrasing I use sometimes is, "It takes two to miscommunicate." Communication works best and most smoothly if you have a good listener and a clear speaker, sharing a language and context. But it can still bumble along successfully if you have a poor speaker with a great listener, or even a great speaker with a mediocre listener. Sometimes you have to say the same thing five ways before it gets across (wifi packet retransmits), or ask way too many clarifying questions, but if one side or the other is diligent enough, you can almost always make it work.

This asymmetry is key to all high-level communication. It makes network bugs much less severe. Without Postel's Law, triggering a bug in the sender would break the connection; so would triggering a bug in the receiver. With Postel's Law, we acknowledge from the start that there are always bugs and we have twice as many chances to work around them. Only if you trigger both sets of bugs at once is the flaw fatal.

…So okay, if you've used the Internet, you've probably observed that fatal connection errors are nevertheless pretty common. But that misses how incredibly much more common they would be in a non-Postel world. That world would be the one my physics teacher imagined, where nothing ever works and it all topples over.

And we know that's true because we've tried it. Science! Let us digress.

XML

We had the Internet ("OSI Layer 3") mostly figured out by the time my era began in the late 1900s, but higher layers of the stack still had work to do. It was the early days of the web. We had these newfangled hypertext ("HTML") browsers that would connect to a server, download some stuff, and then try their best to render it.

Web browsers are and have always been an epic instantiation of Postel's Law. From the very beginning, they assumed that the server (content author) had absolutely no clue what they were doing and did their best to apply some kind of meaning on top, despite every indication that this was a lost cause. List items that never end? Sure. Tags you've never heard of? Whatever. Forgot some semicolons in your javascript? I'll interpolate some. Partially overlapping italics and bold? Leave it to me. No indication what language or encoding the page is in? I'll just guess.

The evolution of browsers gives us some insight into why Postel's Law is a law and not just, you know, Postel's Advice. The answer is: competition. It works like this. If your browser interprets someone's mismash subjectively better than another browser, your browser wins.

I think economists call this an iterated prisoner's dilemma. Over and over, people write web pages (defect) and browsers try to render them (defect) and absolutely nobody actually cares what the HTML standard says (stays loyal). Because if there's a popular page that's wrong and you render it "right" and it doesn't work? Straight to jail.

(By now almost all the evolutionary lines of browsers have been sent to jail, one by one, and the HTML standard is effectively whatever Chromium and Safari say it is. Sorry.)

This law offends engineers to the deepness of their soul. We went through a period where loyalists would run their pages through "validators" and proudly add a logo to the bottom of their page saying how valid their HTML was. Browsers, of course, didn't care and continued to try their best.

Another valiant effort was the definition of "quirks mode": a legacy rendering mode meant to document, normalize, and push aside all the legacy wonko interpretations of old web pages. It was paired with a new, standards-compliant rendering mode that everyone was supposed to agree on, starting from scratch with an actual written spec and tests this time, and public shaming if you made a browser that did it wrong. Of course, outside of browser academia, nobody cares about the public shaming and everyone cares if your browser can render the popular web sites, so there are still plenty of quirks outside quirks mode. It's better and it was well worth the effort, but it's not all the way there. It never can be.

We can be sure it's not all the way there because there was another exciting development, HTML Strict (and its fancier twin, XHTML), which was meant to be the same thing, but with a special feature. Instead of sending browsers to jail for rendering wrong pages wrong, we'd send page authors to jail for writing wrong pages!

To mark your web page as HTML Strict was a vote against the iterated prisoner's dilemma and Postel's Law. No, your vote said. No more. We cannot accept this madness. We are going to be Correct. I certify this page is correct. If it is not correct, you must sacrifice me, not all of society. My honour demands it.

Anyway, many page authors were thus sacrificed and now nobody uses HTML Strict. Nobody wants to do tech support for a web page that asks browsers to crash when parsing it, when you can just… not do that.

Excuse me, the above XML section didn't have any XML

Yes, I'm getting to that. (And you're soon going to appreciate that meta joke about schemas.)

In parallel with that dead branch of HTML, a bunch of people had realized that, more generally, HTML-like languages (technically SGML-like languages) had turned out to be a surprisingly effective way to build interconnected data systems.

In retrospect we now know that the reason for HTML's resilience is Postel's Law. It's simply easier to fudge your way through parsing incorrect hypertext, than to fudge your way through parsing a Microsoft Word or Excel file's hairball of binary OLE streams, which famously even Microsoft at one point lost the knowledge of how to parse. But, that Postel's Law connection wasn't really understood at the time.

Instead we had a different hypothesis: "separation of structure and content." Syntax and semantics. Writing software to deal with structure is repetitive overhead, and content is where the money is. Let's automate away the structure so you can spend your time on the content: semantics.

We can standardize the syntax with a single Extensible Markup Language (XML). Write your content, then "mark it up" by adding structure right in the doc, just like we did with plaintext human documents. Data, plus self-describing metadata, all in one place. Never write a parser again!

Of course, with 20/20 hindsight (or now 2025 hindsight), this is laughable. Yes, we now have XML parser libraries. If you've ever tried to use one, you will find they indeed produce parse trees automatically… if you're lucky. If you're not lucky, they produce a stream of "tokens" and leave it to you to figure out how to arrange it in a tree, for reasons involving streaming, performance, memory efficiency, and so on. Basically, if you use XML you now have to deeply care about structure, perhaps more than ever, but you also have to include some giant external parsing library that, left in its normal mode, might spontaneously start making a lot of uncached HTTP requests that can also exploit remote code execution vulnerabilities haha oops.

If you've ever taken a parser class, or even if you've just barely tried to write a parser, you'll know the truth: the value added by outsourcing parsing (or in some cases only tokenization) is not a lot. This is because almost all the trouble of document processing (or compiling) is the semantic layer, the part where you make sense of the parse tree. The part where you just read a stream of characters into a data structure is the trivial, well-understood first step.

Now, semantics is where it gets interesting. XML was all about separating syntax from semantics. And they did some pretty neat stuff with that separation, in a computer science sense. XML is neat because it's such a regular and strict language that you can completely validate the syntax (text and tags) without knowing what any of the tags mean or which tags are intended to be valid at all.

…aha! Did someone say validate?! Like those old HTML validators we talked about? Oh yes. Yes! And this time the validation will be completely strict and baked into every implementation from day 1. And, the language syntax itself will be so easy and consistent to validate (unlike SGML and HTML, which are, in all fairness, bananas) that nobody can possibly screw it up.

A layer on top of this basic, highly validatable XML, was a thing called XML Schemas. These were documents (mysteriously not written in XML) that described which tags were allowed in which places in a certain kind of document. Not only could you parse and validate the basic XML syntax, you could also then validate its XML schema as a separate step, to be totally sure that every tag in the document was allowed where it was used, and present if it was required. And if not? Well, straight to jail. We all agreed on this, everyone. Day one. No exceptions. Every document validates. Straight to jail.

Anyway XML schema validation became an absolute farce. Just parsing or understanding, let alone writing, the awful schema file format is an unpleasant ordeal. To say nothing of complying with the schema, or (heaven forbid) obtaining a copy of someone's custom schema and loading it into the validator at the right time.

The core XML syntax validation was easy enough to do while parsing. Unfortunately, in a second violation of Postel's Law, almost no software that outputs XML runs it through a validator before sending. I mean, why would they, the language is highly regular and easy to generate and thus the output is already perfect. …Yeah, sure.

Anyway we all use JSON now.

JSON

Whoa, wait! I wasn't done!

This is the part where I note, for posterity's sake, that XML became a decade-long fad in the early 2000s that justified billions of dollars of software investment. None of XML's technical promises played out; it is a stain on the history of the computer industry. But, a lot of legacy software got un-stuck because of those billions of dollars, and so we did make progress.

What was that progress? Interconnection.

Before the Internet, we kinda didn't really need to interconnect software together. I mean, we sort of did, like cut-and-pasting between apps on Windows or macOS or X11, all of which were surprisingly difficult little mini-Postel's Law protocol adventures in their own right and remain quite useful when they work (except "paste formatted text," wtf are you people thinking). What makes cut-and-paste possible is top-down standards imposed by each operating system vendor.

If you want the same kind of thing on the open Internet, ie. the ability to "copy" information out of one server and "paste" it into another, you need some kind of standard. XML was a valiant effort to create one. It didn't work, but it was valiant.

Whereas all that money investment did work. Companies spent billions of dollars to update their servers to publish APIs that could serve not just human-formatted HTML, but also something machine-readable. The great innovation was not XML per se, it was serving data over HTTP that wasn't always HTML. That was a big step, and didn't become obvious until afterward.

The most common clients of HTTP were web browsers, and web browsers only knew how to parse two things: HTML and javascript. To a first approximation, valid XML is "valid" (please don't ask the validator) HTML, so we could do that at first, and there were some Microsoft extensions. Later, after a few billions of dollars, true standardized XML parsing arrived in browsers. Similarly, to a first approximation, valid JSON is valid javascript, which woo hoo, that's a story in itself (you could parse it with eval(), tee hee) but that's why we got here.

JSON (minus the rest of javascript) is a vastly simpler language than XML. It's easy to consistently parse (other than that pesky trailing comma); browsers already did. It represents only (a subset of) the data types normal programming languages already have, unlike XML's weird mishmash of single attributes, multiply occurring attributes, text content, and CDATA. It's obviously a tree and everyone knows how that tree will map into their favourite programming language. It inherently works with unicode and only unicode. You don't need cumbersome and duplicative "closing tags" that double the size of every node. And best of all, no guilt about skipping that overcomplicated and impossible-to-get-right schema validator, because, well, nobody liked schemas anyway so nobody added them to JSON (almost).

Today, if you look at APIs you need to call, you can tell which ones were a result of the $billions invested in the 2000s, because it's all XML. And you can tell which came in the 2010s and later after learning some hard lessons, because it's all JSON. But either way, the big achievement is you can call them all from javascript. That's pretty good.

(Google is an interesting exception: they invented and used protobuf during the same time period because they disliked XML's inefficiency, they did like schemas, and they had the automated infrastructure to make schemas actually work (mostly, after more hard lessons). But it mostly didn't spread beyond Google… maybe because it's hard to do from javascript.)

Blockchain

The 2010s were another decade of massive multi-billion dollar tech investment. Once again it was triggered by an overwrought boondoggle technology, and once again we benefited from systems finally getting updated that really needed to be updated.

Let's leave aside cryptocurrencies (which although used primarily for crime, at least demonstrably have a functioning use case, ie. crime) and look at the more general form of the technology.

Blockchains in general make the promise of a "distributed ledger" which allows everyone the ability to make claims and then later validate other people's claims. The claims that "real" companies invested in were meant to be about manufacturing, shipping, assembly, purchases, invoices, receipts, ownership, and so on. What's the pattern? That's the stuff of businesses doing business with other businesses. In other words, data exchange. Data exchange is exactly what XML didn't really solve (although progress was made by virtue of the dollars invested) in the previous decade.

Blockchain tech was a more spectacular boondoggle than XML for a few reasons. First, it didn't even have a purpose you could explain. Why do we even need a purely distributed system for this? Why can't we just trust a third party auditor? Who even wants their entire supply chain (including number of widgets produced and where each one is right now) to be visible to the whole world? What is the problem we're trying to solve with that?

…and you know there really was no purpose, because after all the huge investment to rewrite all that stuff, which was itself valuable work, we simply dropped the useless blockchain part and then we were fine. I don't think even the people working on it felt like they needed a real distributed ledger. They just needed an updated ledger and a budget to create one. If you make the "ledger" module pluggable in your big fancy supply chain system, you can later drop out the useless "distributed" ledger and use a regular old ledger. The protocols, the partnerships, the databases, the supply chain, and all the rest can stay the same.

In XML's defense, at least it was not worth the effort to rip out once the world came to its senses.

Another interesting similarity between XML and blockchains was the computer science appeal. A particular kind of person gets very excited about validation and verifiability. Both times, the whole computer industry followed those people down into the pits of despair and when we finally emerged… still no validation, still no verifiability, still didn't matter. Just some computers communicating with each other a little better than they did before.

LLMs

In the 2020s, our industry fad is LLMs. I'm going to draw some comparisons here to the last two fads, but there are some big differences too.

One similarity is the computer science appeal: so much math! Just the matrix sizes alone are a technological marvel the likes of which we have never seen. Beautiful. Colossal. Monumental. An inspiration to nerds everywhere.

But a big difference is verification and validation. If there is one thing LLMs absolutely are not, it's verifiable. LLMs are the flakiest thing the computer industry has ever produced! So far. And remember, this is the industry that brought you HTML rendering.

LLMs are an almost cartoonishly amplified realization of Postel's Law. They write human grammar perfectly, or almost perfectly, or when they're not perfect it's a bug and we train them harder. And, they can receive just about any kind of gibberish and turn it into a data structure. In other words, they're conservative in what they send and liberal in what they accept.

LLMs also solve the syntax problem, in the sense that they can figure out how to transliterate (convert) basically any file syntax into any other. Modulo flakiness. But if you need a CSV in the form of a limerick or a quarterly financial report formatted as a mysql dump, sure, no problem, make it so.

In theory we already had syntax solved though. XML and JSON did that already. We were even making progress interconnecting old school company supply chain stuff the hard way, thanks to our nominally XML- and blockchain- investment decades. We had to do every interconnection by hand – by writing an adapter – but we could do it.

What's really new is that LLMs address semantics. Semantics are the biggest remaining challenge in connecting one system to another. If XML solved syntax, that was the first 10%. Semantics are the last 90%. When I want to copy from one database to another, how do I map the fields? When I want to scrape a series of uncooperative web pages and turn it into a table of products and prices, how do I turn that HTML into something structured? (Predictably microformats, aka schemas, did not work out.) If I want to query a database (or join a few disparate databases!) using some language that isn't SQL, what options do I have?

LLMs can do it all.

Listen, we can argue forever about whether LLMs "understand" things, or will achieve anything we might call intelligence, or will take over the world and eradicate all humans, or are useful assistants, or just produce lots of text sludge that will certainly clog up the web and social media, or will also be able to filter the sludge, or what it means for capitalism that we willingly invented a machine we pay to produce sludge that we also pay to remove the sludge.

But what we can't argue is that LLMs interconnect things. Anything. To anything. Whether you like it or not. Whether it's bug free or not (spoiler: it's not). Whether it gets the right answer or not (spoiler: erm…).

This is the thing we have gone through at least two decades of hype cycles desperately chasing. (Three, if you count java "write once run anywhere" in the 1990s.) It's application-layer interconnection, the holy grail of the Internet.

And this time, it actually works! (mostly)

The curse of success

LLMs aren't going away. Really we should coin a term for this use case, call it "b2b AI" or something. For this use case, LLMs work. And they're still getting better and the precision will improve with practice. For example, imagine asking an LLM to write a data translator in some conventional programming language, instead of asking it to directly translate a dataset on its own. We're still at the beginning.

But, this use case, which I predict is the big one, isn't what we expected. We expected LLMs to write poetry or give strategic advice or whatever. We didn't expect them to call APIs and immediately turn around and use what it learned to call other APIs.

After 30 years of trying and failing to connect one system to another, we now have a literal universal translator. Plug it into any two things and it'll just go, for better or worse, no matter how confused it becomes. And everyone is doing it, fast, often with a corporate mandate to do it even faster.

This kind of scale and speed of (successful!) rollout is unprecedented, even by the Internet itself, and especially in the glacially slow world of enterprise system interconnections, where progress grinds to a halt once a decade only to be finally dislodged by the next misguided technology wave. Nobody was prepared for it, so nobody was prepared for the consequences.

One of the odd features of Postel's Law is it's irresistible. Big Central Infrastructure projects rise and fall with funding, but Postel's Law projects are powered by love. A little here, a little there, over time. One more person plugging one more thing into one more other thing. We did it once with the Internet, overcoming all the incompatibilities at OSI layers 1 and 2. It subsumed, it is still subsuming, everything.

Now we're doing it again at the application layer, the information layer. And just like we found out when we connected all the computers together the first time, naively hyperconnected networks make it easy for bad actors to spread and disrupt at superhuman speeds. We had to invent firewalls, NATs, TLS, authentication systems, two-factor authentication systems, phishing-resistant two-factor authentication systems, methodical software patching, CVE tracking, sandboxing, antivirus systems, EDR systems, DLP systems, everything. We'll have to do it all again, but faster and different.

Because this time, it's all software.

Billionaire math

Fri, 11 Jul 2025 16:18:52 +0000

I have a friend who exited his startup a few years ago and is now rich. How rich is unclear. One day, we were discussing ways to expedite the delivery of his superyacht and I suggested paying extra. His response, as to so many of my suggestions, was, “Avery, I’m not that rich.”

Everyone has their limit.

I, too, am not that rich. I have shares in a startup that has not exited, and they seem to be gracefully ticking up in value as the years pass. But I have to come to work each day, and if I make a few wrong medium-quality choices (not even bad ones!), it could all be vaporized in an instant. Meanwhile, I can’t spend it. So what I have is my accumulated savings from a long career of writing software and modest tastes (I like hot dogs).

Those accumulated savings and modest tastes are enough to retire indefinitely. Is that bragging? It was true even before I started my startup. Back in 2018, I calculated my “personal runway” to see how long I could last if I started a company and we didn’t get funded, before I had to go back to work. My conclusion was I should move from New York City back to Montreal and then stop worrying about it forever.

Of course, being in that position means I’m lucky and special. But I’m not that lucky and special. My numbers aren’t that different from the average Canadian or (especially) American software developer nowadays. We all talk a lot about how the “top 1%” are screwing up society, but software developers nowadays fall mostly in the top 1-2%[1] of income earners in the US or Canada. It doesn’t feel like we’re that rich, because we’re surrounded by people who are about equally rich. And we occasionally bump into a few who are much more rich, who in turn surround themselves with people who are about equally rich, so they don’t feel that rich either.

But, we’re rich.

Based on my readership demographics, if you’re reading this, you’re probably a software developer. Do you feel rich?

It’s all your fault

So let’s trace this through. By the numbers, you’re probably a software developer. So you’re probably in the top 1-2% of wage earners in your country, and even better globally. So you’re one of those 1%ers ruining society.

I’m not the first person to notice this. When I read other posts about it, they usually stop at this point and say, ha ha. Okay, obviously that’s not what we meant. Most 1%ers are nice people who pay their taxes. Actually it’s the top 0.1% screwing up society!

No.

I’m not letting us off that easily. Okay, the 0.1%ers are probably worse (with apologies to my friend and his chronically delayed superyacht). But, there aren’t that many of them[2] which means they aren’t as powerful as they think. No one person has very much capacity to do bad things. They only have the capacity to pay other people to do bad things.

Some people have no choice but to take that money and do some bad things so they can feed their families or whatever. But that’s not you. That’s not us. We’re rich. If we do bad things, that’s entirely on us, no matter who’s paying our bills.

What does the top 1% spend their money on?

Mostly real estate, food, and junk. If they have kids, maybe they spend a few hundred $k on overpriced university education (which in sensible countries is free or cheap).

What they don’t spend their money on is making the world a better place. Because they are convinced they are not that rich and the world’s problems are caused by somebody else.

When I worked at a megacorp, I spoke to highly paid software engineers who were torn up about their declined promotion to L4 or L5 or L6, because they needed to earn more money, because without more money they wouldn’t be able to afford the mortgage payments on an overpriced $1M+ run-down Bay Area townhome which is a prerequisite to starting a family and thus living a meaningful life. This treadmill started the day after graduation.[3]

I tried to tell some of these L3 and L4 engineers that they were already in the top 5%, probably top 2% of wage earners, and their earning potential was only going up. They didn’t believe me until I showed them the arithmetic and the economic stats. And even then, facts didn’t help, because it didn’t make their fears about money go away. They needed more money before they could feel safe, and in the meantime, they had no disposable income. Sort of. Well, for the sort of definition of disposable income that rich people use.[4]

Anyway there are psychology studies about this phenomenon. “What people consider rich is about three times what they currently make.” No matter what they make. So, I’ll forgive you for falling into this trap. I’ll even forgive me for falling into this trap.

But it’s time to fall out of it.

The meaning of life

My rich friend is a fountain of wisdom. Part of this wisdom came from the shock effect of going from normal-software-developer rich to founder-successful-exit rich, all at once. He described his existential crisis: “Maybe you do find something you want to spend your money on. But, I'd bet you never will. It’s a rare problem. Money, which is the driver for everyone, is no longer a thing in my life.”

Growing up, I really liked the saying, “Money is just a way of keeping score.” I think that metaphor goes deeper than most people give it credit for. Remember old Super Mario Brothers, which had a vestigial score counter? Do you know anybody who rated their Super Mario Brothers performance based on the score? I don’t. I’m sure those people exist. They probably have Twitch channels and are probably competitive to the point of being annoying. Most normal people get some other enjoyment out of Mario that is not from the score. Eventually, Nintendo stopped including a score system in Mario games altogether. Most people have never noticed. The games are still fun.

Back in the world of capitalism, we’re still keeping score, and we’re still weirdly competitive about it. We programmers, we 1%ers, are in the top percentile of capitalism high scores in the entire world - that’s the literal definition - but we keep fighting with each other to get closer to top place. Why?

Because we forgot there’s anything else. Because someone convinced us that the score even matters.

The saying isn’t, “Money is the way of keeping score.” Money is just one way of keeping score.

It’s mostly a pretty good way. Capitalism, for all its flaws, mostly aligns incentives so we’re motivated to work together and produce more stuff, and more valuable stuff, than otherwise. Then it automatically gives more power to people who empirically[5] seem to be good at organizing others to make money. Rinse and repeat. Number goes up.

But there are limits. And in the ever-accelerating feedback loop of modern capitalism, more people reach those limits faster than ever. They might realize, like my friend, that money is no longer a thing in their life. You might realize that. We might.

There’s nothing more dangerous than a powerful person with nothing to prove

Billionaires run into this existential crisis, that they obviously have to have something to live for, and money just isn’t it. Once you can buy anything you want, you quickly realize that what you want was not very expensive all along. And then what?

Some people, the less dangerous ones, retire to their superyacht (if it ever finally gets delivered, come on already). The dangerous ones pick ever loftier goals (colonize Mars) and then bet everything on it. Everything. Their time, their reputation, their relationships, their fortune, their companies, their morals, everything they’ve ever built. Because if there’s nothing on the line, there’s no reason to wake up in the morning. And they really need to want to wake up in the morning. Even if the reason to wake up is to deal with today’s unnecessary emergency. As long as, you know, the emergency requires them to do something.

Dear reader, statistically speaking, you are not a billionaire. But you have this problem.

So what then

Good question. We live at a moment in history when society is richer and more productive than it has ever been, with opportunities for even more of us to become even more rich and productive even more quickly than ever. And yet, we live in existential fear: the fear that nothing we do matters.[6][7]

I have bad news for you. This blog post is not going to solve that.

I have worse news. 98% of society gets to wake up each day and go to work because they have no choice, so at worst, for them this is a background philosophical question, like the trolley problem.

Not you.

For you this unsolved philosophy problem is urgent right now. There are people tied to the tracks. You’re driving the metaphorical trolley. Maybe nobody told you you’re driving the trolley. Maybe they lied to you and said someone else is driving. Maybe you have no idea there are people on the tracks. Maybe you do know, but you’ll get promoted to L6 if you pull the right lever. Maybe you’re blind. Maybe you’re asleep. Maybe there are no people on the tracks after all and you’re just destined to go around and around in circles, forever.

But whatever happens next: you chose it.

We chose it.

Footnotes

[1] Beware of estimates of the “average income of the top 1%.” That average includes all the richest people in the world. You only need to earn the very bottom of the 1% bucket in order to be in the top 1%.

[2] If the population of the US is 340 million, there are actually 340,000 people in the top 0.1%.

[3] I’m Canadian so I’m disconnected from this phenomenon, but if TV and movies are to be believed, in America the treadmill starts all the way back in high school where you stress over getting into an elite university so that you can land the megacorp job after graduation so that you can stress about getting promoted. If that’s so, I send my sympathies. That’s not how it was where I grew up.

[4] Rich people like us methodically put money into savings accounts, investments, life insurance, home equity, and so on, and only what’s left counts as “disposable income.” This is not the definition normal people use.

[5] Such an interesting double entendre.

[6] This is what AI doomerism is about. A few people have worked themselves into a terror that if AI becomes too smart, it will realize that humans are not actually that useful, and eliminate us in the name of efficiency. That’s not a story about AI. It’s a story about what we already worry is true.

[7] I’m in favour of Universal Basic Income (UBI), but it has a big problem: it reduces your need to wake up in the morning. If the alternative is bullshit jobs or suffering then yeah, UBI is obviously better. And the people who think that if you don’t work hard, you don’t deserve to live, are nuts. But it’s horribly dystopian to imagine a society where lots of people wake up and have nothing that motivates them. The utopian version is to wake up and be able to spend all your time doing what gives your life meaning. Alas, so far science has produced no evidence that anything gives your life meaning.

The evasive evitability of enshittification

Sun, 15 Jun 2025 02:52:58 +0000

Our company recently announced a fundraise. We were grateful for all the community support, but the Internet also raised a few of its collective eyebrows, wondering whether this meant the dreaded “enshittification” was coming next.

That word describes a very real pattern we’ve all seen before: products start great, grow fast, and then slowly become worse as the people running them trade user love for short-term revenue.

It’s a topic I find genuinely fascinating, and I've seen the downward spiral firsthand at companies I once admired. So I want to talk about why this happens, and more importantly, why it won't happen to us. That's big talk, I know. But it's a promise I'm happy for people to hold us to.

What is enshittification?

The term "enshittification" was first popularized in a blog post by Corey Doctorow, who put a catchy name to an effect we've all experienced. Software starts off good, then goes bad. How? Why?

Enshittification proposes not just a name, but a mechanism. First, a product is well loved and gains in popularity, market share, and revenue. In fact, it gets so popular that it starts to defeat competitors. Eventually, it's the primary product in the space: a monopoly, or as close as you can get. And then, suddenly, the owners, who are Capitalists, have their evil nature finally revealed and they exploit that monopoly to raise prices and make the product worse, so the captive customers all have to pay more. Quality doesn't matter anymore, only exploitation.

I agree with most of that thesis. I think Doctorow has that mechanism mostly right. But, there's one thing that doesn't add up for me:

Enshittification is not a success mechanism.

I can't think of any examples of companies that, in real life, enshittified because they were successful. What I've seen is companies that made their product worse because they were... scared.

A company that's growing fast can afford to be optimistic. They create a positive feedback loop: more user love, more word of mouth, more users, more money, more product improvements, more user love, and so on. Everyone in the company can align around that positive feedback loop. It's a beautiful thing. It's also fragile: miss a beat and it flattens out, and soon it's a downward spiral instead of an upward one.

So, if I were, hypothetically, running a company, I think I would be pretty hesitant to deliberately sacrifice any part of that positive feedback loop, the loop I and the whole company spent so much time and energy building, to see if I can grow faster. User love? Nah, I'm sure we'll be fine, look how much money and how many users we have! Time to switch strategies!

Why would I do that? Switching strategies is always a tremendous risk. When you switch strategies, it's triggered by passing a threshold, where something fundamental changes, and your old strategy becomes wrong.

Threshold moments and control

In Saint John, New Brunswick, there's a river that flows one direction at high tide, and the other way at low tide. Four times a day, gravity equalizes, then crosses a threshold to gently start pulling the other way, then accelerates. What doesn't happen is a rapidly flowing river in one direction "suddenly" shifts to rapidly flowing the other way. Yes, there's an instant where the limit from the left is positive and the limit from the right is negative. But you can see that threshold coming. It's predictable.

In my experience, for a company or a product, there are two kinds of thresholds like this, that build up slowly and then when crossed, create a sudden flow change.

The first one is control: if the visionaries in charge lose control, chances are high that their replacements won't "get it."

The new people didn't build the underlying feedback loop, and so they don't realize how fragile it is. There are lots of reasons for a change in control: financial mismanagement, boards of directors, hostile takeovers.

The worst one is temptation. Being a founder is, well, it actually sucks. It's oddly like being repeatedly punched in the face. When I look back at my career, I guess I'm surprised by how few times per day it feels like I was punched in the face. But, the constant face punching gets to you after a while. Once you've established a great product, and amazing customer love, and lots of money, and an upward spiral, isn't your creation strong enough yet? Can't you step back and let the professionals just run it, confident that they won't kill the golden goose?

Empirically, mostly no, you can't. Actually the success rate of control changes, for well loved products, is abysmal.

The saturation trap

The second trigger of a flow change is comes from outside: saturation. Every successful product, at some point, reaches approximately all the users it's ever going to reach. Before that, you can watch its exponential growth rate slow down: the infamous S-curve of product adoption.

Saturation can lead us back to control change: the founders get frustrated and back out, or the board ousts them and puts in "real business people" who know how to get growth going again. Generally that doesn't work. Modern VCs consider founder replacement a truly desperate move. Maybe a last-ditch effort to boost short term numbers in preparation for an acquisition, if you're lucky.

But sometimes the leaders stay on despite saturation, and they try on their own to make things better. Sometimes that does work. Actually, it's kind of amazing how often it seems to work. Among successful companies, it's rare to find one that sustained hypergrowth, nonstop, without suffering through one of these dangerous periods.

(That's called survivorship bias. All companies have dangerous periods. The successful ones surivived them. But of those survivors, suspiciously few are ones that replaced their founders.)

If you saturate and can't recover - either by growing more in a big-enough current market, or by finding new markets to expand into - then the best you can hope for is for your upward spiral to mature gently into decelerating growth. If so, and you're a buddhist, then you hire less, you optimize margins a bit, you resign yourself to being About This Rich And I Guess That's All But It's Not So Bad.

The devil's bargain

Alas, very few people reach that state of zen. Especially the kind of ambitious people who were able to get that far in the first place. If you can't accept saturation and you can't beat saturation, then you're down to two choices: step away and let the new owners enshittify it, hopefully slowly. Or take the devil's bargain: enshittify it yourself.

I would not recommend the latter. If you're a founder and you find yourself in that position, honestly, you won't enjoy doing it and you probably aren't even good at it and it's getting enshittified either way. Let someone else do the job.

Defenses against enshittification

Okay, maybe that section was not as uplifting as we might have hoped. I've gotta be honest with you here. Doctorow is, after all, mostly right. This does happen all the time.

Most founders aren't perfect for every stage of growth. Most product owners stumble. Most markets saturate. Most VCs get board control pretty early on and want hypergrowth or bust. In tech, a lot of the time, if you're choosing a product or company to join, that kind of company is all you can get.

As a founder, maybe you're okay with growing slowly. Then some copycat shows up, steals your idea, grows super fast, squeezes you out along with your moral high ground, and then runs headlong into all the same saturation problems as everyone else. Tech incentives are awful.

But, it's not a lost cause. There are companies (and open source projects) that keep a good thing going, for decades or more. What do they have in common?

An expansive vision that's not about money, and which opens you up to lots of users. A big addressable market means you don't have to worry about saturation for a long time, even at hypergrowth speeds. Google certainly never had an incentive to make Google Search worse.

(Update 2025-06-14: A few people disputed that last bit. Okay. Perhaps Google has ccasionally responded to what they thought were incentives to make search worse -- I wasn't there, I don't know -- but it seems clear in retrospect that when search gets worse, Google does worse. So I'll stick to my claim that their true incentives are to keep improving.)
Keep control. It's easy to lose control of a project or company at any point. If you stumble, and you don't have a backup plan, and there's someone waiting to jump on your mistake, then it's over. Too many companies "bet it all" on nonstop hypergrowth and ~~don't have any way back~~ have no room in the budget, if results slow down even temporarily.

Stories abound of companies that scraped close to bankruptcy before finally pulling through. But far more companies scraped close to bankruptcy and then went bankrupt. Those companies are forgotten. Avoid it.
Track your data. Part of control is predictability. If you know how big your market is, and you monitor your growth carefully, you can detect incoming saturation years before it happens. Knowing the telltale shape of each part of that S-curve is a superpower. If you can see the future, you can prevent your own future mistakes.
Believe in competition. Google used to have this saying they lived by: "the competition is only a click away." That was excellent framing, because it was true, and it will remain true even if Google captures 99% of the search market. The key is to cultivate a healthy fear of competing products, not of your investors or the end of hypergrowth. Enshittification helps your competitors. That would be dumb.

(And don't cheat by using lock-in to make competitors not, anymore, "only a click away." That's missing the whole point!)
Inoculate yourself. If you have to, create your own competition. Linus Torvalds, the creator of the Linux kernel, famously also created Git, the greatest tool for forking (and maybe merging) open source projects that has ever existed. And then he said, this is my fork, the Linus fork; use it if you want; use someone else's if you want; and now if I want to win, I have to make mine the best. Git was created back in 2005, twenty years ago. To this day, Linus's fork is still the central one.

If you combine these defenses, you can be safe from the decline that others tell you is inevitable. If you look around for examples, you'll find that this does actually work. You won't be the first. You'll just be rare.

Side note: Things that aren't enshittification

I often see people worry about enshittification that isn't. They might be good or bad, wise or unwise, but that's a different topic. Tools aren't inherently good or evil. They're just tools.

"Helpfulness." There's a fine line between "telling users about this cool new feature we built" in the spirit of helping them, and "pestering users about this cool new feature we built" (typically a misguided AI implementation) to improve some quarterly KPI. Sometimes it's hard to see where that line is. But when you've crossed it, you know.

Are you trying to help a user do what they want to do, or are you trying to get them to do what you want them to do?

Look into your heart. Avoid the second one. I know you know how. Or you knew how, once. Remember what that feels like.
Charging money for your product. Charging money is okay. Get serious. Companies have to stay in business.

That said, I personally really revile the "we'll make it free for now and we'll start charging for the exact same thing later" strategy. Keep your promises.

I'm pretty sure nobody but drug dealers breaks those promises on purpose. But, again, desperation is a powerful motivator. Growth slowing down? Costs way higher than expected? Time to capture some of that value we were giving away for free!

In retrospect, that's a bait-and-switch, but most founders never planned it that way. They just didn't do the math up front, or they were too naive to know they would have to. And then they had to.

Famously, Dropbox had a "free forever" plan that provided a certain amount of free storage. What they didn't count on was abandoned accounts, accumulating every year, with stored stuff they could never delete. Even if a very good fixed fraction of users each year upgraded to a paid plan, all the ones that didn't, kept piling up... year after year... after year... until they had to start deleting old free accounts and the data in them. A similar story happened with Docker, which used to host unlimited container downloads for free. In hindsight that was mathematically unsustainable. Success guaranteed failure.

Do the math up front. If you're not sure, find someone who can.
Value pricing. (ie. charging different prices to different people.) It's okay to charge money. It's even okay to charge money to some kinds of people (say, corporate users) and not others. It's also okay to charge money for an almost-the-same-but-slightly-better product. It's okay to charge money for support for your open source tool (though I stay away from that; it incentivizes you to make the product worse).

It's even okay to charge immense amounts of money for a commercial product that's barely better than your open source one! Or for a part of your product that costs you almost nothing.

But, you have to do the rest of the work. Make sure the reason your users don't switch away is that you're the best, not that you have the best lock-in. Yeah, I'm talking to you, cloud egress fees.
Copying competitors. It's okay to copy features from competitors. It's okay to position yourself against competitors. It's okay to win customers away from competitors. But it's not okay to lie.
Bugs. It's okay to fix bugs. It's okay to decide not to fix bugs; you'll have to sometimes, anyway. It's okay to take out technical debt. It's okay to pay off technical debt. It's okay to let technical debt languish forever.
Backward incompatible changes. It's dumb to release a new version that breaks backward compatibility with your old version. It's tempting. It annoys your users. But it's not enshittification for the simple reason that it's phenomenally ineffective at maintaining or exploiting a monopoly, which is what enshittification is supposed to be about. You know who's good at monopolies? Intel and Microsoft. They don't break old versions.

Enshittification is real, and tragic. But let's protect a useful term and its definition! Those things aren't it.

Epilogue: a special note to founders

If you're a founder or a product owner, I hope all this helps. I'm sad to say, you have a lot of potential pitfalls in your future. But, remember that they're only potential pitfalls. Not everyone falls into them.

Plan ahead. Remember where you came from. Keep your integrity. Do your best.

I will too.

NPS, the good parts

Tue, 05 Dec 2023 05:01:12 +0000

The Net Promoter Score (NPS) is a statistically questionable way to turn a set of 10-point ratings into a single number you can compare with other NPSes. That's not the good part.

Humans

To understand the good parts, first we have to start with humans. Humans have emotions, and those emotions are what they mostly use when asked to rate things on a 10-point scale.

Almost exactly twenty years ago, I wrote about sitting on a plane next to a musician who told me about music album reviews. The worst rating an artist can receive, he said, is a lukewarm one. If people think your music is neutral, it means you didn't make them feel anything at all. You failed. Someone might buy music that reviewers hate, or buy music that people love, but they aren't really that interested in music that is just kinda meh. They listen to music because they want to feel something.

(At the time I contrasted that with tech reviews in computer magazines (remember those?), and how negative ratings were the worst thing for a tech product, so magazines never produced them, lest they get fewer free samples. All these years later, journalism is dead but we're still debating the ethics of game companies sponsoring Twitch streams. You can bet there's no sponsored game that gets an actively negative review during 5+ hours of gameplay and still gets more money from that sponsor. If artists just want you to feel something, but no vendor will pay for a game review that says it sucks, I wonder what that says about video game companies and art?)

Anyway, when you ask regular humans, who are not being sponsored, to rate things on a 10-point scale, they will rate based on their emotions. Most of the ratings will be just kinda meh, because most products are, if we're honest, just kinda meh. I go through most of my days using a variety of products and services that do not, on any more than the rarest basis, elicit any emotion at all. Mostly I don't notice those. I notice when I have experiences that are surprisingly good, or (less surprisingly but still notably) bad. Or, I notice when one of the services in any of those three categories asks me to rate them on a 10-point scale.

The moment

The moment when they ask me is important. Many products and services are just kinda invisibly meh, most of the time, so perhaps I'd give them a meh rating. But if my bluetooth headphones are currently failing to connect, or I just had to use an airline's online international check-in system and it once again rejected my passport for no reason, then maybe my score will be extra low. Or if Apple releases a new laptop that finally brings back a non-sucky keyboard after making laptops with sucky keyboards for literally years because of some obscure internal political battle, maybe I'll give a high rating for a while.

If you're a person who likes manipulating ratings, you'll figure out what moments are best for asking for the rating you want. But let's assume you're above that sort of thing, because that's not one of the good parts.

The calibration

Just now I said that if I'm using an invisible meh product or service, I would rate it with a meh rating. But that's not true in real life, because even though I was having no emotion about, say, Google Meet during a call, perhaps when they ask me (after every...single...call) how it was, that makes me feel an emotion after all. Maybe that emotion is "leave me alone, you ask me this way too often." Or maybe I've learned that if I pick anything other than five stars, I get a clicky multi-tab questionnaire that I don't have time to answer, so I almost always pick five stars unless the experience was so bad that I feel it's worth an extra minute because I simply need to tell the unresponsive and uncaring machine how I really feel.

Google Meet never gets a meh rating. It's designed not to. In Google Meet, meh gets five stars.

Or maybe I bought something from Amazon and it came with a thank-you card begging for a 5-star rating (this happens). Or a restaurant offers free stuff if I leave a 5-star rating and prove it (this happens). Or I ride in an Uber and there's a sign on the back seat talking about how they really need a 5-star rating because this job is essential so they can support their family and too many 4-star ratings get them disqualified (this happens, though apparently not at UberEats). Okay. As one of my high school teachers, Physics I think, once said, "A's don't cost me anything. What grade do you want?" (He was that kind of teacher. I learned a lot.)

I'm not a professional reviewer. Almost nobody you ask is a professional reviewer. Most people don't actually care; they have no basis for comparison; just about anything will influence their score. They will not feel badly about this. They're just trying to exit your stupid popup interruption as quickly as possible, and half the time they would have mashed the X button instead but you hid it, so they mashed this one instead. People's answers will be... untrustworthy at best.

That's not the good part.

And yet

And yet. As in so many things, randomness tends to average out, probably into a Gaussian distribution, says the Central Limit Theorem.

The Central Limit Theorem is the fun-destroying reason that you can't just average 10-point ratings or star ratings and get something useful: most scores are meh, a few are extra bad, a few are extra good, and the next thing you know, every Uber driver is a 4.997. Or you can ship a bobcat one in 30 times and still get 97% positive feedback.

There's some deep truth hidden in NPS calculations: that meh ratings mean nothing, that the frequency of strong emotions matters a lot, and that deliriously happy moments don't average out disastrous ones.

Deming might call this the continuous region and the "special causes" (outliers). NPS is all about counting outliers, and averages don't work on outliers.

The degrees of meh

Just kidding, there are no degrees of meh. If you're not feeling anything, you're just not. You're not feeling more nothing, or less nothing.

One of my friends used to say, on a scale of 6 to 9, how good is this? It was a joke about how nobody ever gives a score less than 6 out of 10, and nothing ever deserves a 10. It was one of those jokes that was never funny because they always had to explain it. But they seemed to enjoy explaining it, and after hearing the explanation the first several times, that part was kinda funny. Anyway, if you took the 6-to-9 instructions seriously, you'd end up rating almost everything between 7 and 8, just to save room for something unimaginably bad or unimaginably good, just like you did with 1-to-10, so it didn't help at all.

And so, the NPS people say, rather than changing the scale, let's just define meaningful regions in the existing scale. Only very angry people use scores like 1-6. Only very happy people use scores like 9 or 10. And if you're not one of those you're meh. It doesn't matter how meh. And in fact, it doesn't matter much whether you're "5 angry" or "1 angry"; that says more about your internal rating system than about the degree of what you experienced. Similarly with 9 vs 10; it seems like you're quite happy. Let's not split hairs.

So with NPS we take a 10-point scale and turn it into a 3-point scale. The exact opposite of my old friend: you know people misuse the 10-point scale, but instead of giving them a new 3-point scale to misuse, you just postprocess the 10-point scale to clean it up. And now we have a 3-point scale with 3 meaningful points. That's a good part.

Evangelism

So then what? Average out the measurements on the newly calibrated 1-2-3 scale, right?

Still no. It turns out there are three kinds of people: the ones so mad they will tell everyone how mad they are about your thing; the ones who don't care and will never think about you again if they can avoid it; and the ones who had such an over-the-top amazing experience that they will tell everyone how happy they are about your thing.

NPS says, you really care about the 1s and the 3s, but averaging them makes no sense. And the 2s have no effect on anything, so you can just leave them out.

Cool, right?

Pretty cool. Unfortunately, that's still two valuable numbers but we promised you one single score. So NPS says, let's subtract them! Yay! Okay, no. That's not the good part.

The threefold path

I like to look at it this way instead. First of all, we have computers now, we're not tracking ratings on one of those 1980s desktop bookkeeping printer-calculators, you don't have to make every analysis into one single all-encompassing number.

Postprocessing a 10-point scale into a 3-point one, that seems pretty smart. But you have to stop there. Maybe you now have three separate aggregate numbers. That's tough, I'm sorry. Here's a nickel, kid, go sell your personal information in exchange for a spreadsheet app. (I don't know what you'll do with the nickel. Anyway I don't need it. Here. Go.)

Each of those three rating types gives you something different you can do in response:

The ones had a very bad experience, which is hopefully an outlier, unless you're Comcast or the New York Times subscription department. Normally you want to get rid of every bad experience. The absence of awful isn't greatness, it's just meh, but meh is infinitely better than awful. Eliminating negative outliers is a whole job. It's a job filled with Deming's special causes. It's hard, and it requires creativity, but it really matters.
The twos had a meh experience. This is, most commonly, the majority. But perhaps they could have had a better experience. Perhaps even a great one? Deming would say you can and should work to improve the average experience and reduce the standard deviation. That's the dream; heck, what if the average experience could be an amazing one? That's rarely achieved, but a few products achieve it, especially luxury brands. And maybe that Broadway show, Hamilton? I don't know, I couldn't get tickets, because everyone said it was great so it was always sold out and I guess that's my point.

If getting the average up to three is too hard or will take too long (and it will take a long time!), you could still try to at least randomly turn a few of them into threes. For example, they say users who have a great customer support experience often rate a product more highly than the ones who never needed to contact support at all, because the support interaction made the company feel more personal. Maybe you can't afford to interact with everyone, but if you have to interact anyway, perhaps you can use that chance to make it great instead of meh.
The threes already had an amazing experience. Nothing to do, right? No! These are the people who are, or who can become, your superfan evangelists. Sometimes that happens on its own, but often people don't know where to put that excess positive energy. You can help them. Pop stars and fashion brands know all about this; get some true believers really excited about your product, and the impact is huge. This is a completely different job than turning ones into twos, or twos into threes.

What not to do

Those are all good parts. Let's ignore that unfortunately they aren't part of NPS at all and we've strayed way off topic.

From here, there are several additional things you can do, but it turns out you shouldn't.

Don't compare scores with other products. I guarantee you, your methodology isn't the same as theirs. The slightest change in timing or presentation will change the score in incomparable ways. You just can't. I'm sorry.

Don't reward your team based on aggregate ratings. They will find a way to change the ratings. Trust me, it's too easy.

Don't average or difference the bad with the great. The two groups have nothing to do with each other, require completely different responses (usually from different teams), and are often very small. They're outliers after all. They're by definition not the mainstream. Outlier data is very noisy and each terrible experience is different from the others; each deliriously happy experience is special. As the famous writer said, all meh families are alike.

Don't fret about which "standard" rating ranges translate to bad-meh-good. Your particular survey or product will have the bad outliers, the big centre, and the great outliers. Run your survey enough and you'll be able to find them.

Don't call it NPS. NPS nowadays has a bad reputation. Nobody can really explain the bad reputation; I've asked. But they've all heard it's bad and wrong and misguided and unscientific and "not real statistics" and gives wrong answers and leads to bad incentives. You don't want that stigma attached to your survey mechanic. But if you call it a satisfaction survey on a 10-point or 5-point scale, tada, clear skies and lush green fields ahead.

Bonus advice

Perhaps the neatest thing about NPS is how much information you can get from just one simple question that can be answered with the same effort it takes to dismiss a popup.

I joked about Google Meet earlier, but I wasn't really kidding; after having a few meetings, if I had learned that I could just rank from 1 to 5 stars and then not get guilted for giving anything other than 5, I would do it. It would be great science and pretty unobtrusive. As it is, I lie instead. (I don't even skip, because it's faster to get back to the menu by lying than by skipping.)

While we're here, only the weirdest people want to answer a survey that says it will take "just 5 minutes" or "just 30 seconds." I don't have 30 seconds, I'm busy being mad/meh/excited about your product, I have other things to do! But I can click just one single star rating, as long as I'm 100% confident that the survey will go the heck away after that. (And don't even get me started about the extra layer in "Can we ask you a few simple questions about our website? Yes or no")

Also, don't be the survey that promises one question and then asks "just one more question." Be the survey that gets a reputation for really truly asking that one question. Then ask it, optionally, in more places and more often. A good role model is those knowledgebases where every article offers just thumbs up or thumbs down (or the default of no click, which means meh). That way you can legitimately look at aggregates or even the same person's answers over time, at different points in the app, after they have different parts of the experience. And you can compare scores at the same point after you update the experience.

But for heaven's sake, not by just averaging them.

Interesting

Fri, 06 Oct 2023 20:59:31 +0000

A few conversations last week made me realize I use the word “interesting” in an unusual way.

I rely heavily on mental models. Of course, everyone relies on mental models. But I do it intentionally and I push it extra hard.

What I mean by that is, when I’m making predictions about what will happen next, I mostly don’t look around me and make a judgement based on my immediate surroundings. Instead, I look at what I see, try to match it to something inside my mental model, and then let the mental model extrapolate what “should” happen from there.

If this sounds predictably error prone: yes. It is.

But it’s also powerful, when used the right way, which I try to do. Here’s my system.

Confirmation bias

First of all, let’s acknowledge the problem with mental models: confirmation bias. Confirmation bias is the tendency of all people, including me and you, to consciously or subconsciously look for evidence to support what we already believe to be true, and try to ignore or reject evidence that disagrees with our beliefs.

This is just something your brain does. If you believe you’re exempt from this, you’re wrong, and dangerously so. Confirmation bias gives you more certainty where certainty is not necessarily warranted, and we all act on that unwarranted certainty sometimes.

On the one hand, we would all collapse from stress and probably die from bear attacks if we didn’t maintain some amount of certainty, even if it’s certainty about wrong things. But on the other hand, certainty about wrong things is pretty inefficient.

There’s a word for the feeling of stress when your brain is working hard to ignore or reject evidence against your beliefs: cognitive dissonance. Certain Internet Dingbats have recently made entire careers talking about how to build and exploit cognitive dissonance, so I’ll try to change the subject quickly, but I’ll say this: cognitive dissonance is bad… if you don’t realize you’re having it.

But your own cognitive dissonance is amazingly useful if you notice the feeling and use it as a tool.

The search for dissonance

Whether you like it or not, your brain is going to be working full time, on automatic pilot, in the background, looking for evidence to support your beliefs. But you know that; at least, you know it now because I just told you. You can be aware of this effect, but you can’t prevent it, which is annoying.

But you can try to compensate for it. What that means is using the part of your brain you have control over — the supposedly rational part — to look for the opposite: things that don’t match what you believe.

To take a slight detour, what’s the relationship between your beliefs and your mental model? For the purposes of this discussion, I’m going to say that mental models are a system for generating beliefs. Beliefs are the output of mental models. And there’s a feedback loop: beliefs are also the things you generalize in order to produce your mental model. (Self-proclaimed ”Bayesians” will know what I’m talking about here.)

So let’s put it this way: your mental model, combined with current observations, produce your set of beliefs about the world and about what will happen next.

Now, what happens if what you expected to happen next, doesn’t happen? Or something happens that was entirely unexpected? Or even, what if someone tells you you’re wrong and they expect something else to happen?

Those situations are some of the most useful ones in the world. They’re what I mean by interesting.

The “aha” moment

The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka!” (I found it!) but “That’s funny…”

possibly

When you encounter evidence that your mental model mismatches someone else’s model, that’s an exciting opportunity to compare and figure out which one of you is wrong (or both). Not everybody is super excited about doing that with you, so you have to be be respectful. But the most important people to surround yourself with, at least for mental model purposes, are the ones who will talk it through with you.

Or, if you get really lucky, your predictions turn out to be demonstrably concretely wrong. That’s an even bigger opportunity, because now you get to figure out what part of your mental model is mistaken, and you don’t have to negotiate with a possibly-unwilling partner in order to do it. It’s you against reality. It’s science: you had a hypothesis, you did an experiment, your hypothesis was proven wrong. Neat! Now we’re getting somewhere.

What follows is then the often-tedious process of figuring out what actual thing was wrong with your model, updating the model, generating new outputs that presumably match your current observations, and then generating new hypotheses that you can try out to see if the new model works better more generally.

For physicists, this whole process can sometimes take decades and require building multiple supercolliders. For most of us, it often takes less time than that, so we should count ourselves fortunate even if sometimes we get frustrated.

The reason we update our model, of course, is that most of the time, the update changes a lot more predictions than just the one you’re working with right now. Turning observations back into generalizable mental models allows you to learn things you’ve never been taught; perhaps things nobody has ever learned before. That’s a superpower.

Proceeding under uncertainty

But we still have a problem: that pesky slowness. Observing outcomes, updating models, generating new hypotheses, and repeating the loop, although productive, can be very time consuming. My guess is that’s why we didn’t evolve to do that loop most of the time. Analysis paralysis is no good when a tiger is chasing you and you’re worried your preconceived notion that it wants to eat you may or may not be correct.

Let’s tie this back to business for a moment.

You have evidence that your mental model about your business is not correct. For example, let’s say you have two teams of people, both very smart and well-informed, who believe conflicting things about what you should do next. That’s interesting, because first of all, your mental model is that these two groups of people are very smart and make right decisions almost all the time, or you wouldn’t have hired them. How can two conflicting things be the right decision? They probably can’t. That means we have a few possibilities:

The first group is right
The second group is right
Both groups are wrong
The appearance of conflict is actually not correct, because you missed something critical

There is also often a fifth possibility:

Okay, it’s probably one of the first four but I don’t have time to figure that out right now

In that case, there’s various wisdom out there involving one- vs two-way doors, and oxen pulling in different directions, and so on. But it comes down to this: almost always, it’s better to get everyone aligned to the same direction, even if it’s a somewhat wrong direction, than to have different people going in different directions.

To be honest, I quite dislike it when that’s necessary. But sometimes it is, and you might as well accept it in the short term.

The way I make myself feel better about it is to choose the path that will allow us to learn as much as possible, as quickly as possible, in order to update our mental models as quickly as possible (without doing too much damage) so we have fewer of these situations in the future. In other words, yes, we “bias toward action” — but maybe more of a “bias toward learning.” And even after the action has started, we don’t stop trying to figure out the truth.

Being wrong

Leaving aside many philosophers’ objections to the idea that “the truth” exists, I think we can all agree that being wrong is pretty uncomfortable. Partly that’s cognitive dissonance again, and partly it’s just being embarrassed in front of your peers. But for me, what matters more is the objective operational expense of the bad decisions we make by being wrong.

You know what’s even worse (and more embarrassing, and more expensive) than being wrong? Being wrong for even longer because we ignored the evidence in front of our eyes.

You might have to talk yourself into this point of view. For many of us, admitting wrongness hurts more than continuing wrongness. But if you can pull off that change in perspective, you’ll be able to do things few other people can.

Bonus: Strong opinions held weakly

Like many young naive nerds, when I first heard of the idea of “strong opinions held weakly,” I thought it was a pretty good idea. At least, clearly more productive than weak opinions held weakly (which are fine if you want to keep your job), or weak opinions held strongly (which usually keep you out of the spotlight).

The real competitor to strong opinions held weakly is, of course, strong opinions held strongly. We’ve all met those people. They are supremely confident and inspiring, until they inspire everyone to jump off a cliff with them.

Strong opinions held weakly, on the other hand, is really an invitation to debate. If you disagree with me, why not try to convince me otherwise? Let the best idea win.

After some decades of experience with this approach, however, I eventually learned that the problem with this framing is the word “debate.” Everyone has a mental model, but not everyone wants to debate it. And if you’re really good at debating — the thing they teach you to be, in debate club or whatever — then you learn how to “win” debates without uncovering actual truth.

Some days it feels like most of the Internet today is people “debating” their weakly-held strong beliefs and pulling out every rhetorical trick they can find, in order to “win” some kind of low-stakes war of opinion where there was no right answer in the first place.

Anyway, I don’t recommend it, it’s kind of a waste of time. The people who want to hang out with you at the debate club are the people who already, secretly, have the same mental models as you in all the ways that matter.

What’s really useful, and way harder, is to find the people who are not interested in debating you at all, and figure out why.