Stuffing the stuff

without getting stuffy
Everything here is my opinion. I do not speak for your employer.
January 2006
February 2006

2006-01-16 »

Historical vs. Model-based Predictions

We've had some discussions at work lately about time predictions, and how we never seem to learn from our prediction mistakes. We have two major examples of this: first, our second-digit (4.x.0) releases tend to be about 4.5 months apart. And second, our manual QA regression tests (most tests are automated nowadays) take about 2 weeks to complete each cycle, with several cycles per release.

Those two numbers - 4.5 months and 2 weeks - are what I call historical predictions, that is, you simply measure how long an operation has taken in the past, and then predict that it will take that long in the future. As long as you aren't making major changes to your processes (eg. doing more/less beta testing, doubling the number of tests in a test cycle, etc), this is an extremely accurate method for prediction. Just do the same thing every time, and you'll predict the right timeline for next time. Since accurate predictions are extremely valuable for things like synchronizing advance marketing and sales/support training with your software release, it's a big benefit being able to predict so accurately.

But historical predictions have a big problem, which is that they're empty. All I can say is it'll take about 4.5 months to make the release - I can't say why it'll take so long, or what we'll be doing during that time. So it's very easy for people to negotiate you down in your predictions. "Come on! Can't you do it in only 4.0 months instead of 4.5? That's not much difference, and 4.5 months is too long!"

To have a serious discussion like this about reducing your timelines - which of course, is also beneficial, for lots of reasons - you can't use historical prediction. Trying to bargain with broad statistics is just expecting that wishing will make your dreams come true. I'd like it to take less than 4.5 months to make a release; that won't make it happen.

To make it actually happen, you have to use a totally different method, which I'll call model-based prediction. As you might guess from the name, the model-based system requires you to construct a detailed model of how your process works. Once you have that model, you can try changing parts of the model that take a lot of time, and see the difference it makes to the model timeline. If it makes the prediction look better, you can try that change and see if it works in real life.

Of course, successfully using that technique requires that your model actually be correct. Here's the fundamental problem: it's almost never correct. Worse, the model almost always includes a subset of what happens in real life, which means your model-based predictions will always predict less time than a historical prediction.

Now back to the discussion where someone is trying to bargain you down. Your model predicts it'll take 2.5 months to make a release. History says it'll take 4.5 months. You did change some stuff since last time; after all, you're constantly improving your release process. So when someone comes and wants you to promise 4.0 months instead of 4.5 months, you might plausibly believe that you really maybe did trim 2 weeks off the schedule; after all, the right answer must be between 2.5 months and 4.5 months somewhere. And you give in.

We make this mistake a lot. Now, I'm not going to feel too guilty about saying so, since I think pretty much every software company makes that mistake a lot. But that doesn't mean we shouldn't try to do something about it. Here's what I think is a better way:

  • Remember, your model-based prediction is worthless (ie. definitely wrong and horribly misleading) unless it predicts the same answer as your historical prediction. Tweak it until it does.

  • Each time you run the process, if your model-based prediction was wrong, look back at real life and note what parts you were wrong about, then update the model for next time. This might require taking more careful notes when you do things... maybe using some sort of... uh, schedulation system.

  • If you change your processes, start with your previous model (which must match with history!), then adjust it to match your new plan. Use the prediction from that. Remember: almost all changes to the model will not significantly affect the timeline. Most actions are not on the critical path. So you know your model is wrong if almost every change you make has some kind of significant effect. You can suspect your model is getting close if most changes don't have an effect.

  • If, after changing your processes, your model-based prediction was wrong (most common failure: it took just as long as history said it always does), then your model is wrong. Compare the new and old models, and figure out why your change was not actually on the critical path even though you thought it was. Try again.

I'm sure lots of people have written entire books about this sort of thing, but neither I, nor most programmers, and certainly not most managers, have read them. What they do instead is just ignore statistics completely and try random things, leading to a horrible pattern: "Darn it, you guys keep releasing stuff and it takes too long! We'd better drop all the weird parts of your process and do something more traditional." Then that doesn't improve the process, and it might even make things worse. But nobody will ever say, "Darn it, we're doing it just like everyone else, but it takes too long! We'd better try something randomly different!"

If you don't understand your underlying model and do what you do for a clear reason, eventually someone will talk you into doing things the formal, boring, and possibly painful and inefficient way, whether it actually helps or not. And you won't have any firepower to talk them out of it.

And that's why you can't just use historical predictions, no matter how accurate they might be. Of course, you can't just use model-based predictions either, because we know they're inaccurate. Instead, the one true answer is the one that has both models correct at the same time.

I'm CEO at Tailscale, where we make network problems disappear.

Why would you follow me on twitter? Use RSS.

apenwarr on gmail.com