2012-06-28 »
Last week:
1. Fix a race condition that wasn't very important.
2. As part of #1, introduce a new, more subtle, much more important race condition that breaks existing functionality.
This week:
1. Find out the stresstest script I wrote has been silently not testing network load due to a bug introduced by another change (if you must know, we dropped busybox and the replacement "grep" we used doesn't parse command line args like -q). My fault for making the script insufficiently resilient to catch it.
2. Find out someone in the hardware team has noticed and fixed the bug (thanks!)
3. Find out that with network stress testing actually working, the box fails hardware tests in scary ways.
4. Find out that our friend in the hardware team, noticing point 3 while making changes, also reduced the load created by the network stress test tool so that the tests would pass.
5. Tonight they decided to not run the stress test tool at all, because after a few minutes of running the stress test tool, the system crashes, and it's interrupting the hardware verification.
Sure, you might innocently wonder if software that crashes only during a hardware stress test might, oddly enough, be revealing some kind of hardware problem. But you'd be... well, nobody knows yet :)
Why would you follow me on twitter? Use RSS.