Chegg 0/1 Bug Priority

I’ve previously written about how to get rid of your bug backlog. I used that at SkyPort Systems to radically trim the backlog, and surface and prioritize code infrastructure that needed to be upgraded. It was very useful for justifying code infrastructure improvements to say “we have 5 software issues that can only be fixed if we do X”.

As an aside on that post, I threw off the idea at the end that you only needed 3 bug priorities:

  • Holy Shit! nobody goes home until this is fixed
  • FIFO do bugs in order received
  • No we’re not going to fix this

Now, I’ve always believed that the severity of a bug does not equal priority. A misspelling on the home page may be cosmetic, but its actually super high-priority. At Pace, I had to promise internally that I would read every bug, but that I wanted them to stop setting all the bugs to Priority 1 so someone would read the bug. Priority 1 was off limits, only I could do that, as the engineering manager. Similarly, there was a bottom status, Duly Noted, which meant we’re not going to fix this. Again, my job to set that, I was being honest. The purpose of your bug tracking system is to provide a structured way for the rest of the company to communicate with Software Engineering. Engineers don’t fix bugs they fix bug reports. If one side is lying, communication has broken down. Things got better, quickly.

The engineering management at Chegg came up with something even better. What had been happening for a long time at Chegg was that bugs were lasting forever. When for the 4th book return cycle in a row, the email with the link to print your shipping label had the same typo. That typo broke the shipping label URL, which meant nearly every customer had to call in at $30/pop. It hadn’t gotten fixed because it was “cosmetic”. Once Rush was over, it wasn’t important to fix it… The CIO and CTO blew their stack that 4th time.

So they came up with the 3 states above, 0, 1, “Closed”. After a couple of months, they implemented a Service Level Agreement that all bugs had to be fixed or closed in 3 months. Reducing the priorities to just 2, turned out to be genius. A 0 meant the site was broken.

Humans have leaky brains. What did you have for lunch? For breakfast? Lunch a week ago? The longer a bug sits in the database, the harder it is to fix. If you push the code to the repo, and get a build, and QA finds a bug in 5 minutes, it often takes 1 minute or less to fix it, because the code is still in your head, and most bugs are stupid. Wait a week? Harder. Wait 3 months? What was I doing again? Where was that? Oh, 20 people have stepped on that file since, I have no idea.

As a company, we were spending a lot of time massaging the bugs, instead of fixing the bugs. The intuition that it was important to set the priority was just wrong. 90% of the bugs are just fine with First In First Out. Productivity measured by bug fixing speed jumped, the software managers and product managers had more time, and no one had to go to one of those awful “bug backlog review” meetings.

So it was a total win.

Leave a Reply