≡ Menu

Rule #1: Don’t release code on Friday

We have a rule at iotum that we don’t deploy upgrades on Friday.  It’s a simple recognition that folks would like to go into the weekend without the spectre of a major outage haunting them.  Much as the folks at Facebook describe their experience with deploying server upgrades, our philosophy is to deploy, listen and fix rather than polish apples.  It works well for us, and lets us crank out updates to Calliflower on a two or three week cycle, which means that we can be much much more responsive to customers feature requests than a traditional software house can.  It also means that releasing on Friday is a bad idea, because we often find small gotchas after any release.

We should have followed our own rules last Friday.  Instead, Friday afternoon Noam (our Director of Products and Engineering) and I made a decision to deploy a “critical” fix that upgraded our PIN range from 4 digits to 5 digits.  We had a customer who needed to send invitations to a public call to more than 10,000 people. Noam proposed we make the upgrade, and before I could object, assured me that he would babysit the servers over the weekend personally.  I don’t know about you, but I like that kind of dedication, so I agreed and said “Yes, let’s break the rule”.

Murphy struck.  The ramifications of that change flowed through the Calliflower system like a bad smell through an air conditioning duct. By about 8 pm, complaints from upset customers started to flow in and we had a small scale emergency on our hands. It wasn’t enough to change code.  Voice prompts had to be re-recorded, switches restarted and more.  At 1:43 in the morning everything was in good shape again, after an evening of yeoman’s effort by Noam and senior dev Rob.

Thank goodness it was Friday evening of a holiday weekend.  Not too many people were affected.

For those Calliflower users that were affected by these problems, please accept our humble apologies.

And for the rest of you reading this… well, that’s why we don’t release code on Fridays.

{ 1 comment… add one }

  • Michael Graves September 1, 2008, 4:43 pm

    Good rule. We have a series of similar rules where I work.

    #2 never install a patch/upgrade unless you have 24 hours on-site available to monitor its behavior.

    #3 never trust development to give you the real skinny on the behavior of their latest patch. As someone in support for an impartial opinion.

    #4 don’t be afraid to revert back into known production code if you can’t get 24 hours of clean operation before you have to leave the customers site.

Leave a Comment