We have a rule at iotum that we don’t deploy upgrades on Friday. It’s a simple recognition that folks would like to go into the weekend without the spectre of a major outage haunting them. Much as the folks at Facebook describe their experience with deploying server upgrades, our philosophy is to deploy, listen and fix rather than polish apples. It works well for us, and lets us crank out updates to Calliflower on a two or three week cycle, which means that we can be much much more responsive to customers feature requests than a traditional software house can. It also means that releasing on Friday is a bad idea, because we often find small gotchas after any release.
We should have followed our own rules last Friday. Instead, Friday afternoon Noam (our Director of Products and Engineering) and I made a decision to deploy a “critical” fix that upgraded our PIN range from 4 digits to 5 digits. We had a customer who needed to send invitations to a public call to more than 10,000 people. Noam proposed we make the upgrade, and before I could object, assured me that he would babysit the servers over the weekend personally. I don’t know about you, but I like that kind of dedication, so I agreed and said “Yes, let’s break the rule”.
Murphy struck. The ramifications of that change flowed through the Calliflower system like a bad smell through an air conditioning duct. By about 8 pm, complaints from upset customers started to flow in and we had a small scale emergency on our hands. It wasn’t enough to change code. Voice prompts had to be re-recorded, switches restarted and more. At 1:43 in the morning everything was in good shape again, after an evening of yeoman’s effort by Noam and senior dev Rob.
Thank goodness it was Friday evening of a holiday weekend. Not too many people were affected.
For those Calliflower users that were affected by these problems, please accept our humble apologies.
And for the rest of you reading this… well, that’s why we don’t release code on Fridays.