Saturday, April 30, 2011

Oops: Amazon explains crash that 'broke the internet' was down to engineers botching a routine system upgrade - 30th Apr 2011

In a detailed letter Amazon finally explained today why its much vaunted EC2 cloud computer network crashed last week.

The grovelling explanation - described by one customer as being like a 'Catholic penance' in its length - told of how a routine server upgrade gone wrong caused a cascade of further problems that took down thousands of websites in a 'perfect storm' last Thursday.

In its confessional letter, Amazon promised to learn lessons from the crash and offered customers affected 10 free days of storage to compensate them for their loss.

Speaking to The Register, Justin Santa Barbara - who's company FathomDB was affected by the outage - said: 'Judging by the length [of the apology], we can understand what took them so long. I am sure everyone would have appreciated more details during the outage itself, so that we could make an informed restore vs. ride it out decision, rather than continually being told 'just a few more minutes' until we lose faith. Read More