Gmail outage reminds us that IT automation is not enough

Updated on June 28, 2018 by UpGuard

Gmail outage configuration management

Gmail is amazing, but it isn't perfect. In both 2014 and 2016, the popular service suffered severe outages.

By now, your technology team has invested in automating their processes using configuration management tools. Maybe they are also manually testing that automation. They are monitoring your critical systems. You are trying to attract and retain top notch engineers. You're doing enough then, right? Nothing could go wrong?

I mean, you may not be Google, but... oh, hmm, yeah.

As a reminder that configuration management is hard, prone to errors, and that those errors can have a massive impact comes from none other than the auteurs of automation, the commanders of complexity themselves, Google.

Yep, in 2014, thanks to a bug in "an internal system that generates configuration", Gmail was down for at least 20 (and up to 50) minutes.

"The incorrect configuration was sent to live services..., caused users’ requests for their data to be ignored, and those services, in turn, generated errors.", said VP of Engineering Ben Traynor.

I don't need to explain the impact that up to 50 minutes of Gmail downtime represents. We should remember the event as a wake up call (one to follow up Dropbox's the previous week). Configurations matter. You need to monitor your configurations for drift. You should control them with automated policies, for compliance as well as functionality.

You can have the best systems in the world, and the smartest people to run them, but it won't mean a thing if your configuration isn't solid.

So what can I do, if even Google hasn't solved this? UpGuard Core automates your configuration testing, ensuring your systems stay up and remain resilient against hackers.

Request a Free Demo