Gmail outage reminds us that IT automation is not enough
Updated on July 12, 2016
So you do a bit of IT automation. Maybe you throw in some functional testing for that IT automation too. You have monitoring. You have a top notch engineering team. You're doing enough then, right? Nothing could go wrong?
I mean, you may not be Google, but... oh, hmm, yeah.
Our reminder of the week that configuration management is hard, that it is prone to errors, and that those errors can have a massive impact comes from none other than the auteurs of automation, the commanders of complexity themselves, Google.
"The incorrect configuration was sent to live services..., caused users’ requests for their data to be ignored, and those services, in turn, generated errors.", said VP of Engineering Ben Traynor.
I don't need to explain the impact up to 50 minutes of downtime for Gmail represents. Consider the event a community service announcement though (one to follow up Dropbox's the previous week). Configurations matter. You need to monitor your configurations for drift. You should control them through executable policies, for compliance as well as functionality.
You can have the best systems in the world, and the smartest people to run them, but it won't mean a thing if your configuration isn't solid.
The good news? There's at least one thing Google didn't have that you can today: UpGuard (shameless plug). UpGuard simplifies configuration management and helps you activate DevOps.
Misconfigurations are an internal problem that emanate from within the IT infrastructure of any enterprise; no hacker is necessary for massive damage to occur to digital systems and stored data. And the problem is pervasive, with Gartner estimating anywhere from 70% to 99% of data breaches result not from external, concerted attacks, but from internal misconfiguration of the affected IT systems.