At UpGuard we've got many decades of experience in large enterprises and are very familiar with the sorts of problems that arise in those sorts of environments. Even for those who have lived through it though, it can be hard to explain to people who haven't. That's why we require all our new employees to read The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win by Gene Kim, Kevin Behr and George Spafford. It does a great - and surprisingly entertaining - job of describing these issues. It also explains how the lessons learnt from years of Lean Manufacturing apply directly to IT. We know that no tool is a silver bullet, but if the employees at Parts Unlimited had UpGuard then it may have been an entirely different story. I've chosen some key excerpts from the book so that we could see how things may have been different.
1. Audit Requirements
“‘Issue 127. Insecure Windows operating system MAX_SYN_COOKIE setting’ ? Is this a joke? In case you haven’t heard, we’ve got a real business to run. Sorry if that interferes with this full-time audit employment racket you’ve got going on here.”
This quote comes from Wes, the Director of Distributed Technology Operations, after being given a giant pile of audit findings that he will need to address. With UpGuard he would get one of his team members to create an Environment Variable test for that setting along with the others required by audit and then schedule it to run regularly against every node in the system. Within seconds they could identify any failing machines. Once they were fixed he could show the auditors the big green tick on the UpGuard report. End of story. That would've made the book a bit short though - perhaps they could have made it into an eBook, like all the other cool kids are doing.
2. What Version?
“Yes, we copied the file that you gave us… Yes, it’s version 1.0.13… What do you mean it’s the wrong version… What? When did you change that?… Copy it now and try again… Okay, look, but I’m telling you this isn’t going to work… I think it’s a networking problem… What do you mean we need to open up a firewall port? Why the hell didn’t you tell us this two hours ago? Goddamnit!”
Another successful deployment by Parts Unlimited. Barely tested code is released into Production to meet a project deadline and all hell breaks loose. If these configurations were captured within UpGuard issues like this would never arise. The configuration requirements of everyone (developers, ops and security) would be verifiable at the push of a button. An incorrect component version like the above would be found in seconds. Better still, with these tests being run through all environments, and the ability to run them on a schedule, it's unlikely this would ever have even made it to Production.
Finding incorrect versions is also a trivial affair with UpGuard. With the platform's powerful configuration search engine, queries for version numbers and/or package names can be performed easily across environments.
3. Unauthorized Change Tracking
"In order to prevent this from happening again, we’re putting together a project to monitor our critical systems for unauthorized changes."
Our hero Bill Palmer, the newly appointed VP of IT Operations, explaining one of his initiatives to try and stop any further outages. With configuration settings captured within UpGuard and tested on a regular basis Parts Unlimited would not need a special project to monitor authorized changes, UpGuard would have it covered. As soon as something changed unexpectedly the appropriate teams would be notified.
Additionally, UpGuard's intuitive interface makes it easy to scan and identify changes across your environment visually.
4. Key Resource Dependencies
"No one knows what’s in Brent’s head. This is the classic IT knowledge problem. System information is stuck in people’s heads and when things go wrong, you have to rely on those few people."
Erik Reid, the Yoda-like mentor and Board Candidate with a manufacturing background who helps Bill along the way. Here he's pointing out the problem with having critical IT Knowledge trapped in the head of one key resource. UpGuard is the perfect platform for storing this knowledge. Not only is it a quick and easy way to capture it, once in UpGuard it becomes executable, the only way to guarantee it is kept up to date.
5. Preparing for Automation
"You’re generating documentation that will enable you to automate some of them.”
Automation is a great goal to have, but until you know what you've got - and are able to validate it - getting to automation is going to be an expensive, bug riddled process. And once you do, UpGuard simplifiess your automation workflow by natively outputting to the tool of your choice-- be it Puppet, Chef, or Ansible.
6. Errors - One Time is Enough
"A couple of the early ones we delivered had a few configuration errors or were missing something. We’ve corrected it in the work instructions."
UpGuard supports a test driven approach where when a configuration error is found you would write a test to check for it, make sure the test picks up the failure, then fix the problem and rerun the test to make sure it passes. Make sure the Remediation field is filled in with the information on how to fix the problem. In this way you can be protected against recurring issues by having a documented test - and solution - for every problem you're ever come up against.
7. So Many Requirements
"...there’s over twenty steps involving at least six different teams! You need the OS and all the software packages, license keys, dedicated IP address, special user accounts set up, mount points configured, and then you need the IP addresses to be added to an ACL list on some file server. In this particular case, the requirements say that we need a physical server, so we also need a router port, cabling, and a server rack where we have enough space.”
Patty, the Director of IT Service Support, not happy about star engineer Brent underestimating a task due to lack of coordination with other team members meaning that the estimate only covered part of the job. Collaboration is highly encouraged within UpGuard. Packages can be shared so that each team can add their relevant steps so that the big picture is captured and is visible to everyone involved.
8. Configuration Drift
“There should be absolutely no way that the Dev and QA environments don’t match the production environment.”
Bill, coming up against a very common problem. A bug is found and fixed in Production but doesn't flow back down to the Dev and QA environments. UpGuard protects against manual changes by detecting server drift when tests are run regularly. It can also detect differences between servers that should be the same and visually highlight differences. Furthermore, UpGuard's advanced capabilities like group differencing and variance analysis are essential to combating drift across complex environments.
9. Nailed It
“Good... because you’re finally getting those steps documented, you’re able to enforce some level of consistency and quality, as well."
Thanks Erik, I couldn't have said it better myself.
As you can see, if Parts Unlimited had been using UpGuard in conjunction with the other techniques used in the book, then it would have been quite a different story. If you haven't read it yet, go and get yourself a copy, and if you'd like to know more about UpGuard, check out our Product page.