UpGuard Blog

Group Differencing: How We Designed Our Variance Report

Written by Greg Pollock | Aug 18, 2015 8:31:00 PM

UpGuard is built to answer the fundamental questions of configuration management: how are my systems configured, are they configured correctly, what's changed since yesterday, what's for lunch– the stuff you absolutely need to know. In its first release, UpGuard satisfied the first three by scanning and recording configuration state, continuously testing with policies, and giving users the ability to difference configuration state over time or between nodes. But one thing was missing: the ability to difference a group of nodes all at one time.

The Challenge

Defining what "group differencing" even meant was the first challenge. In a one-to-one comparison each item is in one of a known number of states: either it is only on one node, only on the other, is on both but differs, or is on both nodes. A group differences expands the possible number of states as a function of number of nodes and goes from a two-dimensional problem to an n-dimensional problem. Even if that information might be useful, we needed to simplify it into terms that humans could understand.

Make it Useful

Next we turned to math for help. There are formulas and algorithms for calculating variance but they wouldn't solve the real problem at hand. The goal of the group diff user interface isn't to measure variance; the goal is to provide the information needed to reduce it. (The goal of the CSTAR risk score, on the other hand, is to measure variance as a reporting mechanism abstracted from the gritty details of what varies.) Focusing on the rarity of a piece of information is cool if you're studying the fish in Lake Springfield, but the interests of a system administrator are more practical: you want to find and resolve problems. The output of a group diff shouldn't be a number telling you exactly how deep you're in trouble. It should be a map telling you how to get out of it.

The Solution

After a week of prototyping we arrived at a solution that made sense. We would assemble a superset of all configuration items (CIs) in the group, grade each CI on how many nodes agreed it was the right configuration, and present the findings as a heat map. 

Measuring "how different" an item is across multiples nodes has its own challenges. Each configuration item can have multiple attributes which means that a set of several nodes can disagree about a configuration item in several different ways. One might lack the CI entirely, while three others have different versions. The more attributes, the more permutations of difference per CI.

Given that we're more interested in discovering points of interest than getting precise but useless calculations, we created an algorithm inspired by the Raft election metaphor. Each node votes for what it thinks is the correct version of each configuration item. Where there is a unanimous opinion, all is well. Where there is disagreement, we count how many nodes voted for the most popular version of the item and compare that to the total number of nodes. This provides a continuum of items where there is high consensus– many nodes voted for the same version– to items with low consensus– each node voted for a different version of the item.

In this example we have six nodes that are supposed to be configured the same, and for the most part they are. As it turns out, only one node is misconfigured. Using a one-to-one comparison method, it would take several attempts and a good bit of luck to find the misconfigured node– and even then we wouldn't have any guarantee that there weren't other misconfigurations hiding in the group. Here, in one view, we see all the configuration items, whether or not they are identical across the nodes, and how they differ when they do.

What's Next

The variance report contains a lot of information and can be overwhelming, especially in environments with many configuration inconsistencies. We are trying to find new ways to simplify the presentation without losing the depth of information. If you want to try it for yourself and see how well your environments match up to your expectations, UpGuard can give you a demo by clicking the button below.

More Blogs

How CSTAR Works What's In the Website Risk Grader? Understanding Risk in the 21st Century

How CSTAR Works

All the information needed to perform a CSTAR assessment is bundled into the UpGuard platform. Learn more about CSTAR.
Read Blog >

What's In the Website Risk Grader?

The UpGuard Website Risk Grader provides a low friction way to get an initial assessment of a business' risk profile.
Read Blog >

Understanding Risk in the 21st Century

And as we enter 2016, the risk of data breaches in particular threatens to hamper business innovation.
Read Blog >