When we set out to create a cloud-based tool for configuration monitoring, we used the tools we knew and wrote UpGuard using JRuby. For our application, JRuby had many good qualities: getting started only required a one line install, the agent only needed to talk out on port 443, and it was platform agnostic. Using JRuby we demonstrated the value of system visibility, attracted our first cohort of customers, and raised the funds to expand UpGuard. Now we're not only scrapping that agent, we're moving away from agent-based architecture altogether. Here's why.
As we've learned more about configuration monitoring and discovery, both from our own R&D and from working with our customers, we've realized that an agentless connection manager is better than a per host installed agent. Given the investment in JRuby as a technology, our first attempt was to use the JRuby agent as a connection manager. Since expanding to deployments of tens of thousands of nodes, the performance and memory requirements of JRuby quickly became problematic. (If you want to argue about JRuby, here's the post where you should do it.) Given the position of the company at the time, we had the rare luxury of choosing between optimizing the 2-3 year old code base in order to meet the needs of our customers or rewriting it all together. Rather than accepting the limits of JRuby, we rewrote our application in Go to achieve the goal of an agentless connection manager with the best technology available.
We are proud to announce two great things simultaneously: the launch of our fully featured agentless connection manager, and a tech refresh that improves UpGuard's benchmark's across the board.
The Argument for Agentless
1) Maintenance costs. Far and away, the biggest argument for an agentless design is to slash maintenance costs. UpGuard is intended to make large infrastructures easier to manage and to increase the server/admin ratio. An installed agent means additional costs for users whenever updates are rolled out. There is absolutely no way around that problem with installed agents. More subtly, since those updates will have to make it through a change management process, it also means that (potentially many) different versions of UpGuard will be deployed at any given time. More supported versions = higher likelihood that something will break and need fixing. Supporting fewer versions also means we can provide a higher level of support for the current connection manager and deliver requested features more quickly.
2) Better performance. This is an effect of Go rather than of the agentless architecture, but it's still a good thing. In performance tests, the connection manager has doubled in speed while using at least an order of magnitude less memory and (obviously) incurring no installation footprint.
3) On premise deployments. Packaging UpGuard as a virtual appliance for on premise deployments is a popular option and we wanted to improve the quality of that offering. As described in our post on using Golang, one of the initial reasons for the rewrite was to get away from the unnecessarily large footprint of the JVM required for on premise installations. The connection manager is now easily packaged for deployment behind the firewall and will self-update with new versions.
Ok, so those are the arguments for an agentless connection manager. Lowered maintenance costs = greater return on investment and fewer headaches. Plus, it's faster and lighter because it uses technologies like Golang that were not as well developed two years ago.
The Argument for Agents
We also looked at the arguments for agents and believe that the connection manager addresses all those points. Data Dog wrote a post some time ago explaining why they use an agent. Their arguments might apply for performance monitoring, but for inspecting configurations and building a system of record they don't make as much sense.
1) Sufficient API access for cloud apps. Not a problem here. If anything, more configuration items are available through the API, as in the case of CloudFlare, and there's no reason we would bump against the rate limit for API requests.
2) Sampling rate. Performance monitoring needs one second intervals. Configuration monitoring does not. We know you're a continuous delivery rockstar but you are not pushing changes out every second. And if you are, UpGuard provides targeted testing that can be executed much more quickly than the complete state scan.
We also considered the virtues of the old UpGuard agent.
3) Effort to install. We are working to make installation of the connection manager as easy as possible but even with slightly more work upfront, the long term savings on maintenance very quickly outweigh the initial benefits of the agent's one line install. For example, one way we'll be improving UpGuard distribution is to package it with Docker, offering an even simpler, trusted install.
4) Platform support. The connection manager supports every OS that the agent did and, because it is much lighter, is also suitable for deployment on systems with tighter memory and disk constraints. For example, UpGuard will soon be shipped on lightweight components like enterprise IoT devices.
All things considered, our new connection manager is a huge step forward for UpGuard. Just as Ansible has made the case for agentless configuration management, we see an agentless connection manager as the future of system state monitoring.