Updated on April 30, 2018 by UpGuard
In our first article on cloud leaks, we took a look at what they were and why they should be classified separately from other cyber incidents. To understand how cloud leaks happen and why they are so common, we need to step back and first take a look at the way that leaked information is first generated, manipulated, and used. It’s almost taken as a foregone conclusion that these huge sets of sensitive data exist and that companies are doing something with them, but when you examine the practice of information handling, it becomes clear that organizing a resilient process becomes quite difficult at scale; operational gaps and process errors lead to vulnerable assets, which in turn lead to cloud leaks.
One of the great things about digital data is that it can be reproduced cheaply and without degradation. Organizations typically have several copies of production data aside from the one being used in their business processes. Backups, warehousing, disaster recovery, development and testing environments, in-house analytics, outsourced analytics, a laptop someone copied the data to so they could work from home– the list goes on. The point is that data is at once many and one, like how the word data itself is confused between plural and singular. Many copies of a dataset can exist, but there is only one breach possible, where the data set is exposed. The more copies of a data set exist, the higher the risk that the breach will occur.
Think of the data as passing through a chain of custody. Every copy of the data lives on a server or disk array, passes over network devices and through firewalls, and occasionally ends up on a workstation or laptop. Often it ends up in the cloud, in an internet hosted storage instance, and like any of the other links in the chain, is at potential risk of exposure if not properly configured.
These practices occur in nearly every industry, increasing with scale. It’s not just political campaign analytics, or telecommunications, or defense contractors– it’s everyone. Digitization is a fundamental change in the abstract concept of business itself, and its repercussions affect companies of all stripes. As such, an entire industry dedicated solely to data processing has sprung up to support the informational needs of other companies. Most companies don’t deal in data directly. They deal with cars, or finances, or healthcare, or gadgets. So, like many tertiary business requirements, they outsource data processing to a third party who ostensibly has far more capability to manage it.
But the ease of data replication and the large amount of value companies can extract from data have sped along digitization, increased behavioral reporting and other types of metrics, and pushed infrastructure to faster speeds and larger capacities whenever possible. Not as much attention was given to the business risk posed by these critical, centralized data stores, easily copied and distributed.
In a way, there’s a parallel between analytics vendors and cloud computing– one outsources the labor (and risk) of information, while the other does so for technology. The advantages with cloud are well-known: the overall infrastructure is superior to a small data center; large quantities of servers can be created quickly and programmatically, allowing for better process automation; cloud computing power is elastic and can be scoped closely to needs, and altered without much hassle. But as processes speed up and scope of management increases, operational gaps open in which small, but critical misconfigurations leave production data exposed to the public.
Cloud leaks are possible in any cloud environment– Amazon Web Services, Microsoft Azure, IBM Bluemix– and any internet-hosted infrastructure where sensitive enterprise information can be made public through accident, oversight, or error. This point is critical: almost all cloud devices, including those listed above, are private by default. This means that at some point, the default permissions were altered to allow public access. Cloud leaks do not result from a platform specific software vulnerability, but from processes that lack the necessary controls to guarantee a secure result.
The way in which information is handled differs from place to place, and often from person to person. There are general guidelines– for instance, if you are in a regulated industry and must follow standards such as PCI DSS, HIPAA, or FERPA. But the everyday work that ultimately determines whether a cloud leak will occur varies greatly between and within organizations.
Consider this scenario: a sysadmin is managing storage for an analytics company using Amazon’s S3 cloud service. The sysadmin needs to move some files around, so production data is copied to an unused S3 instance temporarily while the other instances are modified. Once the sysadmin has finally fixed all the buckets, the production data is in place and everything is ready to go. Except the sysadmin forgot to delete the copies of the data from the temporary S3 bucket, and that S3 bucket happens to be configured with full public access.
This might seem like human error. Someone simply forgot to do something. Happens to us all, right? Not exactly. The real problem here isn’t that a person made a mistake– that’s a guaranteed eventuality– the problem is that nothing was in place structurally to prevent the mistake, or at least catch it immediately so it could be fixed.
Cloud leaks are not the result of hackers, and they aren’t the fault of individual IT employees. They are the result of fragile business processes incapable of handling the complexity and scale of cloud operations and relying on luck to make up the difference. Security through obscurity is dead. For example, the idea that a cloud instance is secure because nobody else knows the URL is both untrue and a product of wrong thinking. When information is as valuable as it is today and global enterprise operations are fragile enough to just open the data to anonymous internet browsers, techniques will and have been developed to exploit the situation. This is why resilience must be built into the procedural work that creates, manages, and maintains information technology.
Misconfigurations are an internal problem that emanate from within the IT infrastructure of any enterprise; no hacker is necessary for massive damage to occur to digital systems and stored data. And the problem is pervasive, with Gartner estimating anywhere from 70% to 99% of data breaches result not from external, concerted attacks, but from internal misconfiguration of the affected IT systems.