Updated on April 19, 2018 by UpGuard
It seems like every day there’s a new incident of customer data exposure. Credit card and bank account numbers; medical records; personally identifiable information (PII) such as address, phone number, or SSN— just about every aspect of social interaction has an informational counterpart, and the social access this information provides to third parties gives many people the feeling that their privacy has been severely violated when it’s exposed. There are several avenues by which this exposure occurs, but the names used to describe them are often interchanged arbitrarily, muddying the waters of a large, but nuanced, problem. Breaches, hacks, leaks, attacks— these are just a few of the terms used to describe data exposure incidents. But what do they mean?
There is something of a spectrum here. For example, an internet accessible database server with a default administrative password is almost an open door, but requires an informed attempt to be used. But of these four terms, leak stands apart as the one which is not initiated by a third party. But there’s a specific kind of leak that accounts for many of the largest and most dangerous data exposures to date. We call these “cloud leaks,” because they occur when cloud storage is not appropriately partitioned from the internet at large.
A cloud leak is when sensitive business data stored in a private cloud instance is accidentally exposed to the internet. The cloud is part of the internet. The difference is that “the cloud” offers pockets of privatized space that can be used to carry out enterprise scale IT operations. Enterprise data sets, often handled by third party information analytics companies, are often stored unencrypted in the cloud, with the expectation that their data lives within one of these private pockets.
But cloud storage options like Amazon’s S3 allow users to open their storage to the internet at large. It should be stressed here that S3 buckets are private by default. This means that every cloud leak involving an Amazon S3 storage instance has had its permissions altered at some point by an admin handling the data. When these anonymous public permissions are allowed, the boundary between “the cloud” and the internet dissolves. This data then becomes accessible to anyone, the same as your favorite website.
But whether it’s an Amazon S3 bucket, an Azure file share, a misconfigured GitHub repository, or a vulnerable server set up in the cloud, the failure to guarantee the privacy of the cloud instance puts the data at risk. Once the error is made, it becomes very difficult for organizations to prove that the data was not accessed at some point. Most people wouldn’t know where to start looking, and people with enough technical knowledge might casually browse out of curiosity, but really two main groups of people are scanning for cloud leaks: security researchers and people seeking to exploit the information found for leverage, personal gain, or power.
This information, leaked from an exposed Amazon S3 bucket, included customer details such as name, address, and phone number, as well as some account PINs, used to verify identity with Verizon. Aside from the obvious privacy concerns, the exposed data could have been used maliciously to impersonate customers and even to bypass two-factor authentication.
One of the largest leaks of all time was discovered when an exposed cloud system was found, containing both collected and modeled voter data from Data Root Analytics, a firm contracted by the RNC for data driven political strategy. Over 198 million unique individuals were represented in the data set, with personal details, voter information, and modeled attributes including probable race and religion.
Another exposed cloud storage instance revealed data from government contractor Booz Allen Hamilton. In addition to data exposure, the bucket also contained encryption keys for a BAH engineer and “credentials granting administrative access to at least one data center’s operating system.” In this case, not only was information compromised, but further systems could have been as well, allowing malicious actors to hop from a publicly exposed cloud server to an authorized server on the network.
Cloud leaks are a unique risk for businesses. The simplicity of the error which causes them stands in stark contrast to the magnitude of the consequences that can result from it. Taking advantage of cloud, or employing vendors who do, offers much in the way of functional value. But without accounting for the potential problems native to cloud technology, that functionality will be undermined by an inability to trust it, which in turn will lead to an inability to trust companies who employ it. Cloud leaks are an operational problem, and must be addressed within the IT processes that govern data handling in the cloud to be effectively mitigated.
Misconfigurations are an internal problem that emanate from within the IT infrastructure of any enterprise; no hacker is necessary for massive damage to occur to digital systems and stored data. And the problem is pervasive, with Gartner estimating anywhere from 70% to 99% of data breaches result not from external, concerted attacks, but from internal misconfiguration of the affected IT systems.