How to Secure Data Storage With UpGuard

Last updated by UpGuard on October 16, 2019

scroll down

Despite spending billions on cybersecurity solutions, private industry, government and enterprises alike are faced with the continued challenge of preventing data breaches. The reason cybersecurity solutions have not mitigated this problem is that the overwhelming majority of data exposure incidents are due to misconfigurations, typically by way of third-party vendors, not cutting-edge cyber attacks. These misconfigurations are the result of process errors during data handling, and often leave massive datasets completely exposed to the internet for anyone to stumble across.

UpGuard has been at the vanguard of data breach discovery, finding and analyzing exposed assets across the internet, helping organizations secure them, and informing the public. However, UpGuard does more than just hunt for breaches. We build tools to prevent data breaches from happening in the first place. Using the first hand knowledge gained by researching data breaches, UpGuard’s cyber resilience platform protects important breach vectors, from common assets to an enterprise web presence like servers and network devices, to cloud resources like Amazon S3 and GitHub. We’ll take a look at three breach incidents, how they happened, and how UpGuard can be used to help prevent them.

Amazon S3 Cloud Storage

198 Million Voter Records Exposed

The Cyber Risk research team at UpGuard has discovered many misconfigured Amazon S3 buckets containing highly sensitive information, including classified data, corporate infrastructure blueprints, and resumes and contact details for mercenaries— but perhaps none were as important as the June 2017 discovery of highly detailed voter information for 198 million Americans collected by Deep Root Analytics on behalf of the Republican National Committee.

With records on nearly every single registered American voter, this breach highlighted the risk posed to every person by poor data handling. The Amazon S3 bucket was completely open to the public, not even requiring a login to AWS or password of any kind. Here we should be careful to point out that it was not human error— an individual admin forgetting the bucket was public or not realizing the ramifications of it being public— it was process error: controls were not in place to catch the mistake and surface it for attention.

Securing S3 Storage With UpGuard

There are two groups that grant public access to data when using Amazon S3, AllUsers and AuthenticatedUsers. The first means anyone at all on the internet, as in the case of the RNC breach, while the second means anyone with a free AWS account, hardly better. UpGuard can check any number of S3 buckets and objects for these permissions and alert you if any of them are allowed. This continuous validation ensures privacy when a bucket is deployed and throughout its lifespan.

Securing Amazon S3 with UpGuard

UpGuard provides process controls— the checks and balances necessary to mitigate cyber risk. Amazon S3 buckets are private by default. This means a change was made at some point that was either unintentional or misunderstood, and never caught by anyone who knew how valuable the data stored there actually was. UpGuard finds and visualizes critical misconfigurations automatically, so there’s no more guesswork about whether important data has been secured. Learn more about what UpGuard can do for AWS and S3 here.


Electrical Engineering Contractor Leaks Schematics, Passwords

Cloud storage isn’t the only way that extremely sensitive data leaks onto the internet. In July 2017, UpGuard’s research team discovered a data repository originating from Power Quality Engineering, a Texas based electrical engineering operator. Among the data were potential weak points and trouble spots in customer electrical systems, publicly downloadable schematics revealing the specific locations and configurations of government-operated top secret intelligence transmission zones within at least one Dell facility, and a plain text file of internal PQE passwords, potentially enabling further access to PQE company systems. Some of the affected customers were Dell, Oracle, the City of Austin, and Texas Instruments. This data was being advertised by a poorly configured rsync server that sent the data to anyone who asked for it.

Securing Rsync With UpGuard

Rsync is a relatively simple utility with a lot of power. Among the configurations it has is a directive called auth users. If this directive is absent, anyone who accesses the rsync service can receive the data, as in the case with PQE. UpGuard analyzes rsync configurations and checks for important directives, including auth users, as well as what users have access and what permissions they have. Once you establish your desired configuration, any deviations will be automatically detected and reported, ensuring that rsync servers synchronizing sensitive information are never exposed to the internet. See our full guide for secure rsync deployment in the enterprise to learn more.


Data of 57 Million Uber Users Compromised

Although Uber’s attempted cover-up has perhaps outpaced news about the data exposure itself, the origin of the massive breach affecting 57 million people was sensitive information stored in a publicly accessible GitHub repository, the second time for such an incident at Uber. GitHub is a cloud-based code repository used by developers and IT teams to store versioned code and other files collaboratively. In Uber’s case, AWS credentials stored in a public GitHub repo were found by malicious actors, who in turn used them to compromise Uber’s primary data stores. In fact, no “hack” really took place. Credentials were left exposed in GitHub for anyone to find them.

Securing GitHub With UpGuard

Securing GitHub with UpGuard

There are two problems that lead to GitHub data exposures. One is organizations allowing (or not expressly forbidding) developers storing sensitive information in a personal or ad-hoc GitHub organization. If the data is outside the scope of the organization’s control, it is always subject to exposure. The second problem is that official GitHub resources are not audited regularly for misconfigurations. UpGuard supports GitHub at the organizational and repository level so that the privacy of data stored there can be ensured. UpGuard will send proactive notifications if a GitHub resource falls out of line with expected configurations, for example a repository goes from private to public, a member is given new permissions, or an external collaborator account is added to the org. See our post about securing GitHub for more information.


As businesses and governments utilize modern internet infrastructure, either firsthand or through vendors, they increase the risk of leaking their data out into the internet. When you look at data breaches from the perspective of misconfigured assets, it becomes clear that process controls are the single most important way to prevent them from happening. Many of the most common data exposure vectors, including ransomware like WannaCry, can be managed simply with visibility into resources and controls around when they change.

Finding breaches and informing people about the dangers they pose isn’t enough. UpGuard has built solutions to help prevent these breaches from happening in the first place. This is why we support resources outside the normal scope of monitoring, like S3, rsync, and GitHub— anywhere sensitive data is stored should be protected. The continued digitization of society requires a stable, trustworthy foundation to succeed. Without process controls to mitigate operational errors and oversights, data breaches will continue unabated. But by building safeguards into everyday operations, we can build a resilient and sustainable digital ecosystem.

See how UpGuard can help you prevent data breaches

Related posts

Learn more about the latest issues in cybersecurity