A data leak is when sensitive data is accidentally exposed physically, on the Internet or any other form including lost hard drives or laptops, allowing cyber criminal can gain unauthorized access to the sensitive data without effort. When sensitive data is posted on the dark web following a cyberattack, these events are also classified as data leaks as they help expedite data breaches.

The terms data breach and data leak are often used interchangeably,but that's incorrect as they're two seperate categories of data compromise.

  1. A data breach is when sensitive data is accessed and compromised in a successful attack.
  2. A data leak is the exposure of sensitive data that could be used to make future data breaches happen faster. For example, stolen data posted in ransomware blogs are classified as data leaks as they could be used to compromise IT networks with less effort. Poor data security practices, such as software misconfigurations, also cause data leaks.

Learn the differences betweek data leaks and data breaches >

If a cybercriminal identifies a data leak, the exposed data could be used to strategize a successful cyberattack. So by detecting and remediating data leaks before they are discovered, the risk of data breaches is significantly reduced.

A common form of data leakage is called a cloud leak. A cloud leak is when a cloud data storage service, like Amazon Web Service's S3, exposes a user's sensitive data to the Internet. While AWS does secure S3 buckets by default, we believe that S3 security is flawed and most people need to check their S3 permissions

S3 is not the only culprit. Azure file share and a misconfigured GitHub repository can all prove to have poor data protection if configured poorly, causing unintended data leakage.

The worst part is once a data exposure has happened, it is extremely difficult to know whether the data was accessed. This means that your confidential data, trade secrets, source code, customer data, personal data and anything else stored on information systems could be exposed or used as part of corporate spying.

Data leaks are caused by simple errors but those whose data is exposed don't care about how the data was exposed only that it was. The breach notification requirements for data leaks are the same, as is the potential for reputational, financial, legal and regulatory damage.

Cloud services offer great advantages to on-premise but they bring new risks that could result in security breaches via data leaks.

What Do Cyber Criminals Look for in Data Leaks? 

The key thing that cyber criminals look for is personally identifiable information (PII). Personal information includes social security numbers, credit card numbers and any other personal details that could result in identity theft. Note that not all personally identifiable information (PII) is what you would traditionally think of as confidential information. Simple data like a name or mother's maiden name are targets too.

Another common target is medical or protected health information (PHI) as defined in the US HIPAA standard, "information that is created by a health care provider [and] relates to the past, present, or future physical or mental health or condition of any individual."

Customer Information

This data differs from company to company, but there are usually some common factors involved:

  • Identity information: name, address, phone number, email address, username, password
  • Activity information: order and payment history, browsing habits, usage details
  • Credit card information: card numbers, CVV codes, expiration dates, billing zip codes

Information that is specific to the company can also be exposed. This can be financials for banks and investment groups, medical records for hospitals and insurers or sensitive documents and forms for government entities.

Company Information

Customer information isn't the only thing. Corporate information can be leaked including:

  • Internal communications: memos, emails, and documents detailing company operations
  • Metrics: performance statistics, projections, and other collected data about the company
  • Strategy: messaging details, roadmaps, rolodexes and other critical business information

The exposure of this type of information can hamstring company projects, give competitors insight into business operations, and reveal internal culture and personalities. The bigger the company, the more interest there is in this type of data.

Trade Secrets

This is the most dangerous thing to be exposed in a data leak. Information that is critical to your business and its ability to compete. Trade secrets include: 

  • Plans, formulas, designs: Information about existing or upcoming products and services
  • Code and software: Proprietary technology the business sells or built for in-house use
  • Commercial methods: Market strategies and contacts

Exposure of this type of data can devalue the products and services your business provides and undo years of research.


Analytics rely on large data sets containing multiple information sources that reveal big picture trends, patterns and trajectories. As important as analytics are for many businesses, the data needed to perform the analytics can be a vector of attack if not properly secured. Analytics data includes:

  • Psychographic data: Preferences, personality attributes, demographics, messaging
  • Behavioral data: Detailed information about how someone uses a website, for example
  • Modeled data: Predicted attributes based on other information gathered

Analytics gives you a way to understand individuals as a set of data points and predict their next actions with a high degree of accuracy. This may sound abstract but this type of data can be used to sway voters and change the tide of elections by persuading at scale. Look at the Cambridge Analytica, Aggregate IQ and Facebook if you don't think this information can cause reputational damage.

Why Do Data Leaks Happen?

To understand why data leaks happen, we need to step back and understand how information is generated, manipulated and used. These days it's almost a foregone conclusion that huge sets of sensitive data exist and companies are using them. 

When we examine information security, it becomes clear that organizing a resilient process is difficult at scale. Operational gaps, process errors and poor cybersecurity awareness can lead to vulnerable assets which leads to data leaks

The benefits and risks of digital data are the same. Digital data can be reproduced cheaply and without degradation. Organizations have many copies of production data that includes customer data, trade secrets and other sensitive information. Data loss prevention (DLP) tools, warehousing, disaster recovery, development and testing environments, analytics services and the laptops your employees take home could all house copies of your and your customer's most sensitive data.

The point is many copies of data exist and the more copies of data that exist the higher the chance that something or someone could accidently expose it. 

Application Security and the Data Custody Chain

When you process data, it's effectively flowing through a chain of custody. It could be as simple as your head to your computer or as complex as flowing through multiple cloud services across multiple geographies. 

The key thing to understand is that poor application security and cybersecurity measures in any part of the chain of custody can cause a data leak. This is why third-party risk management and vendor risk management are fundamental to any business. It's no longer just defense contractors and financial services companies who need to worry about data security. It's everyone.

Digitization is fundamentally changing business and the repercussions are affecting small businesses and large multinationals alike. While you may not be in the business of data, you still generate a lot of it. Even if you're selling physical goods like cars or providing a service like healthcare, chances are you are generating, processing and even outsourcing data somewhere. 

And while your business may have security tools and malware protection, if the third-parties that are processing your data don't your data could still be exposed.

How Can Data Leaks Be Exploited?

Four common ways that data leaks are exploited are:

1. Social engineering

The most effective social engineering operations are known as phishing attacks. This is when a cyber criminals sends a targeted fake email based on known information to better impersonate an authority figure or executive. Information exposed in data leaks, especially psychographic and behavioural data, are exactly the type of data needed to sharpen social engineering attacks and give cyber criminals the ability to use information against a target they usually wouldn't know.

2. Doxxing

Personally identifiable information (PII) can be used for more than credit card fraud. Doxxing is a practice of acquiring and publishing a person's information against their will. Doxxing is performed for a variety of reasons. In cases of political extremism, vendettas, harassment or stalking, exposed PII can cause real harm to people.  

3. Surveillance and Intelligence

Psychographic data has many uses. Its very purpose is to predict and shape opinions. Political campaigns use it to win votes and businesses use it to win customers.

4. Disruption

Data leaks can be used to slow or stop business operations can exposed sensitive information to the public. Information exposed in a data leak can have drastic consequences for government, businesses and individuals. 

Why Do Data Leaks Matter?

Consider this scenario:

Your marketing team needs to move your email list from one email service provider to another and they store the data on an unused S3 bucket while they decide on the new tool. Once the tool has been decided on the contacts are uploaded to the new tool and all is well. Except the marketing team forgot to clear the S3 bucket and it happens to be configured for full public access. 

This might seem like human error and it is. The problem isn't that someone made a mistake, the problem is that nothing was in place to prevent the mistake in the first place or at least detect that it happened so it could be fixed immediately. 

You might think this isn't a big deal, it's just emails but what if it was your customer list or worse your customer's personally identifiable information (PII)? Even email addresses are a big deal and can result in irrefutable reputational damage. Resilience must be built into the procedural work that is carried out day in and day out.

The key thing to understand is that data leaks like data breaches can be exploited. Here are four common ways data leaks are exploited:

  1. Credit card fraud: Cyber criminals can exploit leaked credit card information to commit credit card fraud. 
  2. Black market sales: Once the data is exposed, it can auctioned off on the dark web. Many cyber criminals specialize in finding unsecured cloud instances and vulnerable databases that contain credit card numbers, social security numbers and other personally identifiable information (PII) to sell on for identity fraud, spam or phishing operations. It can be as simple as using search queries in Google.
  3. Extortion: Sometimes information is held over a company's head for ransom or to cause reputational damage.
  4. Degrading competitive advantages: Competitors may take advantage of data leaks. Everything from your customer lists to trade secrets give your competitors access to your resources and strategy. This could be as simple as what your marketing team is working on or complex logistical operations.

How Can Data Leaks Be Prevented?

The way information is handled will differ from industries to industries, company to company and even person to person. There are general guidelines that you must follow in you operate in a regulated industry such as PCI DSS, HIPAA or FERPA

That said, it will ultimately be up to your organization and its employers to follow prevention and protection standards on a day-to-day basis. To put it simple, most data leaks are operational problems not traditional cybersecurity problems. Data leaks aren't caused by cyber criminals but they can be exploited by them. 

The three common ways to prevent data breaches are as follows:

1. Validate Cloud Storage Configurations

As cloud storage becomes more common, the amount of data that is being moved in and out of cloud storage is increasing exponentially. Without proper process, sensitive data can be exposed in an unsecured bucket. This is why cloud storage configurations must be validated at deployment and during their time hosting sensitive data. Continuous validations minimizes the cybersecurity risk that data will be exposed and can even proactively notify you if public access occurs.

2. Automate Process Controls

At a large enough scale, validation becomes difficult to police. Computers are far better at maintaining uniformity than people. Automated process controls should act as executable documentation that ensure all cloud storage is secured and stays secure.

3. Monitor Third-Party Risk

Your vendors can expose your information as easily as you can. Even if you don't expose your customer's data, you will still be held accountable for the data leak in the eyes of your customers and often your regulators. This makes assessing third-party risk, fourth-party risk and cyber security risk assessments as important as in-house cyber security and information risk management.

Learn more about how to prevent data leaks.

Examples of Data Leaks

Three examples of data leaks that caused massive damage:

  1. Approximately 6.2 million email addresses were exposed by the Democratic Senatorial Campaign Committee in a misconfigured Amazon S3 storage bucket. The comma separated list of addresses was uploaded to the bucket in 2010 by a DSCC employee. The bucket and file name both reference “Clinton,” presumably having to do with one of Hillary Clinton’s earlier runs for Senator of New York. The list contained email addresses from major email providers, along with universities, government agencies, and the military.
  2. An UpGuard researcher discovered three publicly accessible Amazon S3 buckets related to Attunity. Of those, one contained a large collection of internal business documents. The total size is uncertain, but the researcher downloaded a sample of about a terabyte in size, including 750 gigabytes of compressed email backups. Backups of employees’ OneDrive accounts were also present and spanned the wide range of information that employees need to perform their jobs: email correspondence, system passwords, sales and marketing contact information, project specifications, and more.
  3. A cloud storage repository containing information belonging to LocalBlox, a personal and business data search service, was left publicly accessible, exposing 48 million records of detailed personal information on tens of millions of individuals, gathered and scraped from multiple sources.

Learn how to prevent data leak false positives.

Watch the video below for an overview of UpGuard's data leak detection features.

For more data leak detection and prevention guidance, refer to the following resources:

Ready to see
UpGuard in action?