In February 2021, UpGuard researchers discovered that 51% of analyzed Fortune 500 companies were leaking information in the metadata of public documents hosted on their websites. This discovery is a window into a broader overlooked cyber threat category, increasing the risk of data breaches in the tech industry - data leaks.
Data leaks (often confused with data breaches) help hackers compress the data breach attack pathway, increasing the speed, severity, and frequency of these events.
To learn some of the common causes of data leaks in the tech sector and how to address them, read on.
Since data leaks are caused by overlooked exposures, technically, each event in this list sits within the broader risk category of human error. At a high level, some examples of data leaks with a distinct human error attribution include:
Learn how to choose the best attack surface management product for the tech industry >
A cloud storage misconfiguration is an overlooked error in the setup of a cloud service that leads to unauthorized exposure of highly sensitive data, which could include personal data like social security numbers, financial data, and personally identifiable information (PII). This threat has long been recognized as a critical security risk by its common inclusion in the top 10 list of vulnerabilities in the Open Web Applications Security Project (OWASP).
These exposures are not caused by security vulnerabilities but rather by human error. The detrimental consequences of specific configuration settings are often not realized until these systems are connected to the internet and tested in the wild.
While an exposure causing a data leak could be classified as a vulnerability, it's technically incorrect to conflate the two events. The process of exploiting a software vulnerability is completely distinct from public exposures of sensitive information.
Many prestigious businesses have fallen victim to data leakage resulting from such a seemingly amateur oversight.
The potential large-scale impact of the Thomas Reuters misconfiguration highlights the significant danger of these events. Had this misconfiguration remained unaddressed, cybercriminals could have used the exposed passwords to access systems utilized by businesses working with Thomas Reuters, establishing the necessary foothold for a supply chain attack.
Learn more about supply chain attacks >
Every cloud service is vulnerable to data leak-inducing misconfigurations. Some examples of such events for popular cloud solutions are outlined below.
Setting the public access level for an Azure Storage Blob to “Container” or “Blob” allows anyone with the URL to access the contents of the Blob or Container without authentication, creating a potentially exploitable pathway to any stored sensitive data. To prevent this, always set access levels to “Private” and manage all data access with Shared Access Signatures (SAS) and Azure Active Directory.
Setting an object’s Access Control List (ACL) to “public-read” allows anyone with the URL to the object to access its contents. Setting the ACL to “public-read-write” offers the additional privilege of modifying the contents of an object. If such a URL is exposed to the public, sensitive data stored inside an object is vulnerable to compromise.
To prevent such a data leak, always set the ACL to “Private” and manage object access with Google Cloud Identity and Access Management (IAM) policies. Besides ensuring only authorized users have access to sensitive information stored in Google Cloud Storage, an IAM allows you to control the level of authorized access to each specific object.
Learn the features of an ideal risk remediation tool for the tech sector >
Just like cloud storage services, cloud software is also highly vulnerable to misconfigurations leading to data leakage. The most popular example of this risk is the Microsoft Power Apps data leak of 2021. UpGuard researchers discovered that Microsoft Powerapps had an overlooked exposure to a private database via a poorly configured API - a data leak exposing 38 million sensitive records to the public.
Learn how UpGuard detected this data leak >
Some examples of misconfigured office network services that could result in data leaks include.
File Transfer Protocol (FTP) is a commonly used protocol for transferring large files between remote computers and servers over a network. Many remote setups use FTP as a backup service which could include involving sensitive company information.
When an FTP is misconfigured, any sensitive data stored on the computer associated with the protocol is accessible to unauthorized users.
An example of misconfigurations that can lead to a vulnerable FTP service is not disabling anonymous access. This could allow anyone to access an FTP service without authentication, potentially exposing sensitive data to unauthorized users.
RSync allows Unix and Linux-like systems to transfer files between local and remote systems. When an Rsync service is misconfigured, it’s vulnerable to unauthorized access to any sensitive data stored on a remote endpoint.
Examples of misconfigurations that can lead to a vulnerable Rsync service include:
A misconfigured GIt Service creates a series of vulnerabilities offering hackers a smorgasbord of potential cyberattacks to choose from, including:
Examples of misconfigurations that can lead to a vulnerable Git service include:
GitHub, the most popular code hosting platform for developers, software engineers, and even cybersecurity experts, is commonly a source of data leaks resulting from misconfigurations - either within the GitHub product or its integrated services.
Some examples of events leading to Git Hub-related data leaks include:
When sensitive data and intellectual property stolen in cyberattacks are published on the dark web, these events are classified as data leaks. A data leak is usually the final stage of the attack lifecycle. Following a successful breach, hackers either freely post stolen data on dark web forums - as an extortion tactic in a ransomware attack - or publish it for sale in a cybercriminal marketplace.
Given the high worth of sensitive data in a cybercriminal economy, it's safe to assume that all breach data will eventually be leaked on the dark web.
The scope of data leaks extends beyond your IT borders and into your entire third-party vendor network. Because organizations and their third-party providers are now more connected than ever, each vendor is a potential attack vector to your sensitive data if they are vulnerable to data leaks.
Vendor-related data leaks are caused by the following:
An effective strategy for detecting data leaks must be multifaceted to account for the limitations of each individual solution. A suggested approach is compromised of four components:
Scanning all internet-connected devices in your ecosystem for security vulnerabilities will uncover potential data leaks these events create. For example, a scanning solution like Shodan can discover publically accessible servers vulnerable to compromise through reported exposures.
A more scalable alternative not requiring manual management is an automated attack surface scanning solution with real-time vendor security posture tracking. Such a combination allows third-party vendors with failing security performance to be readily assessed for security risks potentially leading to your exposed data.
Learn more about attack surface management >
Most misconfiguration causing data leaks are difficult to detect with scanning solutions alone. For example, leaky storage buckets exposing sensitive data to the public are not discoverable with attack scanning methods. These hidden regions of the attack surface are best discovered through penetration testing.
A regular penetration testing schedule could help you discover and address hidden exposures before they're exploited by hackers. A thorough pen test could have potentially discovered the unsecured API that led to Optus' enormous data breach in 2022.
Security assessments (also known as security questionnaires) could reveal internal and third-party vulnerabilities linked to data leaks by analyzing your threat landscape against popular cybersecurity standards.
Since data leaks could originate from a broad range of vulnerabilities, an ideal security assessment should force an organization to consider each aspect of its security posture by asking questions about:
An example of a security questionnaire covering such a broad range of controls is the CyberRisk Questionnaire available on the UpGuard platform.
A data leak detection solution is one of the best security measures for preventing ongoing compromise following a data breach event. Such a solution continuously scans common data leak hosts on the dark web, including ransomware blogs which serve as placards for increasing portions of stolen data during the extortion phase of a ransomware attack.
A data leak detection solution alone, however, could become more of an administrative headache than a valuable security control. This is because entirely automated data leak software often fails to consider the broader context of a leak, leading to false positive notifications. An Ideal data leak detection program should be a combination of an automated component - to ensure complete coverage of common data leak hosts - with a human component - to filter out false positives based on an expert understanding of each data leak context.
Learn how to reduce false positives in data leak detection >
With the exception of insider threats, which are a rarity, employees are not purposefully choosing behaviors that expose sensitive company information. The good news is that because data leaks caused by human error are not motivated by malicious intentions, they can be easily addressed with cybersecurity awareness training. Not only is this one of the easiest and most impactful methods of increasing your data breach resilience, it will also make the lives of your security teams much easier!
Human error is the primary factor of most successful data breaches. If you can teach your staff how to correctly identify cyber threats, you could protect your business from a majority of potential data breach events.
Examples of poor employee habits that lead to data leaks include:
A cybersecurity training program focused on mitigating the causes of data leaks should cover the following essential topics. Many of the listed items are supported with free resources that can be used for training content inspiration.
UpGuard’s data leak detection solution helps tech companies rapidly detect and shut down leaks across common hosts on the dark web, including ransomware blogs. With the addition of cybersecurity experts contextualizing each discovery to remove false positives, UpGuard empowers the technology industry with an accurate, efficient, and scalable data leak prevention program to complement existing cybersecurity efforts.
Watch the video below for an overview of UpGuard's data leak detection features.