The UpGuard Cyber Risk Team has discovered and secured a data exposure of documents appearing to describe GoDaddy infrastructure running in the Amazon AWS cloud, preventing any future exploitation of this information. The documents were left exposed in a publicly accessible Amazon S3 bucket which, according to a statement from Amazon, "was created by an AWS salesperson." GoDaddy is “the world’s largest domain name registrar,” one of the largest SSL certificate providers, and as of 2018, the largest web host by market share. The exposed documents include high-level configuration information for tens of thousands of systems and pricing options for running those systems in Amazon AWS, including the discounts offered under different scenarios.The exposed configuration information included fields for hostname, operating system, “workload” (what the system was used for), AWS region, memory and CPU specs, and more. Essentially, this data mapped a very large scale AWS cloud infrastructure deployment, with 41 different columns on individual systems, as well as summarized and modeled data on totals, averages, and other calculated fields. Also included were what appear to be GoDaddy’s discounts from Amazon AWS, usually restricted information for both parties, who must negotiate for rates– as do GoDaddy’s competitors.
With 17.5M customers, and 76M domain names, GoDaddy is a critical part of internet infrastructure, and their cloud utilization operates at one of the largest scales in existence. At the time of discovery, GoDaddy’s CSTAR risk score was 752 out of 950, while Amazon scored a 793. The UpGuard Cyber Risk Team was able to notify GoDaddy, who got the exposure closed, preventing any potential future malicious use of the exposed data.
On June 19th, 2018, an UpGuard Cyber Risk analyst discovered a publicly readable Amazon S3 bucket named abbottgodaddy. Inside were several iterations of a spreadsheet, the latest version of which was named “GDDY_cloud_master_data_1205 (AWS r10).xlsx, a 17MB Microsoft Excel file with multiple sheets and tens of thousands of rows. After determining the nature of the data, UpGuard began notification efforts on June 20th, 2018. GoDaddy responded by email on July 26th, and the UpGuard research team confirmed that the exposure had been closed on the same day.
About S3 Buckets
Amazon’s S3 storage buckets are private by default, meaning only designated users can access them. However, through misunderstanding or misconfiguration, these permissions are sometimes altered to allow public access, which means that anyone who visits the URL of the storage bucket can anonymously view any contents that aren’t explicitly locked down– no password needed. We have outlined some examples of how S3 permissions can be misconfigured to exposed data, but put simply, there are two groups that must be used with extreme care:
- All Users (Everyone) - Public anonymous access. Anyone with the name can open the bucket.
- Authenticated Users (All AWS Users) - Anyone with a (free) AWS account can access the bucket. Exposures of this kind should still be considered public exposures as obtaining an AWS account is very easy.
Whether rolling out dozens of buckets for an enterprise or establishing personal cloud storage, understanding how these public permissions work and how they are set for your resources at any given time are crucial to preventing data exposure through this vector.
Although there were several spreadsheet files in the bucket, they were in fact multiple revisions of the same sheet, with “R10” being the last revision. The most recent spreadsheet contained eight tabs:
Each sheet contained some data used for modeling and analyzing large scale infrastructure running in the Amazon cloud.
The largest sheet, named “GDDY Machine Raw Data,” lists forty-one data points for over 24,000 unique hostnames, including information positioning it within the business like Hostname, Geo Unit, Business Unit, Workload, and Data Center, as well as information describing the configuration of the machine like "Total vCPU (AWS)", "Total Memory (AWS)", "# CPUs (Provisioned)", "# Cores (Provisioned)", "Total vCPU (Provisioned)", " vCPU (Required per Instance)", " vCPU (Total Required)", AVG CPU utilization (%), Peak CPU utilization (%), "Memory (GB) (Provsioned)", "Memory (GB) (Required per instance)", "Memory (GB) (Total Required)", AVG memory utilization (%), Peak memory utilization (%), and "Storage (GB). In addition to the thousands of rows with unique hostnames, a small number of other rows appear to summarize those same data points for collections of multiple machines.
The other sheets broke that information out for legibility and consumption.
Other sheets then translated the technical usage into financials.
Others provided top level summaries.
There are two main vectors by which this data could have been exploited: using the configuration data of the GoDaddy servers as a “map” which would allow malicious actors to select targets based on their role, probable data, size, and region, and using the business data as a competitive advantage for cloud hosting strategy and pricing.
System Configuration Information
The system configuration data offers a potential attacker information about GoDaddy operations. Similar “casing” info is often sought through social engineering and internet-research to make other attacks as effective and efficient as possible– every data point helps to achieve that goal. The “workload” column particularly would help point attackers in the right direction, highlighting which systems serve more important functions and likely house important data.
While not directly providing credentials or exposing sensitive information stored on these servers, exposures of configuration details for digital infrastructure can provide a stepping stone to attacks that do access such information.
But hackers aren’t the only ones in the market for this kind of information. Competitors, vendors, cloud providers, and others would all be interested to know how the largest domain host in the world handles their cloud expenditures. At the scale of Amazon AWS and GoDaddy, negotiations over a percentage point or two are critical, as it can mean a difference of millions of dollars a year. However, knowing the details of GoDaddy’s AWS discounts could give others a negotiation advantage and price point that would otherwise be unknown. Furthermore, the way in which GoDaddy allocates their cloud spend is also strategic– how much compute, how much storage, split over how many regions, in how many environments– this is a blueprint for running cloud infrastructure at the largest scales.
Large Scale Consequences
While the significance of this kind of structural data would hold up for any company, the fact that it’s a company the size and importance of GoDaddy makes it even more important. One could arguably say that GoDaddy hosts a fifth of the internet. Amazon AWS is the leader in its space, claiming approximately 40% of the market for infrastructure as a service. Although the exposed information by itself could not facilitate a planned attack on their systems, such an attack could potentially disrupt global internet traffic. If the DYN DNS attack was any indication, large scale internet attacks are not only possible, but extremely effective, as certain organizations have essentially become critical points of failure for the system as a whole.
The internet is taken for granted these days as an ubiquitous service that “just works.” But just like the companies who rely on the internet for business, the companies responsible for the infrastructure that makes the internet work are subject to the risks of their technology, and although organizations of all sizes must consider data exposure in their business risk assessment, at the largest scales, misconfigurations can be both harder to find and have much more severe consequences.
Although the potential threats to exploit this kind of data require intentional malicious actors, the exposure of that data through misconfigured storage does not. From operations as large as GoDaddy and Amazon, to small and medium organizations, anyone who uses cloud technology is subject to the risk of unintentional exposure, if the operational awareness and processes aren’t there to catch and fix misconfigurations when they occur. Whether an asset in the main data center, or hosted on a third party’s system, all links in the digital supply chain must be resilient to protect the data.
Concerned about data breaches?
N.B. This piece was updated on August 9 to include a statement from Amazon AWS clarifying who was responsible for exposing the data.