Learning Curve: How Personal Data for One Million Individuals Was Exposed

Updated on April 19, 2018 by UpGuard

In an incident that calls to mind multiple data breaches in the analytics and influencing industries, the UpGuard Cyber Risk Team can now report that data relating to a number of subsidiaries of Kansas City holding company Blue Chair LLC, such as lead generation company Target Direct Marketing, was left exposed online, revealing personally identifiable information for over one million individuals seeking further information about higher education. Revealed in the repository are personal details for these million individuals, including their names, email addresses, phone numbers, and, in some cases, information such as the person’s high school graduation year and area of study. Also exposed in this leak are what appear to be backups of a set of server configurations for a large network of feeder websites designed to draw consumers toward the for-profit education application process.

This exposure highlights how easily cyber risk can be accumulated by enterprises operating today. With personal information gathered about individuals for years, at scale, data leaks increasingly have the power to expose huge numbers of consumers in one fell swoop. This exposure of consumer information gathered as long ago as 2005 from a publicly available server speaks to how simple misconfigurations and errors can result in a larger leak. Revealed in this incident are the inner workings of a sophisticated digital marketing operation aimed at consumers, including potential college applicants. If found by any malicious actors, this exposed data could have provided ample ammunition for social engineering and phishing operations.

The Discovery

On February 26th, 2018, UpGuard Director of Cyber Risk Research Chris Vickery notified Gragg Advertising that he had discovered a data leak that referenced the company. While the breach outreach process can sometimes be difficult, in a laudable and highly positive reaction, Gragg responded swiftly, with the publicly accessible service apparently secured within an hour.

The exposed rsync utility was publicly accessible through port 873, which is typically used for the service. Rsync, or “remote synchronization,” is used to remotely backup data, enabling users to easily copy information from one machine to another. While rsync ports can be easily secured to prevent public access to sensitive, privileged data using rsync’s “hosts allow/deny” functions, such measures can often be missed - causing a type of leak the Cyber Risk Team has before.

The publicly available repository revealed here contains two partitioned areas, titled “mysqlbak” and “www.” Each contains revealing information. The data stored under “mysqlbak” consists of folders created on sequential days between January 31st, 2018 and February 19th, 2018. These incremental folders each contain many MySQL database backups, with a total of 131 tables stored in each. Decompressed, the total size of this backup database is around 5 GB, and contains a vast amount of information within a database schema titled “Graggnet.”

The contents of the exposed tables contain personally identifiable information about just over a million individuals. The table “peg_historical” lists these consumers, alongside their names, home addresses, home and mobile phone numbers, email addresses, and, tellingly, some information about their educational backgrounds. Details such as the consumer’s high school graduation year, education level, status as a marketing lead, area of major studies, and how they were referred are completed where the information is available.


The database tables, with "peg_historical" highlighted, under the name "Graggnet" at the top.

Screenshot 2018-04-10 at 7.47.40 PM

The "peg_historical" table's contents, revealing individual data.

The other area of the exposed server, named “www,” contains forty folders and applications titled for a number of websites, with such names as “bullseyeleadsystem.com” (cyber risk score 171/950), “education-for-careers.com” (cyber risk score 171/950),“collegesearchengine.net” (cyber risk score 114/950), and a number of differently formatted variations on these addresses. As indicated by their names, these websites appear designed to appeal to individual consumers browsing the internet for resources about higher education.


The list of folders in the "www" section.

Gregory Gragg, the CEO of Blue Chair LLC, “a holding company that focuses on mergers and acquisitions,” has stated that this leak originated from Target Direct Marketing, which is listed as a Blue Chair subsidiary on its official website. He has also described the data as consisting of “old leads.”

There are indeed some indications of Target Direct Marketing being exposed in this incident. Thermo Interactive (cyber risk score 352/950), an online lead generation firm mentioned multiple times in the “www” configurations area, is also the name of a 2014 acquisition of Target Direct Marketing, which then became Target Direct Marketing, LLC.

Some of the exposed data suggests other properties belonging to Blue Chair LLC had information leaked in this incident as well. There are numerous indications that some of the websites mentioned in the exposed “www” dataset are operated as arms of Blue Chair subsidiaries like Gragg Advertising. For example, while the website “collegesearchengine.net” is “an affiliate marketing tool established in fall 2010 used to generate inquiries for colleges and schools...created by Gragg Advertising,” public domain ownership queries reveal both bullseyeleadsystem.com and education-for-careers.com are registered by Gragg Advertising.

The Significance

Personally identifiable information can be easily accumulated at scale using modern technology - and with one misconfiguration, it can all spill out onto the public internet. As seen in previous UpGuard Cyber Risk Team Reports on French influencer marketing firm Octoly, California analytics company Alteryx, and numerous data firms contracted by the Republican National Committee, the level of individual detail acquired about people in modern marketing operations means that breaches can have a far-reaching and invasive impact. The continued storage of personal information originating from as long ago as 2005 speaks to the importance of data retention and deletion policies in the modern data center, lest an exposure reveal information that should not have even been kept anyway.

Across most of the digital landscape today, consumers cannot truly “opt in” to where their personal information is stored - a dilemma which emerging standards, such as those of the European Union’s forthcoming General Data Privacy Regulation, attempt to explicitly address. In a world in which third-party vendors are contracted for everything from payment processing to lead generation, data is shared and shifted around rapidly, with consumers lacking much say of where their trusted information is stored or how it is secured. In this case, the presence of dozens of websites provides some indication of how third-party vendor risk can implicate many more links in the business toolchain. If an enterprise does not know where its privileged information resides, it will be unable to secure it.

In the hands of malicious actors, this exposed data could easily have been used for social engineering or phishing attacks, in which bad guys pose as educational companies soliciting further information from exposed consumers, either via phone or email. Enterprises must ensure they are adhering to internal processes capable of avoiding such damaging gaps in their armor; cyber risk scoring of Gragg Advertising’s aforementioned lead generation websites, such as collegesearchengine.net and education-for-careers.com, reveal poor scores, at 114 and 171, respectively, out of a maximum of 950. These low scores are indicative of a heightened susceptibility to cyber incidents. In order to effect real cyber resilience, firms must learn from these errors or face exposures like this one.