Updated on April 30, 2018 by UpGuard
The UpGuard Cyber Team’s latest discovery of a data leak, involving the exposed IT assets of a data analytics firm based in British Columbia, Canada, presents significant questions for society about how technology can be used. In this first installment of a multipart series titled “The AIQ Files,” we begin to explain the importance of the data revealed from a publicly exposed AggregateIQ repository, and how it relates to recent US political history.
Coming amidst a firestorm of scrutiny about how political operations can use and harvest consumer information, including from social media networks like Facebook, the UpGuard Cyber Risk Team can now reveal that a large code repository originating from AggregateIQ, a Canadian political data firm active in the 2016 US presidential race, was left publicly downloadable online. Revealed within this repository is a set of sophisticated applications, data management programs, advertising trackers, and information databases that collectively could be used to target and influence individuals through a variety of methods, including automated phone calls, emails, political websites, volunteer canvassing, and Facebook ads. Also exposed among these tools are numerous credentials, keys, hashes, usernames, and passwords to access other AIQ assets, including databases, social media accounts, and Amazon Web Services repositories, raising the possibility of attacks by any malicious actors encountering the exposure.
This exposure reveals how applications and web assets apparently developed by AggregateIQ (AIQ), a small firm of twenty employees based in Victoria, British Columbia, were customized for the failed 2016 presidential campaign of Senator Ted Cruz (R-TX), as well as for Texas Republican Governor Greg Abbott and a number of foreign political parties and figures. The Cruz connection raises obvious questions about the relationship of AggregateIQ to Cambridge Analytica (CA), a controversial London-based data analytics firm which was paid $5.8 million dollars by the Cruz campaign for services rendered in his unsuccessful bid for the GOP presidential nomination.
Cambridge Analytica, which is now being investigated for collecting the Facebook profiles of over fifty million users without their permission, has been reported to work closely with AggregateIQ in their efforts on behalf of clients, as will be further explained. In this first installment of “The AIQ Files,” we take a closer look at the suite of political data and microtargeting tools possessed by AggregateIQ and exposed in this data repository - in turn revealing the inner workings of the kind of influencing prowess in which Cambridge Analytica claimed expertise, to the campaigns of customers like Ted Cruz, Ben Carson, and Donald Trump.
On the night of March 20th, 2018, UpGuard Director of Cyber Risk Research Chris Vickery discovered a large data warehouse hosted on a subdomain of AIQ and using a custom version of popular code repository Gitlab, located at the web address gitlab.aggregateiq.com. Entering the URL, Gitlab prompts the user to register to see the contents - a free process which simply requires supplying an email address. Once registered, contents of the dozens of separate code repositories operated on the AggregateIQ Gitlab subdomain are entirely downloadable. Within these repositories appear to be nothing less than mechanisms capable of organizing vast quantities of data about individuals, measuring how they are being influenced or reached by advertising, and even tracking their internet browsing behavior.
Some of the projects contained in the development repository.
A partial explanation of the workings of some of the tools contained in the repository.
If the potential power of the tools exposed in this incident seem extraordinary, the actual occurrence of data exposures as a result of potential misconfiguration is all too common. The simple matter of fixing a permission setting to exclude public registrants from viewing this development repository would have been the difference between whether the code was exposed or secured. As it was left publicly downloadable, many sets of internal credentials that could have been used to launch damaging attacks were left out in the open.
Further installments of this series will detail the technical workings revealed in this leak, starting with a report detailing the presence of assets apparently designed for use by the presidential campaign of Senator Cruz. The presence of this kind of data raises an obvious question, however: what is the relationship between the failed presidential campaign of Ted Cruz, and a small data firm based in British Columbia? The answer appears to be a complicated one, but involves Cambridge Analytica and a number of tense political situations around the world.
There have already been a number of major journalistic revelations about the ties between Cambridge Analytica and AggregateIQ. But as recounted by former CA employee and current whistleblower Christopher Wylie to Observer journalist Carole Cadwalladr, claiming “AIQ wouldn’t exist without me,” it is reported to have started with a recruitment effort:
“‘When I became research director for SCL [the parent company of Cambridge Analytica] we needed to rapidly expand our technical capacity and I reached out to a lot of people I had worked with in the past.’ That included Jeff Silvester, his former boss, who lived in Wylie’s home town…[Silvester] then set up AIQ with his business partner, Zack Massingham, to work on SCL and later Cambridge Analytica projects. ‘Essentially it was set up as a Canadian entity for people who wanted to work on SCL projects who didn’t want to move to London. That’s how AIQ got started: originally to service SCL and Cambridge Analytica projects,’ said Wylie. Last March, when the Observer started asking questions about the connection between Cambridge Analytica and AIQ, the former removed ‘SCL Canada’ and Massingham’s phone number from its website and said that AIQ was a ‘former IT contractor’.”
The additional information reported by Cadwalladr that SCL Elections, the parent company of Cambridge Analytica, owns AggregateIQ’s intellectual property “in perpetuity” helps to further make clear the significance of the Gitlab subdomain’s contents. While Cambridge Analytica may operate as an independent from, distinct from AggregateIQ, the working relationship appears to be much closer - as evidenced in the tale of an app, contained in the exposed repository.
Ripon, Wisconsin is the site of the schoolhouse where, in 1854, the Republican Party was founded. It is for this historical footnote that, as Mother Jones reports, Cambridge Analytica named its “all-in-one tool that let a campaign manage its voter database, microtargeting efforts, door-to-door canvassing, low-dollar fundraising, and surveys,” as successfully pitched to the campaign of Senator Cruz in 2015.
The code for two projects in AggregateIQ’s Gitlab repository, under the names “Ripon_canvas” and “Ripon_dialer,” contain this same name.
The “libs” folder within “Ripon_canvas” begins to give some illustration as to how this actually was meant to function. Contained within these libraries are configuration files for using Ripon in a number of crucial primary and caucus states - Alaska, Alabama, Arkansas, Colorado, Georgia, Massachusetts, Minnesota, North Carolina, New Mexico, Nevada, South Carolina, Texas, Vermont, and Wyoming. Among the most interesting is the configuration file titled “config.ia.php,” likely signifying Iowa, the crucial first caucus state Ted Cruz won in 2016 while using Cambridge Analytica’s data. This configuration file, like all the others, contains an exposed Facebook app ID and secret key, as well as credentials accessing Twilio, an SMS messaging service.
A script contained within "Ripon_canvas" with text as such might be used by the Cruz campaign.
Notes on updates within "Ripon_dialer" indicating phonebanking utility.
The repository titles hint that this particular application could be for canvassing voters appears borne out by the technical contents. Users of “Ripon_canvas” can be assigned to one of three classes: “administrator,” “campaign manager,” or “volunteer,” pointing to standard field operations for a political campaign.
Other data reveals that any time a new configuration is created for a new state, four user accounts are automatically added as the new file is seeded with starter data. The last two of these four accounts - System, Admin, SCL, and AIQ - are the most intriguing. While AggregateIQ’s acronym is obvious, SCL likely refers to Cambridge Analytica’s parent company, SCL Elections.
Taken in full, it remains unclear why what resembles a version of the app Cambridge Analytica promised would be “revolutionary” for the Cruz campaign would be found in the development repository of AggregateIQ.
The story of the Ripon application, and its intersection with the presidential aspirations of Ted Cruz, Cambridge Analytica, and AggregateIQ, is but one element in a greater story. In this one data exposure, we can see how small process errors can potentially reveal vast systems of information - juggernauts, often too powerful to be managed and secured by even their gatekeepers.
The emerging societal concern for how enterprises treat our personal information - whether it is a gigantic, blue-chip tech power like Facebook, or a little-known data operation operating in small-town British Columbia - makes this an all too relevant story to recount. The revelation that one inadvertent leak can reveal implements designed to potentially influence entire electorates, and perhaps expose millions of people to the invasion of their privacy and the possibility of harm by malicious actors, tells us that the stakes are too high to get this wrong.
Misconfigurations are an internal problem that emanate from within the IT infrastructure of any enterprise; no hacker is necessary for massive damage to occur to digital systems and stored data. And the problem is pervasive, with Gartner estimating anywhere from 70% to 99% of data breaches result not from external, concerted attacks, but from internal misconfiguration of the affected IT systems.