Home Economics: How Life in 123 Million American Households Was Exposed Online

Updated on December 20, 2017 by Dan O'Sullivan

 

In another blow to consumer privacy, the UpGuard Cyber Risk Team can now reveal that a cloud-based data repository containing data from Alteryx, a California-based data analytics firm, was left publicly exposed, revealing massive amounts of sensitive personal information for 123 million American households. Exposed within the repository are massive data sets belonging to Alteryx partner Experian, the consumer credit reporting agency, as well as the US Census Bureau, providing data sets from both Experian and the 2010 US Census. While the Census data consists entirely of publicly accessible statistics and information, Experian’s ConsumerView marketing database, a product sold to other enterprises, contains a mix of public details and more sensitive data.Taken together, the exposed data reveals billions of personally identifying details and data points about virtually every American household.

From home addresses and contact information, to mortgage ownership and financial histories, to very specific analysis of purchasing behavior, the exposed data constitutes a remarkably invasive glimpse into the lives of American consumers. While, in the words of Experian, “protecting consumers is our top priority,” the accumulation of this data in “compliance with legal guidelines,” only to then see it left downloadable on the public internet, exposes affected consumers to large-scale misuse of their information - whether through spamming and unwanted direct marketing, organized fraud techniques like “phantom debt collection,” or through the use of personal details for identity theft and security verification.

While many consumers will likely be troubled by the ability of private corporations to legally collect and sell this data, ranging from publicly available information to sensitive financial details, this exposure highlights a number of growing forms of cyber risk with systemic implications. The continuing concentration of data by a number of large enterprises, now wielding powerful technology of the sort provided by Alteryx, has not been accompanied by greater prudence and process improvement necessary to ensure that the data will remain securely stored. The result has been, in the same way warming waters increase the power of hurricanes, that data exposures such as this are capable of exposing the vast majority of American households to compromise with one error.

Finally, this incident reveals just how thoroughly third-party vendor risk is corroding the integrity of any public and private functions relying upon information technology. The exposure of massive amounts of data about many millions of American households gathered by a credit reporting agency reveals how the consequences of cyber insecurity can, in an increasingly interdependent technological environment, quickly afflict partners and expose their data as well.

The Discovery

On October 6, 2017, UpGuard Director of Cyber Risk Research Chris Vickery discovered an Amazon Web Services S3cloud storage bucket located at the subdomain “alteryxdownload” containing sensitive consumer information. While the default security setting for S3 buckets would allow only specifically authorized users to access the contents, this bucket was configured via permission settings to allow any AWS “Authenticated Users” to download its stored data. In practical terms, an AWS “authenticated user” is “any user that has an Amazon AWS account,” a base that already numbers over a million users; registration for such an account is free. Simply put, one dummy sign-up for an AWS account, using a freshly created email address, is all that was necessary to gain access to this bucket’s contents.

newt2.pngThe main file repository's contents; note the many Alteryx release versions.

Befitting the subdomain name, the bucket contains a large number of Alteryx software releases and development files for applications produced by the data firm for its analytics customers; Alteryx would later confirm ownership of the bucket after securing it. Of greater significance are two files within the repository appearing to originate from beyond Alteryx.

newt1.pngThe "ConsumerView" file in question.

The first, a 36 GB data file titled “ConsumerView_10_2013,” is stored with the extension .yxdb. This extension, an Alteryx database file format used for large data set analytics, had been seen before in a previous data exposure discovered by UpGuard: that of the personal details of 198 million American voters, compiled in a data set by a data firm used by the Republican National Committee. The “ConsumerView” file would contain a similarly vast amount of data compiled about Americans; the file contains over 123 million rows, each one signifying a different American household - a number close to competing estimates of the total number of households at the time of the file’s likely creation in 2013.

While each of the tens of millions of rows represents a different US household, the 248 columns cross-indexed compiles each household’s known or modeled personal details, preferences, and behavior across a wide array of categories. With a total of over 3.5 billion fields to be filled with such data points, the index’s incredibly detailed level of insight is, ultimately, precisely what Experian claims to offer with its ConsumerView product, as described in a 2016 marketing brochure:

“ConsumerViewSM is the largest and most comprehensive resource for traditional and digital marketing campaigns. With thousands of attributes on more than 300 million consumers and 126 million households, ConsumerView data provides a deeper understanding of your customers, resulting in more actionable insights across channels…”

While the spreadsheet uses anonymized record IDs to identify households, the other information in the fields - as well as another spreadsheet in the bucket, to be discussed shortly - are sufficiently detailed as to be not merely often identifying, but with a high degree of specificity. The “deeper understanding” advertised by Experian is evident from the 248 aforementioned category types:

HH_ZeroBasedRecordID

ADDRESS ID

FIPS STATE CODE

STATE ABBREVIATION

FIPS ZIP CODE

ZIP+4

DELIVERY POINT CODE

CARRIER ROUTE

SHORT CITY NAME

CITY NAME

HOUSE NUMBER

PRE DIRECTION

STREET NAME

STREET SUFFIX

POST DIRECTION

UNIT DESIGNATOR

UNIT DESIGNATOR NUMBER

PRIMARY ADDRESS

SECONDARY ADDRESS

ADDRESS QUALITY INDICATOR

FIPS COUNTY CODE

COUNTY NAME

LATITUDE

LONGITUDE

MATCH LEVEL FOR GEO DATA

TIME ZONE

LIVING UNIT ID

PHONE: SPECIAL USAGE PHONE

PHONE: NUMBER 2

DWELLING UNIT SIZE

DWELLING TYPE

HOMEOWNER: PROBABILITY MODEL

HOMEOWNER: COMBINED HOMEOWNER-RENTER

EST HOUSEHOLD INCOME V5

INCOME MODEL:ESTIMATED HH CODE V4

INCOME MODEL:ENHANCED EST HH CODE (SCS) V4

NCOA MOVE UPDATE CODE

NCOA MOVE UPDATE DATE

RECIPIENT RELIABILITY CODE

MAIL RESPONDER

HOME BUSINESS

LENGTH OF RESIDENCE

NUMBER OF PERSONS IN LIVING UNIT

NUMBER OF ADULTS IN LIVING UNIT

RURAL URBAN COUNTY SIZE CODE

ACTIVITY DATE

NUMBER OF CHILDREN IN LIVING UNIT

MOR BANK: UPSCALE MERCHANDISE BUYER

MOR BANK: MALE MERCHANDISE BUYER

MOR BANK: FEMALE MERCHANDISE BUYER

MOR BANK: CRAFTS-HOBBY MERCHANDISE BUYER

MOR BANK: GARDENING-FARMING BUYER

MOR BANK: BOOK BUYER

MOR BANK: COLLECT-SPECIAL FOODS BUYER

MOR BANK: GIFTS AND GADGETS BUYER

MOR BANK: GENERAL MERCHANDISE BUYER

MOR BANK: FAMILY AND GENERAL MAGAZINE

MOR BANK: FEMALE ORIENTED MAGAZINE

MOR BANK: MALE SPORTS MAGAZINE

MOR BANK: RELIGIOUS MAGAZINE

MOR BANK: GARDENING-FARMING MAGAZINE

MOR BANK: CULINARY INTERESTS MAGAZINE

MOR BANK: HEALTH AND FITNESS MAGAZINE

MOR BANK: DO-IT-YOURSELFERS

MOR BANK: NEWS AND FINANCIAL

MOR BANK: PHOTOGRAPHY

MOR BANK: OPPORTUNITY SEEKERS AND CE

MOR BANK: RELIGIOUS CONTRIBUTOR

MOR BANK: POLITICAL CONTRIBUTOR

MOR BANK: HEALTH AND INSTITUTION CONTRIBUTOR

MOR BANK: GENERAL CONTRIBUTOR

MOR BANK: MISCELLANEOUS

MOR BANK: ODDS AND ENDS

MOR BANK: DEDUPED CATEGORY HIT COUNT

MOR BANK: NON-DEDUPED CATEGORY HIT COUNT

MORTGAGE-HOME PURCHASE: HOME PURCHASE PRICE

MORTGAGE-HOME PURCHASE: HOME PURCHASE DATE

PROPERTY-REALTY: HOME YEAR BUILT

PROPERTY-REALTY: HOME LAND VALUE

PROPERTY-REALTY: PROPERTY INDICATOR

ESTIMATED CURRENT HOME VALUE

PROPERTY-REALTY: YEAR BUILT CONFIDENCE

PROPERTY-REALTY: YEAR BUILT

AUTO IN THE MARKET: NEW

AUTO IN THE MARKET: USED

AUTO IN THE MARKET: USED 0-5 VEHICLE

AUTO IN THE MARKET: USED 6-10 VEHICLE

AUTO IN THE MARKET: USED 11+ VEHICLE

BEHAVIORBANK: DONATES TO ENVIRONMENTAL CAUSES

BEHAVIORBANK: CONTRIBUTES TO CHARITIES

BEHAVIORBANK: PRESENCE OF CREDIT CARD

BEHAVIORBANK: PRESENCE OF PREMIUM CREDIT CARD

BEHAVIORBANK: INTEREST IN READING

BEHAVIORBANK: COMPUTERS-PERIPHERALS

MOSAIC HOUSEHOLD

MOSAIC ZIP4

MOSAIC GLOBAL HOUSEHOLD

MOSAIC GLOBAL ZIP4

HOUSEHOLD COMPOSITION

CORE BASED STATISTICAL AREAS (CBSA)

CORE BASED STATISTICAL AREA TYPE

PROPERTY-REALTY: HOME TOTAL VALUE

PROPERTY-REALTY: TOTAL TAX

PROPERTY-REALTY: HOME IMPROVEMENT VALUE

PROPERTY-REALTY: HOME LAND SQUARE FOOTAGE

PROPERTY-REALTY: HOME LAND FRONT FOOTAGE

PROPERTY-REALTY: HOME LAND DEPTH FOOTAGE

PROPERTY-REALTY: HOME STORIES

PROPERTY-REALTY: HOME SWIMMING POOL INDICATOR

PROPERTY-REALTY: HOME TOTAL ROOMS

PROPERTY-REALTY: HOME BUILDING SQUARE FOOTAGE

PROPERTY-REALTY: HOME BEDROOMS

PROPERTY-REALTY: HOME BATH

PROPERTY-REALTY: HOME BASE SQUARE FOOTAGE

PROPERTY-REALTY: HOME FIREPLACES

PROPERTY-REALTY: HOME FLOOR COVER INDICATOR

PROPERTY-REALTY: HOME HEAT INDICATOR

PROPERTY-REALTY: HOME AIR CONDITIONING

PROPERTY-REALTY: HOME EXTERIOR WALL TYPE

PROPERTY-REALTY: BUILDING CONSTRUCTION INDICATOR

BEHAVIORBANK: INTEREST IN GARDENING

BEHAVIORBANK: INTEREST IN AUTOMOTIVE

BEHAVIORBANK: INTEREST IN GOURMET COOKING

BEHAVIORBANK: HOME DECORATING-FURNISHING

BEHAVIORBANK: DOG ENTHUSIASTS

BEHAVIORBANK: CAT ENTHUSIASTS

BEHAVIORBANK: PET ENTHUSIAST

BEHAVIORBANK: INTEREST IN TRAVEL

BEHAVIORBANK: INTEREST IN FITNESS

BEHAVIORBANK: INTEREST IN THE OUTDOORS

BEHAVIORBANK: INTEREST IN SPORTS

BEHAVIORBANK: INVESTORS

BEHAVIORBANK: PURCHASED THROUGH THE MAIL

BEHAVIORBANK: CRUISE ENTHUSIASTS

BEHAVIORBANK: INVESTS IN MUTUAL FUNDS-ANNUITIES

BEHAVIORBANK: PURCHASE VIA PHONE

BEHAVIORBANK: INTERNET-ONLINE SUBSCRIBER

BEHAVIORBANK: PURCHASE VIA ONLINE

BEHAVIORBANK: DOMESTIC TRAVELER

BEHAVIORBANK: FOREIGN TRAVELER

MORTGAGE-HOME PURCHASE: TYPE OF PURCHASE

MORTGAGE-HOME PURCHASE: MORTGAGE AMOUNT

MORTGAGE-HOME PURCHASE: MORTGAGE LENDER NAME V2

MORTGAGE-HOME PURCHASE: MORTGAGE RATE TYPE

MORTGAGE-HOME PURCHASE: MORTGAGE TERM (IN MONTHS)

MORTGAGE-HOME PURCHASE: MORTGAGE LOAN TYPE

MORTGAGE-HOME PURCHASE: DOWN PAYMENT %

MORTGAGE-HOME PURCHASE: DEED DATE OF EQUITY LOAN

MORTGAGE-HOME PURCHASE: EQUITY AMOUNT IN THOUSANDS

MORTGAGE-HOME PURCHASE: EQUITY LENDER NAME V2

MORTGAGE-HOME PURCHASE: EQUITY RATE TYPE

MORTGAGE-HOME PURCHASE: EQUITY TERM

MORTGAGE-HOME PURCHASE: EQUITY LOAN TYPE

MORTGAGE-HOME PURCHASE: REFINANCE DEED DATE

MORTGAGE-HOME PURCHASE: REFINANCE AMOUNT

MORTGAGE-HOME PURCHASE: REFINANCE LENDER NAME V2

MORTGAGE-HOME PURCHASE: REFINANCE RATE TYPE

MORTGAGE-HOME PURCHASE: REFINANCE TERM

MORTGAGE-HOME PURCHASE: REFINANCE LOAN TYPE

INVESTMENT PROPERTY: ZIP CODE

INVESTMENT PROPERTY: FIPS 2000 STATE CODE

INVESTMENT PROPERTY: PRIMARY (HOUSE) NUMBER

INVESTMENT PROPERTY: STREET PRE-DIRECTIONAL

INVESTMENT PROPERTY: STREET NAME

INVESTMENT PROPERTY: STREET SUFFIX

INVESTMENT PROPERTY: STREET POST-DIRECTIONAL

INVESTMENT PROPERTY: SECONDARY NUMBER (E.G. APT #)

INVESTMENT PROPERTY: SECONDARY UNIT DESIGNATOR

INVESTMENT PROPERTY: CITY NAME

INVESTMENT PROPERTY: TYPE OF INVESTMENT

INVESTMENT PROPERTY: DATE OF WARRANTY DEED

INVESTMENT PROPERTY: PURCHASE AMOUNT

INVESTMENT PROPERTY: TYPE OF PURCHASE

INVESTMENT PROPERTY: MORTGAGE AMOUNT

INVESTMENT PROPERTY: MORTGAGE LENDER NAME V2

INVESTMENT PROPERTY: MORTGAGE RATE TYPE

INVESTMENT PROPERTY: MORTGAGE TERM

INVESTMENT PROPERTY: MORTGAGE LOAN TYPE

INVESTMENT PROPERTY: DEED DATE OF EQUITY LOAN

INVESTMENT PROPERTY: EQUITY AMOUNT

INVESTMENT PROPERTY: EQUITY LENDER NAME V2

INVESTMENT PROPERTY: EQUITY RATE TYPE

INVESTMENT PROPERTY: EQUITY TERM

INVESTMENT PROPERTY: EQUITY LOAN TYPE

INVESTMENT PROPERTY: DEED DATE OF REFINANCE LOAN

INVESTMENT PROPERTY: REFINANCE AMOUNT IN THOUSANDS

INVESTMENT PROPERTY: REFINANCE LENDER NAME V2

INVESTMENT PROPERTY: REFINANCE RATE TYPE

INVESTMENT PROPERTY: REFINANCE TERM

INVESTMENT PROPERTY: REFINANCE LOAN TYPE

INVESTMENT PROPERTY: ADDITIONAL INVESTMENT FLAG

REALTY MODEL: EST CUR MORTGAGE AMOUNT CONFIDENCE CODE-V2

REALTY MODEL: EST CUR MORTGAGE AMOUNT-CODE V2

REALTY MODEL: EST CUR MORTGAGE AMOUNT-AMT V2

REALTY MODEL: EST CUR MONTHLY MORTGAGE PMT CONFIDENCE CODE-V2

REALTY MODEL: EST CUR MONTHLY MORTGAGE PMT-CD V2

REALTY MODEL: EST CUR MONTHLY MORTGAGE PMT-AMT V2

REALTY MODEL: EST CUR LOAN-TO-VALUE CONFIDENCE CODE-V2

REALTY MODEL: EST CUR LOAN-TO-VALUE RATIO-V2

REALTY MODEL: EST AVAILABLE EQUITY CONFIDENCE CODE-V2

REALTY MODEL: EST AVAILABLE EQUITY-CODE V2

REALTY MODEL: EST AVAILABLE EQUITY-AMT V2

CHILDREN: AGE 0-18 VERSION 3

CHILDREN: AGE 0-3 VERSION 3

CHILDREN: AGE 0-3 SCORE V3

CHILDREN: AGE 0-3 GENDER

CHILDREN: AGE 4-6 VERSION 3

CHILDREN: AGE 4-6 SCORE V3

CHILDREN: AGE 4-6 GENDER

CHILDREN: AGE 7-9 VERSION 3

CHILDREN: AGE 7-9 SCORE V3

CHILDREN: AGE 7-9 GENDER

CHILDREN: AGE 10-12 VERSION 3

CHILDREN: AGE 10-12 SCORE V3

CHILDREN: AGE 10-12 GENDER

CHILDREN: AGE 13-15 VERSION 3

CHILDREN: AGE 13-15 SCORE V3

CHILDREN: AGE 13-15 GENDER

CHILDREN: AGE 16-18 VERSION 3

CHILDREN: AGE 16-18 SCORE V3

CHILDREN: AGE 16-18 GENDER

PHONE: ACTIVITY DATE

CENSUS 2010: TRACT AND BLOCK GROUP

CAPE: AGE: POP: MEDIAN AGE

CAPE: AGE: POP: % 0-17

CAPE: AGE: POP: % 18-99+

CAPE: AGE: POP: % 65-99+

CAPE: ETHNIC: POP: % WHITE ONLY

CAPE: ETHNIC: POP: % BLACK ONLY

CAPE: ETHNIC: POP: % ASIAN ONLY

CAPE: ETHNIC: POP: % HISPANIC

CAPE: DENSITY: PERSONS PER HH FOR POP IN HH

CAPE: HHSIZE: HH: AVERAGE HOUSEHOLD SIZE

CAPE: TYP: HH: % MARRIED COUPLE FAMILY

CAPE: CHILD: HH: % WITH PERSONS LT18

CAPE: CHILD: HH: % MARR COUPLE FAMW- PERSONS LT18

CAPE: CHILD: HH: % MARR COUPLE FAMW-O PERSONS LT18

CAPE: LANG: HH: % SPANISH SPEAKING

CAPE: EDUC: POP25+: MEDIAN EDUCATION ATTAINED

CAPE: HOMVAL: OOHU: MEDIAN HOME VALUE

CAPE: HUSTR: HU: % MOBILE HOME

CAPE: BUILT: HU: MEDIAN HOUSING UNIT AGE

CAPE: TENANCY: OCCHU: % OWNER OCCUPIED

CAPE: TENANCY: OCCHU: % RENTER OCCUPIED

CAPE: EDUC: ISPSA

CAPE: EDUC: ISPSA DECILE

CAPE: INC: FAMILY INC STATE DECILE

 

CAPE: INC: HH: MEDIAN FAMILY HOUSEHOLD INCOME

 

As is evident, this data spans a wide variety of specific personal information, starting with what Experian calls “the bread and butter of marketing data,” demographics. Beyond analyzing household occupants “in terms of age, gender, education, occupation and marital status,” Experian’s promotional copy also highlights its use of mortgage and financial information, “lifestyle and interest data” from “from consumers who have completed self-reported surveys,” “financial indicators, including card usage and creditworthiness.”

newt-7.pngA number of data fields listing specific gathered data for each household, with personally identifying information redacted.

As confirmed in Experian marketing material, as well as in the exposed column names, this research delves deeper into household finances, analyzing investment behavior, car buying, and even retail purchasing histories, segmented into categories like “Book Buyer” and “Cat Enthusiast.” Census Area Projection Estimate (CAPE) data, drawn from the US Census, is also employed to “help marketers understand everything from consumer spending habits on hundreds of products to commuter and daytime populations,” while Mosiac, “a household segmentation system that classifies U.S. consumers into 19 overarching groups and 71 underlying types,” is used for a number of the categories applied to the listed households.

The use of “household” as the primary unit of measurement may seem odd, but this is in keeping with the methods used by the US Census Bureau. The Bureau’s 2010 census results are also revealed in the bucket, contained in a self-extracting .exe file. However, unlike the information contained in the Experian ConsumerView data set, the Census information available here is entirely publicly available - statistics that can be found and viewed by any interested person on the Bureau's website.

Finally, as confirmed through further research, Alteryx is a partner of both Experian and the US Census Bureau, highlighting the dangers presented by third-party vendor risk. While Experian marketing copy highlights their work “combining the data blending and advanced analytics of Alteryx with the demographic and behavioral data from Experian,” providing detailed data at the household and individual level about millions of Americans, Alteryx’s “Designer with Data” license offering comes packaged with “analytics-ready demographic, segmentation, and firmographic data from Experian, D&B, the US Census Bureau, and more.”

Alteryx’s 2012 advertisement as “the sole provider of software and analytic content used by the U.S. Census Bureau” for over a decade, “including more than 3,000 population characteristics, such as racial and ethnic information as well as family, household, and housing unit details,” further illustrates the close business relationships between all three of these exposed enterprises. Fortunately, no non-public data from the Census Bureau was exposed in this bucket.

The Significance

Taken together, this exposed data provides a highly detailed database of tens of millions of Americans’ personal, financial, and private lives. While Experian argues they “[provide] consumers with notice and choice when it comes to how their data is being used,” using “careful consideration of consumer privacy” and “values-based practices that govern the acquisition, compilation and sale of our consumer data,” these efforts are for naught if the same data is left exposed on the public-facing internet.

This exposure is a prime example of the way in which third-party vendor risk can result in sensitive data leaking from multiple entities. Given the close partnerships of Alteryx, Experian, and, to a lesser degree, the US Census Bureau, and the intermingling of data from all three across multiple internal platforms, it would only follow that the three entities would need to share large amounts of data with one another. While the Census Bureau's data is publicly available, Experian's ConsumerView information is proprietary, sold only to other enterprises; how do you ensure an external partner or vendor to whom you are entrusting your data in this way ensures it remains secure? While Experian rates 728 and the US Census Bureau 872 on the CSTAR cyber risk score, out of a maximum of 950, Alteryx, which owned the bucket, had a lower score of 692 - showing perhaps how a weaker link can be fatal throughout the chain.

This is an enormous problem facing the IT landscape today. As have been seen in many previous data exposures, most enterprises lack the ability to even assess the security postures of external vendors. Even if the primary enterprise maintains high standards of change validation and management, they are inviting risk if they cannot be sure of similarly stringent maintenance within the operations of partners handling their data. In the case of Experian in particular, this is but the latest case of a credit reporting agency finding its data exposed in a cloud leak. With the disaster of the Equifax breach still fresh for many, it is a reminder of how integral credit reporting is to the wider financial system, and how if exposed it can, like a tracing thread, reveal the entire outline of an individual or household’s financial and personal details.

Finally, the concentration of publicly and commercially-gleaned data about tens of millions of American households, and the exposure of this data to anyone with a free AWS account entering a URL, shows just how devastating an exposure can be at an enormous scale. The data exposed in this bucket would be invaluable for unscrupulous marketers, spammers, and identity thieves, for whom this data would be largely reliable and, more importantly, varied. With a large database of potential victims to survey - with such details as “mortgage ownership” revealed, a common security verification question - the price could be far higher than merely bad publicity.

Get the complete 2017 cloud leak report