Block Buster: How A Private Intelligence Platform Leaked 48 Million Personal Data Records

The UpGuard Cyber Risk Team can now confirm that a cloud storage repository containing information belonging to LocalBlox, a personal and business data search service, was left publicly accessible, exposing 48 million records of detailed personal information on tens of millions of individuals, gathered and scraped from multiple sources.

This data includes names, physical addresses, dates of birth, scraped data from LinkedIn and Facebook, Twitter handles, and more. Ashfaq Rahman, co-founder of LocalBlox, a company that bills itself as the “World's Most Comprehensive Cross Device Identity Graph on Businesses, Consumers and Geo Audiences,” has confirmed to UpGuard that the exposed information belongs to them.

In the wake of the Facebook/Cambridge Analytica debacle, the importance of massive sets of psychographic data is becoming more and more apparent. The exposed LocalBlox dataset combines standard personal information like name and address, with data about the person’s internet usage, such as their LinkedIn histories and Twitter feeds. This combination begins to build a three-dimensional picture of every individual affected— who they are, what they talk about, what they like, even what they do for a living— in essence a blueprint from which to create targeted persuasive content, like advertising or political campaigning. If the legitimate uses of the data aren’t enough to give pause, the illegitimate uses range from traditional identity theft, to fraud, to ammunition for social engineering scams such as phishing.

The Discovery

On February 18th, 2018 an Amazon Web Services S3 bucket located at the subdomain “lbdumps” was discovered by the UpGuard Cyber Risk Team, publicly downloadable and configured for access via the internet. The bucket contained one 151.3 GB compressed file, which, when decompressed, revealed a 1.2 TB ndjson (newline-delineated json) file. Metadata in a header file pointed to LocalBlox as the owner. After downloading and beginning to analyze this extremely large data file, the UpGuard Cyber Risk Team notified LocalBlox of the exposure on February 28th; the bucket was secured later that day.

The file name provides some indication of the contents: “final_people_data_2017_5_26_48m.json.” As hinted, the massive file contains 48 million records, each in json format and separated by new lines. This master list corroborates information gathered from a variety of sources about individuals. The sheer breadth of the exposed data includes such information as individuals’ names, physical addresses, dates of birth, scraped LinkedIn job histories, public Facebook data, and individuals’ Twitter handles. In addition, it appears the prominent real estate site Zillow is used in the process as well, with information being somehow blended from the service's listings into the larger data pool. The database appears to work by tracking an IP address, matching collected data to that IP address when able, and thus providing a clearer image of the behavior and background of the user at that IP address.

Also of interest are exposed source fields, providing some indication of where the scraps of data were collected from. Some are fairly unambiguous, pointing to aggregated content, purchased marketing databases, or even information caches sold by payday loan operators to businesses seeking marketing data. Other fields are more mysterious, such as a source field labeled “ex.”

Included among the data are several Facebook data points, filled from queries like this one present in the dataset. In those instances the <query> and <email> fields were populated with the person's name and email address:

"term":"[name:>http://www.facebook.com/search.php?q=<query>,, email:>http://www.facebook.com/search.php?init=s:email&q=<email>&type=users]

Some of the data points associated with these queries include pictures, skills, lastUpdated, companies, currentJob, familyAdditionalDetails, Favorites, mergedIdentities, and a field labeled allSentences which includes other text from the search results. That text includes results that suggest this information was scraped from the Facebook html rather than gathered through the API. For example, this text from one record appears to come from the Facebook page footer in 2016:

English (US) , EspaÃ±ol , FranÃ§ais (France) , ä¸.æ–‡(ç:registered:€ä½“) , Ø§Ù„Ø¹Ø±Ø¨ÙŠØ:copyright: , PortuguÃªs (Brasil) , Italiano , í•œêµ.ì–´ , Deutsch , à¤¹à¤¿à¤¨à¥.à¤¦à¥€ , æ—¥æœ¬èªž , , ","Sign UpLog InMessengerFacebook LiteMobileFind FriendsPeoplePagesPlacesGamesLocations ","CelebritiesGroupsMomentsInstagramAboutCreate AdCreate PageDevelopersCareersPrivacyCookies ","Ad ChoicesTermsHelpSettingsActivity Log ","Facebook Â:copyright: 2016 "

This data highlights the ease with which Facebook data can be scraped, and the ubiquity of Facebook information in psychographic datasets. According to their website, “LocalBlox is the First Global Customer Intelligence Platform to search, combine and validate deep business and people profiles – at scale.” The exposed data wasn’t just a customer list, but the very product LocalBlox offers. Their value statements about the power of their data provide some insight into exactly why exposing such data is extremely dangerous. According to the LocalBlox website, “The need for deeper, more accurate data about individual businesses and consumers is becoming more urgent to compete.” This data is valuable because it can be used effectively, and this efficacy can become dangerous if put to malicious use.

The Significance

Social awareness of data exposure and its consequences has grown in parallel with the scope of datasets being aggregated, stored, shipped, and copied by numerous organizations around the world. The LocalBlox dataset, 1.2 terabytes in size, contained 48 million records on a lesser or similar number of individual people. The presence of scraped data from social media sites like Facebook also highlights an important fact: all too often, data held by widely used websites can be targeted by unknown third parties seeking to monetize this information. In such cases, both a targeted website like Facebook and any affected users are being victimized, as personal information entrusted to the social network is snatched up for the benefit of a platform of which no one is aware.

More importantly, the data gathered on these people connected their identity and online behaviors and activity, all in the context of targeted marketing, i.e. how best to persuade them. It is exactly this persuasive factor that lies at the heart of discussions about how data is gathered and sold: when aggregated together at scale, your psychographic data can be used to influence you. It is what makes exposures of this nature so dangerous, and also what drives not only the business model of LocalBlox, but of the entire data analytics industry. As it says on the LocalBlox website, the “Data and Analytics Market is Booming," and this is reflected in the advertising copy the site employs.

Screenshot 2018-04-17 at 11.28.03 PM — *The LocalBlox website.*

With this kind of business interest in data harvesting, processing, and resale, it should be no wonder that so many massive and intrusive data sets exist in the world, providing companies and political parties with detailed blueprints on how to influence people.

What should be a wonder is that these datasets aren’t better secured and administered. This exposure was not the result of a clever hack, or well-planned scheme, but of a simple misconfiguration of an enterprise asset— an S3 storage bucket— which left the data open to the entire internet. The profitability gained by data must come with the responsibility of protecting its integrity and privacy. Cloud storage itself provides functionality and speed at a reasonable cost, but cloud assets require careful configuration— the thin line between private and public can be erased with the flip of a single switch. The lack of controls around common IT processes are what allow critical errors like this to slip into production, eroding the privacy of millions of people.

Protect your organization

Get in touch or book a free demo.

Contact sales

Free demo

Related breaches

Learn more about the latest issues in cybersecurity.

Own Goal: Inside the Cyber Risks of the 2026 World Cup

Free World Cup streams and black-market betting sites are leaking fan data. UpGuard research reveals the hidden cyber risks of the 2026 tournament.

Greg Pollock

June 30, 2026

Social Insecurity: Billions of Social Security Number and Passwords

UpGuard research found a trove of sensitive information in an exposed Elastic database. Getting to the bottom of what it meant led us down an interesting path.

Greg Pollock

February 18, 2026

Sixth Sense: GPS and AI Data Exposed for Assistive Devices

UpGuard can now report that it has secured an Elasticsearch database for AngelSense, a GPS tracker for children and adults with special needs.

UpGuard Team

January 30, 2025

Stolen Data: National PTA Database Available on Dark Web

On May 13th, UpGuard discovered a new set of data recently posted on a prominent dark web forum, this time allegedly belonging to the National Parent Teacher Association.

UpGuard Team

May 14, 2024

Student Applications: How an Education Software Company Exposed Millions of Files

UpGuard can now report that a public Google Cloud Storage bucket containing approximately 1.5 terabytes of data used to administer funding programs for college students has been secured. The bucket belonged to SmarterSelect, a company that provides software for managing the application process for scholarships, grants, and awards. The more than 2.8 million files included documents like transcripts, resumes, personal essays, tax returns, and invoices for approximately 1.2 million applications to funding programs.

UpGuard Team

November 22, 2021

By Design: How Default Permissions on Microsoft Power Apps Exposed Millions

38 million records were exposed in multiple data leaks resulting from misconfigured Microsoft Power Apps portals. Data included sensitive information such as COVID-19 contact tracing data, COVID-19 vaccination appointments, social security numbers for job applicants, employee IDs, and millions of names and email addresses.

UpGuard Team

August 23, 2021

View all breaches

Sign up for our newsletter

UpGuard's monthly newsletter cuts through the noise and brings you what matters most: our breaking research, in-depth analysis of emerging threats, and actionable strategic insights.

Free instant security score

How secure is your organization?

Request a free cybersecurity report to discover key risks on your website, email, network, and brand.

Instant insights you can act on immediately
Hundreds of risk factors including email security, SSL, DNS health, open ports and common vulnerabilities

Free score

Join 27,000+ cybersecurity newsletter subscribers

The Discovery

The Significance

Protect your organization

Related breaches

Own Goal: Inside the Cyber Risks of the 2026 World Cup

Social Insecurity: Billions of Social Security Number and Passwords

Sixth Sense: GPS and AI Data Exposed for Assistive Devices

Stolen Data: National PTA Database Available on Dark Web

Student Applications: How an Education Software Company Exposed Millions of Files

By Design: How Default Permissions on Microsoft Power Apps Exposed Millions

Sign up for our newsletter

Free instant security score

How secure is your organization?