On March 4th 2026, the UpGuard Research Team discovered a publicly-accessible Elastic database containing nearly a terabyte of threat monitoring intelligence gathered primarily from the dark web and Telegram. In many ways this data collection was similar to the kind of intelligence used by companies like UpGuard to track actors on the dark web, with one important twist. The database was located in China, with content annotations significant to China’s interests like “China-related,” “counter-revolutionary speech,” and “ethnic Chinese / overseas Chinese."

This exposure of dark web monitoring data illuminates an interesting slice of the cyber threat landscape. On one hand, Chinese state-affiliated hackers are responsible for some of the most concerning campaigns in recent history, compromising critical telecommunications infrastructure in the U.S. and elsewhere. On the other, China’s offensive capabilities do not exempt them from defending against the wide world of ungoverned cyber criminals. Despite obvious differences between international rivals, all sides seem to have converged on fundamentally similar technology for addressing the problem of the dark web.

State cyber activity

China often makes American headlines for offensive cyberwarfare, showcasing a cutting edge cyberattack capability. Over the last year, two key changes have altered the traditional methods of state surveillance and espionage, particularly in regards to China and the United States. First, while data exfiltration still serves a useful purpose for intel, a new primary motivation for state cyberattacks has arisen: pre-positioning for potential future conflicts. By setting up dormant systems that can be brought online to disrupt telecommunications or other critical infrastructure, nations seek to give themselves an advantage in any future scenario. Secondly, AI-orchestrated attacks have now been confirmed and are swiftly evolving. The capabilities and speed of AI agents are likely to overwhelm any defenses that have not been equally modernized.

Two campaigns revealed the extent of China’s operations against the US, SaltTyphoon and VoltTyphoon. SaltTyphoon infiltrated major U.S. and global telecom providers, gathering metadata on calls and texts from political figures, allegedly including staff members of the Kamala Harris campaign and the phones of Donald Trump and JD Vance. Meanwhile, VoltTyphoon dug its way into energy, transportation and water infrastructure, with the aim to disrupt these services should war break out.

China’s legal system shifted as well, with China declaring the right to penalize overseas individuals for cyber crime and significantly enlarged fines for data breaches or failure to defend critical systems. They also instituted a nationwide mandate that all software vulnerabilities discovered by Chinese researchers be reported to the state within 48 hours. The last of these not only allows China to quickly protect itself against these vulnerabilities, but to attack its adversaries by using them.

Despite increased exploitation of web-based vulnerabilities and the ability of off-the-shelf AI agents to find them faster than ever, cyber defense programs still need to defend against other attack vectors like the abuse of compromised account credentials. To do so, they need to go where the passwords are being sold: the confederation of groups meeting on the deep and dark web.

How to build a threat intel database

The schema of the exposed Elastic database provides a useful guide for the basics of cyber threat intelligence. First, you identify the sources of data you want to monitor like Telegram, TOR, clear web hacker forums, and social media sites. Second, you develop a list of specific channels, sites, and groups on each of those platforms. Third, by scraping data from those sources, you recursively discover new Telegram channels, TOR sites, and users to monitor. Finally, you collect the illicit data being traded. In the end, you get something like this:

*List of indices, with the names and number of records highlighted for those discussed below.*

Clear web

The clear web–the internet as we normally interact with it–provides certain kinds of intelligence useful to the understanding of the data leak economy. This includes news articles about known leaks or data brokers, posts on the many hacker-related forums, public code repositories and social media monitoring.

In this database, 319 individuals in the key_targets index had first and last names rather than account usernames. Those 319 people were largely mainstream journalists reporting on cybersecurity and data breaches, with about half at the New York Times alone. These targets for collecting intelligence also included right-wing conspiracy site NaturalNews, whose contributors were tagged with an assortment of labels including “China-related content,” “politically-related,” “counter-revolutionary speech,” and “political rumors/disinformation.”

Social media / Facebook

Popular social media sites can be useful for threat intelligence, though open illegal activity is unlikely. Companies typically monitor social media for insider threats, where employees may express intent of illicit activity, and for brand impersonation. However, social media can also provide an entry point for groups to advertise their existence in a place that is easy to find, then divert customers to Telegram or other private channels.

This database included indices fb_groups and fb_posts, tracking groups on Facebook and scraping their posts to index the content for search.

*Example of a Facebook group identified in the data set*

These Facebook groups do not advertise obviously illegal goods. But they can advertise Telegram channels that provide progressive disclosure of where to find them for those who are interested.

Telegram has become a unique locus of dark web related activity. In August of 2024, Telegram’s owner, Pavel Durov, was arrested in France and faced 12 criminal counts, including “complicity in managing an online platform for illegal transactions, refusal to cooperate with law enforcement, and providing unauthorized cryptology services.”

The TOR network has a technical bar of entry that prevents the marketplaces there from having a wide reach. Telegram, on the other hand, allows easy semi-anonymous access to dark web resources, notably Infostealer logs, for people who otherwise would not have them. With Telegram, stolen data can be advertised to tens of thousands of people in a single channel. More importantly, AI agents and other bots can automate crypto transactions, removing the need for human interaction in the Telegram channel for the transaction to take place.

The exposed threat intel dataset corroborates this process. The index tg_groups has 160k entries for tracking Telegram groups, resulting in 732k objects in tg_users and 68MM in tg_posts.

The Onion Router (TOR)

Websites on TOR are not indexed by Google and do not have user-friendly URLs, making discovery difficult for non-technical users. However, once site addresses are known, these sites can be scraped and the content indexed for searching. These pages include markets dedicated to illegal or “gray area” transactions, and forums, where both transactions and discussions about them occur.

In this case, thousands of TOR websites were being monitored, with most results coming from incarnations of the infamous Breach Forums. From those sites and the aforementioned Telegram channels, the threat intel engine scraped records of users sharing data dumps, which were then parsed into the one billion individual records in the data_breach_targets_info collection.

Analysis of leaked data

After collecting all that data, a threat intelligence platform will enrich it with relevant classifications. In this case, those additional annotations are where this generically available data is tailored for the particular use of China’s cyber defense.

Key targets

An index named key_targets contained information about 37,982 unique sources of intelligence, most of the users on the deep and dark web. Each entry contained the following information about the target itself:

Entity Profile - The username of the target
User ID - A unique identifier for this target
Network - Location of actor, commonly “Tor (Dark Web)”
Status - Active or Inactive, describing whether the target is currently operating
Crawl time - Timestamp of when the information for this user was last gathered
Language - The primary language of this target, often Chinese
User Type - Describes the relationship the target has to China. Due to being key targets, all records in the table contained the values: “China-related Account (涉华账号), China-related Seller (涉华卖家)”
Business Type - Describes the general niche of the data broker, for example “US-related (涉美), Leaked Data (泄露数据), Politically Sensitive (涉政)”

Additional fields categorized the types of personal information leaked.

Category	Data Type (Chinese)	Translation
Basic Attributes 基础属性	邮箱	Email Address
	电话	Phone Number
	工作经历	Work History
	身份信息（华裔、华侨）	Identity Information (Ethnic Chinese / Overseas Chinese)
	车产	Vehicle Ownership
	银行卡号	Bank Account / Card Numbers
	关联社交账号	Linked Social Media Accounts
	教育经历	Education History
	性取向	Sexual Orientation
	房产	Real Estate / Property
	地理空间	Geospatial Data
Behavioral Activity 行为活动	投资理财	Investment / Financial Activity
	网络活动	Online Activity
	贷款记录	Loan Records
	住宿	Accommodation / Hotel Records
	出行交通	Travel and Transportation Records
	支付	Payment Records
	快递	Delivery Records
	征信	Credit Report
	污点劣迹（黄赌毒）	Vice Records (Prostitution, Gambling, Drugs)
	保险信息	Insurance Information
	餐饮	Dining / Restaurant Records
Social Relationships 社会关系	兴趣爱好	Interests and Hobbies
	通讯录	Contacts / Address Book
	通话关系	Call Records / Communication Relationships
	同企业关系	Co-worker Relationships
	同学校关系	Schoolmate Relationships
	同家庭关系	Family Relationships
Data Type 数据类型	邮件服务	Email Service Records
	论坛	Forum Posts
	自营商店	Self-operated Shop / Store
	个人博客	Personal Blog

Another set of fields classified the content by industry and content type.

Field	Label (Chinese)	Translation
Account Type	涉华账号	China-Related Account
Account Type	涉华卖家	China-Related Seller
Content Classification	泄露数据	Leaked Data
	涉我/涉华	China-Related Content
	涉政	Government / Political
	涉美	US-Related
	涉军	Military-Related
	黑产数据	Underground / Black Market Data
	涉宗教	Religion-Related
	其他	Other
	涉恐	Terrorism-Related
	政治谣言	Political Rumors / Disinformation
	反动言论	Counter-Revolutionary Speech
Sector	信息和通信	Information and Communication
	金融和保险	Finance and Insurance
	批发和零售业；汽车和摩托车修理	Wholesale and Retail; Motor Vehicle Repair
	艺术、娱乐和文娱	Arts, Entertainment and Recreation
	公共管理与国防；强制性社会保障	Public Administration and Defense; Social Security
	教育	Education
	制造业	Manufacturing
	运输与存储	Transportation and Storage
	食宿服务	Accommodation and Food Services
	人体健康和社会工作	Human Health and Social Work
	房地产	Real Estate
	专业、科学和技术	Professional, Scientific and Technical
	建筑业	Construction
	电、煤气、蒸汽和空调供应	Electricity, Gas, Steam and Air Conditioning Supply
	农林牧渔业	Agriculture, Forestry, Animal Husbandry and Fishery
	采矿和采石	Mining and Quarrying
	供水；污水处理、废物管理和补救	Water Supply; Sewage, Waste Management and Remediation
	行政和辅助	Administrative and Support Services
	家庭作为雇主的；家庭自用、未加区分的物品生产和服务	Households as Employers; Undifferentiated Goods and Services Production

Data breach events and target Info

The two largest indices on the server were titled data_breach_target_info (241GB, 1 billion records) and data_breach_target_info_recent (21GB, 122 million records). These indices contained records for individuals involved in data breach incidents. Fields in the records included:

Target Breach - The name of the breach event in which the account was compromised, an example was the “ShareThis” breach of 2019.
File Name - The filename of the leaked dataset, for example ShareThis_BF.7z.
Original Date of Event: The date of the data breach.
Email Address - The email address of the compromised account.
Source - Where the information was obtained. In addition to records collected from the dark web, many of the records appeared to be aggregated from Have I Been Pwned, a well-known resource for checking compromised email addresses

*Example record indicating haveibeenpwned.com as source.*

Another index titled data_breach_events contained almost a million and a half records (2.6GB) about reported data breach events, with fields indicating where the data came from (eg. TOR, the clear web, Telegram) and details about the post and data broker, linking to the other relevant data in other indices on the server.

Goods

The goods index (600MB, 1 million records) tracked active listings for data on the dark web. This allowed monitoring of what was actually being bought and sold in TOR marketplaces. Each record contains information about the marketplace in which the transaction took place, for example, FreeCitY Market, a known Chinese-language dark web marketplace and Teahorse (茶马古道), another prominent Chinese dark web market, often used for trading data and services.

The records then describe the transaction by categorizing the data into website passwords, regional data leaks and “grey services” (such as the sale of social media accounts that appear realistic enough to bypass fraud protection measures.) The price of the data being sold, along with any relevant cryptocurrency addresses are also present, as well as the seller’s contact information.

A separate index, goods_useful (only 975 records) separates a few significant records from this database according to criteria that flags them as especially relevant. They contain the same data fields, but appear to be actionable by the owners of this intel dataset.

Files

Finally, an index titled files (12GB, 12 million records) serves a crucial purpose: it links binary data (the actual files) to the contextual metadata (who posted it, where, and when). The majority of the records appear to have originated in Telegram groups. When a new file is posted, this system attempts to download it and records its presence. The type of file, such as movie, audio or image is listed in the record, as well as its file size and a brief description. It also contains several metadata fields to determine whether the referenced file was successfully downloaded during the scrape.

Conclusion

Offensive cyberwarfare entails utilizing the latest exploits and well-funded sophistication. Defense measures are largely reactive and can only be as effective as the understanding of what kind of offense must be protected against. These defense measures involve collecting a large amount of data, from which insights can be gleaned through machine analysis. These threat intelligence data collections themselves, like all domestic surveillance data sets, can fall into “the wrong hands.” Weaponized information, like weaponized machinery, can be pointed at anyone, depending who is behind the turret.

In this case, the same class of threats that defenders around the world must contend with are also dangers to China’s state interests and citizens. Leaks of personal data through Telegram and TOR sites do not discriminate based on national origin. Even as China and other nations jostle for position with elbows out, they all share an interest in protecting their citizens, including the right to privacy directly threatened by cyber criminals. When it comes to defending against the threats of the dark web, all parties seem to share a common technical solution.

Protect your organization

Get in touch or book a free demo.

Contact sales

Free demo

Related breaches

Learn more about the latest issues in cybersecurity.

Own Goal: Inside the Cyber Risks of the 2026 World Cup

Free World Cup streams and black-market betting sites are leaking fan data. UpGuard research reveals the hidden cyber risks of the 2026 tournament.

Greg Pollock

June 30, 2026

Social Insecurity: Billions of Social Security Number and Passwords

UpGuard research found a trove of sensitive information in an exposed Elastic database. Getting to the bottom of what it meant led us down an interesting path.

Greg Pollock

February 18, 2026

Sixth Sense: GPS and AI Data Exposed for Assistive Devices

UpGuard can now report that it has secured an Elasticsearch database for AngelSense, a GPS tracker for children and adults with special needs.

UpGuard Team

January 30, 2025

Stolen Data: National PTA Database Available on Dark Web

On May 13th, UpGuard discovered a new set of data recently posted on a prominent dark web forum, this time allegedly belonging to the National Parent Teacher Association.

UpGuard Team

May 14, 2024

Student Applications: How an Education Software Company Exposed Millions of Files

UpGuard can now report that a public Google Cloud Storage bucket containing approximately 1.5 terabytes of data used to administer funding programs for college students has been secured. The bucket belonged to SmarterSelect, a company that provides software for managing the application process for scholarships, grants, and awards. The more than 2.8 million files included documents like transcripts, resumes, personal essays, tax returns, and invoices for approximately 1.2 million applications to funding programs.

UpGuard Team

November 22, 2021

By Design: How Default Permissions on Microsoft Power Apps Exposed Millions

38 million records were exposed in multiple data leaks resulting from misconfigured Microsoft Power Apps portals. Data included sensitive information such as COVID-19 contact tracing data, COVID-19 vaccination appointments, social security numbers for job applicants, employee IDs, and millions of names and email addresses.

UpGuard Team

August 23, 2021

View all breaches

Sign up for our newsletter

UpGuard's monthly newsletter cuts through the noise and brings you what matters most: our breaking research, in-depth analysis of emerging threats, and actionable strategic insights.

Free instant security score

How secure is your organization?

Request a free cybersecurity report to discover key risks on your website, email, network, and brand.

Instant insights you can act on immediately
Hundreds of risk factors including email security, SSL, DNS health, open ports and common vulnerabilities

Free score

Join 27,000+ cybersecurity newsletter subscribers

State cyber activity

How to build a threat intel database

Clear web

Social media / Facebook

Telegram

The Onion Router (TOR)

Analysis of leaked data

Key targets

Data breach events and target Info

Goods

Files

Conclusion

Protect your organization

Related breaches

Own Goal: Inside the Cyber Risks of the 2026 World Cup

Social Insecurity: Billions of Social Security Number and Passwords

Sixth Sense: GPS and AI Data Exposed for Assistive Devices

Stolen Data: National PTA Database Available on Dark Web

Student Applications: How an Education Software Company Exposed Millions of Files

By Design: How Default Permissions on Microsoft Power Apps Exposed Millions

Sign up for our newsletter

Free instant security score

How secure is your organization?