Shared Enemy: Inside a Chinese Dark Web Monitoring Database

Greg Pollock
Greg Pollock
Published Apr 01, 2026

On March 4th 2026, the UpGuard Research Team discovered a publicly-accessible Elastic database containing nearly a terabyte of threat monitoring intelligence gathered primarily from the dark web and Telegram. In many ways this data collection was similar to the kind of intelligence used by companies like UpGuard to track actors on the dark web, with one important twist. The database was located in China, with content annotations significant to China’s interests like “China-related,” “counter-revolutionary speech,” and “ethnic Chinese / overseas Chinese."

This exposure of dark web monitoring data illuminates an interesting slice of the cyber threat landscape. On one hand, Chinese state-affiliated hackers are responsible for some of the most concerning campaigns in recent history, compromising critical telecommunications infrastructure in the U.S. and elsewhere. On the other, China’s offensive capabilities do not exempt them from defending against the wide world of ungoverned cyber criminals. Despite obvious differences between international rivals, all sides seem to have converged on fundamentally similar technology for addressing the problem of the dark web.

State cyber activity

China often makes American headlines for offensive cyberwarfare, showcasing a cutting edge cyberattack capability. Over the last year, two key changes have altered the traditional methods of state surveillance and espionage, particularly in regards to China and the United States. First, while data exfiltration still serves a useful purpose for intel, a new primary motivation for state cyberattacks has arisen: pre-positioning for potential future conflicts. By setting up dormant systems that can be brought online to disrupt telecommunications or other critical infrastructure, nations seek to give themselves an advantage in any future scenario. Secondly, AI-orchestrated attacks have now been confirmed and are swiftly evolving. The capabilities and speed of AI agents are likely to overwhelm any defenses that have not been equally modernized.

Two campaigns revealed the extent of China’s operations against the US, SaltTyphoon and VoltTyphoon. SaltTyphoon infiltrated major U.S. and global telecom providers, gathering metadata on calls and texts from political figures, allegedly including staff members of the Kamala Harris campaign and the phones of Donald Trump and JD Vance. Meanwhile, VoltTyphoon dug its way into energy, transportation and water infrastructure, with the aim to disrupt these services should war break out.

China’s legal system shifted as well, with China declaring the right to penalize overseas individuals for cyber crime and significantly enlarged fines for data breaches or failure to defend critical systems. They also instituted a nationwide mandate that all software vulnerabilities discovered by Chinese researchers be reported to the state within 48 hours. The last of these not only allows China to quickly protect itself against these vulnerabilities, but to attack its adversaries by using them. 

Despite increased exploitation of web-based vulnerabilities and the ability of off-the-shelf AI agents to find them faster than ever, cyber defense programs still need to defend against other attack vectors like the abuse of compromised account credentials. To do so, they need to go where the passwords are being sold: the confederation of groups meeting on the deep and dark web. 

How to build a threat intel database

The schema of the exposed Elastic database provides a useful guide for the basics of cyber threat intelligence. First, you identify the sources of data you want to monitor like Telegram, TOR, clear web hacker forums, and social media sites. Second, you develop a list of specific channels, sites, and groups on each of those platforms. Third, by scraping data from those sources, you recursively discover new Telegram channels, TOR sites, and users to monitor. Finally, you collect the illicit data being traded. In the end, you get something like this:

List of indices, with the names and number of records highlighted for those discussed below.

Clear web

The clear web–the internet as we normally interact with it–provides certain kinds of intelligence useful to the understanding of the data leak economy. This includes news articles about known leaks or data brokers, posts on the many hacker-related forums, public code repositories and social media monitoring. 

In this database, 319 individuals in the key_targets index had first and last names rather than account usernames. Those 319 people were largely mainstream journalists reporting on cybersecurity and data breaches, with about half at the New York Times alone. These targets for collecting intelligence also included right-wing conspiracy site NaturalNews, whose contributors were tagged with an assortment of labels including “China-related content,” “politically-related,” “counter-revolutionary speech,” and “political rumors/disinformation.” 

Social media / Facebook

Popular social media sites can be useful for threat intelligence, though open illegal activity is unlikely. Companies typically monitor social media for insider threats, where employees may express intent of illicit activity, and for brand impersonation. However, social media can also provide an entry point for groups to advertise their existence in a place that is easy to find, then divert customers to Telegram or other private channels. 

This database included indices fb_groups and fb_posts, tracking groups on Facebook and scraping their posts to index the content for search. 

Example of a Facebook group identified in the data set

These Facebook groups do not advertise obviously illegal goods. But they can advertise Telegram channels that provide progressive disclosure of where to find them for those who are interested. 

Telegram

Telegram has become a unique locus of dark web related activity. In August of 2024, Telegram’s owner, Pavel Durov, was arrested in France and faced 12 criminal counts, including “complicity in managing an online platform for illegal transactions, refusal to cooperate with law enforcement, and providing unauthorized cryptology services.” 

The TOR network has a technical bar of entry that prevents the marketplaces there from having a wide reach. Telegram, on the other hand, allows easy semi-anonymous access to dark web resources, notably Infostealer logs, for people who otherwise would not have them. With Telegram, stolen data can be advertised to tens of thousands of people in a single channel. More importantly, AI agents and other bots can automate crypto transactions, removing the need for human interaction in the Telegram channel for the transaction to take place. 

The exposed threat intel dataset corroborates this process. The index tg_groups has 160k entries for tracking Telegram groups, resulting in 732k objects in tg_users and 68MM in tg_posts

The Onion Router (TOR)

Websites on TOR are not indexed by Google and do not have user-friendly URLs, making discovery difficult for non-technical users. However, once site addresses are known, these sites can be scraped and the content indexed for searching. These pages include markets dedicated to illegal or “gray area” transactions, and forums, where both transactions and discussions about them occur.

In this case, thousands of TOR websites were being monitored, with most results coming from incarnations of the infamous Breach Forums. From those sites and the aforementioned Telegram channels, the threat intel engine scraped records of users sharing data dumps, which were then parsed into the one billion individual records in the data_breach_targets_info collection. 

Analysis of leaked data

After collecting all that data, a threat intelligence platform will enrich it with relevant classifications. In this case, those additional annotations are where this generically available data is tailored for the particular use of China’s cyber defense. 

Key targets

An index named key_targets contained information about 37,982 unique sources of intelligence, most of the users on the deep and dark web. Each entry contained the following information about the target itself:

  • Entity Profile - The username of the target
  • User ID - A unique identifier for this target
  • Network - Location of actor, commonly “Tor (Dark Web)”
  • Status - Active or Inactive, describing whether the target is currently operating
  • Crawl time - Timestamp of when the information for this user was last gathered
  • Language - The primary language of this target, often Chinese
  • User Type - Describes the relationship the target has to China. Due to being key targets, all records in the table contained the values: “China-related Account (涉华账号), China-related Seller (涉华卖家)”
  • Business Type - Describes the general niche of the data broker, for example “US-related (涉美), Leaked Data (泄露数据), Politically Sensitive (涉政)”

Additional fields categorized the types of personal information leaked.

CategoryData Type (Chinese)Translation
Basic Attributes
基础属性
邮箱Email Address
电话Phone Number
工作经历Work History
身份信息(华裔、华侨)Identity Information (Ethnic Chinese / Overseas Chinese)
车产Vehicle Ownership
银行卡号Bank Account / Card Numbers
关联社交账号Linked Social Media Accounts
教育经历Education History
性取向Sexual Orientation
房产Real Estate / Property
地理空间Geospatial Data
Behavioral Activity
行为活动
投资理财Investment / Financial Activity
网络活动Online Activity
贷款记录Loan Records
住宿Accommodation / Hotel Records
出行交通Travel and Transportation Records
支付Payment Records
快递Delivery Records
征信Credit Report
污点劣迹(黄赌毒)Vice Records (Prostitution, Gambling, Drugs)
保险信息Insurance Information
餐饮Dining / Restaurant Records
Social Relationships
社会关系
兴趣爱好Interests and Hobbies
通讯录Contacts / Address Book
通话关系Call Records / Communication Relationships
同企业关系Co-worker Relationships
同学校关系Schoolmate Relationships
同家庭关系Family Relationships
Data Type
数据类型
邮件服务Email Service Records
论坛Forum Posts
自营商店Self-operated Shop / Store
个人博客Personal Blog

Another set of fields classified the content by industry and content type.

FieldLabel (Chinese)Translation
Account Type涉华账号China-Related Account
涉华卖家China-Related Seller
Content Classification泄露数据Leaked Data
涉我/涉华China-Related Content
涉政Government / Political
涉美US-Related
涉军Military-Related
黑产数据Underground / Black Market Data
涉宗教Religion-Related
其他Other
涉恐Terrorism-Related
政治谣言Political Rumors / Disinformation
反动言论Counter-Revolutionary Speech
Sector信息和通信Information and Communication
金融和保险Finance and Insurance
批发和零售业;汽车和摩托车修理Wholesale and Retail; Motor Vehicle Repair
艺术、娱乐和文娱Arts, Entertainment and Recreation
公共管理与国防;强制性社会保障Public Administration and Defense; Social Security
教育Education
制造业Manufacturing
运输与存储Transportation and Storage
食宿服务Accommodation and Food Services
人体健康和社会工作Human Health and Social Work
房地产Real Estate
专业、科学和技术Professional, Scientific and Technical
建筑业Construction
电、煤气、蒸汽和空调供应Electricity, Gas, Steam and Air Conditioning Supply
农林牧渔业Agriculture, Forestry, Animal Husbandry and Fishery
采矿和采石Mining and Quarrying
供水;污水处理、废物管理和补救Water Supply; Sewage, Waste Management and Remediation
行政和辅助Administrative and Support Services
家庭作为雇主的;家庭自用、未加区分的物品生产和服务Households as Employers; Undifferentiated Goods and Services Production

Data breach events and target Info

The two largest indices on the server were titled data_breach_target_info (241GB, 1 billion records) and data_breach_target_info_recent (21GB, 122 million records). These indices contained records for individuals involved in data breach incidents. Fields in the records included:

  • Target Breach - The name of the breach event in which the account was compromised, an example was the “ShareThis” breach of 2019.
  • File Name - The filename of the leaked dataset, for example ShareThis_BF.7z.
  • Original Date of Event: The date of the data breach.
  • Email Address - The email address of the compromised account.
  • Source - Where the information was obtained. In addition to records collected from the dark web, many of the records appeared to be aggregated from Have I Been Pwned, a well-known resource for checking compromised email addresses
Example record indicating haveibeenpwned.com as source.

Another index titled data_breach_events contained almost a million and a half records (2.6GB) about reported data breach events, with fields indicating where the data came from (eg. TOR, the clear web, Telegram) and details about the post and data broker, linking to the other relevant data in other indices on the server. 

Goods

The goods index (600MB, 1 million records) tracked active listings for data on the dark web. This allowed monitoring of what was actually being bought and sold in TOR marketplaces. Each record contains information about the marketplace in which the transaction took place, for example, FreeCitY Market, a known Chinese-language dark web marketplace and Teahorse (茶马古道), another prominent Chinese dark web market, often used for trading data and services.

The records then describe the transaction by categorizing the data into website passwords, regional data leaks and “grey services” (such as the sale of social media accounts that appear realistic enough to bypass fraud protection measures.) The price of the data being sold, along with any relevant cryptocurrency addresses are also present, as well as the seller’s contact information. 

A separate index, goods_useful (only 975 records) separates a few significant records from this database according to criteria that flags them as especially relevant. They contain the same data fields, but appear to be actionable by the owners of this intel dataset.

Files

Finally, an index titled files (12GB, 12 million records) serves a crucial purpose: it links binary data (the actual files) to the contextual metadata (who posted it, where, and when). The majority of the records appear to have originated in Telegram groups. When a new file is posted, this system attempts to download it and records its presence. The type of file, such as movie, audio or image is listed in the record, as well as its file size and a brief description. It also contains several metadata fields to determine whether the referenced file was successfully downloaded during the scrape.

Conclusion

Offensive cyberwarfare entails utilizing the latest exploits and well-funded sophistication. Defense measures are largely reactive and can only be as effective as the understanding of what kind of offense must be protected against. These defense measures involve collecting a large amount of data, from which insights can be gleaned through machine analysis. These threat intelligence data collections themselves, like all domestic surveillance data sets, can fall into “the wrong hands.” Weaponized information, like weaponized machinery, can be pointed at anyone, depending who is behind the turret.

In this case, the same class of threats that defenders around the world must contend with are also dangers to China’s state interests and citizens. Leaks of personal data through Telegram and TOR sites do not discriminate based on national origin. Even as China and other nations jostle for position with elbows out, they all share an interest in protecting their citizens, including the right to privacy directly threatened by cyber criminals. When it comes to defending against the threats of the dark web, all parties seem to share a common technical solution.

UpGuard customer support teamUpGuard customer support teamUpGuard customer support team

Protect your organization

Get in touch or book a free demo.

Related breaches

Learn more about the latest issues in cybersecurity.
Deliver icon

Sign up for our newsletter

UpGuard's monthly newsletter cuts through the noise and brings you what matters most: our breaking research, in-depth analysis of emerging threats, and actionable strategic insights.
Free instant security score

How secure is your organization?

Request a free cybersecurity report to discover key risks on your website, email, network, and brand.
  • Check icon
    Instant insights you can act on immediately
  • Check icon
    Hundreds of risk factors including email security, SSL, DNS health, open ports and common vulnerabilities
Website Security scan resultsWebsite Security scan rating