On March 4th 2026, the UpGuard Research Team discovered a publicly-accessible Elastic database containing nearly a terabyte of threat monitoring intelligence gathered primarily from the dark web and Telegram. In many ways this data collection was similar to the kind of intelligence used by companies like UpGuard to track actors on the dark web, with one important twist. The database was located in China, with content annotations significant to China’s interests like “China-related,” “counter-revolutionary speech,” and “ethnic Chinese / overseas Chinese."
This exposure of dark web monitoring data illuminates an interesting slice of the cyber threat landscape. On one hand, Chinese state-affiliated hackers are responsible for some of the most concerning campaigns in recent history, compromising critical telecommunications infrastructure in the U.S. and elsewhere. On the other, China’s offensive capabilities do not exempt them from defending against the wide world of ungoverned cyber criminals. Despite obvious differences between international rivals, all sides seem to have converged on fundamentally similar technology for addressing the problem of the dark web.
State cyber activity
China often makes American headlines for offensive cyberwarfare, showcasing a cutting edge cyberattack capability. Over the last year, two key changes have altered the traditional methods of state surveillance and espionage, particularly in regards to China and the United States. First, while data exfiltration still serves a useful purpose for intel, a new primary motivation for state cyberattacks has arisen: pre-positioning for potential future conflicts. By setting up dormant systems that can be brought online to disrupt telecommunications or other critical infrastructure, nations seek to give themselves an advantage in any future scenario. Secondly, AI-orchestrated attacks have now been confirmed and are swiftly evolving. The capabilities and speed of AI agents are likely to overwhelm any defenses that have not been equally modernized.
Two campaigns revealed the extent of China’s operations against the US, SaltTyphoon and VoltTyphoon. SaltTyphoon infiltrated major U.S. and global telecom providers, gathering metadata on calls and texts from political figures, allegedly including staff members of the Kamala Harris campaign and the phones of Donald Trump and JD Vance. Meanwhile, VoltTyphoon dug its way into energy, transportation and water infrastructure, with the aim to disrupt these services should war break out.
China’s legal system shifted as well, with China declaring the right to penalize overseas individuals for cyber crime and significantly enlarged fines for data breaches or failure to defend critical systems. They also instituted a nationwide mandate that all software vulnerabilities discovered by Chinese researchers be reported to the state within 48 hours. The last of these not only allows China to quickly protect itself against these vulnerabilities, but to attack its adversaries by using them.
Despite increased exploitation of web-based vulnerabilities and the ability of off-the-shelf AI agents to find them faster than ever, cyber defense programs still need to defend against other attack vectors like the abuse of compromised account credentials. To do so, they need to go where the passwords are being sold: the confederation of groups meeting on the deep and dark web.
How to build a threat intel database
The schema of the exposed Elastic database provides a useful guide for the basics of cyber threat intelligence. First, you identify the sources of data you want to monitor like Telegram, TOR, clear web hacker forums, and social media sites. Second, you develop a list of specific channels, sites, and groups on each of those platforms. Third, by scraping data from those sources, you recursively discover new Telegram channels, TOR sites, and users to monitor. Finally, you collect the illicit data being traded. In the end, you get something like this:

Clear web
The clear web–the internet as we normally interact with it–provides certain kinds of intelligence useful to the understanding of the data leak economy. This includes news articles about known leaks or data brokers, posts on the many hacker-related forums, public code repositories and social media monitoring.
In this database, 319 individuals in the key_targets index had first and last names rather than account usernames. Those 319 people were largely mainstream journalists reporting on cybersecurity and data breaches, with about half at the New York Times alone. These targets for collecting intelligence also included right-wing conspiracy site NaturalNews, whose contributors were tagged with an assortment of labels including “China-related content,” “politically-related,” “counter-revolutionary speech,” and “political rumors/disinformation.”
Social media / Facebook
Popular social media sites can be useful for threat intelligence, though open illegal activity is unlikely. Companies typically monitor social media for insider threats, where employees may express intent of illicit activity, and for brand impersonation. However, social media can also provide an entry point for groups to advertise their existence in a place that is easy to find, then divert customers to Telegram or other private channels.
This database included indices fb_groups and fb_posts, tracking groups on Facebook and scraping their posts to index the content for search.

These Facebook groups do not advertise obviously illegal goods. But they can advertise Telegram channels that provide progressive disclosure of where to find them for those who are interested.
Telegram
Telegram has become a unique locus of dark web related activity. In August of 2024, Telegram’s owner, Pavel Durov, was arrested in France and faced 12 criminal counts, including “complicity in managing an online platform for illegal transactions, refusal to cooperate with law enforcement, and providing unauthorized cryptology services.”
The TOR network has a technical bar of entry that prevents the marketplaces there from having a wide reach. Telegram, on the other hand, allows easy semi-anonymous access to dark web resources, notably Infostealer logs, for people who otherwise would not have them. With Telegram, stolen data can be advertised to tens of thousands of people in a single channel. More importantly, AI agents and other bots can automate crypto transactions, removing the need for human interaction in the Telegram channel for the transaction to take place.
The exposed threat intel dataset corroborates this process. The index tg_groups has 160k entries for tracking Telegram groups, resulting in 732k objects in tg_users and 68MM in tg_posts.
The Onion Router (TOR)
Websites on TOR are not indexed by Google and do not have user-friendly URLs, making discovery difficult for non-technical users. However, once site addresses are known, these sites can be scraped and the content indexed for searching. These pages include markets dedicated to illegal or “gray area” transactions, and forums, where both transactions and discussions about them occur.
In this case, thousands of TOR websites were being monitored, with most results coming from incarnations of the infamous Breach Forums. From those sites and the aforementioned Telegram channels, the threat intel engine scraped records of users sharing data dumps, which were then parsed into the one billion individual records in the data_breach_targets_info collection.
Analysis of leaked data
After collecting all that data, a threat intelligence platform will enrich it with relevant classifications. In this case, those additional annotations are where this generically available data is tailored for the particular use of China’s cyber defense.
Key targets
An index named key_targets contained information about 37,982 unique sources of intelligence, most of the users on the deep and dark web. Each entry contained the following information about the target itself:
- Entity Profile - The username of the target
- User ID - A unique identifier for this target
- Network - Location of actor, commonly “Tor (Dark Web)”
- Status - Active or Inactive, describing whether the target is currently operating
- Crawl time - Timestamp of when the information for this user was last gathered
- Language - The primary language of this target, often Chinese
- User Type - Describes the relationship the target has to China. Due to being key targets, all records in the table contained the values: “China-related Account (涉华账号), China-related Seller (涉华卖家)”
- Business Type - Describes the general niche of the data broker, for example “US-related (涉美), Leaked Data (泄露数据), Politically Sensitive (涉政)”
Additional fields categorized the types of personal information leaked.
Another set of fields classified the content by industry and content type.
Data breach events and target Info
The two largest indices on the server were titled data_breach_target_info (241GB, 1 billion records) and data_breach_target_info_recent (21GB, 122 million records). These indices contained records for individuals involved in data breach incidents. Fields in the records included:
- Target Breach - The name of the breach event in which the account was compromised, an example was the “ShareThis” breach of 2019.
- File Name - The filename of the leaked dataset, for example ShareThis_BF.7z.
- Original Date of Event: The date of the data breach.
- Email Address - The email address of the compromised account.
- Source - Where the information was obtained. In addition to records collected from the dark web, many of the records appeared to be aggregated from Have I Been Pwned, a well-known resource for checking compromised email addresses

Another index titled data_breach_events contained almost a million and a half records (2.6GB) about reported data breach events, with fields indicating where the data came from (eg. TOR, the clear web, Telegram) and details about the post and data broker, linking to the other relevant data in other indices on the server.
Goods
The goods index (600MB, 1 million records) tracked active listings for data on the dark web. This allowed monitoring of what was actually being bought and sold in TOR marketplaces. Each record contains information about the marketplace in which the transaction took place, for example, FreeCitY Market, a known Chinese-language dark web marketplace and Teahorse (茶马古道), another prominent Chinese dark web market, often used for trading data and services.
The records then describe the transaction by categorizing the data into website passwords, regional data leaks and “grey services” (such as the sale of social media accounts that appear realistic enough to bypass fraud protection measures.) The price of the data being sold, along with any relevant cryptocurrency addresses are also present, as well as the seller’s contact information.
A separate index, goods_useful (only 975 records) separates a few significant records from this database according to criteria that flags them as especially relevant. They contain the same data fields, but appear to be actionable by the owners of this intel dataset.
Files
Finally, an index titled files (12GB, 12 million records) serves a crucial purpose: it links binary data (the actual files) to the contextual metadata (who posted it, where, and when). The majority of the records appear to have originated in Telegram groups. When a new file is posted, this system attempts to download it and records its presence. The type of file, such as movie, audio or image is listed in the record, as well as its file size and a brief description. It also contains several metadata fields to determine whether the referenced file was successfully downloaded during the scrape.
Conclusion
Offensive cyberwarfare entails utilizing the latest exploits and well-funded sophistication. Defense measures are largely reactive and can only be as effective as the understanding of what kind of offense must be protected against. These defense measures involve collecting a large amount of data, from which insights can be gleaned through machine analysis. These threat intelligence data collections themselves, like all domestic surveillance data sets, can fall into “the wrong hands.” Weaponized information, like weaponized machinery, can be pointed at anyone, depending who is behind the turret.
In this case, the same class of threats that defenders around the world must contend with are also dangers to China’s state interests and citizens. Leaks of personal data through Telegram and TOR sites do not discriminate based on national origin. Even as China and other nations jostle for position with elbows out, they all share an interest in protecting their citizens, including the right to privacy directly threatened by cyber criminals. When it comes to defending against the threats of the dark web, all parties seem to share a common technical solution.
Protect your organization
Related breaches

Student Applications: How an Education Software Company Exposed Millions of Files



.jpg)
.jpg)
