Open source intelligence (OSINT) is the process of identifying, harvesting, processing, analyzing, and reporting data obtained from publicly available sources for intelligence purposes.
Open source intelligence analysts use specialized methods to explore the diverse landscape of open source intelligence and pinpoint any data that meets their objectives. OSINT analysts regularly discover information that is not broadly known to be accessible to the public.
OSINT includes any offline or online information that is publicly available, whether free of cost, purchasable or obtainable by request.
Below are some examples of offline and online information used for open source intelligence.
- Diplomatic: Government, law enforcement and courts, NGOs, international agencies
- Academic: Academic research, journals, dissertations
- Corporate: Annual reports, conference proceedings, press releases, employee profiles, résumés
- Mass media: Television, radio, newspapers, magazines
- Internet Search/Database: Google, Bing, Yahoo, Wayback Machine, Whois
- Social Media Platforms: Facebook, Twitter, LinkedIn, Instagram
- Sharing & Publishing: Youtube, Flickr, Pinterest, Dailymotion
- Blogging, Forums, and Online Communities: WordPress, Medium, Reddit, 4Chan
- Deep web: The deep web consists of any non-indexed web pages (sites that are not reachable by internet search engines).
- Dark web: The dark web is only accessible through darknets. Darknets can be small peer-to-peer or friend-to-friend networks, as well as large networks like Tor and I2Ps. Many sites on the dark web host illegal content.
History of Open Source Intelligence
The origins of OSINT span much further than the introduction of digital technologies and the Internet.
OSINT became a leading intelligence discipline during the Cold War, especially for gathering intelligence on the Soviet Union and China.
Following the Cold War, significant global technological, commercial, and political developments further increased the capabilities and scope of OSNIT.
Notably, the broadening distribution of media publications, the invention of the television, and the advent of the Internet have all enhanced and enriched the intelligence community's access to open sources.
Source: Mercado, S., 2004. Sailing the Sea of OSINT in the Information Age. Studies in Intelligence, [online] 48(3), pp.44-55.
Open Source Intelligence Uses
Information security teams use OSINT for two main reasons:
Discovering Public-Facing Internal Assets
OSINT analysts use penetration testing to discover an organization's publicly available assets. Also known as ethical hacking, penetration testing involves testing a computer system, network, or web application's cybersecurity to find exploitable security vulnerabilities.
Relevant intelligence that security teams can uncover through penetration testing includes:
- Data leaks, e.g., the recent Microsoft PowerApps data leaks occurred through portals configured to allow public access to personally identifiable information (PII).
- Unpatched software, such as zero-day vulnerabilities
- Open ports and unsecured devices
- Exposed assets, such as IP addresses, networks, device names, and software versions.
Identifying External Information
Organizations must also consider external cyber threats when assessing their attack surfaces. Assessing external threats is particularly important for an organization's third-party risk management program, as third parties rise as common attack vectors.
Content on social media, including professional social networks, could appear benign on its own. Still, threat actors can launch cyber attacks by leveraging information disclosed by employees and suppliers in combination with existing vulnerabilities.
While even a simple internet search can reveal an organization's vulnerabilities, security teams also look into deeper layers of the Internet to identify external threats. For example, open source intelligence analysts access the deep and dark web to gather further intelligence, like data leaks.
For these reasons, OSINT is vital in optimizing Operations Security (OPSEC). OPSEC is the process of identifying friendly actions that could be useful for a potential attacker if properly analyzed and grouped with other data to reveal critical information or sensitive data.
OSINT reconnaissance (recon) techniques fall into one of two main categories: passive and active.
Passive recon involves gathering information about a target network or device without directly engaging with the system. OSINT analysts rely on third-party information using passive recon tools, such as Wireshark, which analyzes network traffic in real-time for Windows, Mac, Unix, and Linux systems. They piece together these different OSINT data points to find and map patterns.
Active recon directly engages with the target system, offering more accurate and timely information. OSINT analysts use active recon tools like Nmap, a network discovery tool that provides a granular view of a network's security.
Targets are more likely to notice active scanning as intrusion detection systems (IDS) or intrusion prevention systems (IPS) can detect attempts to access open ports and scan for vulnerabilities.
While information security teams need to adopt unique OSINT techniques specific to their organizational needs, following a general process helps lay the foundations for effective intelligence gathering.
The Open Web Application Security Project (OWASP) outlines a 5-step OSINT process:
Determine where to find the information for the specific intelligence requirement.
Gather relevant information from the identified source.
Process the identified source's data and extract meaningful insights.
Combine the processed data from multiple sources.
Create a final report on findings.
There are many free and paid open source intelligence tools available for a variety of purposes, such as:
- Searching metadata and code
- Researching phone numbers
- Investigating people and identities
- Verifying email addresses
- Analyzing images
- Detecting wireless networks and analyzing packets.
Listed below are some useful open source intelligence tools.
Babel X is a multilingual Internet search tool that finds publicly available information from sources like social media, forums, news sites, and blogs across 200 different languages. It filters relevant information into different categories for OSINT analysis.
BuiltWith is a website profiling tool that shows current and historical information about a website's technology usage, technology versions, and hosting.
Creepy is an open source intelligence gathering tool that collects geolocation information through social networking platforms.
DarkSearch is a dark web search engine that allows organizations to research and access sites directly through Tor2Web.
GHunt is an OSINT tool used to find data associated with Google accounts, including account owner name, Google ID, YouTube, and other services like Photos and Maps.
Google Dorking, also known as a Google Dork, involves using advanced search queries to find security and configuration information about websites.
Greg.app is a search engine that searches code from public repositories on GitHub.
Intel Owl is an OSINT tool that gathers threat intelligence data about a specific file, an IP, or a domain through a single API request.
Intelligence X is a search engine and data archive that Searches Tor, I2P, data leaks, and the public web by email, domain, IP, CIDR, Bitcoin address, and more.
Maltego is an OSINT and graphical link analysis tool for gathering and connecting information for investigative tasks.
O365 Squatting is a Python tool used to check inputted domains against O365 infrastructure to identify typo-squatted domains that do not appear in DNS requests.
The OSINT framework is an online directory that lists open source tools for OSINT gathering, sorted by source type.
reNgine is an automated reconnaissance framework used for OSINT gathering that streamlines the recon process.
Recon-ng is an open source intelligence gathering tool used to conduct web-based reconnaissance.
Searchcode is a source code search engine that indexes API documentation, code snippets, and open source (free software) repositories.
Shodan is a search engine used for gathering intelligence information from a variety of IoT devices like webcams, routers, and servers.
Social Mapper is an OSINT tool that uses facial recognition to correlate social media profiles across different sites on a large scale.
Spiderfoot is a reconnaissance tool that automatically queries over 100 public data sources (OSINT) to gather intelligence on IP addresses, domain names, email addresses, names, and more.
Sublist3r is a python tool designed to enumerate subdomains of websites, using search engines such as Google, Yahoo, Bing, Baidu, and Ask.
theHarvester is a penetration testing tool used to gather information about emails, subdomains, hosts, employee names, open ports, and banners from different public sources like search engines, PGP key servers, and SHODAN computer database.
TinEye is a reverse image search engine and image recognition tool.
Zmap is a network tool used for Internet-wide network surveys.
Is OSINT Legal?
The US Code defines the legal use of open source intelligence as "... intelligence that is produced from publicly available information and is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement."
OSINT analysts use specialized recon tools to harvest relevant data. These tools and techniques are legal as they aid in data collection, analysis, and processing from publicly available information.
It's important to note that while OSINT deals with information that anyone on the Internet can find, it often uncovers information that most people do not know is public.
This lack of knowledge is where the 'grey area' exists for OSINT. The legality and ethics of OSINT come down to how vulnerabilities are managed.
For example, an organization has accidentally leaked employee credentials on Amazon S3, a public storage bucket. The leak is discovered using a code search engine.
A threat actor could discover this leak and exploit it for social engineering or other cyber attacks.
An OSINT analyst could alert the organization accordingly to ensure fast remediation.
Given the prevalence of scenarios such as the above, organizations must develop clear frameworks for OSINT to ensure analysts are following correct procedures. Strict regulatory and compliance requirements, such as GDPR, further highlight the need for concrete ethical guidelines.
The Dangers of OSINT
The accessibility of OSINT appeals to both resourceful security teams looking to improve their cybersecurity and cyber attackers with malicious intent.
For example, OSINT analysts often leverage OSINT tools to perform network scanning during a network security assessment. Threat actors can use these same tools to identify network vulnerabilities and exploit them.
They can also gather intelligence to carry out other cyber attacks, such as:
- Social engineering, e.g., phishing and email spoofing
- Creating botnets to launch Distributed Denial of Service (DDoS) attacks
- Brute force attacks
- Malware injections, e.g., spyware, ransomware, and other types of malware
Security teams should have effective information risk management practices in place to account for abuses of OSINT.
Upguard's attack surface management platform identifies public data leaks and software vulnerabilities affecting organizations and vendors in real-time.
UpGuard uses real-time data and remediation workflows to help organizations secure assets before attackers can exploit them.