Apache Hadoop vs MongoDB: Which Is More Secure?

Posted by UpGuard

Either you’re reading this because the question has been puzzling you secretly, or you’ve arrived to protest this admittedly incongruous comparison. Fortunately, both sides of the fence are covered here.

In this article we’ll compare and contrast their features and benefits, but not before clearing up some popular misconceptions about the two big data platforms. We’ll then delve into each respective platforms’ attack surfaces/vulnerabilities and evaluate them from a security angle.

A quick look at popular database-ranking site DB Engines illustrates Big Data and NoSQL technologies’ encroachment into traditional RDBMS territory. MongoDB is the leading Big Data solution among the top-ranked DBMS solutions (#15), with Cassandra (#8), Redis (#10), and HBase (#15) trailing behind.  As you can see, Hadoop is noticeably absent from the list.  

Top ranking DBMS platforms. Source: DB-Engines.com

The reason is that it’s not a DBMS, per se. Instead, Hadoop is an open-source software framework that enables massively scalable storage and batch data processing. And while HBase is the actual database component that runs on top of the Hadoop Distributed File System (HDFS), implementing Hadoop to augment existing database platforms is not uncommon. Our other subject of comparison—MongoDB—even lays out some representative use cases for Hadoop/MongoDB integration.

Apache Hadoop

Hadoop consists of the following core components:

  • Hadoop Common – a collection of libraries and utilities that are used by other Hadoop modules.

  • Hadoop Distributed File System (HDFS) – a highly fault-tolerant and scalable Java-based file system designed for high throughput access to large application data sets

  • MapReduce – a software development paradigm/framework for processing large sets of data in parallel

  • YARN (Yet Another Resource Negotiator) – a framework for scheduling/handling resource requests from distributed applications

Other Hadoop-related projects rounding out the platform include the aforementioned Hbase distributed database, as well as popular Hadoop tools like Pig, Hive, Ambari, Zookeeper, among others.

The Hadoop platform architecture. Source: HortonWorks.

Many prominent vendors have created their own distributions and specialized offerings around the open source Hadoop platform. For example, Cloudera’s CDH (Cloudera Distribution Including Apache Hadoop) targets enterprise-class use cases while MapR uses its own proprietary file system called MapRFS over HDFS. The Hortonworks Data Platform (HDP) provides pure-play enterprise Hadoop solutions and is focused on increasing interoperability between vendors and solutions.

Hadoop’s merits are consistently lauded by its adopters across the board. As an affordable (free), highly-scalable, and fault tolerant big data platform, the platform is usually the go-to solution for big data needs. However, from a security perspective—Hadoop is a landmine of vulnerabilities. Prominent Gartner analyst Merv Adrian recently commented publicly on the dangerous security shortcomings of Hadoop:

"The nearly non-existent response to the security issue is shocking. Can it be that people believe Hadoop is secure? Because it certainly is not. At every layer of the stack, vulnerabilities exist, and at the level of the data itself there are numerous concerns. These include the use of external unveiled data and of data in file systems that lack any protection, and the separation of Hadoop initiatives in most organizations from IT governance. Add to that the kinds of use cases Hadoop is being pointed at: sensitive healthcare information personal data in retail systems; telephone usage; social media connection and sentiment analytics - all of them give us pause."

Most of Hadoop’s security shortcomings revolve around the central drawback of the platform: complexity. Because the platform—an intricate array of interworking components—is difficult to configure and manage, attack vectors are often left exposed by less-experienced Hadoop architects. And because Hadoop was not initially designed for security (initial uses of it were restricted to private clusters in trusted environments), incorporating security into the framework can be a challenge. Initial versions didn’t even authenticate users or services, incorporate data privacy controls, or encryption at the storage/network levels. Hadoop now comes with basic security mechanisms for things like authentication and authorization, but they are nonetheless turned off by default. Additionally, as a Java-based technology, Hadoop is subject to many of the exploits inherent to the language.

The Common Vulnerabilities and Exposures (CVE) database is a consolidated, up-to-date repository of publicly known information-security vulnerabilities and exposures. Hadoop has four documented vulnerabilities in the CVE database, and has a weighted average Common Vulnerability Scoring System (CVSS) score of 6.3, putting it in the medium risk category. Several projects exist to bolster Hadoop’s security posture, including Apache projects like Knox Gateway, Sentry, and Ranger, while corporate-led initiatives like Intel’s GitHub-hosted Project Rhino aim to develop all the security features that are missing from Hadoop like encryption and key management, a common authorization framework, single sign-on/token-based authentication, and others.


To be fair, all Big Data solutions—be it Hadoop or MongoDB—are fledgling technologies when compared to mature RDBMS platforms like Oracle, PostgreSQL, or MySQL. And despite their widespread adoption, most tools are just coming into their own. This is especially true of MongoDB: with its 3.0 release back in February, the popular NoSQL database now offers standard RDBMS-like features like document-level locking and pluggable storage APIs.

In contrast to Hadoop’s HBase’s wide column stores, MongoDB uses a schema-less, document store database model--instead of employing relational structures such as tables and columns, it instead stores its records as documents. And while data is queried in HBase with a SQL-like language called HiveQL, database manipulations in MongoDB are carried out with Binary JSON (BSON). This makes it especially popular with many full stack platforms based heavily around JavaScript like the MEAN stack.

Free DevOps and Security eBooks

The platform has a reputation for fault-tolerance, high availability, and unprecedented horizontal scaling capabilities enabled primarily through a mechanism called sharding, which allows for storing data records across multiple machines. Furthermore, MongoDB does not require using traditional database constructs like joins or data normalization, allowing full stack developers to build highly-scalable applications quickly and efficiently.

MongoDB consists of the following core components:

  • mongod: a core database process

  • mongos: a controller and query router for sharded clusters

  • mongo: an interactive MongoDB Shell

MongoDB has also made available an assortment of utilities for tasks such as data importing/exporting and diagnostics, as well as drivers for various languages such as PHP, C/C++, Python, and Scala, among others.

MongoDB architecture and sharding. Source: MongoDB.

Though perhaps not as widely publicized as Hadoop’s shortcomings, MongoDB also harbors many critical vulnerabilities. Like Hadoop, MongoDB (and indeed most Big Data technologies) carries some baggage due to its origins in the private data center. These powerful data crunching platforms have been accelerated by the advent of the cloud, but have also gained a plethora of attack vectors as a result. Interestingly—also like Hadoop—MongoDB comes without access control and encryption services turned on by default.

MongoDB has seven documented vulnerabilities in the CVE database, and has a weighted average CVSS score of 6, putting it in the medium risk category.

In short, perhaps it’s not question of which is more secure, as both are equally unsecure. Hadoop can be more of a challenge to tighten security-wise due to its preponderance of components; that said, MongoDB seems to be having a bad run as of late—first, with a massive discovery of 40,000 vulnerable MongoDB databases on the internet in February, followed by a zero-day vulnerability discovered in phpMoAdmin, a PHP-based MongoDB administration tool. The following table is a side-by-side highlight of each platform’s features as well as respective security vulnerabilities and CVE scores.


Apache Hadoop


Ease-of-use & Configuration

Intermediate to difficult

Provides tools like Pig and Hive to aid development management

Intermediate, but requires some effort to configure sharding.

Manipulating documents with BSON may be difficult for those unfamiliar with JSON

Database Type

Wide column store (HBase)

Document store


Excels in fully distributed, multi-threaded execution

Designed for easy horizontal scaling


Commerical implementations and support available through vendors such as HortonWorks and Cloudera

An abundance of community support is available online 


Commercial support available from MongoDB

Ample community support also available online

Ideal Use Case(s)

Data processing intensive applications

Applications requiring fast data storage and retrieval capabilities

# CVE Database Entries



Average CVSS Score

6.3 (medium)

6 (medium)

The fact is that most big data web applications built with Hadoop or MongoDB live in some mode of cloud deployment, yet both technologies ship with security turned off by default. This of course is an enabler of bad security practices—but it does not shift complete responsibility to the vendor. Ultimately, an application’s security posture is in the hands of the developer, and should be baked-in during all phases of development—especially when deploying big data platforms. Regardless of which database technology is being used, UpGuard can provide the proper control mechanisms for identifying and remediating critical vulnerabilities at any point in the software delivery pipeline, allowing for continuous security monitoring and assessment for all endpoints: databases, servers, network devices, and more.










UpGuard customers