This content is part of the Essential Guide: Where does the enterprise stand with open source cloud computing?
Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Hadoop users see bumps in the open source road

Hadoop has watched its popularity grow exponentially, and the market projects an even greater explosion in the future -- but, there's a flip side to every coin.

Interest in the open source software framework Hadoop is rising. Huge expansions in raw structured and unstructured data, coupled with a demand for big data analytics, are major drivers fueling Hadoop's market success.

Market research firm Research Beam forecasts that the global Hadoop market will skyrocket from $1.5 billion in 2012 to $50 billion in 2020. However, while the platform offers enterprises a compelling case for storing unstructured data, it's not all sunshine and rainbows for Hadoop users.

Hadoop is designed for massive workloads. This means that it offers scalability and is generally faster and more cost-effective than conventional tools, such as relational database management systems (RDBMS). Consequently, Hadoop is able to scale and meet the exponential increases in data volumes generated by mobile, social media, and Internet of Things technologies. The database can also be used as an alternative or extension to existing proprietary data warehouses when companies have trouble keeping up with rising volumes.

Hadoop is flexible; it works with both unstructured data and structured information. Traditional database management systems were geared toward only storing structured data.

The flip side of the Hadoop coin

As Hadoop becomes more of a force in enterprise cloud applications, it shows its weaknesses as much as its strengths. Hadoop does a good job of processing high volumes of data; however, the programming environment is immature, so building applications requires significant time, effort and investment.

While Hadoop's roots go back to the turn of the millennium, the market has taken until just a few years ago to catch up to the new architecture.

Hadoop does not work well with time-sensitive information; quick response times are not its strength. So, it does not cooperate well with transactional systems or ad hoc queries. Hadoop would struggle to find information in a database of 300 million records in 30 milliseconds or less.

While Hadoop's roots go back to the turn of the millennium, the market has taken until just a few years ago to catch up to the new architecture. Consequently, its programming ecosystem is not as well developed as RDMSes in use for decades.

In addition, Hadoop application development work is tedious. Enterprises using Hadoop to pre-process large volumes of raw data must first move it to a different system. Programming tools and links to daily business applications are often hard to find and usually are not very robust. Therefore, it requires a lot of time and effort for enterprises to build applications on the open source system.

Cost is another potential source of Hadoop dissatisfaction. The belief that Hadoop is cheap or even free due to its open source nature does not hold water. Typically, businesses need help with their first few applications and hire a Hadoop vendor or third-party specialist, which becomes expensive. On that note, finding qualified Hadoop programmers and analysts is difficult -- lists more than 2,000 open Hadoop positions. Due to the high demand for their services, Hadoop developers earn $75,000 to $125,000 base salaries to go with various bonuses. So, development costs can be higher than for other systems.

About the author:
Paul Korzeniowski is a freelance writer who specializes in cloud computing issues. He has been covering technology issues for more than two decades; is based in Sudbury, Mass.; and can be reached at

Next Steps

Database technology evolution to impact Hadoop?

OpenStack dominating open source conversation

Dig Deeper on Open source cloud computing

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

Paul, very informative article. As an alternative to Hadoop, LexisNexis has open sourced the HPCC Systems platform that is a complete enterprise-ready solution. Designed by data scientists, it provides for a single architecture, a consistent data-centric programming language (ECL), and two data processing clusters. Their built-in analytics libraries for Machine Learning and BI integration provide a complete integrated solution from data ingestion and data processing to data delivery. This all in one platform means only one thing to support and from a significant lower number of resources. For a recent blog covering the main differences navigate to