Interest in the open source software framework Hadoop is rising. Huge expansions in raw structured and unstructured data, coupled with a demand for big data analytics, are major drivers fueling Hadoop's market success.
Market research firm Research Beam forecasts that the global Hadoop market will skyrocket from $1.5 billion in 2012 to $50 billion in 2020. However, while the platform offers enterprises a compelling case for storing unstructured data, it's not all sunshine and rainbows for Hadoop users.
Hadoop is designed for massive workloads. This means that it offers scalability and is generally faster and more cost-effective than conventional tools, such as relational database management systems (RDBMS). Consequently, Hadoop is able to scale and meet the exponential increases in data volumes generated by mobile, social media, and Internet of Things technologies. The database can also be used as an alternative or extension to existing proprietary data warehouses when companies have trouble keeping up with rising volumes.
Hadoop is flexible; it works with both unstructured data and structured information. Traditional database management systems were geared toward only storing structured data.
The flip side of the Hadoop coin
As Hadoop becomes more of a force in enterprise cloud applications, it shows its weaknesses as much as its strengths. Hadoop does a good job of processing high volumes of data; however, the programming environment is immature, so building applications requires significant time, effort and investment.
Hadoop does not work well with time-sensitive information; quick response times are not its strength. So, it does not cooperate well with transactional systems or ad hoc queries. Hadoop would struggle to find information in a database of 300 million records in 30 milliseconds or less.
While Hadoop's roots go back to the turn of the millennium, the market has taken until just a few years ago to catch up to the new architecture. Consequently, its programming ecosystem is not as well developed as RDMSes in use for decades.
In addition, Hadoop application development work is tedious. Enterprises using Hadoop to pre-process large volumes of raw data must first move it to a different system. Programming tools and links to daily business applications are often hard to find and usually are not very robust. Therefore, it requires a lot of time and effort for enterprises to build applications on the open source system.
Cost is another potential source of Hadoop dissatisfaction. The belief that Hadoop is cheap or even free due to its open source nature does not hold water. Typically, businesses need help with their first few applications and hire a Hadoop vendor or third-party specialist, which becomes expensive. On that note, finding qualified Hadoop programmers and analysts is difficult -- Dice.com lists more than 2,000 open Hadoop positions. Due to the high demand for their services, Hadoop developers earn $75,000 to $125,000 base salaries to go with various bonuses. So, development costs can be higher than for other systems.
About the author:
Paul Korzeniowski is a freelance writer who specializes in cloud computing issues. He has been covering technology issues for more than two decades; is based in Sudbury, Mass.; and can be reached at firstname.lastname@example.org.
Database technology evolution to impact Hadoop?
OpenStack dominating open source conversation
Dig deeper on Open source cloud computing