Hadoop Summit: Where is the value? Where are the apps?

Jonathan Gray, Founder & CEO of Cask, is an entrepreneur and software engineer with a background in open source and data. Prior to Cask, he was at Facebook working on projects like Facebook Messages. At startup Streamy, Jonathan was an early adopter of Hadoop and HBase committer.

Coming out of Hadoop Summit, one thing is clear to me – while there has been significant growth and success of the ecosystem, it is still early days and Hadoop is still exceptionally hard to consume for most organizations. As a result of this persistent issue, there weren’t many major announcements, nothing exceptionally new or different released, and the buzz remained largely centered on YARN and Spark, both of which are several years old. While we saw reports of early adopting companies seeing real value created with Hadoop, the focus was more technical this year than I anticipated—from the keynotes to the breakout sessions to the show floor, this year’s summit seemed more about the endless variety of different technologies than use cases and actual return on investment realized. A brief overview of a few other trends we observed is below:

Hadoop is not quite enterprise ready…yet

Hadoop Summit generated significant discussion about whether Hadoop is truly ready for real, production enterprise use. Of particular concern is security and related issues of privacy and data policies needed for companies, especially those dealing with customer or financial information. Recent acquisitions of Hadoop security upstarts by the major Hadoop distributions indicate that this will continue to be an important area of focus in the near term.

Hadoop vs. The EDW: To Replace or To Augment

Another hot topic is whether Hadoop is a replacement for the traditional EDW or if it is only to augment and offload certain workloads. In years past, this has been much more of a debate; however this year it seems clear that most have accepted a symbiotic relationship for the time being. While I do expect this to change, it is evident today that there is a significant gap in the capabilities of the Hadoop stack compared to proprietary EDW technologies.

Hadoop is becoming more fragmented

This year it became apparent that the Hadoop ecosystem is splintering into multiple and often competing projects. Competing vendors are establishing parallel but increasingly separate stacks while differentiated vendors are marketing overlapped messages. There has been an explosion in the variety of ways to work with Hadoop and in the number of companies trying to make Hadoop consumable, and it’s becoming even more confusing to choose which path is best to follow. This is true not only for business leaders who are making decisions about Big Data projects in their company but even for knowledgeable developers.

Hadoop (still) needs to be simplified

This mass confusion in the market is undercutting companies’ ability to achieve value and realize what they want from their Big Data initiatives. A lot of attention is still being paid to the infrastructure rather than the applications, so although the disruptive value of Big Data should be at the forefront, it remains elusive for most. The Big Data Application revolution is still forthcoming. It is still early days, Hadoop is still very difficult, and very few people understand how to work with it. That’s why we are building a platform that focuses on making Hadoop easier for developers, allowing anyone to build applications (today in Java) without worrying about the low-level infrastructure. Rather than grapple with myriad technology options, they are free to focus on what matters – turning their brilliant ideas for data into apps that solve real problems. This is where Hadoop can produce desired outcomes – in data applications that quickly provide measurable value.

Adding Jet Fuel to the Fire

Not to be left out of the new choices in the Hadoop menagerie, in case you missed it, we announced a project in collaboration with AT&T Labs: a distributed framework for real-time data processing and analytics applications, codenamed jetStream. Available in open source in Q3 2014, you can find more information about this effort in our recent blog post and at jetStream.io.
Leave a comment