Strata + Hadoop World NYC 2014 Recap: Four Trends in Hadoop

Jonathan Gray, Founder & CEO of Cask, is an entrepreneur and software engineer with a background in open source and data. Prior to Cask, he was at Facebook working on projects like Facebook Messages. At startup Streamy, Jonathan was an early adopter of Hadoop and HBase committer.

The Cask team had a great and productive time at Strata + Hadoop World earlier this month in New York City! We are very optimistic about the robust growth in Hadoop adoption, increased participation from a broad range of developers and companies in many industries, and continued maturation in the early days of this technology. As I mentioned during my talk at the conference, we believe strongly that Hadoop’s underlying infrastructure will continue to subside as the focus of the industry as data applications that directly solve business problems come to the forefront. A quick overview of four key trends we observed is below:

Hadoop is becoming an operating system

In his keynote address, Cloudera Chief Strategy Officer Mike Olson proclaimed that we’re going to see Hadoop disappear this year. I agree with Mike’s prediction and wholeheartedly believe that Hadoop is becoming a platform or OS upon which we need to build data applications. While Hadoop is a fundamental layer of a new modern, distributed data infrastructure, it has little value for an organization without apps that turn data into insights and action.

But we’re still talking about the infrastructure

As I’ve said before, a lot of attention is still being paid to the infrastructure rather than the applications, so although the disruptive value of Big Data should be at the forefront, it unfortunately remains elusive for most. This year at Hadoop World was no exception as the number of companies developing databases and other types of infrastructure continue to increase.

While nearly every Hadoop vendor claims to make Hadoop easy, the ecosystem continues to become more fragmented as new infrastructure systems, frameworks, tools, and projects proliferate. As a community, we are actually making an already complex technology stack more confusing, and we need to do a better job at helping companies and developers wade through all of the nonsense that is being thrown at them.

Developers are critical to Hadoop’s success 

It is clear that if we want to accelerate the adoption of Hadoop and enable the wave of data applications to be created, we must enable the developers who will lead the way. In spite of this, I noticed that developers were largely overlooked at Hadoop World this year. Most of the attention is still on low-level infrastructure and operations or high-level BI tools and data science. While these are important parts of the equation, these alone will not enable Hadoop to deliver on the application platform vision, addressing the truly disruptive Big Data use cases. This is why Cask remains singularly focused on developers and allowing them to build the next wave of data apps.

It’s all about open source 

Today’s large enterprises are using Big Data and Hadoop as an opportunity to build a new house within their IT infrastructure. They strive for a Big Data reference architecture that is based on modern, distributed, open source software designed for commodity hardware rather than traditional proprietary technologies. And based on what we saw at the conference, the momentum of pure open source technologies and projects continues to grow, and the feedback from attendees on our move to open source was overwhelmingly positive.

We are working toward universal access to Hadoop for developers because we believe there is so much value to be realized in big data despite the difficulties of working with the underlying technology. This will happen as an open source community and not through a single company.

One more thing: clusters with a click (in the cloud)

In case you missed it among the flurry of vendor announcements at Hadoop World, we recently released Coopr Cloud—a free, cloud-based service that makes it easier than ever for developers to get a complex, distributed Hadoop cluster up and running on your favorite cloud. It also makes it simple to spin up a distributed open source Cask Data Application Platform (CDAP) cloud instance and run the powerful data applications you build—without having to download, install or configure any software. You can find more info about Coopr Cloud in our recent blog post or provision a cluster for free today at coo.pr.

<< Return to Cask Blog