Hadoop Components Versions in Distros Matrix

The Apache Hadoop ecosystem is always evolving, with the major distributions constantly upgrading their included core Hadoop components.  This can present a challenge when building any application which runs on top of Hadoop.  When developing our open-source application framework, CDAP, we strive to maintain compatibility with all major Hadoop distributions. Building on our previous reference … Read more




Tephra: A Transaction engine for HBase – moves to Apache Incubation!

Cask Data Application Platform (CDAP) simplifies Big Data application development by abstracting many of Hadoop’s complexities and enabling developers to use familiar skills. We found that one of the best ways to simplify distributed programs is to have exactly-once processing semantics. Having exactly-once processing makes it easy to reason about the state of the system … Read more



Powering BI with ODBC Connectors for CDAP

Bhooshan Mogal

Open Database Connectivity (ODBC) is the de-facto standard API for accessing data stored in relational databases. ODBC drivers allow applications across a variety of platforms (especially non-Java) to access relational databases in a manner independent from the implementation and the operating system. In this blog we will discuss the integration between CDAP Datasets and Tableau … Read more


Pachyderm and TubeMogul Share Their Big Data Application Platforms and Experience

Russ Savage

It was so great to see everyone at the Big Data Applications Meetup last week! The meetup was sponsored by Cask, the company making big data applications easy, and by Ampool, and we would like to thank Milind Bhandarkar, the Founder and CEO of Ampool, for supporting this event. For those that couldn’t join us, we … Read more


Announcing CDAP Release 3.4: Introducing Tracker, Next-gen Hydrator, Enhanced Spark support, and much more!

Rohit Sinha

I am very happy to announce the general availability of our flagship product, the Cask Data Application Platform (CDAP), version 3.4. This release introduces a fresh new look for Cask Hydrator, and improvements to it that extend beyond data ingestion use cases, such as building aggregations and performing data science on the ingested data. The … Read more




The Next Wave in Big Data – Data Hulk, Introducing Cask Hulk Hydrator

Big Data, Mobile and Cloud are the mega-trends of this decade. In order to realize the ultimate value of data, we at Cask have been conceptualizing and building a unique system that brings all these three mega-trends together. In the ecosystem of never-changing technologies of Mobile & Hadoop, you need internet-grade pipeline management system. At … Read more