Cask Blog

Integrating CDAP with Microsoft Azure HDInsight

We recently announced the integration of CDAP with the Microsoft Azure HDInsight platform. This post will give a behind-the-scenes look at this integration. First, a bit about the integration itself. Azure HDInsight is an Apache Hadoop and Spark distribution powered by the cloud. This means that it handles any amount of data, scaling from terabytes … Read more


Actions in Cask Hydrator

Chris Lu

This summer as an intern at Cask, I had the opportunity to work on Cask Hydrator. Since its launch in 2015, Cask Hydrator has been a broadly used and important application on CDAP to help users easily build and run big data pipelines. I helped evolve Hydrator further by adding the Action function to it. … Read more



Securely storing sensitive information and using it in CDAP

Nishith Nand

The need for Secure Store CDAP and Hadoop cluster admins often need to store sensitive information like passwords, access tokens, private keys etc. which is used to access resources during the course of cluster operation. This data needs to be stored in a way that it is protected from unauthorized access but is still available to … Read more


Cask Tracker Enhanced: Metadata Taxonomy and Data Usage Analytics in CDAP 3.5

Yue Gao and Riwaz Poudyal

Cask Tracker is a self-service CDAP Extension that automatically captures rich metadata and provides users with visibility into how data is flowing into, out of, and within a Data Lake. Tracker was first introduced in CDAP v3.4. Tracker v0.2 has just been released along with CDAP 3.5 and packs a ton of new features. Dataset … Read more


CDAP 3.5 – Enterprise Security, Drag-and-Drop Spark Streaming, and much more!

Sagar Kapare

I am very excited to announce the release of Cask Data Application Platform (CDAP) version 3.5. The focus for CDAP 3.5 is security, with a number of significant new capabilities added to the platform, in addition to major improvements to the Extensions, Cask Hydrator and Cask Tracker. CDAP 3.5 introduces authorization to the platform with … Read more





Tephra: A Transaction engine for HBase – moves to Apache Incubation!

Cask Data Application Platform (CDAP) simplifies Big Data application development by abstracting many of Hadoop’s complexities and enabling developers to use familiar skills. We found that one of the best ways to simplify distributed programs is to have exactly-once processing semantics. Having exactly-once processing makes it easy to reason about the state of the system … Read more