Cask Blog


Securely storing sensitive information and using it in CDAP

Nishith Nand

The need for Secure Store CDAP and Hadoop cluster admins often need to store sensitive information like passwords, access tokens, private keys etc. which is used to access resources during the course of cluster operation. This data needs to be stored in a way that it is protected from unauthorized access but is still available to … Read more


Cask Tracker Enhanced: Metadata Taxonomy and Data Usage Analytics in CDAP 3.5

Yue Gao and Riwaz Poudyal

Cask Tracker is a self-service CDAP Extension that automatically captures rich metadata and provides users with visibility into how data is flowing into, out of, and within a Data Lake. Tracker was first introduced in CDAP v3.4. Tracker v0.2 has just been released along with CDAP 3.5 and packs a ton of new features. Dataset … Read more


CDAP 3.5 – Enterprise Security, Drag-and-Drop Spark Streaming, and much more!

Sagar Kapare

I am very excited to announce the release of Cask Data Application Platform (CDAP) version 3.5. The focus for CDAP 3.5 is security, with a number of significant new capabilities added to the platform, in addition to major improvements to the Extensions, Cask Hydrator and Cask Tracker. CDAP 3.5 introduces authorization to the platform with … Read more


Gamifying product testing – learn how Cask did it

Last week at Cask was especially abuzz with activity. Why, you ask? Over the last couple of months, the engineering team at Cask had been focusing on building a brand new version of our flagship product, the Cask Data Application Platform (CDAP) version 3.5, which will include many new engaging and useful features like enterprise-grade … Read more



CDAP – Taking Spark Apps from Prototype to Production

Apache Spark™ is a general data processing framework, which is getting popular due to its fast data model and its flexible execution engine compared to MapReduce. In fact, Spark is becoming an essential technology for data analytics and this has become even more evident by the fact that all the top three Hadoop distributions are … Read more


Tephra: A Transaction engine for HBase – moves to Apache Incubation!

Cask Data Application Platform (CDAP) simplifies Big Data application development by abstracting many of Hadoop’s complexities and enabling developers to use familiar skills. We found that one of the best ways to simplify distributed programs is to have exactly-once processing semantics. Having exactly-once processing makes it easy to reason about the state of the system … Read more



Powering BI with ODBC Connectors for CDAP

Bhooshan Mogal

Open Database Connectivity (ODBC) is the de-facto standard API for accessing data stored in relational databases. ODBC drivers allow applications across a variety of platforms (especially non-Java) to access relational databases in a manner independent from the implementation and the operating system. In this blog we will discuss the integration between CDAP Datasets and Tableau … Read more