Cask Blog

Cask Tracker Enhanced: Metadata Taxonomy and Data Usage Analytics in CDAP 3.5

Yue Gao and Riwaz Poudyal

Cask Tracker is a self-service CDAP Extension that automatically captures rich metadata and provides users with visibility into how data is flowing into, out of, and within a Data Lake. Tracker was first introduced in CDAP v3.4. Tracker v0.2 has just been released along with CDAP 3.5 and packs a ton of new features. Dataset … Read more



The Next Wave in Big Data – Data Hulk, Introducing Cask Hulk Hydrator

Big Data, Mobile and Cloud are the mega-trends of this decade. In order to realize the ultimate value of data, we at Cask have been conceptualizing and building a unique system that brings all these three mega-trends together. In the ecosystem of never-changing technologies of Mobile & Hadoop, you need internet-grade pipeline management system. At … Read more



Running Legacy MapReduce Jobs in CDAP

Rohit Sinha

The Cask Data Application Platform is an integrated developer platform for the Hadoop ecosystem. With CDAP, developers can address a broader set of batch and real-time use-cases with easy-to-use abstractions. Developers can write MapReduce programs using CDAP and deploy them as CDAP applications easily, as explained in this guide. Running MapReduce programs inside CDAP has … Read more


CDAP Services for Apache Ambari

Cask is excited to announce easy CDAP integration for Apache Ambari users. Previously, we introduced you to integration with Cloudera Manager. This post will familiarize you with integration with Apache Ambari, the open source provisioning system for HDP (Hortonworks Data Platform). Adding the CDAP service to Ambari To install CDAP on a cluster managed by … Read more


CDAP Workflows: A closer look

Sagar Kapare

The Cask Data Application Platform (CDAP) is an open-source platform to build and deploy data applications on Apache Hadoop™. In a previous blog post we introduced Workflows, a core component of CDAP, in comparison with Apache Oozie. In this post we will discuss  the CDAP Workflow engine in greater detail. CDAP Workflows are used to … Read more


Announcing CDAP 3.2 – Hydrator and much more!

Bhooshan Mogal

We are excited to announce the Cask Data Application Platform (CDAP) 3.2 release. This release brings many enhancements to existing CDAP features as well as lays the foundation for upcoming, advanced features—all designed to further simplify data application development. Cask Hydrator CDAP 3.2 introduces Cask Hydrator—a highly functional framework and UI to support self-service batch … Read more


CDAP Workflows: In Comparison with Apache Oozie

Bhooshan Mogal

Apache Oozie is a workflow scheduler system to manage Apache Hadoop™ jobs. It is one of the most popular open-source workflow scheduler systems for Hadoop. Cask Data Application Platform (CDAP) is an open-source platform to build and deploy data applications on Hadoop. CDAP provides abstractions on top of Hadoop that enable developers to rapidly build, … Read more


AeroCask – Real-time Flight Data Analytics using CDAP

One of the many things that I love about Cask are the hackathons before every release. It is not only a way for us to dog-food new features in the CDAP platform but it is also an opportunity to let your imagination run loose and implement an integration with another system; or develop an interesting … Read more