Cask Blog

CDAP Workflows: A closer look

Sagar Kapare

The Cask Data Application Platform (CDAP) is an open-source platform to build and deploy data applications on Apache Hadoop™. In a previous blog post we introduced Workflows, a core component of CDAP, in comparison with Apache Oozie. In this post we will discuss  the CDAP Workflow engine in greater detail. CDAP Workflows are used to … Read more



A Data Quality Application Template for CDAP

Shilpa Subrahmanyam

One of Cask’s core goals is making a reasonably-experienced Java developer’s life much easier when building Hadoop applications. My summer project was aligned with the company’s effort to take this to the next level by lowering the barrier to entry for using Hadoop even further — Java proficiency not required. I spent my summer writing … Read more


Caskalytics: Multi-log Analytics Application

Derek Tzeng & Jay Jin

This summer, we joined Cask as interns to work on Cask Data Application Platform (CDAP). Our project, internally codenamed Caskalytics, was creating an internal Operational Data Lake. In reality, a data lake is just a concept focused on storing data from disparate sources (real-time or batch, structured, unstructured or semi-structured) in a single big data … Read more


CDAP Workflows: In Comparison with Apache Oozie

Bhooshan Mogal

Apache Oozie is a workflow scheduler system to manage Apache Hadoop™ jobs. It is one of the most popular open-source workflow scheduler systems for Hadoop. Cask Data Application Platform (CDAP) is an open-source platform to build and deploy data applications on Hadoop. CDAP provides abstractions on top of Hadoop that enable developers to rapidly build, … Read more