Cask Blog

Combining Hadoop and Spark in a Data Processing Pipeline

Tony Duarte

  CDAP includes an Application Development Framework so that Developers can build entire Applications with existing Big Data technologies – technologies such as Apache Hadoop, Apache Spark, Apache HBase, Apache Hive and more. CDAP has been used by Fortune 50 customers to help them do Data Ingestion and Data Egress from their data lakes and to help them … Read more


Monitoring Key Hadoop Operational Statistics using CDAP

Bhooshan Mogal

The Cask Data Application Platform (CDAP) is the first Unified Integration Platform for Big Data. It provides users with higher level abstractions and APIs over complex, low-level systems for building  Big Data applications. It does the heavy lifting involved in integrating various platforms in the Apache Hadoop ecosystem, to provide a single end-to-end platform. To … Read more


Cask Tracker Enhanced: Metadata Taxonomy and Data Usage Analytics in CDAP 3.5

Yue Gao and Riwaz Poudyal

Cask Tracker is a self-service CDAP Extension that automatically captures rich metadata and provides users with visibility into how data is flowing into, out of, and within a Data Lake. Tracker was first introduced in CDAP v3.4. Tracker v0.2 has just been released along with CDAP 3.5 and packs a ton of new features. Dataset … Read more



The Next Wave in Big Data – Data Hulk, Introducing Cask Hulk Hydrator

Big Data, Mobile and Cloud are the mega-trends of this decade. In order to realize the ultimate value of data, we at Cask have been conceptualizing and building a unique system that brings all these three mega-trends together. In the ecosystem of never-changing technologies of Mobile & Hadoop, you need internet-grade pipeline management system. At … Read more



Running Legacy MapReduce Jobs in CDAP

Rohit Sinha

The Cask Data Application Platform is an integrated developer platform for the Hadoop ecosystem. With CDAP, developers can address a broader set of batch and real-time use-cases with easy-to-use abstractions. Developers can write MapReduce programs using CDAP and deploy them as CDAP applications easily, as explained in this guide. Running MapReduce programs inside CDAP has … Read more


CDAP Services for Apache Ambari

Cask is excited to announce easy CDAP integration for Apache Ambari users. Previously, we introduced you to integration with Cloudera Manager. This post will familiarize you with integration with Apache Ambari, the open source provisioning system for HDP (Hortonworks Data Platform). Adding the CDAP service to Ambari To install CDAP on a cluster managed by … Read more


CDAP Workflows: A closer look

Sagar Kapare

The Cask Data Application Platform (CDAP) is an open-source platform to build and deploy data applications on Apache Hadoop™. In a previous blog post we introduced Workflows, a core component of CDAP, in comparison with Apache Oozie. In this post we will discuss  the CDAP Workflow engine in greater detail. CDAP Workflows are used to … Read more


Announcing CDAP 3.2 – Hydrator and much more!

Bhooshan Mogal

We are excited to announce the Cask Data Application Platform (CDAP) 3.2 release. This release brings many enhancements to existing CDAP features as well as lays the foundation for upcoming, advanced features—all designed to further simplify data application development. Cask Hydrator CDAP 3.2 introduces Cask Hydrator—a highly functional framework and UI to support self-service batch … Read more