Cask Blog

Combining Hadoop and Spark in a Data Processing Pipeline

Tony Duarte

  CDAP includes an Application Development Framework so that Developers can build entire Applications with existing Big Data technologies – technologies such as Apache Hadoop, Apache Spark, Apache HBase, Apache Hive and more. CDAP has been used by Fortune 50 customers to help them do Data Ingestion and Data Egress from their data lakes and to help them … Read more


CDAP 4.1 – More Enterprise-Grade Hardening, Pre-Built Solutions and Enhanced UX

Nishith Nand

We are happy to announce the release of Cask Data Application Platform (CDAP) version 4.1. This new release brings with it some major enhancements and significant new capabilities in the platform, as well as new, ready-to-use solutions offered via Cask Market. CDAP 4.1 improves security by allowing fine grained secure impersonation. It introduces replication so … Read more



Monitoring Key Hadoop Operational Statistics using CDAP

Bhooshan Mogal

The Cask Data Application Platform (CDAP) is the first Unified Integration Platform for Big Data. It provides users with higher level abstractions and APIs over complex, low-level systems for building  Big Data applications. It does the heavy lifting involved in integrating various platforms in the Apache Hadoop ecosystem, to provide a single end-to-end platform. To … Read more


CDAP 4 – Introducing Cask’s Big Data App Store, Cask Market, plus Cask Wrangler, a new UI and more

Vinisha Vyasa

We are very happy to introduce the general availability of the 4th generation of Cask’s flagship product – CDAP 4. This release builds on what we learned over the past few years from our users and the community. This post summarizes the major enhancements in CDAP 4, namely, New & Revamped User Experience, Cask’s “Big … Read more


Integrating CDAP with Microsoft Azure HDInsight

We recently announced the integration of CDAP with the Microsoft Azure HDInsight platform. This post will give a behind-the-scenes look at this integration. First, a bit about the integration itself. Azure HDInsight is an Apache Hadoop and Spark distribution powered by the cloud. This means that it handles any amount of data, scaling from terabytes … Read more


Actions in Cask Hydrator

Chris Lu

This summer as an intern at Cask, I had the opportunity to work on Cask Hydrator. Since its launch in 2015, Cask Hydrator has been a broadly used and important application on CDAP to help users easily build and run big data pipelines. I helped evolve Hydrator further by adding the Action function to it. … Read more



Securely storing sensitive information and using it in CDAP

Nishith Nand

The need for Secure Store CDAP and Hadoop cluster admins often need to store sensitive information like passwords, access tokens, private keys etc. which is used to access resources during the course of cluster operation. This data needs to be stored in a way that it is protected from unauthorized access but is still available to … Read more


Cask Tracker Enhanced: Metadata Taxonomy and Data Usage Analytics in CDAP 3.5

Yue Gao and Riwaz Poudyal

Cask Tracker is a self-service CDAP Extension that automatically captures rich metadata and provides users with visibility into how data is flowing into, out of, and within a Data Lake. Tracker was first introduced in CDAP v3.4. Tracker v0.2 has just been released along with CDAP 3.5 and packs a ton of new features. Dataset … Read more