Cask Blog

Stream Views in CDAP

Alvin Wang

In a previous blog post, we outlined how schema-on-read works with streams. Schema-on-read features allows users to decouple data ingestion from exploration. In this post, we will see how users can attach multiple views on the same stream using a feature called stream views. Stream views provide a way to read from the same stream … Read more


CDAP Services for Apache Ambari

Cask is excited to announce easy CDAP integration for Apache Ambari users. Previously, we introduced you to integration with Cloudera Manager. This post will familiarize you with integration with Apache Ambari, the open source provisioning system for HDP (Hortonworks Data Platform). Adding the CDAP service to Ambari To install CDAP on a cluster managed by … Read more


Multiple Outputs in CDAP

Ali Anwar

In CDAP, a MapReduce program can interact with a CDAP dataset by using it as an input or an output. Before CDAP 3.2.0 users could only have single dataset as the output of a MapReduce job. We wanted to extend this capability and allow writing to multiple datasets from MapReduce jobs to support the following … Read more


CDAP Workflows: A closer look

Sagar Kapare

The Cask Data Application Platform (CDAP) is an open-source platform to build and deploy data applications on Apache Hadoop™. In a previous blog post we introduced Workflows, a core component of CDAP, in comparison with Apache Oozie. In this post we will discuss  the CDAP Workflow engine in greater detail. CDAP Workflows are used to … Read more


Fluentd & Docker, Turbocharging CDAP Apps, and Building a Data Science Platform — all of this next week!

After two very successful Big Data Apps meetups, Cask will be hosting the third Big Data Application meetup on October 14, 2015 at Cask HQ in Palo Alto. This meetup includes an exciting lineup of talks: The first talk will be about Integrating Fluentd and Docker. John Hammink from Treasure Data will introduce us to … Read more