A Hydrator Python Transform for Python nerds like you and me!

John Jackson

Before every CDAP release, we at Cask conduct an internal hackathon to use CDAP and work on interesting features. A few Cask engineers got together and, wanting to open up the capabilities of Cask Hydrator beyond Java developers, decided to build a transformation that uses user-written Python. Beginning with CDAP release 3.2, the CDAP UI … Read more

Running Legacy MapReduce Jobs in CDAP

Rohit Sinha

The Cask Data Application Platform is an integrated developer platform for the Hadoop ecosystem. With CDAP, developers can address a broader set of batch and real-time use-cases with easy-to-use abstractions. Developers can write MapReduce programs using CDAP and deploy them as CDAP applications easily, as explained in this guide. Running MapReduce programs inside CDAP has … Read more

Stream Views in CDAP

Alvin Wang

In a previous blog post, we outlined how schema-on-read works with streams. Schema-on-read features allows users to decouple data ingestion from exploration. In this post, we will see how users can attach multiple views on the same stream using a feature called stream views. Stream views provide a way to read from the same stream … Read more

CDAP Services for Apache Ambari

Cask is excited to announce easy CDAP integration for Apache Ambari users. Previously, we introduced you to integration with Cloudera Manager. This post will familiarize you with integration with Apache Ambari, the open source provisioning system for HDP (Hortonworks Data Platform). Adding the CDAP service to Ambari To install CDAP on a cluster managed by … Read more

Multiple Outputs in CDAP

Ali Anwar

In CDAP, a MapReduce program can interact with a CDAP dataset by using it as an input or an output. Before CDAP 3.2.0 users could only have single dataset as the output of a MapReduce job. We wanted to extend this capability and allow writing to multiple datasets from MapReduce jobs to support the following … Read more

Fluentd & Docker, Turbocharging CDAP Apps, and Building a Data Science Platform — all of this next week!

After two very successful Big Data Apps meetups, Cask will be hosting the third Big Data Application meetup on October 14, 2015 at Cask HQ in Palo Alto. This meetup includes an exciting lineup of talks: The first talk will be about Integrating Fluentd and Docker. John Hammink from Treasure Data will introduce us to … Read more

SockJS + $resource = Awesomeness!

Ajai Narayanan

The Cask Data Application Platform (CDAP) is an open-source platform to build and deploy data applications on Apache Hadoop™. As of version 3.0, it includes a slick new user interface to help users deploy, manage and monitor their data applications. This UI provides real-time updates from the CDAP backend. Problem Statement Initiating too many HTTP … Read more