SockJS + $resource = Awesomeness!

Ajai Narayanan

The Cask Data Application Platform (CDAP) is an open-source platform to build and deploy data applications on Apache Hadoop™. As of version 3.0, it includes a slick new user interface to help users deploy, manage and monitor their data applications. This UI provides real-time updates from the CDAP backend. Problem Statement Initiating too many HTTP … Read more

Announcing CDAP 3.2 – Hydrator and much more!

Bhooshan Mogal

We are excited to announce the Cask Data Application Platform (CDAP) 3.2 release. This release brings many enhancements to existing CDAP features as well as lays the foundation for upcoming, advanced features—all designed to further simplify data application development. Cask Hydrator CDAP 3.2 introduces Cask Hydrator—a highly functional framework and UI to support self-service batch … Read more

A Data Quality Application Template for CDAP

Shilpa Subrahmanyam

One of Cask’s core goals is making a reasonably-experienced Java developer’s life much easier when building Hadoop applications. My summer project was aligned with the company’s effort to take this to the next level by lowering the barrier to entry for using Hadoop even further — Java proficiency not required. I spent my summer writing … Read more

Learning CDAP with Elasticsearch

Ashley Taylor

Elasticsearch is a popular search engine based on Apache Lucene™. Unlike relational databases, Elasticsearch stores information in documents; each document has a type (with a mapping) that gives information about its schema, and similar documents are stored together in an index. Elasticsearch even allows time-based indices, so documents can be stored with other records created … Read more

Caskalytics: Multi-log Analytics Application

This summer, we joined Cask as interns to work on Cask Data Application Platform (CDAP). Our project, internally codenamed Caskalytics, was creating an internal Operational Data Lake. In reality, a data lake is just a concept focused on storing data from disparate sources (real-time or batch, structured, unstructured or semi-structured) in a single big data … Read more