Learning CDAP with Elasticsearch

Ashley Taylor

Elasticsearch is a popular search engine based on Apache Lucene™. Unlike relational databases, Elasticsearch stores information in documents; each document has a type (with a mapping) that gives information about its schema, and similar documents are stored together in an index. Elasticsearch even allows time-based indices, so documents can be stored with other records created … Read more



Caskalytics: Multi-log Analytics Application

This summer, we joined Cask as interns to work on Cask Data Application Platform (CDAP). Our project, internally codenamed Caskalytics, was creating an internal Operational Data Lake. In reality, a data lake is just a concept focused on storing data from disparate sources (real-time or batch, structured, unstructured or semi-structured) in a single big data … Read more


Join us for the 2nd Big Data Application Meetup

Henry Saputra

Cask is proud to host the second Big Data Application Meetup on August 19, 2015 at Cask HQ in Palo Alto. By sponsoring and promoting knowledge-sharing and community-building through the Big Data Application Meetup, Cask continues to take lead in promoting technologies and best practices used to build big data applications. For the second meetup, we have … Read more



Java Class Loading and Distributed Data Processing Frameworks

Java class loading is one of the most fundamental and powerful concepts provided by the Java Platform. Understanding the class loading mechanism helps you when designing and building extensible application frameworks. You can also avoid spending many hours in debugging exceptions such as ClassCastException and ClassNotFoundException, among others. In this post, we will talk about … Read more


AeroCask – Real-time Flight Data Analytics using CDAP

One of the many things that I love about Cask are the hackathons before every release. It is not only a way for us to dog-food new features in the CDAP platform but it is also an opportunity to let your imagination run loose and implement an integration with another system; or develop an interesting … Read more


CDAP 3.1 adds MapR support, Spark integration, enhanced Datasets and much more!

Shankar Selvam

We are excited to announce the release of the Cask Data Application Platform (CDAP) v3.1.0.  In this release we have added support for MapR, that provides users with more distro choice when using  CDAP. Furthermore, this release expands our footprint to support CDH 5.4, HDP 2.2 and Apache Hadoop with Hbase 1.0 and Hive 1.1. … Read more



A Look at Automating Cluster Creation in the Cloud with Coopr

David Bajot

Coopr is a cluster provisioning system designed to fully facilitate cluster lifecycle management in public and private clouds. In this blog, we will take an inside look at what happens when Coopr provisions a cluster. Deploying clusters can be time-consuming. For many system deployments, this work can be accomplished with a configuration management tool such … Read more