Cask Blog

Cask Tracker Enhanced: Metadata Taxonomy and Data Usage Analytics in CDAP 3.5

Yue Gao and Riwaz Poudyal

Cask Tracker is a self-service CDAP Extension that automatically captures rich metadata and provides users with visibility into how data is flowing into, out of, and within a Data Lake. Tracker was first introduced in CDAP v3.4. Tracker v0.2 has just been released along with CDAP 3.5 and packs a ton of new features. Dataset … Read more


Pachyderm and TubeMogul Share Their Big Data Application Platforms and Experience

Russ Savage

It was so great to see everyone at the Big Data Applications Meetup last week! The meetup was sponsored by Cask, the company making big data applications easy, and by Ampool, and we would like to thank Milind Bhandarkar, the Founder and CEO of Ampool, for supporting this event. For those that couldn’t join us, we … Read more




Fluentd & Docker, Turbocharging CDAP Apps, and Building a Data Science Platform — all of this next week!

After two very successful Big Data Apps meetups, Cask will be hosting the third Big Data Application meetup on October 14, 2015 at Cask HQ in Palo Alto. This meetup includes an exciting lineup of talks: The first talk will be about Integrating Fluentd and Docker. John Hammink from Treasure Data will introduce us to … Read more


Learning CDAP with Elasticsearch

Ashley Taylor

Elasticsearch is a popular search engine based on Apache Lucene™. Unlike relational databases, Elasticsearch stores information in documents; each document has a type (with a mapping) that gives information about its schema, and similar documents are stored together in an index. Elasticsearch even allows time-based indices, so documents can be stored with other records created … Read more


Join us for the 2nd Big Data Application Meetup

Henry Saputra

Cask is proud to host the second Big Data Application Meetup on August 19, 2015 at Cask HQ in Palo Alto. By sponsoring and promoting knowledge-sharing and community-building through the Big Data Application Meetup, Cask continues to take lead in promoting technologies and best practices used to build big data applications. For the second meetup, we have … Read more


A Look at Automating Cluster Creation in the Cloud with Coopr

David Bajot

Coopr is a cluster provisioning system designed to fully facilitate cluster lifecycle management in public and private clouds. In this blog, we will take an inside look at what happens when Coopr provisions a cluster. Deploying clusters can be time-consuming. For many system deployments, this work can be accomplished with a configuration management tool such … Read more


Join us for the new Big Data Application Meetup

Cask is determined to help big data application developers on their journey of building and deploying Hadoop solutions. We’re happy to announce a new meetup for the developer community—the Big Data Application Meetup—a group for everyone interested in building applications using Apache Hadoop™ and other open-source, big data technologies. Meetup topics will be focused on … Read more


Deploying CDAP packages from source via Coopr

Developing features for CDAP follows a similar workflow as working on many projects. Developers have their local checkout of the source, make modifications in a feature branch, build and test locally on their development machines, push their branch, and submit a pull request for code review. During this process, developers build CDAP clusters (for testing) … Read more