Cask Blog

Cask Tracker Enhanced: Metadata Taxonomy and Data Usage Analytics in CDAP 3.5

Yue Gao and Riwaz Poudyal

Cask Tracker is a self-service CDAP Extension that automatically captures rich metadata and provides users with visibility into how data is flowing into, out of, and within a Data Lake. Tracker was first introduced in CDAP v3.4. Tracker v0.2 has just been released along with CDAP 3.5 and packs a ton of new features. Dataset … Read more


A Look at Automating Cluster Creation in the Cloud with Coopr

David Bajot

Coopr is a cluster provisioning system designed to fully facilitate cluster lifecycle management in public and private clouds. In this blog, we will take an inside look at what happens when Coopr provisions a cluster. Deploying clusters can be time-consuming. For many system deployments, this work can be accomplished with a configuration management tool such … Read more


Hadoop Components Versions in Distros Matrix

The Apache Hadoop ecosystem is always evolving, with the major distributions constantly upgrading their included core Hadoop components. This can present a challenge when building any application which runs on top of Hadoop. When developing our open-source application framework, CDAP, we strive to maintain compatibility with all major Hadoop distributions. Building on our previous reference … Read more


Efficient Use of Hadoop Cluster with YARN Capacity Scheduler

As organizations see an increase in Hadoop adoption, there is a spike in both the number of jobs that are run on a Hadoop cluster, as well as the number of tenants utilizing the cluster. Effectively utilizing a Hadoop cluster becomes important from an administration perspective. Consolidating data and allowing multiple tenants to share a … Read more


Continuuity Loom 0.9.7: Extensible cluster management

Please note: Continuuity is now known as Cask, and Continuuity Loom is now known as the Coopr. In March, we open sourced Continuuity Loom, a system for templatizing and materializing complex multi-tiered application reference architectures in public or private clouds. It is designed bottom-up to support different facets of your organization – from developers, operations … Read more


Running Presto over Apache Twill

Alvin Wang

Please note: Continuuity is now known as Cask, and Continuuity Reactor is now known as the Cask Data Application Platform (CDAP). We open-sourced Apache Twill with the goal of enabling developers to easily harness the power of YARN using a simple programming framework and reusable components for building distributed applications. Twill hides the complexity of … Read more



What do you do at Continuuity, again? Part 2

Please note: Continuuity is now known as Cask, and Continuuity Reactor is now known as the Cask Data Application Platform (CDAP). In our previous post, we introduced the basics of our Continuuity platform by using a simple example of a real-time processing application. In this post, we’ll take a step forward and introduce you to … Read more