Cask Blog

Multitenancy for Hadoop: Namespaces – Part II

Bhooshan Mogal

We introduced the concept of namespaces and how it helps to bring multitenancy to Apache Hadoop in a previous blog. We also briefly introduced the use of namespaces in CDAP,  leaving out the implementation details. In this blog we’ll discuss some of the requirements that influenced the design of namespaces in CDAP, as well as … Read more


Hadoop Components Versions in Distros Matrix

The Apache Hadoop ecosystem is always evolving, with the major distributions constantly upgrading their included core Hadoop components. This can present a challenge when building any application which runs on top of Hadoop. When developing our open-source application framework, CDAP, we strive to maintain compatibility with all major Hadoop distributions. Building on our previous reference … Read more


CDAP 3.0 – From Zero to App in 5 minutes

The Cask Data Application Platform (CDAP) was created with the intent of empowering all developers to build data applications. It was, is and always will be a developer platform – a platform with the mission to provide developers with simple access to power technology. CDAP has proven to significantly lower the barriers to building Hadoop … Read more



Countdown to HBaseCon 2015

Conference season is in full swing, and we at Cask could not be more excited about the upcoming HBaseCon 2015 on May 7th in San Francisco!  This is the fourth annual Apache HBase™ community conference, and a chance for us to share the latest developments from our open source Cask Data Application Platform (CDAP).  While … Read more


Efficient Use of Hadoop Cluster with YARN Capacity Scheduler

As organizations see an increase in Hadoop adoption, there is a spike in both the number of jobs that are run on a Hadoop cluster, as well as the number of tenants utilizing the cluster. Effectively utilizing a Hadoop cluster becomes important from an administration perspective. Consolidating data and allowing multiple tenants to share a … Read more


Deploying CDAP packages from source via Coopr

Developing features for CDAP follows a similar workflow as working on many projects. Developers have their local checkout of the source, make modifications in a feature branch, build and test locally on their development machines, push their branch, and submit a pull request for code review. During this process, developers build CDAP clusters (for testing) … Read more



Hadoop Vendor OS Support Matrix

Developing our open source data application platform, CDAP, which runs on top of Apache™ Hadoop® can be a challenging task. It requires testing of many different configurations, on multiple vendors of Hadoop, and on lots of different distributions of Linux. Setting up and testing all of these configurations can be extremely difficult without a simple reference of supported Linux distributions … Read more


Multitenancy for Hadoop: Namespaces

Bhooshan Mogal

As a data processing platform, Hadoop‘s popularity today is often attributed to its cost-effectiveness, derived equally from the usage of commodity hardware and from the ability to co-locate work on shared compute and storage resources. Sharing resources allows organizations to maximize the throughput and utilization of a small number of large clusters instead of managing a large … Read more