Cask Blog

Deploying CDAP packages from source via Coopr

chrisg

Developing features for CDAP follows a similar workflow as working on many projects. Developers have their local checkout of the source, make modifications in a feature branch, build and test locally on their development machines, push their branch, and submit a pull request for code review. During this process, developers build CDAP clusters (for testing) … Read more



Hadoop Vendor OS Support Matrix

chrisg

Developing our open source data application platform, CDAP, which runs on top of Apache™ Hadoop® can be a challenging task. It requires testing of many different configurations, on multiple vendors of Hadoop, and on lots of different distributions of Linux. Setting up and testing all of these configurations can be extremely difficult without a simple reference of supported Linux distributions … Read more


Multitenancy for Hadoop: Namespaces

bhooshan

As a data processing platform, Hadoop‘s popularity today is often attributed to its cost-effectiveness, derived equally from the usage of commodity hardware and from the ability to co-locate work on shared compute and storage resources. Sharing resources allows organizations to maximize the throughput and utilization of a small number of large clusters instead of managing a large … Read more


Data-driven job scheduling in Hadoop

Julien Guery

Triggering the processing of data in Hadoop—as soon as enough new data is available—helps optimize many incremental data processing use-cases, but is not trivial to implement. The ability to schedule a job (such as MapReduce or Spark) to run as soon as there’s a certain amount of unprocessed data available—for instance, in a set of … Read more



How we built it: designing a globally consistent transaction engine

alexb

At Cask, we are committed to contributing back to the open source community. One of our latest open-sourced projects is Tephra, a system that adds complete transaction support to Apache HBase™. As an XA-style transaction system, Tephra is designed to be agnostic to the underlying data stores, so its usage is not limited to HBase. … Read more