Introducing Application Templates for CDAP

Application Templates are the major new feature added in CDAP 3.0. In this blog post we will introduce what they are, and the problems they solve. While building applications, we noticed that CDAP users would sometimes end up deploying multiple applications that all solved the same type of problem. Their code was mostly the same; … Read more

Metrics System for Data Application Platform

Collecting metrics and providing access to metrics is a must-have requirement for any application platform — and is even more important when it comes to distributed systems. In this post we will examine some aspects of designing metrics systems for a distributed application platform and take a brief look at one built for the open … Read more

Efficient Use of Hadoop Cluster with YARN Capacity Scheduler

As organizations see an increase in Hadoop adoption, there is a spike in both the number of jobs that are run on a Hadoop cluster, as well as the number of tenants utilizing the cluster. Effectively utilizing a Hadoop cluster becomes important from an administration perspective. Consolidating data and allowing multiple tenants to share a … Read more

Deploying CDAP packages from source via Coopr

Developing features for CDAP follows a similar workflow as working on many projects. Developers have their local checkout of the source, make modifications in a feature branch, build and test locally on their development machines, push their branch, and submit a pull request for code review. During this process, developers build CDAP clusters (for testing) … Read more

Weblog Analytics on Apache Hadoop™

Hadoop provides specialized tools and technologies that can be used for transporting and processing huge amounts of weblog data. In this blog, we’ll explore the end-to-end process of aggregating logs, processing them and generating analytics on Hadoop to gain insights about how users interact with your website. With the digitization of the world, generating knowledge … Read more