Introducing Application Templates for CDAP

Albert Shau is a software engineer at Cask, where he is working to simplify data application development. Prior to Cask, he developed recommendation systems at Yahoo! and search systems at Box.

Application Templates are the major new feature added in CDAP 3.0. In this blog post we will introduce what they are, and the problems they solve.

While building applications, we noticed that CDAP users would sometimes end up deploying multiple applications that all solved the same type of problem. Their code was mostly the same; it was their configuration that differed. They might read from different data sources and write to different destinations, but they were all solving the same type of use case. Rather than deploying multiple copies of similar code, we wanted to be able to deploy code once, then push a different configuration for each new use case. This is why we introduced Application Templates in CDAP 3.0. An Application Template is application code that is reusable through configuration. It lets you support new use cases by pushing configuration instead of pushing code.

Let’s walk through an example to get a better idea of what this looks like. Suppose we want to write an application that reads news feeds, analyzes articles, and attaches a category (sports, finance, etc.) to each article. We decide to write a Worker program to do this:

Worker Program

 

The Worker periodically polls a news feed as input, analyzes and categorizes each article, then writes the article category to a table. Our Worker also supports some configuration options, such as which news feed to read, which analyzers to use, and which categorization algorithm to use. We identify ten news feeds we want to process and want to create one Worker for each feed. Depending on the input feed, we want to adjust the analyzers and categorizer settings to achieve the most accurate results. Prior to CDAP 3.0, we would have had to deploy an Application for each feed. With Application Templates, we can deploy the code once, and then supply ten different configurations to create ten Adapters. An Adapter is an instantiation of an Application Template created through configuration.

Adapter

 

Application Templates are also extensible through plugins. A plugin is an extension of a template that implements an interface expected by the template. Suppose our Application Template defines a pluggable “categorizer” interface. We then implement “svm”, “naive bayes”, and “decision tree” categorizers. The implementation used by an Adapter can then be determined according to its configuration.


Implementation

Plugin code is deployed separately from the Application Template instead of bundled with it. This means you can update your plugins and add new plugins without updating your Application Template. If we implement a fourth categorizer, we can deploy the plugin as its own entity without touching the template code. In this way, the plugin framework allows easy extension of Application Templates. It lets you divide your code into modular, functional pieces that can be developed, managed, and deployed separately.

Included in the CDAP 3.0 release are ETL (Extract, Transform, Load) Application Templates that let you easily ingest data into CDAP without writing any code. We will describe them in more detail in a follow up blog post. Application Templates are still a new feature with a lot more room to grow. Try them out and let us know what you think!

<< Return to Cask Blog