CDAP 3.1 adds MapR support, Spark integration, enhanced Datasets and much more!

Shankar Selvam

Shankar Selvam is a software engineer at Cask where he is building software to enable the next generation of data applications. Prior to Cask, he worked on Hadoop and HBase performance evaluation/analysis at Intel.

Shankar Selvam

We are excited to announce the release of the Cask Data Application Platform (CDAP) v3.1.0.  In this release we have added support for MapR, that provides users with more distro choice when using  CDAP. Furthermore, this release expands our footprint to support CDH 5.4, HDP 2.2 and Apache Hadoop with Hbase 1.0 and Hive 1.1.

In a previous release of CDAP we introduced Spark integration as an experimental feature, with Spark programs running in standalone mode only.  We are now proud to support Spark 1.2 and 1.3 for distributed CDAP.  This means that CDAP users will have a wider choice of processing paradigms with the ability to run MapReduce, Realtime, Spark on production use-cases.

In addition we made number of improvements to CDAP with v3.1, including

  • Enabling Workflow token persistence
  • Custom and system metadata for fileset partitions
  • Incremental processing in workflows for partitioned filesets
  • Ability to consume existing files in HDFS as CDAP datasets
  • An quick and easy way to create real-time and batch ETL pipelines via the UI.

A complete list of new features, improvement, and bug fixes available in this release can be found in the Release Notes.

CDAP v3.1 also introduces an easy way to create real-time and batch ETL pipelines via the UI, which makes it very easy to set up and configure your Realtime or Batch ETL pipelines.

Screen Shot 2015-08-03 at 7.03.21 PM

 

Check out CDAP 3.1 (download here), give it a whirl and let us know your feedback. Help us make CDAP better by sending us your questions or suggestions to CDAP user group.

<< Return to Cask Blog