The Cask team is here in New York in full force this week, for our biggest conference of the year: Strata + Hadoop World 2016.
But there is more happening in New York this week. IBM today made several important announcements:
- their Open Platform initiative with Apache Hadoop
- an early access view of the new IBM Project DataWorks
- an introduction to DataFirst Services from IBM
IBM’s Project Dataworks is the industry’s first cloud-based data and analytics platform that integrates all types of data to enable AI-powered decision making. With this, companies are able to realize the full promise of data by enabling data professionals to collaborate and build cognitive solutions by combining IBM data and analytics services and a growing ecosystem of data and analytics partners – all delivered on Apache Spark. Project Dataworks is designed to allow for faster development and deployment of data and analytics solutions with self-service user experiences to help accelerate business value.
In conjunction with IBM’s announcements, Cask is announcing that we are joining IBM’s Project DataWorks. This announcement follows upon the certification of Cask’s open source Cask Data Application Platform (CDAP) on the IBM Open Platform (IOP) and on Apache Spark. CDAP is now available on the IBM Marketplace, which means that IBM customers can now accelerate time to value by using CDAP and its self-service extensions Cask Hydrator and Cask Tracker to rapidly build big data solutions.
“IBM is continuing to embrace a community of open source minded partners who deliver unique capabilities in extending the value of Project DataWorks,” said Ritika Gunnar, Vice President of Offering Management IBM Analytics. “Cask represents this profile of partner who is able to help customers embracing Spark and Apache Hadoop to leverage CDAP as their unified integration platform for big data.”
CDAP is a 100% open source framework for rapidly delivering solutions on Hadoop and Spark. CDAP integrates and standardizes the underlying open source technologies from the Hadoop ecosystem to provide a simple and consistent platform to build and run data lakes, as well as full-fledged production data applications, in the cloud or on-premise. With the upcoming release of CDAP 4, the first unified integration platform for big data that we announced last week, we will introduce Cask Market, a “big data app store”, which will simplify the deployment of big data solutions even further through pre-built Hadoop solutions, pipelines, plugins, and reusable templates.
Don’t miss our demo tonight at the DataFirst Launch Event, where we will show CDAP on IOP using a 5 node IOP cluster set up by IBM.
The use case we will be showing is Customer 360 as part of an Enterprise Data Lake architecture, a common use case we see within the Cask customer base. We plan to have two demos for this use case, featuring an e-commerce company running on IOP and using existing IBM products for their website.
In the first demo, we will show a simple end-to-end Data Lake where we will upload sample data, run a SQL query, configure and run a Spark pipeline to aggregate data, view aggregated results, explore metadata, and look at data lineage.
The second demo will be an IBM on Spark demo, where we will upload a Netezza driver and a DB2 driver, create a pipeline to read web logs and join with a DB2 customer database, then perform aggregations in Spark, and then finally write back to Netezza.
So, if you are in town, join us for the IBM DataFirst Launch Event tonight, and check out these cool Cask demos on top of IBM’s Project DataWorks, so you can experience first-hand how building modern data lakes and data apps on IOP and Spark can be much faster and simpler with CDAP. And if you are also attending Strata this week, please meet the Cask team at booth #201. We look forward to seeing you!