Data-driven job scheduling in Hadoop

Julien Guery

Data-driven job scheduling in Hadoop

Julien Guery

Triggering the processing of data in Hadoop—as soon as enough new data is available—helps optimize many incremental data processing use-cases, but is not trivial to implement. The ability to schedule a job (such as MapReduce or Spark) to run as soon as there's a certain amount of unprocessed data available—for instance, in a set of

CDAP v2.8.0 is out in the wild

CDAP v2.8.0 is out in the wild

I am very happy to announce that the latest release of our flagship product – the Cask Data Application Platform (CDAP) – v2.8.0 is now available for everyone to download. This release has a bunch of cool features that our customers, partners and the community want: Namespaces (provides application and data isolation that enables multi-tenancy)