Cask Blog

Julien Guery

Julien Guery was a software engineer at Cask where he built software fueling the next generation of Big Data applications. Prior to Cask, Julien was a graduate student at Telecom Bretagne in France where he studied information technologies.

Data-driven job scheduling in Hadoop

Julien Guery

Triggering the processing of data in Hadoop—as soon as enough new data is available—helps optimize many incremental data processing use-cases, but is not trivial to implement. The ability to schedule a job (such as MapReduce or Spark) to run as soon as there’s a certain amount of unprocessed data available—for instance, in a set of … Read more