Cask Blog




Weblog Analytics on Apache Hadoop™

Hadoop provides specialized tools and technologies that can be used for transporting and processing huge amounts of weblog data. In this blog, we’ll explore the end-to-end process of aggregating logs, processing them and generating analytics on Hadoop to gain insights about how users interact with your website. With the digitization of the world, generating knowledge … Read more


Multitenancy for Hadoop: Namespaces

Bhooshan Mogal

As a data processing platform, Hadoop‘s popularity today is often attributed to its cost-effectiveness, derived equally from the usage of commodity hardware and from the ability to co-locate work on shared compute and storage resources. Sharing resources allows organizations to maximize the throughput and utilization of a small number of large clusters instead of managing a large … Read more


Data-driven job scheduling in Hadoop

Julien Guery

Triggering the processing of data in Hadoop—as soon as enough new data is available—helps optimize many incremental data processing use-cases, but is not trivial to implement. The ability to schedule a job (such as MapReduce or Spark) to run as soon as there’s a certain amount of unprocessed data available—for instance, in a set of … Read more


CDAP v2.8.0 is out in the wild

I am very happy to announce that the latest release of our flagship product – the Cask Data Application Platform (CDAP) – v2.8.0 is now available for everyone to download. This release has a bunch of cool features that our customers, partners and the community want: Namespaces (provides application and data isolation that enables multi-tenancy) … Read more


How we built it: designing a globally consistent transaction engine

At Cask, we are committed to contributing back to the open source community. One of our latest open-sourced projects is Tephra, a system that adds complete transaction support to Apache HBase™. As an XA-style transaction system, Tephra is designed to be agnostic to the underlying data stores, so its usage is not limited to HBase. … Read more


Strata + Hadoop World NYC 2014 Recap: Four Trends in Hadoop

The Cask team had a great and productive time at Strata + Hadoop World earlier this month in New York City! We are very optimistic about the robust growth in Hadoop adoption, increased participation from a broad range of developers and companies in many industries, and continued maturation in the early days of this technology. As … Read more


Introducing Tigon: Real-time streaming for the real world

In collaboration with AT&T Labs, today we are releasing version 0.2.0 of the open source Tigon project, a real-time streaming analytics framework for Hadoop based on technology contributed by both companies. By combining AT&T’s low-latency and declarative language support with our durable, high-throughput computing capabilities and procedural language support, Tigon provides developers with a new … Read more