Introducing Tigon: Real-time streaming for the real world

Gokul Gunaskeran is a software engineer at Cask where he is building software to enable the next generation of data applications. Prior to Cask, he worked on Architecture Performance and Workload Analysis at Oracle.

In collaboration with AT&T Labs, today we are releasing version 0.2.0 of the open source Tigon project, a real-time streaming analytics framework for Hadoop based on technology contributed by both companies. By combining AT&T’s low-latency and declarative language support with our durable, high-throughput computing capabilities and procedural language support, Tigon provides developers with a new way to take in, process and store vast quantities of data, build massively scalable applications, and update applications in real-time as new data is ingested.


What is Tigon?

Tigon is a native YARN application that tightly integrates with HDFS and HBase for persistence and utilizes Tephra for transactions. It has an exactly-once processing guarantee so your logic can safely perform non-idempotent operations. Tigon offers a simple, efficient and cost-effective way for developers to create a diverse range of apps that address a broad set of use cases such as network intrusion detection and analytics, social media market analysis, location analytics, and real-time recommendation engines that match relevant content to the right users at the right time.


Tigon Architecture

Screen Shot 2014-10-16 at 9.33.49 AM



Blended real-time & batch processing

When you consider the wealth of data being generated and processed by organizations today and the opportunity within that data, giving more developers and companies the ability to make informed, real-time decisions with data is critical. Tigon combines the performance and flexibility of traditional SQL-based in-memory complex event processing (CEP) with distributed Java-based persistent and transactional event processing. By blending ad-hoc, batch analysis and a real-time model where applications serve relevant data continuously to business users and consumers, real-time recommendations can be guided by historical insights, creating more business value.


A better developer experience

Tigon supports CEP-like continuous query semantics in a SQL-like declarative language as well as model-based, discrete event semantics in an imperative Java API. Here’s how this works:

  1. Tigon Applications are referred to as Flows. A Flow can be logically represented as a Directed Acyclic Graph, with each node representing a processing unit. These processing units are referred to as Flowlets. The data flow between the processing units happen through Queues.
  2. Flows are spun up by Tigon as YARN applications using Twill. This provides the runtime elastic scalability of Flows. Additional instances of Flowlet containers can be spun up with simple CLI commands even when the Flow is running.
  3. Flowlets can store data in HBase with ACID properties using Tephra.
  4. Tigon SQL, an in-memory stream processing library, ships with the Tigon project. Users can leverage this library to ingest massive amounts of data streams into their Hadoop/HBase cluster and perform CEP-like continuous query semantics with a SQL-like declarative language.


With the release of Tigon, we believe we can fill critical gaps in the Hadoop ecosystem.

Traditional, proprietary CEP technology is costly, difficult to operate, and often provides a subpar developer experience. On the other hand, existing open source alternatives often lack the framework-level correctness, fault tolerance guarantees, and application logic scalability that reduces friction, errors, and bugs during development. 

We are very excited to release Tigon as the first technology of its kind to be open sourced. We hope that you will join and contribute to the Tigon community. If you are interested, check out our open source community site at

Finally, in case you missed it, today we also released Coopr Cloud, a hosted version of our open source cluster management software that provisions, manages and scales clusters on public clouds and private clouds. You can read more about it in our blog post.

<< Return to Cask Blog