Continuuity & AT&T Labs to Open Source Real-Time Data Processing Framework

Nitin Motgi is Founder and CTO of Cask, where he is responsible for developing the company's long-term technology, driving company engineering initiatives and collaboration. 

Prior to Cask, Nitin was at Yahoo! working on a large-scale content optimization system externally known as C.O.R.E.

Why are we combining our technologies?Today we announced an exciting collaborative effort with AT&T Labs that will facilitate the integration of Continuuity BigFlow, our distributed framework for building durable high-throughput real-time data processing applications, with AT&T’s streaming analytics tool, an extremely fast, low-latency streaming analytics database originally built out of the necessity for managing its network at scale. The outcome of this joint endeavor is to make a new project, codenamed jetStream, available to the market as an Apache-licensed open source project with general availability in the third quarter of 2014.

We decided to bring together the complementary functionality of BigFlow and AT&T’s streaming analytics tool to create a unified real-time framework that combines in-memory stream processing with model-based event processing including direct integration for a variety of existing data systems like Apache HBase™ and HDFS. By combining AT&T’s low-latency and declarative language support with BigFlow’s durable, high-throughput computing capabilities and procedural language support, jetStream provides developers with a new way to take in and store vast quantities of data, build massively scalable applications, and update your applications in real-time as new data is ingested.

Moving to real-time data applications

When you look at the wealth of data being generated and processed, and the opportunity within that data, giving more organizations the ability to make informed, real-time decisions with data is critical. We believe that the next commercial opportunity in big data is moving beyond ad-hoc, batch analysis to a real-time model where applications serve relevant data continuously to business users and consumers.

Open sourcing jetStream and making it available within Continuuity Reactor will enable enterprises and developers to create a wide range of big data analytics and streaming applications that address a broad set of business use cases. Examples of these include network intrusion detection and network analytics, real-time analysis for spam filtering, social media market analysis, location analytics, and real-time recommendation engines that match relevant content to the right users at the right time.

New developer features

By using jetStream, developers will be able to do the following:

  • Direct integration of real-time data ingestion and processing applications with Hadoop and HBase and utilization of YARN for deployment and resource management
  • Framework-level correctness, fault tolerance guarantees, and application logic scalability that reduces friction, errors, and bugs during development
  • A transaction engine that provides delivery, isolation and consistency guarantees that enable exactly-once processing semantics
  • Scalability without increased operational cost of building and maintaining applications
  • Develop pipelines that combine in-memory continuous query semantics with persistent, procedural event processing with simple Java APIs

For more information, please visit jetStream.io.