What do you do at Continuuity, again?

Alex Baranau is a software engineer at Cask where he is responsible for building and designing software fueling the next generation of Big Data applications. Alex is a contributor to HBase and Flume, and has created several open-sourced projects. He also writes frequently about Big Data technologies.

As Continuuity gets more traction, my friends ask me about what I do at Continuuity. The short answer is – we’ve created a platform which makes building Big Data applications easier. Let me try to give you more details with a short example. Let’s imagine you need to implement an app.

The app

This example app is very simple. I will not reason why one should implement it. The point of the example is to let me walk you through the developer experience of implementing a Big Data app using Continuuity Reactor. Functionality of the example app:
  • There’s a website.
  • The website will run ads for active users.
Additional feature request:
  • Detect active users and show them content while they are still browsing the web-site.
So, the example that we will go through in this post (“the app”) should do the following:
  • Detect active users in real-time.
  • Add detected users to another system that we will use to generate content for web pages.

Step 1: This is easy!

The app seems to be very simple – it should be easy to implement. Here’s one way to do it in Java (it takes less than a minute to code):
public class ActiveUsersTracker {
  private static Map<String, Integer> userActivity = new HashMap<String, Integer>();

  public static void processPageView(String user) {
    if (isNewActiveUser(user)) {
      ContentSystem.addNewActiveUser(user);
    }
  }

  private static boolean isNewActiveUser(String user) {
    Integer pageViews = userActivity.get(user);
    if (pageViews == null) {
      userActivity.put(user, 1);
      return false;
    }

    if (pageViews >= 100) {
      return false;
    }

    pageViews++;
    userActivity.put(user, pageViews);

    return pageViews == 100;
  }
}
There’s one tiny problem though. It’s not production ready since it cannot handle large volumes of data. To be production ready:
  • It should be horizontally scalable and should handle large volumes of streaming data.
  • It should be fault-tolerant.
There are many other requirements but I will ignore them for now. At the very least we need to solve these issues:
  • How do you run & manage it?
  • How do you get data in?
  • How do you scale computations?
  • Where and how to persist data (of “billions” of users)?
These are not easy problems to solve unless you are using the right tool. Let’s try to convert the app I just wrote into a Continuuity Reactor app to meet the above requirements (and many others as we will see).

Step 2: Make It real!

Here is the application code:
 public class ActiveUsersApp implements Application {
    @Override
    public ApplicationSpecification configure() {
      return ApplicationSpecification.Builder.with()
        .setName("ActiveUsersApp").setDescription("Tracks active users")
        .withStreams().add(new Stream("pageViews"))
        .withDataSets().add(new Table("userActivity"))
        .withFlows().add(new ActiveUsersFlow())
        .noProcedure().noMapReduce().noWorkflow().build();
    }
  }

  class ActiveUsersFlow implements Flow {
    @Override
    public FlowSpecification configure() {
      return FlowSpecification.Builder.with()
        .setName("ActiveUsersFlow").setDescription("Tracks active users by page views")
        .withFlowlets().add("tracker", new ActiveUsersTracker(), 4)
        .connect().fromStream("pageViews").to("tracker").build();
    }
  }

  class ActiveUsersTracker extends AbstractFlowlet {
    @UseDataSet("userActivity")
    private Table userActivity;

    @ProcessInput
    public void processPageView(StreamEvent event) {
      String user = event.getHeaders().get("user");
      if (isNewActiveUser(user)) {
        ContentSystem.addNewActiveUser(user);
      }
    }

    private boolean isNewActiveUser(String user) {
      int pageViews = userActivity.get(new Get(user)).getInt("pageViews", 0);

      if (pageViews >= 100) {
        return false;
      }

      pageViews++;
      userActivity.put(new Put(user, "pageViews", pageViews));

      return pageViews == 100;
    }
  }
This code doesn’t seem to be very different from what I wrote earlier. ActiveUsersTracker is almost identical to the first version. ActiveUsersApp has a definition of the components of our app. ActiveUsersFlow defines our real-time computation component (flow), which in this case has one processing unit (flowlet). Note: This is not a pseudo-code. This is the real full code for the app. Nothing is hidden, not even a small configuration file (apart from ContentSystem, which is external to the app). Let’s take a closer look at the code to see how it meets our requirements:

Get data in

In FlowSpecification, we connected ActiveUsersTracker to a stream pageViews. Based on this, the Continuuity Reactor will create a HTTP endpoint to which you can send the data. Whatever you POST to it will be reliably transferred to ActiveUsersTracker flowlet. For every event, processPageView method will be invoked.

Store high volumes of data

ActiveUsersTracker has a field called userActivity of type Table (instead of a HashMap we used previously). Table is a storage type available to your Continuuity Reactor app. In a distributed environment, data is persisted in HBase, which is a distributed storage platform that allows you to store many terabytes of data with fast random access to it. And accessing HBase through Table in your app code is as simple as dealing with a HashMap, which will be injected automatically when the flowlet is started.

Horizontal scaling

FlowSpecification tells Reactor to start four instances of ActiveUsersTracker flowlet. In a distributed environment this means starting four containers on your YARN cluster. Each container is a separate process with dedicated resources assigned to it. The Continuuity Reactor will handle all logistics including data transfer between containers. You can change the number of running instances on-the-fly without restarting the app by using the nice UI, CLI or REST API that comes with the Continuuity Reactor.

Handling failures

Every invocation of processPageView method happens inside a transaction. This means that you have ACID guarantees when writing to userActivity table. It also means that if for some reason processPageView fails (e.g. throws an exception), the processing will be re-tried automatically. Continuuity Reactor manages instances of the flowlets to ensure they are up and doing their job. If the process fails or the server goes down, Reactor will restart the flowlet without any data loss and without leaving an inconsistent state. Finally, the data is stored in distributed storage (HBase, HDFS), which was designed to be resilient to different kinds of failures.

Running and managing the app

Continuuity Reactor is an application server. There are numerous ways to deploy an app: dragging and dropping via the UI, invoking CLI command or using REST API. You have all these interfaces to manage the app too. In the interest of keeping this post brief, I will stop here and will leave you with this summary – Continuuity Reactor gives you all the needed tools and APIs to deploy, run, operationalize and debug your application.

Step 3: Make It Awesome!

I’ll skip this section for now. If I got you interested and you want to know more about what we are doing, stay tuned for the next post in this series.

The Promise

Let me use this opportunity to answer some other (more important) questions. Why do I think this product I am working on is important? How does it make this world better? I strongly believe that making well-thought-out decisions is essential for the world to become a better place. Analyzing a lot of data helps to make such decisions. The more people can analyze data, the more well-thought-out decisions will be made. The Continuuity Reactor allows any Java developer (who knows how to use HashMap) to work with large volumes of data. And that is a lot of people! :)
Leave a comment