Programming with Weave, Part I

Terence Yim is a Software Engineer at Cask, responsible for designing and building realtime processing systems on Hadoop/HBase. Prior to Cask, Terence spent over a year at LinkedIn Inc. and seven years at Yahoo!, building high performance large scale distributed systems.

Read first blog post of the Weave series

In this second blog post of the Weave series, we would like to show you how writing a distributed application can be as simple as writing a standalone Java application using Weave.

Writing applications to run on YARN

Before we dive into the details about Weave, let’s talk briefly about what a developer has to do in order to write a YARN application, other than standard MapReduce. A YARN application always consists of three parts:

  1. Application Client
    • Application Client is responsible for submitting the application to YARN.
    • It can query the application status after the application is submitted.
  2. Application Master (AM)
    • The AM is the entry point for the application to negotiate with
      YARN Resource Manager (RM) for Containers on the cluster.

      • A YARN Container is an abstract entity representing resources such as memory,
        CPU cores and network bandwidth available to the process running inside.
    • When a Container is assigned to an application, the AM can run commands inside the Container,
      e.g. start a Java application.
    • It knows when a Container is completed or killed and determines what actions need to be taken when that happens.
    • The AM is responsible for sending heartbeats to RM (Resource Manager)
      so that YARN can be up-to-date on the application status.
  3. Application
    • This is where you put the actual application logic.
      The Application Master is the one which will launch this application,
      potentially multiple instances of them, in the YARN Containers it acquired.

You can find more details here: Writing YARN applications.

Analogy to Standalone Java Application

It is not difficult to see that a YARN application is very similar to what many Java developers are familiar with: Standalone Java Application.

YARN Standalone Java App
Application Client java command that provides options and arguments to the application.
Application Master main() method preparing threads for the application.
Container Application Runnable implementation, where each runs in its own thread.

This close resemblance between the two made us wonder – wouldn’t it be great if you can write a distributed application just like you write a standalone java application? It inspired us to develop Weave with the goal of enabling everyone to write distributed applications and make it as easy as writing threads.

Programming in Weave

So, how simple is it to develop a distributed application on Weave? Let’s take a classic Hello World example:

public class HelloWorld {    
  public static Logger LOG = LoggerFactory.getLogger(HelloWorld.class);

  public static class HelloWorldRunnable extends AbstractWeaveRunnable {
    @Override
    public void run() {
      LOG.info("Hello World. My first distributed application.");
    }
  }

  public static void main(String[] args) throws Exception {
    WeaveRunnerService weaveRunner = 
      new YarnWeaveRunnerService(new YarnConfiguration(), "localhost:2181");
    weaveRunner.startAndWait();

    WeaveController controller = 
      weaveRunner.prepare(new HelloWorldRunnable())
                 .addLogHandler(
                   new PrinterLogHandler(new PrintWriter(System.out, true)))
                 .start();

    Services.getCompletionFuture(controller).get();
  }        
}

As you can see from the example, the effort to make your application run on a YARN cluster is minimal. There are four steps:

  1. Write your application by extending from AbstractWeaveRunnable and implementing the run() method.
  2. Create and start a WeaveRunnerService.
  3. Submit your application through the WeaveRunnerService.
  4. Control your application using WeaveController, for example, wait for completion.

Weave Architecture

If you try to write the HelloWorld application above using YARN API directly, it will require hundreds, if not thousands, of lines of code to do the same thing. We found that many YARN applications have very similar needs:

  • Start, stop, monitor and restart Containers.
  • Collect logs and metrics from live Containers.
  • Elastic scaling by adding or removing Container instances.
  • Distribute commands to live Containers to adjust runtime behavior.
  • Service registry for announcement and discovery.
  • Delegation token generation and renewal when running in a secure Hadoop cluster.

Instead of coding it again and again for every application you write, Weave provides a simple and intuitive API for setting up your application, and a generic Application Master to handle interactions with YARN. This is how the high-level architecture of Weave looks like:

There are several things to highlight in the architecture:

  • An application can contain one or more WeaveRunnable. Each WeaveRunnable can have multiple live instances.
    • Each instance of WeaveRunnable will be running in a separate YARN Container, which has dedicated system resources assigned to it.
    • In the HelloWorld example above, there is only one runnable, but the Weave API allows you to have more.
  • All logs that are emitted from WeaveRunnable through SLF4J API are sent to a Kafka that runs inside the Application Master.
    • In the client API, you can attach a LogHandler to receive logs in real time for further processing.
      In the HelloWorld example, they just get printed to standard output.
  • Zookeeper is used for states, messages and service discovery.
    • When the client application is down and comes back up, it will rediscover all live application states from Zookeeper.
    • Services announcement and discovery are done by ephemeral nodes creation and detection.
    • Any custom command messages can be sent to the application through the WeaveController, which bounces through Zookeeper.

I hope you enjoyed this quick introduction to Weave. In the next blog post, we’ll dive deeper into the Weave API to explore more capabilities of Weave.

Leave a comment