Programming with Apache Twill*, Part II

Terence Yim is a Software Engineer at Cask, responsible for designing and building realtime processing systems on Hadoop/HBase. Prior to Cask, Terence worked at both LinkedIn and Yahoo!, building high performance large scale distributed systems.

Please note: Continuuity is now known as Cask, and Continuuity Reactor is now known as the Cask Data Application Platform (CDAP).

In the Programming with Weave (now Apache Twill), Part I blog post, we introduced the basics of writing a distributed application on Hadoop YARN using Twill. In this post, we are going to highlight some of the important features in Twill.

Resources specification

There are situations where you will need more than one instance of your application. For example, you are using Twill to run a cluster of Jetty Web Servers. Moreover, different applications would have different requirements on system resources, such as CPU and memory. By default, Twill starts one container per TwillRunnable with 1 virtual core and 512MB of memory. You could, however, customize it when starting your application through TwillRunner. For example, you can specify 5 instances, each with 2 virtual cores and 1GB of memory by doing this:

TwillRunner twillRunner = ...
twillRunner.prepare(new JettyServerTwillRunnable(), 
                                         .setMemory(1, SizeUnit.GIGA)

Notice that this specifies virtual cores and not actual CPU cores. The mapping is defined in the YARN configuration and the allowable virtual core values are governed by yarn.scheduler.minimum-allocation-vcores and yarn.scheduler.maximum-allocation-vcores.

(Read this post to learn how to enable virtual core support in YARN.)

Multiple runnables

Just like you can have multiple threads doing different things, you can have multiple TwillRunnable in your application. All you need to do is implement the TwillApplication interface and specify the runnables that constitute your application. Say your application contains a Jetty server and a log processing daemon, your TwillApplication will look something like this:

public class MyTwillApplication implements TwillApplication {

  public TwillSpecification configure() {
    return TwillSpecification.Builder.with()
        .add("jetty", new JettyServerTwillRunnable()).noLocalFiles()
        .add("logdaemon", new LogProcessorTwillRunnable()).noLocalFiles()

Notice that the call to anyOrder() specifies that every TwillRunnable in this application can be started in no particular order. If there are dependencies between runnables, you can specify the ordering like this:

// To have Log processing daemon starts before Jetty server
  .add("jetty", new JettyServerTwillRunnable()).noLocalFiles()
  .add("logdaemon", new LogProcessorTwillRunnable()).noLocalFiles()

File localization

One nice feature in YARN is that it can copy HDFS files to a container’s working directory on local disk, which is an efficient way to distribute files needed by containers across the cluster. Here is an example of how to do so in Twill:

  .add("jetty", new JettyServerTwillRunnable())
      // Distribute local file "index.html" to container running the Jetty server
      .add("index.html", new File("index.html"))  

      // Distribute and expand contents in local archive "images.tgz" 
      // to the container "images" directory
      .add("images", new File("images.tgz"), true)

      // Distribute HDFS file "site-script.js" to container file named "script.js".
      // "fs" is the Hadoop FileSystem object
      .add("script.js", fs.resolvePath(new Path("site-script.js")).toUri())

In Twill, the file that needs to be localized doesn’t need to be on HDFS. It can come from a local file, or even an external URL. Twill also supports archive auto-expansion and file rename. If no file needs to be localized, simply call noLocalFile() when adding the TwillRunnable.


Just like a standalone application, you may want to pass arguments to alter the behavior of your application. In Twill, you can pass arguments to the individual TwillRunnable as well as to the whole TwillApplication. Arguments are passed when launching the application through TwillRunner:

TwillRunner twillRunner = ...
twillRunner.prepare(new MyTwillApplication())
           // Application arguments will be visible to all runnables

           // Arguments only visible to instance of a given runnable.
           .withArguments("jetty", "--threads", "100")
           .withArguments("logdaemon", "--retain-logs", "5")

The arguments can be accessed using the TwillContext object in TwillRunnable. Application arguments are retrieved by calling TwillContext.getApplicationArguments(), while runnable arguments are available through the TwillContext.getArguments() call.

Service discovery

When launching your application in YARN, you don’t know where your containers will be running, and the hosts can change over time due to container or machine failure. Twill has built-in service discovery support – you can announce a named service from runnables and later on discover their locations. For example, you can start the Jetty server instances on a random port and announce the address and port of the service.

class JettyServerTwillRunnable extends AbstractTwillRunnable() {
  public void initialize(TwillContext context) {
    // Starts Jetty on random port
    int port = startJetty();        
    context.announce("jetty", port);

You can then build a router layer to route those requests to the cluster. The router will look something like this:

TwillController controller = ...
ServiceDiscovered jettyService = controller.discoverService("jetty");   

// The ServiceDiscovered maintains a live list of service endpoints.
// Everytime the .iterator() is invoked it gives the latest list of endpoints.
Iterator<Discoverable> itor = jettyService.iterator();
// Pick an endpoint from the list of endpoints.
// ...

Controlling live applications

As mentioned in Programming with Weave, Part I, you can control a running application using TwillController. You can change the number of instances of a runnable by simply doing this:

TwillController controller = ...
ListenableFuture<Integer> changeComplete = controller.changeInstances("jetty", 10);

You can then either block until the change is completed or observe the completion asynchronously by listening on the future.


We hope you enjoyed the deep dive into some of the capabilities of Twill. In the next post, we will use an application to illustrate the features highlighted in this post.

*Apache Twill is currently undergoing incubation at the Apache Software Foundation. Help us make it better by becoming a contributor.

  • ripley

    Didn’t find a way to distribute files to twill controller node, say there’s a integrated kafka client in twill controller, is there any way to bind a conf file?

    • Terence Yim

      Hi, you can use the TwillPreparer.withResources() method to add additional files to the container.

      • ripley

        Thanks for the reply Terence, I’ve already tried that method and saw the file was distributed to runnables’ node, while I’m in a situation where a configuration file has to be deployed to the node the ApplicationMaster is running in (we overrided it to do some special work), and the TwillPreparer.withResources() way seems not working. Is it still possible for that purpose?

        • Terence Yim

          Currently Twill doesn’t allow localizing file to AM directly, since the Twill AM is not part of user code. Can you tell me a bit more of your use case? For example, suppose the configuration file is localized to the AM container, how is it being used?

          • ripley

            It’s from legacy code where some logic were inserted to a subclass of ApplicationMaster, and working together with some other runnables as the central node, also doing commands distribution and some life cycle management works. We’re seeking to replace this architecture. I guess this wouldn’t be a problem when it’s done 🙂

          • Terence Yim

            Currently Twill AM doesn’t support execution of custom code. For your use case, seems like you need to modify Twill itself so that you can sub-class the AM. In that case, you can also modify Twill code to ship any file you wanted to the AM container. Please feel free to create a Twill JIRA ( to have support added.

  • han

    Hi Terence

    If runnable has finished execution could it be restarted with same instance id?
    I have some runnables which completed execution while they should not, and I tried to start them again with TwillController’s restartInstances method. The call didn’t work since it tried to verify with an absent instance id and failed with exception.

    Is TwillRunner capable of doing that?

    • Terence Yim


      If a runnable completed successfully, the same instance id won’t get reused and cannot be restarted while the application is running. However, if a runnable terminated with an exception or the JVM process that the runnable is running in was terminated with an error (non-zero exit code), then the Twill AM will restart the runnable process with the same instance id automatically.


<< Return to Cask Blog