A Look at Automating Cluster Creation in the Cloud with Coopr

David Bajot

David Bajot was a DevOps Engineer at Cask, working on automating everything that is not tied down, including tools to manage the next generation of Big Data applications and anything that will make developers’ lives easier. Previously, he worked at Samsung as the OpenStack Cloud Architect and Senior Linux Engineer for Samsung Research America.

David Bajot

Coopr is a cluster provisioning system designed to fully facilitate cluster lifecycle management in public and private clouds. In this blog, we will take an inside look at what happens when Coopr provisions a cluster.

Deploying clusters can be time-consuming. For many system deployments, this work can be accomplished with a configuration management tool such as Chef or Puppet. Cluster deployments include at least two additional steps: initialization and startup. These two are trickier because they require orchestration between machines. Most configuration management tools are oriented towards managing a single node, while cluster builds require determining and applying many steps in a specific order to meet dependencies. Coopr helps solve this problem by combining predefined cluster templates with user-provided properties to determine a requisite cluster layout and orchestrate its deployment.

A Word about Coopr

Coopr allows users to deploy clusters to selected cloud providers while orchestrating the installation, configuration, initialization, and startup of software and services. In this example, we will use the fog library for Ruby to communicate with a cloud API service while leveraging Chef to deploy and configure services. While our example happens to use the bundled fog provider and chef-solo automator, Coopr is fully pluggable.

Coopr provides the user with a cluster interface and orchestrates cluster operations to individual machine operations by configuring and executing individual chef-solo runs. The result frees the user from having to specify an order of execution and offers more flexibility in the automation and orchestration of tasks. This gives a user the freedom to deploy systems without depending on an operations team for every task. Learn how Coopr works by reading the Coopr documentation overview, Under the Hood.

The Fun Starts Here

Let’s walk through a cluster deployment in Coopr with a real-life example: a full install of Hadoop on a single-node, pseudo-distributed cluster. One of the great things about Coopr is that you can follow the orchestration process in the logs and the UI.

The Orchestration

In the UI, clicking on ‘Show Actions’ in the lower right-hand corner shows the steps Coopr is running. Here is just a sample (the most recent ones are at the top):

Coopr_Blog_UI_Tasks

  • CREATE, CONFIRM and BOOTSTRAP are responsible for setting up the node
  • INSTALL runs package installation steps
  • CONFIGURE does typical configuration for all the components
  • INITIALIZE and START run all the application initialization and startup steps one would normally run manually

The INITIALIZE and START tasks will be our main focus here.

a. Task Order Determination

Task order is a critical part of orchestration, and this is where Coopr shines, setting it apart from manual installations. Let’s use the Hadoop YARN ResourceManager as an example.

Initializing YARN requires HDFS to be up. In order for HDFS to be up, the Namenode must be formatted, and both the Namenode and Datanodes must be started. A simple user-provided constraint set in the YARN ResourceManager service configuration makes Coopr aware of these rules. Coopr even ensures these rules are met across all machines in the cluster.

This is what these constraints look like logically:

Coopr_Blog_Task_Order_Diagram2_sml

The constraints can be seen in the Coopr UI, under Services => hadoop-yarn-resourcemanager:

Coopr_Blog_Service_HYRM

Coopr’s provisioner solves the constraints into the follow task order:

Coopr_Blog_Task_Order_Diagramc

Without Coopr, the user would run these steps and all other initialization and startup steps manually, while keeping track of the state of the various services, making sure each service was up prior to manually initializing or starting up the next.

b. A Deeper Look Inside

For a better understanding of the inner workings of Coopr, take a look at the Coopr architecture.

Choosing a template during cluster creation tells Coopr to apply a set of services and constraints. This is a simple but powerful concept, as Coopr takes those services and constraints and solves them into the stages of task execution. All tasks within these stages can be run in parallel across a cluster. You can see all the steps in the system creation and cluster orchestration in the Coopr Provisioner’s coopr-provisioner.log file. Each task performed has entries in the log file that will look like this:

2015-06-18 14:01:38 -0700 david-demo-happy-coopr.net.11447 DEBUG: Received task from server <{"taskId":"00000015-044-275","jobId":"00000001-002","clusterId":"00000001","taskName":"INITIALIZE","nodeId":"242945c9-576d-4c98-8c1d-6759faa6c58a","config" [...]

Every task has a specific task ID. Let’s make a note of this one: 00000015-044-275.

In our example, this task (INITIALIZE hadoop-hdfs-namenode) is responsible for formatting the namenode. The Coopr chef-solo automator stores each task’s JSON configuration in /var/cache/coopr on your instance.

It contains all the information that was given to the provisioner to run that task: heap size, log directory, dfs replication, block size, YARN configuration, etc.

Note: These JSON files are an important part of the troubleshooting tools when creating a node or cluster, as they will help determine the last tasks performed by a provisioner.

To further help troubleshoot provisioning tasks, one can search for a task number in the coopr-provisioner.log file and find all log entries for that task. Here is an example, searching for task ID 00000015-044-275:

2015-06-18 14:02:52 -0700 david-demo-happy-coopr.net.11447 DEBUG: ---ssh-exec command: sudo chef-solo -j /var/cache/coopr/00000015-044-275.json -o 'recipe[hadoop_wrapper::hadoop_hdfs_namenode_init]'

Looking at coopr-provisioner.log again: There are a few more lines that indicate the task progress. The last two confirm that the task completed successfully:

2015-06-18 14:03:02 -0700 [...] DEBUG: Chef-solo run completed successfully for task 00000015-044-275: {"status"=>0}
2015-06-18 14:03:02 -0700 [...] DEBUG: Task <00000015-044-275> completed, updating results <{"status"=>0, "workerId"=>"[...]", "taskId"=>"00000015-044-275", "provisionerId"=>"my-demo-id", "tenantId"=>"superadmin"}>

This was Coopr in action, handling the most tedious parts of setting up a Hadoop cluster.

Where Do You Go From Here?

If you’d like to try Coopr for yourself, please install Coopr, using the quickstart guide, making certain to configure a provider.

You can also explore the available templates in Coopr, such as CDAP, LAMP stack, or MEAN stack. Once you have tried any of the templates included with Coopr, you can create your own template (e.g. start with the base template and build on top of that). Coopr is open source software and we encourage you to share your templates with the community.

<< Return to Cask Blog