Monitoring Key Hadoop Operational Statistics using CDAP

Bhooshan Mogal

Bhooshan Mogal is a Software Engineer at Cask, where he is working on making data application development fun and simple. Before Cask, he worked on a unified storage abstraction for Hadoop at Pivotal and personalization systems at Yahoo.

Bhooshan Mogal

The Cask Data Application Platform (CDAP) is the first Unified Integration Platform for Big Data. It provides users with higher level abstractions and APIs over complex, low-level systems for building  Big Data applications. It does the heavy lifting involved in integrating various platforms in the Apache Hadoop ecosystem, to provide a single end-to-end platform. To achieve this goal, it has to depend on the services provided by the underlying components such as HDFS, Yarn, Apache HBase, Apache Hive, Apache Spark, etc. It also means that the health of CDAP as a platform depends on these services being healthy and available to serve requests. Hence,  monitoring these services is important to ensure that CDAP and its applications run smoothly. To make this easier, CDAP 4.0 introduced a new feature called Operational Stats that allows users to monitor all these services from one central location on the CDAP UI.

Let’s go over a few key requirements for the Operational Stats feature:

  1. The system needs to be highly extensible. Different deployments of CDAP could have various flavors of Hadoop. In addition, they may also have other systems like reporting tools working in the same environment as CDAP. The system should be extensible enough to allow flexibility to monitor systems other than the ones that CDAP provides out of the box, without requiring CDAP platform changes.
  2. Re-use existing, mainstream monitoring technologies and protocols, so that users could choose to use 3rd party tools for monitoring and debugging.
  3. Avoid overloading underlying systems. The operational stats system may get a high volume of requests, but it should serve them efficiently, and not overload the underlying systems for serving these requests.
  4. The system should be able to serve stats for a given service name grouped by the stat type. E.g. Stats for service HDFS of type storage, stats for service Yarn of type resources.

Ultimately, with these requirements, the system should be able to serve a user interface like:

img1
img2
img3

Design Principles

To fulfill the requirements above, we architected the Operational Stats with the following key design tenets:

Extensibility

To address extensibility, the Operational Stats feature re-uses the Java ServiceLoader architecture. OperationalStats was defined as a Service, that would be implemented by various Service Providers, such as HDFS, Yarn, HBase and CDAP itself. In addition, to provide the capability to group stats by category, a Service Provider is created for a category or type of stats for a given component. For example, there is a Service Provider each for HDFS stats of type info, storage, and nodes.

Reusability

Operational Stats are reported as JMX metrics. Hence, even though Operational Stats are reported on the CDAP 4.0 UI, users choose to plug them into their existing monitoring infrastructure such as Nagios and Ganglia, by simply connecting to the CDAP Master process’ MBeanServer. Developers can also connect the CDAP Master process to JConsole for debugging.

Efficiency

As described before, another key requirement of the Operational Stats system was to serve requests efficiently, and to not overload the underlying systems with lots of requests. To achieve this, Operational Stats are cached, while Collectors run asynchronously at configurable periods of time to refresh stats. This way, every request to the system does not have to make an RPC call to the underlying systems.

Out-of-box Operational Stats

The following figure represents the Operational Stats that are available out of the box with CDAP 4.0:

ootb

Developing custom Operational Stats Extensions

As described before, the Operational Stats system uses JMX for reporting and retrieving stats. To write a new operational stats extension, users can follow these steps:

1. Define the MXBean for your extension

Like any other JMX metric, this step defines the MXBean interface for your operational stat extension.

2. Implement the MXBean

This step implements the MXBean defined in the previous step. Additionally, for the CDAP Master to recognize this Operational Stat extension, this class should either implement the OperationalStats interface or extend the AbstractOperationalStats class. The main methods to implement are described below:

Method Description
String getServiceName() Defines the name of the service for which stats are being reported. e.g. “HDFS” for all types of HDFS stats. Stats for a given service can be retrieved via RESTful API.
String getStatType() Defines the type of stat being reported for the service specified by getServiceName. e.g. “Storage” for storage stats. Stats for a given service grouped by the stat type,can be retrieved via RESTful API. The service name and stat type together uniquely identify an Operational Stat.
void collect() Collects operational stats for a given service name and stat type. This method is executed asynchronously at configurable intervals of time, and is expected to refresh stats.

3. Define the ServiceLoader configuration file

The service loader configuration file follows standard Java ServiceLoader configuration file structure, and should contain the fully-qualified class name of each extension, one extension per line. Create a file named co.cask.cdap.operations.OperationalStats in the META-INF/services directory of your project. For maven projects, this directory should be created under the src/main/resources directory.

4. Package and install

Build a JAR file for your project and place it under [CDAP_Master_Installation_Directory]/ext/operations/[Your_Extension_Name]/ on the CDAP Master host.
After installing the operational stats extension, the CDAP Master should be restarted.

RESTful APIs

The Operational Stats system also provides RESTful APIs that allow users to query statistics:

API Description
/v3/system/serviceproviders Returns all the service providers for whom operational stats are emitted by the CDAP Master. The output of this API returns the stats of type info for each service provider.
/v3/system/serviceproviders/{service-provider}/stats Returns all the stats for the specified service provider grouped by the stat type.

Conclusion

We hope you enjoyed this blog on getting end-to-end operational insights for your Hadoop clusters using CDAP. For more information on operational stats, as well as some other cool, new features in CDAP 4.0 including Cask Market, be sure to check out the CDAP 4.0 release blog, as well as take CDAP 4.0 for a spin. We’d love to hear any feedback about this and other features of CDAP.

<< Return to Cask Blog