Securely storing sensitive information and using it in CDAP

Nishith Nand

Nishith Nand is a software engineer at Cask where he builds the platform for the next generation of data applications. Prior to Cask, he was building high performance large scale distributed systems for Pepperdata and Hedvig.

Nishith Nand

The need for Secure Store

CDAP and Hadoop cluster admins often need to store sensitive information like passwords, access tokens, private keys etc. which is used to access resources during the course of cluster operation. This data needs to be stored in a way that it is protected from unauthorized access but is still available to authorized entities during program execution.

To fulfill this requirement, the CDAP 3.5 version release introduced CDAP Secure Store, which supports safe storage of and authorized access to sensitive data. CDAP Programs and Hydrator Pipelines can access the store and safely retrieve the passwords or access tokens etc. that are stored there and that the programs have access to, whenever they need during runtime.

Backend storage

CDAP secure store supports two backing stores to safely store the data.

In the InMemory and Standalone modes, the Java JCEKS file based keystore is used. The path to the keystore file and the name of the file are defined in the default configuration and can be changed from there. The password to protect the file needs to be set in cdap-security.xml.

In distributed mode, CDAP secure store uses Hadoop KMS to store the data. Hadoop KMS is a cryptographic key management server based on Hadoop’s KeyProvider API. It provides a client and server components which communicate over HTTP using RESTful API.The client is a KeyProvider implementation which interacts with the KMS using the KMS HTTP RESTful API.

Hadoop KMS is a proxy that interfaces with a backing key store on behalf of HDFS daemons and clients. Both the backing key store and the KMS implement the Hadoop KeyProvider API. A default Java keystore is provided for testing but is not recommended for production use. Cloudera provides Navigator Key Trustee for production clusters. Hortonworks recommends using Ranger KMS.

To use Hadoop KMS as the store, the “security.store.provider” property needs to be set to “kms”, secure store picks the KMS provider path from Hadoop configuration. KMS requires Apache Hadoop version 2.6.0 or higher or a distribution based on Apache Hadoop 2.6.0 or higher.

Secure Store

Accessing the Store in CDAP programs

The secure store exposes RESTful APIs as well as programmatic APIs for access. The data can be added to the store through RESTful APIs through a PUT call:

PUT /v3/namespaces/<namespace-id>/securekeys/<secure-key-id>

With a JSON formatted body:

{
  "description": "Example Secure Key",
  "data": <secure-contents>,
  "properties": {
    "<property-key>": "<property-value>"
  }
}

The data itself comprises of the following

  • Name – This is an identifier for the stored data
  • Description – A description of the stored key
  • Value – The sensitive data that needs to be stored securely
  • Metadata – Information about the item stored, e.g. creation time
  • Properties – Key value map of properties of the stored data

Entries in the store can be created through the RESTful and the programmatic APIs.

Accessing store using the programmatic APIs

Getting the data

String value = getContext().getSecureData(namespace, “accessid”).get();

Listing all secure keys

    getContext().listSecureData(namespace)

Secure keys can be managed in programs using the Admin object returned by

Admin admin = getContext().getAdmin();

Adding a new key

getContext().getAdmin().putSecureData(namespace, “accessid”, new String(value), "S3 access key", new HashMap<String, String>());

Removing a key

   getContext().getAdmin().deleteSecureData(namespace, “accessid”);

Accessing Secure store from Hydrator pipelines

Hydrator pipelines have capabilities to access sensitive information from secure store, the access to secure store can be enabled via macros. For example, the accessId field in a plugin can be accessed from secure store by adding the following configuration ${secure(accessid)}.

${} is used to denote a macro, the keyword “secure” specifies that we want to access the secure store, and the string inside the parentheses is the key whose value we want to retrieve from the store. The key will be searched for under the namespace the app is deployed in.

image00

Accessing Secure store through RESTful API

Retrieve a Secure Key

GET /v3/namespaces/<namespace-id>/securekeys/<secure-key-id>

Retrieve the Metadata for a Secure Key

GET /v3/namespaces/<namespace-id>/securekeys/<secure-key-id>/metadata

List all Secure Keys

GET /v3/namespaces/<namespace-id>/securekeys

Remove a Secure Key

DELETE /v3/namespaces/<namespace-id>/securekeys/<secure-key-id>

Authorization

Access to the data stored in the secure store is controlled by CDAP’s fine grained authorization policies. Entries stored in the secure store are scoped by namespaces. Different namespaces can have the same key name. To create a new key in a namespace, the user needs to have write permissions to the namespace. And to read the entry the user needs to have read permissions on the entry.

Find out more about how to setup and use CDAP secure store: “Using secure store”.

In this post, we described how to store sensitive in a secure way and using them in CDAP.
I encourage you to download the latest CDAP, and give these new features a spin. We look forward to any questions and comments you may have!

<< Return to Cask Blog