Use HDFS Transparent Encryption

HDFS Transparent Encryption protects Hadoop data that’s at rest on disk. When the encryption is enabled for a cluster, data write and read operations on encrypted zones (HDFS directories) on the disk are automatically encrypted and decrypted. This process is “transparent” because it’s invisible to the application working with the data. HDFS Transparent Encryption does not affect user access to Hadoop data, although it can have a minor impact on performance.

Prerequisite

The cluster where you want to use HDFS Transparent Encryption must have Kerberos enabled.

Important:

Security Setup must be enabled when creating the cluster. The person creating the cluster must choose the Security Setup: Enabled option on the Security page of the Create Cluster wizard, as described in Create a Cluster. You can’t enable Kerberos for a cluster after it’s been created.

When you create a cluster with Security Setup enabled, the following takes place:

  • HDFS Transparent Encryption is enabled on the cluster. You can verify this by entering the following at the command line:

    bdacli getinfo cluster_hdfs_transparent_encryption_enabled

  • MIT Kerberos, Sentry, Network Firewall, Network Encryption, and Auditing are also enabled on the cluster.

  • Two principals are created as part of the Kerberos configuration:

    • hdfs/clustername@BDACLOUDSERVICE.ORACLE.COM — The password for authenticating this principal is your Cloudera admin password.

    • oracle/clustername@BDACLOUDSERVICE.ORACLE.COM — The password for authenticating this principal is your Oracle operating system password.

    In both cases, clustername is the name of your cluster and BDACLOUDSERVICE.ORACLE.COM is the Kerberos realm for Oracle Big Data Cloud Service.

  • A Key Trustee Server is installed and configured on the cluster. This server is used for managing keys and certificates for HDFS Transparent Encryption. See Cloudera Navigator Key Trustee Server for more information about this server. (You should back up Key Trustee Server databases and configuration files on a regular schedule. See the Cloudera documentation topic, Backing Up and Restoring Key Trustee Server.)

Creating Encryption Zones on HDFS

An encryption zone is an HDFS directory in which the contents are encrypted on a write operation and decrypted on a read operation.

See Also:

Cloudera documentation Managing Encryption Keys and Zones.

Prerequisites:

  1. Make sure services are healthy in Cloudera Manager. Especially make sure the Key Trustee service is healthy.

  2. Make sure the two KMS hosts are in sync.

    On each KMS host run the commands below as the root user. The output should be the same on each host. If not, open a service request (SR) with Oracle Support, because that would indicate a problem synchronizing the two Key Management Servers.

    # ls -l /var/lib/kms-keytrustee/keytrustee/.keytrustee 
    # cksum /var/lib/kms-keytrustee/keytrustee/.keytrustee/* 
    # gpg --homedir /var/lib/kms-keytrustee/keytrustee/.keytrustee --fingerprint;

Perform the following steps on any node of the cluster as the root user, unless otherwise specified.

To create an encryption zone:
  1. Create an encryption key for the zone:
    1. Authenticate the hdfs/clustername@BDACLOUDSERVICE.ORACLE.COM principal using your Cloudera password, for example:
      # kinit -p hdfs@BDACLOUDSERVICE.ORACLE.COM
      Password for hdfs@BDACLOUDSERVICE.ORACLE.COM: ****
    2. Create the encryption key, using the following command::
      hadoop key create keyname

      For example:

      # hadoop key create bdakey
      bdakey has been successfully created with options Options{cipher='AES/CTR/NoPadding', bitLength=128, description='null', 
      attributes=null}. 
      org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider@4145bad8 has been updated.
  2. Create a new empty directory and make it an encryption zone using the key generated above with the following two commands:
    # hadoop fs -mkdir path
    # hdfs crypto -createZone -keyName keyname -path path 

    For example:

    # hadoop fs -mkdir /zone 
    # hdfs crypto -createZone -keyName bdakey -path /zone
    Added encryption zone /zone

    Note:

    Encryption zones must be created as the super user, but after that access to encrypted file data and metadata is controlled by normal HDFS file system permissions.
  3. Verify creation of the new encryption zone by running the -listZones command; for example:
    # hdfs crypto -listZones  
    /zone bdakey  

Adding Files to Encryption Zones

Use the hadoop fs -put command to add files to the encryption zone.

For example:

# hadoop fs -put helloWorld /zone

Viewing Keys in Encryption Zones

Use the hadoop key list command to view keys in an encryption zone.

For example:

# hadoop key list
Listing keys for KeyProvider: org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider@xxxxxx 
MYKEY1 
MYKEY2