Kerberos

The Kerberos network authentication protocol enables client/server applications to identify one another in a secure manner, even when communicating over an unsecured network.

In Kerberos terminology, individual applications are called principals. Each principal has a keytab file, which contains its key, or password. Keytab files enable principals to authenticate automatically, without human interaction. When one principal wants to communicate with another, it uses its keytab file to obtain a ticket. It then uses its ticket to gain access to the other principal.

Because Kerberos authentication uses strong encryption, it can work over unsecured networks. Additionally, tickets can be configured to expire after a set period of time to minimize risk should they become compromised.

You can configure BDD to use Kerberos authentication for its communications with Hadoop. This is required if Kerberos is already enabled in your Hadoop cluster, and strongly recommended for production environments in general. BDD supports integration with Kerberos 5+.

This procedure assumes you already have Kerberos installed on your system and configured for your Hadoop cluster.

To enable Kerberos:

  1. Create the following directories in HDFS:
    • /user/<bdd user>, where <bdd user> is the name of the bdd user.
    • /user/<HDFS_DP_USER_DIR>, where <HDFS_DP_USER_DIR> is the value of HDFS_DP_USER_DIR in BDD's configuration file.
    The owner of both directories must be the bdd user. Their group must be the HDFS super users group, which is defined by the dfs.permissions.supergroup configuration parameter. The default value is supergroup.
  2. Add the bdd user to the hive group.
  3. Add the bdd user to the hdfs group on all BDD nodes.
  4. Create a BDD principal.
    The primary component must be the name of the bdd user. The realm must be your default realm.
  5. Generate a keytab file for the BDD principal and copy it to the install machine.
    The name and location of this file are arbitrary. The installer will rename it bdd.keytab and copy it to all BDD nodes.
  6. Copy the krb5.conf file from one of your Hadoop nodes to the install machine.
    The location you put it in is arbitrary. The installer will copy it to /etc on all BDD nodes.
  7. Install the kinit and kdestroy utilities on all BDD nodes.
    These are required to enable ticket expiration.
  8. If you have HDP, set the hadoop.proxyuser.hive.groups property in core-site.xml to *.
    You can do this in Ambari.

You must also set the Kerberos-related properties in BDD's configuration file. For more information, see Configuring BDD.