Kerberos

The Kerberos network authentication protocol enables client/server applications to identify one another in a secure manner, even when communicating over an unsecured network.

In Kerberos terminology, individual applications are called principals. Each principal has a keytab file, which contains its key, or password. When one principal wants to communicate with another, it presents its keytab file for authentication and is only granted access to the other principal if its name and key are recognized. Because keytab files are protected using strong encryption, this process still works over unsecured networks.

You can configure BDD to use Kerberos authentication for its communications with Hadoop. This is required if Kerberos is already enabled in your Hadoop cluster, and strongly recommended for production environments in general. BDD supports integration with Kerberos 5+.
Note: This procedure assumes you already have Kerberos enabled in your Hadoop cluster.

To enable Kerberos:

  1. Create the following directories in HDFS:
    • /user/<bdd user>, where <bdd user> is the name of the bdd user.
    • /user/<HDFS_DP_USER_DIR>, where <HDFS_DP_USER_DIR> is the value of HDFS_DP_USER_DIR in BDD's configuration file.
    The owner of both directories must be the bdd user. Their group must be the HDFS super users group, which is defined by the dfs.permissions.supergroup configuration parameter. The default value is supergroup.
  2. Add the bdd user to the hive group.
  3. Add the bdd user to the hdfs group on all BDD nodes.
  4. Create a BDD principal.
    The primary component must be the name of the bdd user. The realm must be your default realm.
  5. Generate a keytab file for the BDD principal and copy it to the install machine.
    The name and location of this file are arbitrary. The installer will rename it bdd.keytab and copy it to all BDD nodes.
  6. Copy the krb5.conf file from one of your Hadoop nodes to the install machine.
    The location you put it in is arbitrary. The installer will copy it to /etc on all BDD nodes.
  7. Install the kinit and kdestroy utilities on all BDD nodes.
  8. If you have HDP, set the hadoop.proxyuser.hive.groups property in core-site.xml to *.
    You can do this in Ambari.

You also need to manually configure Kerberos for the Transform Service after installing BDD. For instructions, see Enabling Kerberos for the Transform Service.