Enabling Kerberos

BDD supports Kerberos 5+ to authenticate its communications with Hadoop. You can enable this for BDD to improve the security of your cluster and data.

Before you can configure Kerberos for BDD, you must install it on your Hadoop cluster. If your Hadoop cluster already uses Kerberos, you must enable it for BDD so it can access the Hive tables it requires.

To enable Kerberos:

  1. Install the kinit and kdestroy utilities on all BDD nodes.
  2. Create the following directories in HDFS:
    • /user/<bdd>, where <bdd> is the name of the bdd user.
    • /user/<HDFS_DP_USER_DIR>, where <HDFS_DP_USER_DIR> is the value of HDFS_DP_USER_DIR defined in bdd.conf.
    The owner of both directories must be the bdd user. Their group must be the HDFS super users group, which is defined by the dfs.permissions.supergroup configuration parameter. The default value is supergroup.
  3. Add the bdd user to the hdfs and hive groups on all BDD nodes.
  4. If you use HDP, add the group that the bdd user belongs to to the hadoop.proxyuser.hive.groups property in core-site.xml.
    You can do this in Ambari.
  5. Create a principal for BDD.
    The primary component must be the name of the bdd user and the realm must be your default realm.
  6. Generate a keytab file for the BDD principal and move it to the Admin Server.
    The name and location of this file are arbitrary as you will pass this information to the bdd-admin script at runtime.
  7. Copy your krb5.conf file to the same location on all BDD nodes.
    The location is arbitrary, but the default is /etc.
  8. If your Dgraph databases are stored on HDFS, you must also enable Kerberos for the Dgraph. On the Admin Server, make a copy of bdd.conf and edit the following properties in the copy:
    Property Description
    KERBEROS_TICKET_REFRESH_INTERVAL The interval (in minutes) at which the Dgraph's Kerberos ticket is refreshed. For example, if set to 60, it would be refreshed ever 60 minutes, or every hour.
    KERBEROS_TICKET_LIFETIME The amount of time that the Dgraph's Kerberos ticket is valid. This should be given as a number followed by a supported unit of time: s, m, h, or d. For example, 10h (10 hours), or 10m (10 minutes).
    Then go to $BDD_HOME/BDD_manager/bin and run:
    ./bdd-admin.sh publish-config <path>
    Where <path> is the absolute path to the modified version copy of bdd.conf.
  9. Go to $BDD_HOME/BDD_manager/bin and run:
    ./bdd-admin.sh publish-config kerberos on -k <krb5> -t <keytab> -p <principal>
    Where:
    • <krb5> is the absolute path to krb5.conf on all BDD nodes
    • <keytab> is the absolute path to the BDD keytab file on the Admin Server
    • <principal> is the BDD principal
    The script updates BDD's configuration files with the name of the principal and the location of the krb5.conf file. It also renames the keytab file to bdd.keytab and distributes it to $BDD_HOME/common/kerberos on all BDD nodes.
  10. If you use HDP, publish the change you made to core-site.xml:
    ./bdd-admin.sh publish-config hadoop
  11. Restart your cluster for the changes to take effect:
    ./bdd-admin.sh restart [-t <minutes>]
  12. To enable Kerberos for the Transform Service:
    1. Copy k5start from $BDD_HOME/dgraph/bin/ on one of your Dgraph nodes to $BDD_HOME/transformservice/ on each of your Transform Service nodes.
    2. On each Transform Service node, start k5start by running the following command from $BDD_HOME/transformservice/:
      ./k5start -f $KERBEROS_KEYTAB_PATH -K $KERBEROS_TICKET_REFRESH_INTERVAL 
      -l $KERBEROS_TICKET_LIFETIME $KERBEROS_PRINCIPAL -b > <logfile> 2>&1
      Where:
      • $KERBEROS_KEYTAB_PATH and $KERBEROS_PRINCIPAL are the values of those properties defined in bdd.conf.
      • <ticket_refresh> is the rate at which the Transform Service's Kerberos ticket is refreshed, in minutes. For example, a value of 60 would set its ticket to be refreshed every 60 minutes, or every hour. You can optionally use the value for KERBEROS_TICKET_REFRESH_INTERVAL in bdd.conf.
      • <ticket_lifetime> is the amount of time the Transform Service's Kerberos ticket is valid for. This should be given as a number followed by a supported unit of time: s, m, h, or d. For example, 10h (10 hours) or 10m (10 minutes). You can optionally use the value for KERBEROS_TICKET_LIFETIME in bdd.conf.
      • <logfile> is the absolute path to the log file you want k5start to write to.
    3. Optionally, configure k5start to run as a service on all Transform Service nodes.
      This will enable it to start automatically after a node reboot. Otherwise, you'll have to rerun the above command each time a Transform Service node is rebooted.

Once Kerberos is enabled, you can use the bdd-admin script to update its configuration as needed. For more information, see kerberos.