DP CLI permissions and logging

This topic provides brief overviews of permissions and logging.

DP CLI permissions

The DP CLI script is installed with ownership permission for the person who ran the installer. These permissions can be changed by the owner to allow anyone else to run the script.

In addition, if the DP CLI is to be run by a non-install user, the non-install user must get additional permissions in one of two ways: either broadening permissions to HDFS or by impersonating the HDFS user.
  • To broaden permissions for a given edpDataDir=/user/bdd/edp/data, you can grant everyone Write access:
    hadoop fs -chmod -R 777 /user/bdd/edp/data
  • Impersonating the install user does not require any permissions changes, and can be done by multiple different users. For example, while logged in as local user j_jones, you can use HDFS's impersonation feature to act as HDFS user bdd:
    j_jones@node1:data-processing/edp_cli $ 
    HADOOP_USER_NAME=bdd ./data_processing_CLI -t myTable
    This is the same mechanism as used on the hadoop fs command line:
    HADOOP_USER_NAME=hdfs hadoop fs -chmod -R 777 /user/bdd/edp/data

DP CLI logging

The DP CLI logs detailed information about its workflow into the log file defined in the log4j.properties file. This file is located in the $BDD_HOME/dataprocessing/edp_cli/config directory.

The implementation of the BDD Hive Table Detector is based on the DP CLI, so it uses the same logging properties as the DP CLI script. It also produces verbose outputs (on some classes) to stdout/stderr.