Setting up Apache Hue to Use Different Hadoop Components

  1. In the Hue UI, create an admin user with the password that you used while Creating a Cluster.
  2. Navigate to Administer users and do the following:
    1. Select the Hue user.
    2. Update the password.
    3. Make the user allowed to sign in as a Hue user for subsequent logins.
    You can create more users according to your requirement.
    Note

    For secure clusters, ensure that you select the Create home directory check box and create the user in utility node using the sudo useradd <new_user> command to manage the access policies using ranger.
  3. To run the DistCp and MapReduce applications, add the MapReduce library to YARN classpath. To add the library, follow these steps:
    1. On the Ambari UI, under YARN, click Configs.
    2. Search for yarn.application.classpath.
    3. Copy the following configuration and paste it for the yarn.application.classpath value:
      $HADOOP_CONF_DIR,/usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop/*,/usr/lib/hadoop/lib/*,/usr/lib/hadoop-hdfs/*,/usr/lib/hadoop-hdfs/lib/*,/usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*
    4. Save and restart the YARN, Oozie, and MapReduce services.
  4. Configure Sqoop. To configure, follow these steps:
    1. Copy mysql-connector to oozie classpath.
      sudo cp /usr/lib/oozie/embedded-oozie-server/webapp/WEB-INF/lib/mysql-connector-java.jar /usr/lib/oozie/share/lib/sqoop/
      sudo su oozie -c "hdfs dfs -put /usr/lib/oozie/share/lib/sqoop/mysql-connector-java.jar /user/oozie/share/lib/sqoop"
    2. Restart the Oozie service through Ambari.
    3. The Sqoop jobs run from worker nodes.

      In Big Data Service clusters with version 3.0.7 or later, access to the hue mysql user is available from all worker nodes.

      For clusters with prior versions, run the grant statements as described in Step 2 of the secure and nonsecure cluster sections of Configuring Apache Hue. While running the commands, replace the local host with the worker host name and repeat this for each worker node in the cluster. For example:

      grant all privileges on *.* to 'hue'@'wn_host_name ';
      grant all on hue.* to 'hue'@'wn_host_name';
      alter user 'hue'@'wn_host_name' identified by 'secretpassword';
      flush privileges;
  5. On the master or utility node, from the following directory, use the spark-related jars and add as dependency to your spark project. For example, copy spark-core and spark-sql jars into lib/ for sbt project.
    /usr/lib/oozie/share/lib/spark/spark-sql_2.12-3.0.2.odh.1.0.ce4f70b73b6.jar
    
    
    /usr/lib/oozie/share/lib/spark/spark-core_2.12-3.0.2.odh.1.0.ce4f70b73b6.jar

    Here's a word count example code. Assemble the code into JAR:

    Sample word count code

    On the Hue Spark interface, use the relevant jars to run the Spark job.

  6. To run MapReduce through Oozie, do the following:
    1. Copy oozie-sharelib-oozie-5.2.0.jar (contains the OozieActionConfigurator class) into your sample code.
    2. Define the mapper and reducer classes as given in any standard word count MapReduce example.
    3. Create another class as shown here:

      MapReduce example

      Package the code as shown in the previous image along with the mapper and reducer classes into a jar and do the following:
      1. Upload it to HDFS through Hue file browser.
      2. Provide the following to run MapReduce program, where oozie.action.config.class points to the fully qualified class name in the snippet as shown in the previous image.

        Hue file browser

  7. Configure HBase.

    In Big Data Service clusters with version 3.0.7 or later, you must enable the Hue HBase Module using Apache Ambari. Apache Ambari screen shot showing the Hue HBase Module toggle field.

    Hue interacts with the HBase Thrift server. Therefore, to access HBase, you must start the Thrift server. Follow these steps:

    1. After you add the HBase service on the Ambari page, navigate to Custom Hbase-Site.xml (from HBase, go to Configs, and under Advanced, click Custom Hbase-Site.xml).
    2. Add the following parameters by substituting the keytab or principal.
      hbase.thrift.support.proxyuser=true
      hbase.regionserver.thrift.http=true
      ##Skip the below configs if this is a non-secure cluster
      hbase.thrift.security.qop=auth
      hbase.thrift.keytab.file=/etc/security/keytabs/hbase.service.keytab
      hbase.thrift.kerberos.principal=hbase/_HOST@BDSCLOUDSERVICE.ORACLE.COM
      hbase.security.authentication.spnego.kerberos.keytab=/etc/security/keytabs/spnego.service.keytab
      hbase.security.authentication.spnego.kerberos.principal=HTTP/_HOST@BDSCLOUDSERVICE.ORACLE.COM
    3. Run the following commands on the master nodes terminal:
      
      # sudo su hbase
      //skip kinit command if this is a non-secure cluster
      # kinit -kt /etc/security/keytabs/hbase.service.keytab  hbase/<master_node_host>@BDSCLOUDSERVICE.ORACLE.COM
      # hbase thrift start
    4. Sign in to the utility node where Hue is installed.
    5. Open the sudo vim /etc/hue//conf/pseudo-distributed.ini file and remove hbase from app_blacklist.
      # Comma separated list of apps to not load at startup.
      # e.g.: pig, zookeeper
      app_blacklist=search, security, impala, hbase, pig
    6. Restart Hue from Ambari.
    7. Ranger governs access to the HBase service. Therefore, to use Hue and access HBase tables on a secure cluster, you must have access to the HBase service from Ranger.
  8. Configure the script action workflow:
    1. Sign in to Hue.
    2. Create a script file and upload it to Hue.
    3. Sign in to Hue, and then in the leftmost navigation menu, click Scheduler.
    4. Click Workflow, and then click My Workflow to create a workflow.
    5. Click the shell icon to drag the script action to the Drop your action here area.
    6. Select the script from the Shell command dropdown.
    7. Select the workflow from the FILES dropdown.
    8. Click the save icon.
    9. Select the workflow from the folder structure, and then click the submit icon.
      Note

      While executing any shell action in a Hue workflow, if the job is stuck or fails because of errors such as Permission Denied or Exit code[1], complete the following instructions to resolve the issue.
      1. Ensure that all the required files (script file and other related files) are available at the specified location as mentioned in the workflow, with the required permissions to the workflow execution user (hue logged-in user).
      2. Sometimes, such as spark-submit, if the you submit a Spark job without a specific user, then by default, the job runs with the container process owner (yarn), in this case be sure that user yarn has all the required permissions to run the job.

        Example:

        // cat spark.sh
        /usr/odh/current/spark3-client/bin/spark-submit --master yarn --deploy-mode client  --queue default --class org.apache.spark.examples.SparkPi spark-examples_2.12-3.2.1.jar 
        
        // Application throws exception if yarn user doesn't have read permission to access spark-examples_2.12-3.2.1.jar.
        
        org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=yarn, access=READ, inode="/workflow/lib/spark-examples_2.12-3.2.1.jar":hue:hdfs:---------x 
        In this case user yarn must have required permissions to include
        spark-examples_2.12-3.2.1.jar
        in the Spark job.
      3. If you submit a Spark job with a specific user (--proxy-user spark), then be sure that the Yarn user can impersonate that specified user. If the Yarn user can't impersonate the specified user and receives errors such as (User: Yarn is not allowed to impersonate Spark), add the following configurations.
        // cat spark.sh
        /usr/odh/current/spark3-client/bin/spark-submit --master yarn --proxy-user spark --deploy-mode client  --queue default --class 
        org.apache.spark.examples.SparkPi spark-examples_2.12-3.2.1.jar 

        In this case the job is run using the Spark user. The Spark user must have access to all the related files (spark-examples_2.12-3.2.1.jar). Also, be sure the Yarn user can impersonate the Spark user. Add the following configurations so that the yarn user impersonate other users.

        1. Sign in to Ambari.
        2. From the side toolbar, under Services click HDFS.
        3. Click the Advanced tab and add the following parameters under Custom core-site.
          • hadoop.proxyuser.yarn.groups = *
          • hadoop.proxyuser.yarn.hosts = *
        4. Click Save and restart all the required services.
  9. Run the Hive workflow from Oozie.
    1. Sign in to Hue.
    2. Create a script file and upload it to Hue.
    3. Sign in to Hue, and then in the leftmost navigation menu, click Scheduler.
    4. Click Workflow.
    5. Drag the third icon of HiveServer2 to the Drop your action here area.
    6. To select the Hive query script from HDFS, click the script menu. The query script is stored in an HDFS path that's accessible to the user that's signed in.
    7. To save the workflow, click the save icon.
    8. Click the run icon.