D Determining the Correct Software Version and Composing the Download Paths for Hadoop Clients

To configure bds-database-create-bundle.sh to download the Hadoop, Hive, and HBase tarballs, you must supply an URL to each these parameters:
--hive-client-ws
--hadoop-client-ws
--hbase-client-ws

To get the information needed to provide the correct URL, first check the content management service (CM or Ambari) and find the version of the Hadoop, Hive, and HBase services running on the Hadoop cluster. The compatible clients are of the same versions. In each case, the client tarball filename includes a version string segment that matches the version of the service installed on the cluster. In the case of CDH, you can then browse the public repository and find the URL to the client that matches the service version. For the HDP repository this would require a tool that can browse Amazon S3 storage. However you can also compose the correct URL using the known URL pattern along with information that you can acquire from Ambari, as described in this section.

For CDH (Both Oracle Big Data Appliance and Commodity CDH Systems):

  1. Log on to Cloudera Manager and go to the Hosts menu. Select All Hosts , then Inspect All Hosts.

  2. When the inspection is finished, select either Show Inspector Results (on the screen) or Download Result Data (to a JSON file).

  3. In either case, scan the result set and find the service versions.

    In JSON version of the inspector results, there is a componentInfo section for each cluster that shows the versions of software installed on that cluster. For example:
               "componentInfo": [
                    ...              
             {                 
                "cdhVersion": "CDH5",                  
                "componentRelease": "1.cdh5.11.1.p0.6",                
             "componentVersion": "2.6.0+cdh5.11.1+2400",
             "name": "hadoop"             
             },              
            ...
  4. Go to https://archive.cloudera.com/cdh5/cdh/5.

    Note:

    Since February 2021 all Cloudera repos require password authentication, you will need to supply your Cloudera credential to access and download both client jars for cdh5 or client rpms for cdh6. If you are running Big Data Appliance please contact oracle support to request a patch with the specific clients you need.

    Look in the ”hadoop,” hive,” and “hbase” subdirectories of the CDH5 section of the archive. In the listings, you should find the client tarball packages for the versions of the services installed on the cluster, such as the following:

    https://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.12.1.tar.gz
    https://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.12.1.tar.gz
    https://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.12.1.tar.gz
  5. Copy the URLs and use them as the parameter values supplied to bds-database-create-bundle.sh. For example:

    https://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.12.1.tar.gz
    https://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.12.1.tar.gz
    https://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.12.1.tar.gz

See Also:

Search for “Host Inspector” on Cloudera website if you need more help using this tool to determine installed software versions.

For HDP:

  1. Log on to Ambari. Go to Admin, then Stack and Versions. On the Stack tab, locate the entries for the HDFS, Hive, and HBase services and note down the version number of each as the “service version.”

  2. Click the Versions tab. Note down the version of HDP that is running on the cluster as the “HDP version base.”

  3. Click Show Details to display a pop-up window that shows the full version string for the installed HDP release. Note this down as the “HDP full version

  4. The last piece of information needed is the Linux version (“centos5,” “centos6,” or “centos7”). Note this down as “OS version.”

To search though the HDP repository in Amazon S3 storage to find the correct client URLs using this information acquired in this steps, you would need an S3 browser, browser extension, or command line tool. As alternative, you can piece together the correct URLs, using these strings.

For HDP 2.5 and earlier, the URLs pattern is as follows.

http://public-repo-1.hortonworks.com/HDP/<OS version>/2.x/updates/<HDP version base>/tars/{hadoop|apache-hive|hbase}-<service version>.<HDP full version>.tar.gz

Here are some examples. Note that the pattern of the gzip filename is slightly different for Hive. There is an extra “-bin” segment in the name.

http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.3.2.0/tars/hadoop-2.7.1.2.3.2.0-2950.tar.gz
http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.3.2.0/tars/apache-hive-1.2.1.2.3.2.0-2950-bin.tar.gz
http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.3.2.0/tars/hbase-1.1.2.2.3.2.0-2950.tar.gz 
For HDP 2.5 and later releases, the pattern is almost the same except that there is an additional hadoop, hive, or hbase directory under the tar directory:
http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.5.6.0/tars/hadoop/hadoop-2.7.3.2.5.6.0-40.tar.gz
http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.5.6.0/tars/hive/apache-hive-1.2.1000.2.5.6.0-40-bin.tar.gz
http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.5.6.0/tars/hbase/hbase-1.1.2.2.5.6.0-40.tar.gz 

Alternative Method for HDP:

You can get the required software versions from the command line instead of using Ambari.

  • # hdp-select versions

    Copy and save the numbers to the left of the dash as the “HDP version base”.

  • # hadoop version 
    # beeline --version 
    # hbase version
    Use the output from these commands to formulate the <service version>.<HDP full version> segment for each URL.