Processing Hive tables with Snappy compression

This topic explains how to set up the Snappy libraries so that the DP CLI can process Hive tables with Snappy compression.

By default, the DP CLI cannot successfully process Hive tables with Snappy compression. The reason is that the required Hadoop native libraries are not available in the library path of the JVM. Therefore, you must copy the Hadoop native libraries from their source location into the appropriate BDD directory.

To set up the Snappy libraries:

  1. Locate the source directory for the Hadoop native libraries in your Hadoop installation.
    The typical location on CDH is:
    /opt/cloudera/parcels/CDH/lib/hadoop/lib/native/
    
  2. Copy the Hadoop native libraries to this BDD directory:
    $BDD_HOME/common/edp/olt/bin
    

    The copy operation must be performed on all BDD nodes.

Once this copy is done, all subsequent DP workflows should be able to process Hive tables with Snappy compression.

Note that if you add a new Data Processing node, you must manually copy the Hadoop native libraries to the new node.