Apache Hive Issues

Troubleshoot Apache Hive issues for Big Data Service clusters.

The Hive Query Fails With a RegexSerDe Class Not Found Exception

Troubleshooting the Hive query fails with a RegexSerDe class not found exception.

The hive-contrib jar, which includes the {{org.apache.hadoop.hive.contrib.serde2.RegexSerDe}} class, isn't sent to the MapReduce/Tez jobs by default.

Complete one of the following:

  1. Add hive-contrib jar to the distributed cache using the addJar command.
  2. Use the {{org.apache.hadoop.hive.serde2.RegexSerDe}} class that is available in the hive-serde jar. The hive-serde jar includes the RegexSerDe class along with the other commonly used SerDe classes, and is sent to MapReduce/Tez jobs by default. Therefore, you don't need to run the addJar command explicitly.

    We recommend this option.

Querying Nested Data Within Object Storage

Troubleshooting external tables jobs failing.

When creating an external table using Hive with data that are present in Object Storage and are in a nested folder structure, some jobs fail.

Note

The format of data stores in Object Storage is parquet.

To read any data from Object Storage with no job failures, enter the following configuration in Spark

  1. Access Apache Ambari.
  2. From the side toolbar, under Services select Spark3.
  3. Select Configs.
  4. Expand the Custom spark3-defaults section.
  5. Set spark.sql.hive.convertMetastoreParquet to False.
  6. Expand the Custom spark3-hive-site-override section.
  7. Set mapred.input.dir.recursive to True.
  8. Select Restart.