Using Apache Hive

Apache Hive is a data warehouse system built on top of Apache Hadoop and facilitates reading, writing, and managing large datasets residing in distributed storage using SQL.

Performance Tuning

This is a general guideline when running queries against smaller datasets. The FetchTask feature provides the following benefits in the mentioned scenario/usecases.

When working with smaller datasets, you benefit from the FetchTask feature in specific scenarios or use cases when executing queries. The feature can be controlled using the "hive.fetch.task.conversion" parameter, which is set to "more" by default. FetchTask is an efficient task that directly accesses the file to provide results, without starting a MapReduce job for the incoming query. The feature is applied when selecting specific columns with a limit, subject to additional requirements such as a single data source (one table or partition), no subquery, no aggregation or distinct, and not applying to views or joins. Therefore, running queries such as "select col1 as alias1, col2 from table where partitionkey='somePartitionValue'", "select id from t limit 1;", or "select col1 as alias1, col2 from table where partitionkey='somePartitionValue'", FetchTask is leveraged. However, if you set "hive.fetch.task.conversion" to minimal, FetchTask will be applied when executing "select *" with a limit.

When dealing with larger datasets and enabling FetchTask, tune the server and client heap size based on the dataset size when firing select queries from BI tools, Beeline, Hue, and so on. Alternatively, to eliminate memory bottlenecks from the HiveServer, set "hive.fetch.task.conversion" to None to disable the FetchTask feature and run MapReduce jobs.

Hive Delete Table Query Failure

The Hive delete table query fails with error DELETE from table1.

DELETE from table1;

FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.

Delete support isn't available for non-transactional or non-acid tables in Hive. To delete a table, you must create a transactional table.

Example:

CREATE TABLE emp12(id int,age int)TBLPROPERTIES ('transactional'='true');