- Analytical Application Infrastructure User Guide
- Data Model Management
- Configurations for File Formats for Hive Infodom
Configurations for File Formats for Hive Infodom
Hive file format refers to how records are stored in the file. The supported
file formats are Text, Sequence, RC, Avro, Parquet and ORC. Model Upload component
accepts the Input File Format and Output File Format as inputs at three
levels:
- Configuration table entries.This is the OFSAA instance-level configuration. This is applicable to all Information Domains in the instance. Configuration table entries are:
- HIVE_INPUT_FILE_FORMAT– Default value is seeded as org.apache.hadoop.mapred.TextInputFormat.
- HIVE_OUTPUT_FILE_FORMAT – Default value is seeded as org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.
- Model-level properties (Model UDP)You can define Model UDPs to hold the input and output file formats. These will be applied to all tables in the model. UDP names are the same as the configuration parameters (HIVE_INPUT_FILE_FORMAT and HIVE_OUTPUT_FILE_FORMAT).
- Table-level properties (Table UDP)File formats can be applied at an individual table-level by specific table level UDPs. UDP names are the same as the configuration parameters (HIVE_INPUT_FILE_FORMAT and HIVE_OUTPUT_FILE_FORMAT).
Note:
- Configuration Table data are overridden by Model UDPs, which in turn will be overridden by Table UDPs.
- Hive file formats are support only for creating new tables.
The supported File Formats are listed in the following table:Table 6-4 Supported File Formats
Types Input File Format Output File Format Text File org.apache.hadoop.mapred.TextInputFormat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Sequence File org.apache.hadoop.mapred.SequenceFileInputFormat org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat RC File org.apache.hadoop.hive.ql.io.RCFileInputFormat org.apache.hadoop.hive.ql.io.RCFileOutputFormat Avro File org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat ORC File org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Parquet File parquet.hive.DeprecatedParquetInputFormat parquet.hive.DeprecatedParquetOutputFormat