Configurations for File Formats for Hive Infodom

Hive file format refers to how records are stored in the file. The supported file formats are Text, Sequence, RC, Avro, Parquet and ORC. Model Upload component accepts the Input File Format and Output File Format as inputs at three levels:
  1. Configuration table entries.
    This is the OFSAA instance-level configuration. This is applicable to all Information Domains in the instance. Configuration table entries are:
    • HIVE_INPUT_FILE_FORMAT– Default value is seeded as org.apache.hadoop.mapred.TextInputFormat.
    • HIVE_OUTPUT_FILE_FORMAT – Default value is seeded as org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.
  2. Model-level properties (Model UDP)
    You can define Model UDPs to hold the input and output file formats. These will be applied to all tables in the model. UDP names are the same as the configuration parameters (HIVE_INPUT_FILE_FORMAT and HIVE_OUTPUT_FILE_FORMAT).
  3. Table-level properties (Table UDP)
    File formats can be applied at an individual table-level by specific table level UDPs. UDP names are the same as the configuration parameters (HIVE_INPUT_FILE_FORMAT and HIVE_OUTPUT_FILE_FORMAT).

    Note:

    • Configuration Table data are overridden by Model UDPs, which in turn will be overridden by Table UDPs.
    • Hive file formats are support only for creating new tables.
    The supported File Formats are listed in the following table:

    Table 6-4 Supported File Formats

    Types Input File Format Output File Format
    Text File org.apache.hadoop.mapred.TextInputFormat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    Sequence File org.apache.hadoop.mapred.SequenceFileInputFormat org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
    RC File org.apache.hadoop.hive.ql.io.RCFileInputFormat org.apache.hadoop.hive.ql.io.RCFileOutputFormat
    Avro File org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
    ORC File org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
    Parquet File parquet.hive.DeprecatedParquetInputFormat parquet.hive.DeprecatedParquetOutputFormat