Data type conversions

When you apply your transformation script to the project data set or click Create a new data set from within Transform), the Data Processing component converts most of the Hive data types to its corresponding Dgraph data types. However, this can result in some of the original data types being changed or omitted. This topic discusses these data type conversions in detail.

For information on complex types in Hive tables, see https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types. The types that are present in your source Hive tables depend on the Hadoop environment you use.

For information on which data types are supported by Big Data Discovery, see the Data Processing Guide.

The following table describes how different Hive data types are affected by transformation scripts. The table lists the data types the source Hive table can contain and shows the data types in the Dgraph (mdex:<type>) to which they are converted.

Source Hive table data type (before the transformation script is applied) Dgraph data type Target Hive table data type (after the transformation script is applied)
BOOLEAN mdex:boolean BOOLEAN
TINYINT mdex:long BIGINT; this type is converted to Long during ingest.
SMALLINT mdex:long BIGINT ; this type is converted to Long during ingest.
INT mdex:long BIGINT; this type is converted to Long during ingest.
BIGINT mdex:long BIGINT
FLOAT mdex:double DOUBLE
DOUBLE mdex:double DOUBLE
DECIMAL mdex:double DOUBLE ; this may result in loss of precision.
DATE mdex:dateTime TIMESTAMP
TIMESTAMP mdex:dateTime TIMESTAMP
STRING Discovered mdex:<type> STRING (or other primitive types)
CHAR Discovered mdex:<type> STRING (or other primitive types)
VARCHAR Discoveredmdex:<type> STRING (or other primitive types)
ARRAY (complex) Multi-assign of the ARRAY type. For example, for an ARRAY of decimals, it becomes a multi-assign attribute of mdex:double. ARRAY (complex) of the types obtained from the Dgraph type.
STRUCT (complex) None Multiple fields of this format: struct_(structName)_(fieldName)
BINARY None Unsupported; the entire field or column is omitted.
MAP (complex) None Unsupported; the entire field or column is omitted.
UNION (complex) None Unsupported; the entire field or column is omitted.