When you apply your transformation script to the project data set or to the source Hive table (when you create a new data set from within Transform), the data processing in Big Data Discovery converts most of the Hive data types to its corresponding Dgraph data types. However, this can result in some of the original data types being changed or omitted. This topic discusses these data type conversions in detail.
For information on complex types in Hive tables, see https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-ComplexTypes. The types that are present in your source Hive tables depend on the Hadoop environment you use.
For information on which data types are supported by Big Data Discovery, see the Data Processing Guide.
The following table describes how different Hive data types are affected by transformation scripts. The table lists the data types the source Hive table can contain and shows the data types in the Dgraph (mdex:<type>) to which they are converted.
Source Hive table data type (before the transformation script is applied) | Dgraph data type | Target Hive table data type (after the transformation script is applied) |
---|---|---|
BOOLEAN | mdex:boolean | BOOLEAN |
TINYINT | mdex:long | BIGINT; this type is converted to Long during ingest. |
SMALLINT | mdex:long | BIGINT ; this type is converted to Long during ingest. |
INT | mdex:long | BIGINT; this type is converted to Long during ingest. |
BIGINT | mdex:long | BIGINT |
FLOAT | mdex:double | DOUBLE |
DOUBLE | mdex:double | DOUBLE |
DECIMAL | mdex:double | DOUBLE ; this may result in loss of precision. |
DATE | mdex:dateTime | TIMESTAMP |
TIMESTAMP | mdex:dateTime | TIMESTAMP |
STRING | Discovered mdex:<type> | STRING (or other primitive types) |
CHAR | Discovered mdex:<type> | STRING (or other primitive types) |
VARCHAR | Discoveredmdex:<type> | STRING (or other primitive types) |
ARRAY (complex) | Multi-assign of the ARRAY type. For example, for an ARRAY of decimals, it becomes a multi-assign attribute of mdex:double. | ARRAY (complex) of the types obtained from the Dgraph type. |
STRUCT (complex) | None | Multiple fields of this format: struct_(structName)_(fieldName) |
BINARY | None | Unsupported; the entire field or column is omitted. |
MAP (complex) | None | Unsupported; the entire field or column is omitted. |
UNION (complex) | None | Unsupported; the entire field or column is omitted. |