Note: Due to the way BDD converts Hive source table data types to its
own data types, applying your script to the source table may result in some
omitted or changed data types. For example, some complex Hive data types that
do not match the Dgraph data types are omitted. For more information, see
Data type conversion.
To create a new data set:
If the script is
successful, the new Hive table will be added to the index and the new data set
will appear in
Catalog.
If you do not see the new data set in
Catalog, then the script failed. You can learn
more about why it failed by checking the Data Processing logs. For more
information, see
Transform logging.
When you apply your transformation script to the source Hive table,
data processing in Big Data Discovery does the following:
- Obtains the transformation
script from Studio.
- Retrieves the schema of
the transformed project data set from the Dgraph.
- Creates a new Hive table
(let's name it HT2 in this example), using the project data set's schema.
- Loads the data row by row
from the original source Hive table (let's name it HT1) to the HT2 Hive table,
and at the same time runs the transformation script on each loaded row, and
saves the transformed data as HT2.
- Samples the HT2 Hive table
(this is the new Hive table with the transformed data) and adds the resulting
data set to the
Catalog.