Applying transformation scripts to project data sets

You can apply your transformation script at any point to make changes in your project data set. When the script finishes running, users working with your project can view, search, use guided navigation and interact with the transformed data in Transform, Explore, and Discover areas of Studio.

The transformed data set is only available within your project. The Commit operation does not add a new data set to the Catalog, nor does it modify the source data in Hive.

Note: Due to the way BDD converts Hive source table data types to its own data types, applying your script to the project's data set may result in some omitted data types. For example, some complex Hive data types that do not match the Dgraph data types are omitted. For more information, see Data type conversion.

To commit your script:

In the Transformation Editor, click Commit at the bottom of the transformation script.
Transform becomes locked and a message appears stating that the operation may take several minutes to complete. Don't leave or refresh the page until the script finishes running.

When the script finishes running, Transform displays a message indicating whether it succeeded or failed. If it succeeds, you can refresh Transform to view the transformed data set.

When you commit your transformation script, the data processing component in Big Data Discovery does the following:
  1. Obtains the schema for the transformed data set from the Dgraph.
  2. Transforms the data using the transformation script.
  3. Creates a new project data set based on the schema and metadata and populates it with the transformed data.

You can continue to work on your script after you apply it to the data set (recall that a project data set in Big Data Discovery is a sample of your source Hive table). You can also reapply the transformation script to the data set as many times as you like.