Running the transformation script against a project data set

A project's full data set is not affected by the transformations until you run the transformation script against the project's data set. Until that time, Studio commits the transformations to the data sample that is currently associated with the project. Also, if you later change the sample size, the transformations that you have already applied are automatically applied to the sample.

The transformed data set is only available within your project. The Commit operation does not add a new data set to the Catalog, nor does it modify the source data in Hive.

Note: Due to the way BDD converts Hive source table data types to its own data types, applying your script to the project's data set may result in some omitted data types. For example, some complex Hive data types that do not match the Dgraph data types are omitted. For more information, see Data type conversion.

To run the transformation script against the project data set:

  1. In the Catalog, select a project.
  2. Select Transform.
  3. Expand the Transformation Script panel by clicking the grey bubble that indicates the project contains transforms that have not yet been committed.
  4. Click Commit to Project.
Commit to Project runs the transformations against the project data set but does not modify the underlying data set as it appears in the Catalog to other Studio users. This means that other Studio users can explore the untransformed data set in the Catalog, and project users can explore the transformed data set.

If you want to run transformations and make them available to other Studio users, commit the transformations to the project, and then create a new data set. The resulting new data set contains the transform changes and is available in the Catalog. See Creating a new data set from the transformed data.