About transformations and transformation scripts

Transformations are changes you can make to your project data set, after the source data has been processed and loaded into Studio. Transformations can be thought of as a substitute for an ETL process of cleaning your data. Transformations can overwrite an existing attribute, modify attributes, or create new attributes.

For example, you can do any of the following transformations:

Most transformations are available directly as specific options in the Transform page of Studio.

You can you use the Groovy scripting language and a list of custom, predefined Groovy-based transform functions available in Big Data Discovery, to create a transformation script. Transformation scripts are collections of various transformations; they can contain any of the transform functions.

You can also write your own transformations from scratch using Groovy, within the same Transform page of Studio, using the Transformation Editor.

When you commit a transformation script to a project, the script runs against the data sample but does not affect the data set in the Catalog. You can either apply the transform script to your current project, or create a new data set using the transformation script: