When you run a transformation script against a project data set and commit it, Studio modifies the project data set (this is the private copy of the data set), but does not modify the underlying data set as it appears in the Catalog to other Studio users (the public copy of the data set).
As shown in this workflow diagram, you create a project based on a data set and transform the private copy of the data set in your project by committing the transformation script.
As you change the transformation script, you can commit it again as necessary to continue modifying the project data set. The transformations apply to your version of the data set in project (this is the private copy of the data set); they do not affect the public version of this data set in Catalog.
Interaction of transformations with updates
Here is a summary of how changes you make in Transform interact with updates:You can apply a transformation to the private copy of this data set in your project.
When a reloaded version of this data set appears in the Catalog, the private copy in your project is not automatically updated. Studio prompts you to Accept data updates". If you do, Studio creates a new private copy of the reloaded public data set and your transformations are applied to that private copy.
If you have not transformed this data set in your project, you can update it with DP CLI.
Once you transform the DP CLI-loaded data set in your project, you can update it with newer or additional data using DP CLI if the data set has been already loaded in full or is configured for updates using Configure for Updates.
If the data set has not been loaded in full or configured for updates, and you apply transformations, they are applied to the private version of this data set in your project. This means that to run transformations on data sets which you want to continuously update with DP CLI, the data sets must be loaded in full or configured for updates.