About updating a data set in the Catalog

In Studio, you can a update a data set that has a source type of Excel, delimited file, or JDBC by using Reload Data set. A reload operation updates a data set to include all records and attributes that have been added, modified, or deleted since the last reload or since the time the data set was created.

Studio does not automatically reload a data set after you add it to a project. Once a data set has been transformed or loaded within a project it becomes a private copy.

After you reload a data set in the Catalog, Studio automatically reloads the data set into a project if the project is using the public copy.

However, once a data set has been transformed in a project it becomes a copy that is specific to that project.

If the public version of this data set in Catalog is reloaded (that is, a newer version of this public data set appears in Catalog), then the next time you enter the project, Studio notices that you are using a private copy of a recently updated public version of this data set, and prompts you to "Accept data updates" by re-committing the transform script to the new version of the data set. This commit creates a new private copy of the reloaded public data set. The transform script is applied to that private copy.

Note:

To update data sets that were created directly in Hive, use the Data Processing CLI. For details, see the Data Processing Guide.