Data sets in Catalog vs data sets in projects

It is important to distinguish between data sets in the Catalog, and data sets you add to a project.

Here is why:
  • The Catalog only contains data sets that are discovered in a data lake, or that you add to BDD from a personal file or import from a JDBC source. A data set in a project represents your own modified version of the data set. Once you move a data set to a project, it is similar to creating a personal branch version of the data set.
  • You can edit data set metadata and some attribute metadata from the Catalog or the Explore view, but data set altering operations, such as transformations, enrichments, and other attribute metadata changes, require editing the data set in a project.
  • You can perform some updates of data sets from the Catalog, such as reloading a newer version of a data set. In order to run scheduled updates using the DP CLI, a data set must be part of a project. This is beneficial if you want to run updates periodically, and automate them, by adding them to your scripts that use DP CLI.