Data Set Manager

In Studio, the Data Set Manager includes information about all project data sets. You access Data Set Manager once you are in a project, from Project Settings.

Here is an example of data set's details in the Data Set Manager:

screenshot of a Data Set Manager with a data set name, data set logical name, and other characteristics of the data set.

You can see that the project includes one data set. You can also see the number of records and attributes in it.

The Data Source Type shows the type of the source file, such as Excel, or CSV, or it shows "Hive", if the data originated from a Hive table. In this example, the Data Source Type is Excel. This means the data set originated from the spreadsheet that was loaded in Studio and then was reloaded. You can tell this by observing the dates for creation and update.

The Data Source field shows the name of the source file. In this example, it is WarrantyClaims.xls.

Notice the Data Set Logical Name. This is the name that each data set has, whether it exists in Catalog or belongs to a project. When a data set is in Catalog, it has one data set logical name. When you move it into a project, the data set's logical name changes. To run scripted updates with DP CLI, you need to note the correct data set logical name, so that you know which data set you are going to update. For information on scripted updates, see the Data Processing Guide.