About Data Preparation

You can transform and enrich the data that you're preparing for analysis.

Tutorial icon Tutorial

When you create a project and add a data set to it, the data undergoes column level profiling that runs on a representative sample of the data. After profiling the data, you can implement transformation and enrichment recommendations provided for the recognizable columns in the data set. The following types of recommendations are provided to perform single-click transforms and enrichments on the data:

  • Global positioning system enrichments such as latitude and longitude for cities or zip codes.
  • Reference-based enrichments, for example, adding gender using on the person’s first name as the attribute for make the gender decision.
  • Column concatenations, for example, adding a column with the person’s first and last name.
  • Part extractions, for example, separating out the house number from the street name in an address.
  • Semantic extractions, for example, separating out information from a recognized semantic type such as domain from an email address.
  • Date part extractions, for example, separating out the day of week from a date that uses a month, day, year format to make the data more useful in the visualizations.
  • Full and partial obfuscation or masking of detected sensitive fields.
  • Recommendations to delete columns containing detected sensitive fields.

You can use and configure a wide range of data transformations from the column’s Options menu. See Transform Data Using Column Menu Options.

When you transform data, a step is automatically added to the Preparation Script pane. A blue dot indicates that Apply Script hasn't been executed. After applying the script, you can make additional changes to the data set, or you can create a project, or click Visualize to begin your analysis.

As each transformation and enrichment change is applied to the data, you can review the changes. You can also compare the data changes with the original source data verify that the changes are correct.

The data transformation and enrichment changes that you apply to a data set affect all projects and data flows that use the same data set. When you open the project that shares the data set, a message appears indicating that the project uses updated data. You can create a data set from the original source that doesn’t contain the data preparation changes. When you refresh the data in a data set, the preparation script changes are automatically applied to the refreshed data.