About Similarity Discovery

The similarity discovery feature is a web service that allows you to compare two transform files and predict the similarity between the two datasets. This comparison is done on the profile metadata of the two transform files and not the actual huge data.

You can use this feature to identify if a transform file is similar to another transform file. This prediction approach reduces the time and resources required to compare the actual big datasets. Other uses of this feature include:

  • Identifying potentially duplicate datasets

  • Analyzing drifts in the new dataset while compared to the previous datasets from the same data source

  • Identifying similar columns for blending into useful datasets

The prediction results are displayed in JSON format in the web page itself. You can use a browser plugin like JSONView to automatically display the JSON results in a readable format.