Extracting key phrases from attribute values

The Extract key phrases transform extracts key phrases from a String attribute and creates a list of phrases in a new multi-assign attribute. The transform calculates key phrases using the TF/IDF algorithm, which takes the total number of times each term appears within the String and offsets that value by the number of times it appears within a larger body of work.

To extract key phrases from an attribute's value:

  1. In the Catalog, select a project.
  2. Select Transform.
  3. Locate an attribute that you want to extract phrases from and select the column.
  4. From the transform menu, select Advanced > Extract key phrases.
  5. In Input language, specify the language of the attribute value.
    This improves the accuracy of key phrase identification by applying a language specific identification model.
  6. Select Use smart casing for input text to better handle documents that are predominantly in either title case or upper case.
    Smart casing helps the transform better identify key phrases by first converting upper case text to lower case and then running the phrase extraction on the lower-case text.
  7. In New Attribute Name, specify the name of the attribute you want to create. By default, Studio creates a multi-assign attribute to store key phrases.
  8. Either click Preview to see the previewed results of running the transformation, or click Add to Script to save the transformation step to the script.

If you are done making changes to the project data set, you can commit the changes. See Running the transformation script against a project data set.