Extracting names of people, places, or organizations from attribute values

The Entity Extraction transformation searches attributes for names of people, places, or organizations. The resulting attribute contains a delimited list of the found entities. If the transformation does not find any entities, the resulting attribute is empty.

For example, for the text: "In New York, the Metropolitan Museum of Art is the largest museum. Other popular museums include the Guggenheim (designed by Frank Lloyd Wright) and the Museum of Modern Art."
  • If you run Entity Extraction to extract places, the resulting value would be something like "New York, Metropolitan Museum of Art, Guggenheim, Museum of Modern Art".
  • If you run Entity Extraction to extract people, the resulting value would be "Frank Lloyd Wright".

To extract entity names from an attribute's values:

  1. In the Catalog, select a project.
  2. Select Transform.
  3. Locate an attribute of type String that contains entity information that you want to extract and select the column.
  4. From the transform menu, select Advanced > Extract entities.
  5. Select the type of entities you want to extract (people, places, or organizations). You select one or all of the available entities.
  6. Specify a prefix for the new attribute.
    Studio creates a new attribute with the prefix value combined with a suffix of <prefix>_person (for People), <prefix>_location (for Places), and <prefix>_organization (for Organizations).
  7. Either click Preview to see the previewed results of running the transformation, or click Add to Script to save the transformation step to the script.

If you are done making changes to the project data set, you can commit the changes. See Running the transformation script against a project data set.