Entity extractor

The Entity extractor module extracts the names of people, companies and places from the input text inside records in source data.

The Entity extractor locates and classifies individual elements in text into the predefined categories, which are PERSON, ORGANIZATION, and LOCATION.

The Entity extractor supports only English input text.

Configuration options

This module does not automatically run during the sampling phase of a Data Processing workflow, but you can launch it from Transform in Studio.

Output

For each predefined category, the output is a list of names which are ingested into the Dgraph as a multi-assign string Dgraph attribute. The names of the output attributes are:
  • <attribute>_entity_person
  • <attribute>_entity_loc
  • <attribute>_entity_org

In addition, the Transform API has a getEntities function that wraps the Name Entity extractor to return single values from the input text.

Example

Assume the following input text:
While in New York City, Jim Davis bought 300 shares of Acme Corporation in 2012.

The output would be:

location: New York City
organization: Acme Corporation
person: Jim Davis