Entity extractor

The Entity extractor module extracts the names of people, companies and places from the input text inside records in source data.

The Entity extractor locates and classifies individual elements in text into the predefined categories, which are PERSON, ORGANIZATION, and LOCATION.

The Entity extractor supports only English input text.

Configuration options

This module does not automatically run during the sampling phase of a Data Processing workflow, but you can launch it from Transform in Studio.

Output

For each predefined category, the output is a list of names which are ingested into the Dgraph as a multi-assign string Dgraph attribute. The names of the output attributes are:
  • <colname>_entity_person
  • <colname>_entity_loc
  • <colname>_entity_org
In addition, the Transform API has the following functions that are wrappers around the Name Entity extractor to return single values from the input text:
  • getPersonEntities returns the name of each person identified in the input.
  • getOrganizationEntities returns the name of each organization identified in the input.
  • getLocationEntities returns the name of each location identified in the input.

Example

Assume the following input text:
While in New York City, Jim Davis bought 300 shares of Acme Corporation in 2012.

The output might be:

ext__entity_loc: New York City
ext_entity_org: Acme Corporation
ext_entitY_person: Jim Davis