The Entity extractor module extracts the names of people,
companies and places from the input text inside records in source data.
The Entity extractor locates and classifies individual elements in
text into the predefined categories, which are PERSON, ORGANIZATION, and
LOCATION.
The Entity extractor supports only English input text.
Configuration options
This module does not automatically run during the sampling phase of a
Data Processing workflow, but you can launch it from
Transform in Studio.
Output
For each predefined category, the output is a list of names which are
ingested into the Dgraph as a multi-assign string Dgraph attribute. The names
of the output attributes are:
- <colname>_entity_person
- <colname>_entity_loc
- <colname>_entity_org
In addition, the Transform API has the following functions that are
wrappers around the Name Entity extractor to return single values from the
input text:
- getPersonEntities
returns the name of each person identified in the input.
- getOrganizationEntities
returns the name of each organization identified in the input.
- getLocationEntities
returns the name of each location identified in the input.
Example
Assume the following input text:
While in New York City, Jim Davis bought 300 shares of Acme Corporation in 2012.
The output might be:
ext__entity_loc: New York City
ext_entity_org: Acme Corporation
ext_entitY_person: Jim Davis