The Data Enrichment modules increase the usability of your data by discovering value in its content.
Bundled in the Data Enrichment package is a collection of modules along with the logic to associate these modules with a column of data (for example, an address column can be detected and associated with a GeoTagger module).
During the sampling phase of the Data Processing workflow, some of the Data Enrichment modules run automatically while others do not. If you run a workflow with the DP CLI, you can use the --excludePlugins flag to specify which modules should not be run.
After a data set has been created, you can run any module from Studio's Transform page.
When Data Processing is running against a Hive table, the Data Enrichment modules that run automatically obtain their input pre-screened by the sampling stage. For example, only an IP address is ever passed to the IP Address GeoTagger module.
All Data Enrichment modules ignore both the primary-key attribute of a record and any attribute whose data type is inappropriate for that module. For example, the Entity extractor works only on string attributes, so that numeric attributes are ignored. In addition, multi-assign attributes are ignored for auto-enrichment.
Note that when the Data Processing workflow finishes, you can manually run any of these modules from Transform in Studio.
The supported languages are specific to each module. For details, see the topic for the module.
The types and names of output attributes are specific to each module. For details on output attributes, see the topic for the module.