Enrichment functions are based on Data Enrichment modules used as
part of data processing in Big Data Discovery. You can use these functions to
extract meaningful information from your data and modify attributes to make
them more useful for analysis.
The same functions are described in the
Transform API Reference (Groovydoc).
More information on the Data Enrichment
modules is available in the
Data Processing Guide.
Transform supports the following enrichment
functions:
detectLanguage
Finds the language of a given document and returns an Oracle language
code (for example,
es for Spanish). For accurate results, the
text should contain at least ten words.
detectLanguageaccepts the following parameter:
- text.
This is the data in type String to perform language detection on.
geotagAddress*
A set of the following functions:
- geotagAddressGetCity
- geotagAddressGetCountry
- geotagAddressGetGeocode
- geotagAddressGetPostcode
- geotagAddressGetRegion
- geotagAddressGetSubRegion
- geotagAddressGetRegionID
- geotagAddressGetSubRegionID
Converts a valid address String to a Geocode object, such as city,
country, geocode, postcode, region, subregion or region and subregion IDs. This
is a wrapper function for the Address Geotagger data enrichment module. It adds
a multi-assign attribute (column) to your data set that contains the following
fields:
- city
- country
- geocode
(the address's latitude and longitude coordinates)
- latitude
- longitude
- population
- postal_code
- region
- sub_region
- Geoname ID for the
region or
sub_region
geoTagAddress* accepts the following parameters:
- arg1
address. The address String to process. This must be less than or
equal to 350 characters.
- Map. This is a map of
advanced options:
- PREFERRED_LEVEL. An
optional parameter in type String that specifies an administrative division to
improve accuracy. This can be set to only one of the following values
(case-insensitive):
- CITY. Target for a
city match.
- COUNTRY. Target for a
country match.
- REGION. Target for a
region match, such as "state" in the United States.
- SUB_REGION. Target
for a subregion match, such as "county".
- NONE. If this value
is used, the function returns the most populous location that most closely
matches the address String. This is the default value.
Note: Administrative divisions vary depending on the country, so
the returned values may be different than expected. Also, if your input value
is not in the acceptable list, an exception is thrown.
- STRICT_MODE. An
optional Boolean parameter that specifies how the function should handle
ambiguous or improperly-formatted addresses, such as one that contains an
incorrect postal code. This can be set to one of the following:
- true. If the address
is invalid, the function returns
null.
- false. If the address
is invalid, the function returns the closest match. This is the default.
The following example shows how to specify these parameters for a
function
geotagAddressGetSubRegion in a map:
geotagAddressGetSubRegion (' 1 Main Street Cambridge', ['PREFERRED_LEVEL':'CITY', 'STRICT_MODE':true])
geotagIPAddressGetCity
Converts an IP address to a Geocode and returns its
city field as an Object. This is a wrapper function
for the IP Address Geotagger data enrichment module that returns a single
value.
geoTagIPAddressGetCity accepts the following
parameters:
-
IPAddress. The IP address to process, in type String.
-
language. An optional String parameter that specifies the output
language. The default value is
null, which sets the language to English.
geotagIPAddressGetCountry
Converts an IP address to a Geocode and returns its
country field as an Object. This is a wrapper function
for the IP Address Geotagger data enrichment module that returns a single
entity type.
geoTagIPAddressGetCountry accepts the following
parameters:
- IPAddress. The IP
address to process, in type String.
- language.
An optional String parameter that specifies the output language. The default
value is
null, which sets the language to English.
geotagIPAddressGetGeocode
Converts an IP address to a Geocode and returns its
geocode field as an Object. This is a wrapper function
for the IP Address Geotagger data enrichment module that returns a single
entity type.
geoTagIPAddressGetGeoCode accepts the following
parameters:
- IPAddress. The IP
address to process, in type String.
- language.
An optional String parameter that specifies the output language. The default
value is
null, which sets the language to English.
geotagIPAddressGetPostCode
Converts an IP address to a Postal Code and returns its
postal_code field as an Object. This is a wrapper
function for the IP Address Geotagger data enrichment module that returns a
single entity type.
geoTagIPAddressGetPostCode accepts the following
parameters:
- IPAddress. The IP
address to process, in type String.
- language.
An optional String parameter that specifies the output language. The default
value is
null, which sets the language to English.
geotagIPAddressGetRegion
Converts an IP address to a Geocode and returns its
region field as an Object. This is a wrapper function
for the IP Address Geotagger data enrichment module that returns a single
entity type.
geoTagIPAddressGetRegion accepts the following
parameters:
- IPAddress. The IP
address to process, in type String.
- language.
An optional String parameter that specifies the output language. The default
value is
null, which sets the language to English.
geotagIPAddressGetRegionID
Converts an IP address to a Geocode and returns its Geoname ID for the
region field as an Object. This is a wrapper function
for the IP Address Geotagger data enrichment module that returns a single
entity type.
geoTagIPAddressGetRegionID accepts the following
parameters:
- IPAddress. The IP
address to process, in type String.
- language.
An optional String parameter that specifies the output language. The default
value is
null, which sets the language to English.
geotagIPAddressGetSubRegion
Converts an IP address to a Geocode and returns its
sub_region field as an Object. This is a wrapper
function for the IP Address Geotagger data enrichment module that returns a
single entity type.
geoTagIPAddressGetSubRegion accepts the following
parameters:
- IPAddress. The IP
address to process, in type String.
- language.
An optional String parameter that specifies the output language. The default
value is
null, which sets the language to English.
geotagIPAddressGetSubRegionID
Converts an IP address to a Geocode and returns its Geoname ID for the
sub_region field as an Object. This is a wrapper
function for the IP Address Geotagger data enrichment module that returns a
single entity type.
geoTagIPAddressGetSubRegion accepts the following
parameters:
- IPAddress. The IP
address to process, in type String.
- language.
An optional String parameter that specifies the output language. The default
value is
null, which sets the language to English.
getLocationEntities
Returns all location entities within a String as an Object. Location
entities are names of places, such as "Boston" or "Canada". This function
creates a new multi-assign column in your data set. This is a wrapper function
for the name Entity extractor data enrichment module that returns a single
entity type.
getLocationEntities accepts the following parameter:
- text. The
String to process.
getNegativeLocationEntitySentiment
Locates passages within a String that contain location entities and
returns the negative sentiment of those passages as an Object.
getNegativeLocationEntitySentiment accepts the
following parameters:
- text. The
String to process.
-
language. An optional parameter that specifies the language in type
String to improve accuracy. If set to
null (which is the default value), the language is
automatically detected. Supported language is English only.
getNegativeNounGroupsSentiment
Locates passages within a String that contain noun groups and returns
the negative sentiment of those passages as an Object.
getNegativeNounGroupsSentiment accepts the following
parameters:
- text. The
String to process.
-
language. An optional parameter that specifies the language in type
String to improve accuracy. If set to
null (which is the default value), the language is
automatically detected. Supported languages are English (UK/US), Portuguese
(Brazilian), Spanish, French, German and Italian.
getNegativeOrganizationEntitySentiment
Locates passages within a String that contain organization entities
and returns the negative sentiment of those passages as an Object.
getNegativeOrganizationEntitySentiment accepts the
following parameters:
- arg1. The
String to process.
-
language. An optional parameter that specifies the String's language
to improve accuracy. If set to
null (which is the default value), the language is
automatically detected. Supported language is English only.
getNegativePersonEntitySentiment
Locates passages within a String that contain person entities and
returns the negative sentiment of those passages as an Object.
getNegativePersonEntitySentiment accepts the
following parameters:
- arg1. The
String to process.
- language.
An optional parameter that specifies the String's language to improve accuracy.
If set to
null (which is the default value), the language is
automatically detected. Supported language is English only.
getNegativeTFIDFSentiment
Extracts key phrases in sentences that have a negative sentiment.
getNegativeTFIDFSentiment accepts the following
parameters:
- arg1. The
String to process.
- language.
An optional parameter that specifies the String's language to improve accuracy.
If set to
null (which is the default value), the language is
automatically detected. Supported languages are English (UK/US), Portuguese
(Brazilian), Spanish, French, German and Italian.
getOrganizationEntities
Returns an Object containing the organization entities found within a
String. This is a wrapper function for the Name Entity extractor data
enrichment module that returns a single entity type.
Note: This function creates a new multi-assign column in your data set.
getOrganizationEntities accepts the following
parameter:
- arg1. The
String to process.
getPersonEntities
Returns an Object containing the person entities found within a
String. This is a wrapper function for the Name Entity extractor data
enrichment module that returns a single entity type.
Note: This function creates a new multi-assign column in your data set.
getPersonEntities accepts the following parameter:
- arg1. The
String to process.
getPositiveLocationEntitySentiment
Locates passages within a String that contain location entities and
returns the positive sentiment of those passages as an Object.
getPositiveLocationEntitySentiment accepts the
following parameters:
- arg1. The
String to process.
-
language. An optional parameter that specifies the String's language
to improve accuracy. If set to
null (which is the default value), the language is
automatically detected. Supported language is English only.
getPositiveNounGroupsSentiment
Locates passages within a String that contain noun groups and returns
the positive sentiment of those passages as an Object.
getPositiveNounGroupsSentiment accepts the following
parameters:
- arg1. The
String to process.
-
language. An optional parameter that specifies the String's language
to improve accuracy. If set to
null (which is the default value), the language is
automatically detected. Supported language is English only.
getPositivePersonEntitySentiment
Locates passages within a String that contain person entities and
returns the positive sentiment of those passages as an Object.
getPositivePersonEntitySentiment accepts the
following parameters:
- arg1. The
String to process.
-
language. An optional parameter that specifies the String's language
to improve accuracy. If set to
null (which is the default value), the language is
automatically detected. Supported language is English only.
getPositiveOrganizationEntitySentiment
Locates passages within a String that contain organization entities
and returns the positive sentiment of those passages as an Object.
getPositiveOrganizationEntitySentiment accepts the
following parameters:
- arg1. The
String to process.
-
language. An optional parameter that specifies the String's language
to improve accuracy. If set to
null (which is the default value), the language is
automatically detected. Supported language is English only.
getPositiveTFIDFSentiment
Extracts key phrases in sentences that have a positive sentiment.
getNegativeTFIDFSentiment accepts the following
parameters:
- arg1. The
String to process.
- language.
An optional parameter that specifies the String's language to improve accuracy.
If set to
null (which is the default value), the language is
automatically detected. Supported languages are English (UK/US), Portuguese
(Brazilian), Spanish, French, German, and Italian.
getSentiment
Returns an Object containing the overall sentiment of a String. This
is a wrapper function for the Sentiment Analysis (document level) data
enrichment module. The String's sentiment can be one of the following:
getSentiment accepts the following parameters:
- arg1. The
String to process.
- language.
An optional parameter that specifies the String's language to improve accuracy.
Supported languages are English (UK/US), Portuguese (Brazilian), Spanish,
French, German, and Italian. If set to
null (which is the default value), the language is
automatically detected.
reverseGeotagGetCity
Returns the
city field from a Geocode as an Object. Searches for
cities within the specified radius from the entered Geocode. This is a wrapper
function for the Reverse Geotagger data enrichment module that returns a single
value.
reverseGeotagGetCity accepts the following parameter:
- geo. The
Geocode to process.
-
language. An optional parameter that specifies the output language.
The default value is
null, which sets the output language to English.
- proximityThreshold.
An optional parameter that specifies the maximum distance in miles allowed for
input geocode and output geographic location. If this parameter is not
specified, the default of 100 miles is used. If the distance exceeds the
threshold, null is returned.
reverseGeotagGetCountry
Returns the
country field from a Geocode as an Object. Searches
for countries within the specified radius from the entered Geocode. This is a
wrapper function for the Reverse Geotagger data enrichment module that returns
a single value.
reverseGeotagGetCountry accepts the following
parameter:
- geo. The
Geocode to process.
- language.
An optional parameter that specifies the output language. The default value is
null, which sets the output language to English.
- proximityThreshold.
An optional parameter that specifies the maximum distance in miles allowed for
input geocode and output geographic location. If this parameter is not
specified, the default of 100 miles is used. If the distance exceeds the
threshold, null is returned.
reverseGeotagGetPostCode
Returns the
postal_code field from a Geocode as an Object.
Searches for post codes within the specified radius from the entered Geocode.
This is a wrapper function for the Reverse Geotagger data enrichment module
that returns a single value.
reverseGeotagGetPostCode accepts the following
parameter:
- geo. The
Geocode to process.
-
language. An optional parameter that specifies the output language.
The default value is
null, which sets the output language to English.
- proximityThreshold.
An optional parameter that specifies the maximum distance in miles allowed for
input geocode and output geographic location. If this parameter is not
specified, the default of 100 miles is used. If the distance exceeds the
threshold, null is returned.
reverseGeotagGetRegion
Returns the
region field from a Geocode as an Object. Searches for
regions within the specified radius from the entered Geocode. This is a wrapper
function for the Reverse Geotagger data enrichment module that returns a single
value.
reverseGeotagGetRegion accepts the following
parameter:
- geo. The
Geocode to process.
-
language. An optional parameter that specifies the output language.
The default value is
null, which sets the output language to English.
- proximityThreshold.
An optional parameter that specifies the maximum distance in miles allowed for
input geocode and output geographic location. If this parameter is not
specified, the default of 100 miles is used. If the distance exceeds the
threshold, null is returned.
reverseGeotagGetRegionID
Returns the Geoname region ID field from a Geocode of the
region field as an Object. Searches for regions within
the specified radius from the entered Geocode. This is a wrapper function for
the Reverse Geotagger data enrichment module that returns a single value.
reverseGeotagGetRegion accepts the following
parameter:
- geo. The
Geocode to process.
-
language. An optional parameter that specifies the output language.
The default value is
null, which sets the output language to English.
- proximityThreshold.
An optional parameter that specifies the maximum distance in miles allowed for
input geocode and output geographic location. If this parameter is not
specified, the default of 100 miles is used. If the distance exceeds the
threshold, null is returned.
reverseGeotagGetSubRegion
Returns the
sub_region field from a Geocode as an Object. Searches
for sub-regions within the specified radius from the entered Geocode. This is a
wrapper function for the Reverse Geotagger data enrichment module that returns
a single value.
reverseGeotagGetSubRegion accepts the following
parameter:
- geo. The
Geocode to process.
-
language. An optional parameter that specifies the output language.
The default value is
null, which sets the output language to English.
- proximityThreshold.
An optional parameter that specifies the maximum distance in miles allowed for
input geocode and output geographic location. If this parameter is not
specified, the default of 100 miles is used. If the distance exceeds the
threshold, null is returned.
reverseGeotagGetSubRegionID
Returns the Geoname ID of the Geocode from the
sub_region field as an Object. Searches for
sub-regions within the specified radius from the entered Geocode. This is a
wrapper function for the Reverse Geotagger data enrichment module that returns
a single value.
reverseGeotagGetSubRegion accepts the following
parameter:
- geo. The
Geocode to process.
-
language. An optional parameter that specifies the output language.
The default value is
null, which sets the output language to English.
- proximityThreshold.
An optional parameter that specifies the maximum distance in miles allowed for
input geocode and output geographic location. If this parameter is not
specified, the default of 100 miles is used. If the distance exceeds the
threshold, null is returned.
runExternalPlugin
Runs the external Groovy script as defined in an external file of
pluginName, and returns the result of the script.
runExternalPlugin accepts the following parameters:
-
pluginName. The name of the external plugin.
- arg1. An
argument passed to the external plugin.
toPhoneticHash
Produces a String hash of the input text (English only) that
represents the phonetics of the text.
A word's phonetic hash is based on its pronunciation, rather than its
spelling. One application for phonetic hashes is search engines. If a search
term does not return any results, the search engine can compare the term's
phonetic hash to the hashes of other terms, and return results for the term
that is the best fit. For example, "purple" and "pruple" have the same phonetic
hash (PRPL), so a search for the misspelled term "pruple" would still yield
results for "purple".
toPhoneticHash accepts the following parameter:
- arg1. The
String to process.