List of enrichment functions
Type | Name and description |
---|---|
java.lang.String |
detectLanguage(java.lang.String attribute) Find the language of a given document. |
java.util.Collection<java.lang.String> |
extractKeyPhrases(java.lang.String attribute, java.lang.String languageCode = null /* null is auto-detect */, java.lang.Boolean smartCasing = true) Extract Key Phrases using TF.IDF. |
java.util.Collection<java.lang.String> |
extractNounGroups(java.lang.String attribute, java.lang.String languageCode = null /* null is auto-detect */) Extract Noun Groups (or Noun Phrases) |
java.util.Collection<java.lang.String> |
extractWhitelistTags(java.lang.String attribute, java.lang.String whitelist, java.lang.String languageCode = LANGUAGE_DEFAULT, boolean caseSensitive = false, boolean matchWholeWords = true, java.lang.String delimiter = "\t") Use a dictionary-matching algorithm that locates elements of a finite set of strings (the "whitelist") within input text. |
java.lang.String |
geotagIPAddress(java.lang.String IPAddress, java.lang.String adminLevel) Geotag an IP address |
com.oracle.endeca.pdi.concepts.Geocode |
geotagIPAddressGetGeocode(java.lang.String IPAddress) Geotag an IP address |
java.lang.Object |
geotagStructuredAddress(java.lang.String country, java.lang.String region, java.lang.String subregion, java.lang.String city, java.lang.String postcode, boolean returnByPopulation = false, java.lang.String adminLevel = UserFunctions.ADMIN_LEVEL_GEOCODE) Geotag an address using structured fields |
java.lang.String |
geotagUnstructuredAddress(java.lang.String addressText, java.lang.String adminLevel, java.lang.String addressGrain = null, java.lang.Boolean validateAddress = null) Geotag an address |
com.oracle.endeca.pdi.concepts.Geocode |
geotagUnstructuredAddressGetGeocode(java.lang.String addressText, java.lang.String addressGrain = null, java.lang.Boolean validateAddress = null) Geotag an address |
java.util.Collection<java.lang.String> |
getEntities(java.lang.String attribute, java.lang.String entityType) Named Entity Recognition (NER) - extract for entity |
java.lang.String |
getSentiment(java.lang.String attribute, java.lang.String languageCode = null /* null is auto-detect */) Extract the two class sentiment for a given piece of text |
java.util.Collection<java.lang.String> |
getTermSentiment(java.lang.String textAttribute, java.lang.String termsAttribute, java.lang.String sentimentCategory, java.lang.String languageCode = null /* null is auto-detect */) Extract Phrases that are found in sentences that exhibit POSITIVE/NEGATIVE sentiment |
java.lang.String |
reverseGeotag(com.oracle.endeca.pdi.concepts.Geocode attribute, java.lang.String adminLevel, java.lang.Double proximityThreshold = null) Reverse Geotagging |
java.lang.Object |
runExternalPlugin(java.lang.String pluginName, java.lang.String attribute, java.util.Map options = [:]) Run the external Groovy script as defined in an external file of pluginName, and return the result of the script The script should contain a method that implements
|
java.lang.String |
stripTagsFromHTML(java.lang.String attribute) Detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page |
java.lang.String |
toPhoneticHash(java.lang.String attribute) Produce a String hash of the input text (English only) that represents the phonetics of the text. |
Methods inherited from class | Name |
---|---|
class groovy.lang.Script |
groovy.lang.Script#println(), groovy.lang.Script#println(java.lang.Object), groovy.lang.Script#run(), groovy.lang.Script#run(java.io.File, [Ljava.lang.String;), groovy.lang.Script#setProperty(java.lang.String, java.lang.Object), groovy.lang.Script#getProperty(java.lang.String), groovy.lang.Script#print(java.lang.Object), groovy.lang.Script#printf(java.lang.String, java.lang.Object), groovy.lang.Script#printf(java.lang.String, [Ljava.lang.Object;), groovy.lang.Script#invokeMethod(java.lang.String, java.lang.Object), groovy.lang.Script#evaluate(java.lang.String), groovy.lang.Script#evaluate(java.io.File), groovy.lang.Script#getBinding(), groovy.lang.Script#setBinding(groovy.lang.Binding), groovy.lang.Script#getMetaClass(), groovy.lang.Script#setMetaClass(groovy.lang.MetaClass), groovy.lang.Script#wait(long, int), groovy.lang.Script#wait(long), groovy.lang.Script#wait(), groovy.lang.Script#equals(java.lang.Object), groovy.lang.Script#toString(), groovy.lang.Script#hashCode(), groovy.lang.Script#getClass(), groovy.lang.Script#notify(), groovy.lang.Script#notifyAll() |
class groovy.lang.GroovyObjectSupport |
groovy.lang.GroovyObjectSupport#setProperty(java.lang.String, java.lang.Object), groovy.lang.GroovyObjectSupport#getProperty(java.lang.String), groovy.lang.GroovyObjectSupport#invokeMethod(java.lang.String, java.lang.Object), groovy.lang.GroovyObjectSupport#getMetaClass(), groovy.lang.GroovyObjectSupport#setMetaClass(groovy.lang.MetaClass), groovy.lang.GroovyObjectSupport#wait(long, int), groovy.lang.GroovyObjectSupport#wait(long), groovy.lang.GroovyObjectSupport#wait(), groovy.lang.GroovyObjectSupport#equals(java.lang.Object), groovy.lang.GroovyObjectSupport#toString(), groovy.lang.GroovyObjectSupport#hashCode(), groovy.lang.GroovyObjectSupport#getClass(), groovy.lang.GroovyObjectSupport#notify(), groovy.lang.GroovyObjectSupport#notifyAll() |
Find the language of a given document. For accurate results, the text should contain at least ten words.
attribute
- TextExtract Key Phrases using TF.IDF. Supported languages are English (UK/US), Portuguese (Brazilian), Spanish, French, German & Italian
attribute
- TextlanguageCode
- OLT Language name or code (for example "en", "English", "German"). The language parameter is optional, when specified it will force the function to use a model specific to that language. When not specified, or when passed as null, the function will automatically detect the language model.smartCasing
- Automatically handle documents that are predominantly in either title or upper case.Extract Noun Groups (or Noun Phrases)
attribute
- TextlanguageCode
- OLT Language name or code (for example "en", "English", "German"). The language parameter is optional, when specified, it forces the function to use a model specific to that language. When not specified, or when passed as null, the function automatically detects the language model. Supported languages are English (UK/US), Portuguese (Brazilian), Spanish, French, German & Italian.Use a dictionary-matching algorithm that locates elements of a finite set of strings (the "whitelist") within input text. The function finds all occurrences of any whitelist terms. The input text is matched against a whitelist. Whitelist entries are always newline-delimited. Each line may be either a comment (indicated with a # as the first character), or a matching directive comprised of either one or two values (separated by "delimiter" character). The (optional) second value is used to rewrite the match output.
A very simple example whitelist:
We could rewrite this whitelist like this: (note
When this whitelist is run on the text "Radon is a chemical element with symbol Rn and atomic number 86.", it would produce an output list of ['Rn']
attribute
- Text the input documentlanguageCode
- (whitespace-delimited languages only)caseSensitive
- case sensitive (default: false - case-insensitive 'FOO' matches 'foo')matchWholeWords
- (default: true 'red' DOES NOT match 'reduce')delimiter
- (default: \\t Geotag an IP address
IPAddress
- IP addressadminLevel
- the desired output admin level, expected values are: 'City', 'Country', 'Postcode', 'Region', 'SubRegion', 'RegionID', 'SubRegionID'Geotag an IP address
IPAddress
- IP addressGeotag an address using structured fields
country
- field for country of the address, use null when unknownregion
- field for region of the address, use null when unknownsubregion
- field for subregion of the address, use null when unknowncity
- field for city of the address, use null when unknownpostcode
- field for postcode of the address, use null when unknownreturnByPopulation
- if set to true, returns the location with largest population. Optional parameter, default to false.adminLevel
- the desired output admin level, expected values are: 'City', 'Country', 'Postcode', 'Geocode', 'Region', 'SubRegion', 'RegionID', 'SubRegionID'. Optional Parameter, default to 'Geocode'.Geotag an address
addressText
- Text, an addressadminLevel
- The desired output admin level, expected values are: 'City', 'Country', 'Postcode', 'Region', 'SubRegion', 'RegionID', 'SubRegionID'addressGrain
- Helps the geotagger to find the most likely match for a given level, expected values are: 'None', 'City', 'Country', 'Region', 'SubRegion'. Optional parameter, default to 'None'.validateAddress
- Whether or not geotagger should validate the address. Optional parameter, default to false.Geotag an address
addressText
- Text, an addressaddressGrain
- Helps the geotagger to find the most likely match for a given level, expected values are: 'None', 'City', 'Country', 'Region', 'SubRegion'. Optional parameter, default to 'None'.validateAddress
- Whether or not geotagger should validate the address. Optional parameter, default to false.Named Entity Recognition (NER) - extract for entity
attribute
- Text input documententityType,
- which type of entity to extractExtract the two class sentiment for a given piece of text
attribute
- TextlanguageCode
- OLT Language name or code (for example "en", "English", "German"). The language parameter is optional, when specified it will force the function to use a model specific to that language. When not specified, or when passed as null, the function will automatically detect the language model. Supported languages English (UK/US), Spanish, French, German & Italian.Extract Phrases that are found in sentences that exhibit POSITIVE/NEGATIVE sentiment
textAttribute
- TexttermsArrtibute
- type of the termssentimentCategory
- positive terms or negative termslanguage
- OLT Language name or code (for example "en", "English", "German"). The language parameter is optional, when specified it will force the function to use a model specific to that language. When not specified, or when passed as null, the function will automatically detect the language model. Supported languages for Key Phrases and Noun Groups are English (UK/US), Spanish, French, German & Italian. Location, Person and Organization Entities are English only.Reverse Geotagging
attribute
- geocodeadminLevel
- the desired output admin level, expected values are: 'City', 'Country', 'Postcode', 'Region', 'SubRegion', 'RegionID', 'SubRegionID'proximityThreshold
- the maximun distance in miles allowed for input geocode and output geo location, if the distance exceeded the threshold, null will be returned. Optional parameter, default to 100.0Run the external Groovy script as defined in an external file of pluginName, and return the result of the script The script should contain a method that implements
def pluginExec(Object[] args) {
def attribute = args[0]
def options = args[1]
...
pluginName
- base name and extension of the Groovy script file, (i.e. MyPlugin.groovy)attribute
- input string to the scriptoptions
- options Map, containing any options to be used by Groovy script, default to be emptyDetect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page
attribute
- TextProduce a String hash of the input text (English only) that represents the phonetics of the text. The algorithm is designed for names, two similar sounding names, like 'Daren' and 'Darren' will both have the same hash.
attribute
- Text (works best on names)