Noun Group extractor

This plugin extracts noun groups from the input text.

The Noun Group extractor retrieves noun groups from a string attribute in each of the supported languages. The extracted noun groups are sorted by C-value and (optionally) truncated to a useful number, which is driven by the size of the original document and how many groups are extracted. One use of this plugin is in tag cloud visualization to find the commonly occurring themes in the data.

A typical noun group consists of a determiner (the head of the phrase), a noun, and zero or more dependents of various types. Some of these dependents are:
  • noun adjuncts
  • attribute adjectives
  • adjective phrases
  • participial phrases
  • prepositional phrases
  • relative clauses
  • infinitive phrases

The allowability, form, and position of these elements depend on the syntax of the language being used.

Design

This plugin works by applying language-specific phrase grouping rules to an input text. A phrase grouping rule consists of sequences of lexical tests that apply to the tokens in a sentence, identifying a grouping action. The action of a grouping rule is a single part of speech with a weight value, which can be negative or positive integers, followed by optional component labels and positions. The POS (part of speech) for noun groups will use the noun POS. The components must either be head or mod, and the positions are zero-based index into the pattern, excluding the left and right context (if exists).

Configuration options

There are no configuration options.

Note that this plugin is not run automatically during the Data Processing sampling phase (i.e., when a new or modified Hive table is sampled).

Output

The output of this plugin is an ordered list of phrases (single- or multi-word) that are ingested into the Dgraph as a multi-assign, string attribute.

The name of the output attributes is <colname>_ noun_groups.

In addition, the Transform API has the extractNounGroups function that is a wrapper around the Name Group extractor to return noun group single values from the input text.

Example

The following sentence provides a high-level illustration of noun grouping:
The quick brown fox jumped over the lazy dog.
From this sentence, the extractor would return two noun groups:
  • The quick brown fox
  • the lazy dog

Each noun group would be ingested into the Dgraph as a multi-assign string attribute.