You can use either custom noun phrase grouping or OLT for noun phrase grouping through the CAS manipulator configuration.
In both options, OLT is used for tokenization, part-of-speech tagging, dynamic stemming, and sentence recognition.
For the Use OLT for NounGrouping setting:
A false value means custom logic is used for noun phrase grouping. This model helps you get similar behavior as in Forge-based term extraction.
Note
Only English and French languages are supported by this model.
A true value means that the OLT grammatical noun phrase grouping is used in this module. This model supports over fifty languages including English and French. See Supported languages for more information on the supported languages.
This section compares the output of a false value (Forge-based term extraction) and a true value (OLT-based term extraction in CAS). In OLT-based term extraction, noun groups are formed with better meaning and context. The following example uses the same text given in both Forge-based term extraction and OLT-based term extraction.
Here is the text:
Sachin Ramesh Tendulkar is a former Indian cricketer widely
acknowledged as the greatest batsmen of all time, popularly holding the title
"God of Cricket" among his fans. He is the only player to have scored one
hundred international centuries.
The following table shows the noun groups that were extracted:
Forge-based term extraction |
OLT-based term extraction using CAS |
---|---|
Sachin Ramesh Tendulkar |
Sachin Ramesh Tendulkar |
cricketer |
a former Indian cricketer |
greatest batsmen |
the greatest batsmen of all time |
title |
the title |
God |
God of Cricket |
fans |
his fans |