Two pass-through parameters set the language ID of input records on a global and per-record basis.
You can use the LANG and LANG_PROP_NAME pass-through parameters to
specify the global language ID and the per-record language ID of the input
records. The language ID is not case sensitive for both pass-through
parameters. For example, you can specify
EN
or
en
for English (American).
Note that the LANG_PROP_NAME value takes precedence, and if not present, the value of LANG is used as the language of the record.
Both can be specified in the CAS manipulator.
The LANG_PROP_NAME pass-through specifies the name of the record
property that contains the language ID for that record. If you do not specify
this pass-through, the language ID for each record will default to the value of
the LANG pass-through. For example, if the value for LANG is
en-GB
, then the term extractor assumes that all
the records are in English - UK.
If you do specify the LANG_PROP_NAME pass-through, the term extractor will evaluate each record as follows:
If the value of the LANG_PROP_NAME property matches the LANG setting, then terms are extracted from the record in that language.
If the value of the LANG_PROP_NAME property does not match the LANG setting, then terms are not extracted from the record. That is, the record is ignored for purposes of term extraction, but the record is otherwise processed by CAS. For example, if the value of LANG is
fr
(French - European) and the value of the LANG_PROP_NAME property isen-GB
(English - UK) for a given record, the terms extractor will ignore that record.If the value of the LANG_PROP_NAME property is null or the record does not contain the LANG_PROP_NAME property, the term extractor will assume that the language ID of the record is the same as the LANG setting and therefore will attempt to extract terms from the record in that language.
If you have documents in multiple languages, the LANG_PROP_NAME pass-through is useful to ensure that only records in the desired language (the LANG setting) are processed by the term extractor.