1.3.11.41 Strip Words

The Strip Words transformation processor removes any occurrences of words that match a Reference Data list from attribute values.

Strip Words can be used to remove extraneous words from attributes, often with a view to creating values for matching. For example, when matching companies using a Company Name field, it may be useful to remove less significant words that occur in various forms, or which may occur in some values and not others, such as LTD, LIMITED, UK, PLC and so on.

The following table describes the configuration options:

Configuration Description

Inputs

Specify any String or String Array type attributes from which you want to strip words. Number and Date attributes are not valid inputs.

Note that if you input an Array attribute, the transformation will apply to all array elements, and an Array attribute will be output.

Options

Specify the following options:

  • Reference data: the list of words that you want to strip from attribute values. Specified as Reference Data. Default value: None.

  • Delimiters: provides a way of specifying a standard, reusable set of delimiter characters for breaking up value into words, and allows you to use control characters as delimiters. Note that only single characters (not strings of characters) can be used as delimiters. Multi-character delimiters will be ignored. Specified as Reference Data. Default value: *Delimiters.

  • Delimiters list: allows you to specify delimiters to use without having to create reference data for simple delimiters such as space or comma. Note that if these are used in addition to a reference list, all delimiters from both options will be used to break up the data. Specified as a free text entry. Default value: Space.

  • Ignore case?: determines whether or not to ignore case when matching the list of words to strip. Specified as Yes/No. Default value: Yes.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

The following data attributes are output:

  • StrippedWords: a new attribute derived from the original attribute value, with any words that matched your reference list stripped out. The original delimiters used in the input value will be preserved.

Flags

None.

The Strip Words transformer presents no summary statistics on its processing.

In the Data view, each input attribute is shown with its new derived attribute with numbers stripped to the right.

Output Filters

None.

Example

In this example, Strip Words is used to remove less significant words such as 'Limited', 'Ltd.', 'Services' and 'Associates' from a field containing Company Names:

BUSINESS Business.StrippedWords

Kamke & Ellis Ltd.

Kamke & Ellis

Sanford Electrical Co

Sanford Electrical

C T V Services

C T V

W F Electrical Contractors Limited

W F Electrical Contractors

Eco-Systems Group

Eco-Systems

Milbourne Associates

Milbourne