1.3.11.19 Generate Initials

The Generate Initials processor transforms values into their initials, for example to transform "Bayerische Motoren Werke" to "BMW".

The Generate Initials transformation is most commonly used to match data (or cluster records for matching) where both the abbreviated, and non-abbreviated forms of names (or other terms) are used. It is useful in order to find matches such as "International Business Machines" and "IBM", which are hard for a computer to match without first initializing each value. An option is included to ensure that short 'words' such as "IBM" are not initialized to "I".

The following table describes the configuration options:

Configuration Description

Inputs

Specify any String or String Array type attributes that you want to convert to initials. Number and Date attributes are not valid inputs.

Note that if you input an Array attribute, the transformation will apply to all array elements, and an Array attribute will be output.

Options

Specify the following options:

  • Delimiters Reference Data: allows the use of a standard set of characters that are used to split up words before generating initials. Specified as Reference Data. Default value: *Delimiters.

  • Delimiter characters: specifies an additional set of characters that are used to split up words before generating initials. Specified as free text. Default value: Space.

  • Ignore upper case single words of length: allows the Generate Initials processor to leave alone any single word values (that is, where no word splits occurred) of up to a number of characters in length, and which are all upper case (for example, 'IBM').

    Specified as an integer. Default value: 4.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

The following data attributes are output:

  • [Attribute Name].initials: a new attribute with the initialized values. Value is derived from the original attribute value, converted to initials.

Flags

None.

Normally, the Generate Initials transformation simply ignores the case of the original value, and generates upper case initials for each separate word it finds, as separated by the specified delimiters. For example, the values "A j Smith", "ALAN JOHN SMITH" and "Alan john smith" are all initialized as "AJS". However, there may be some values which are already initialized, for example, "PWC", "IBM", "BT", which should not be further initialized to "P", "I" and "B" respectively.

These can be distinguished by the fact that they are:

  • single word values,

  • already in upper case, and

  • only a few characters in length.

The Ignore upper case single words of length option allows you to specify a length of word (in characters) below or equal to which you do not want to initialize single upper case word values.

For example, if set to 4, the values "PWC, "BT", "RSPB" and "IBM" would be ignored during the initialization process as they are 4 characters or less in length, are single word values, and are already upper case. By contrast, "IAN JOHN SMITH" would still be initialized to "IJS", as although the word "IAN" is less than 4 characters in length, and is already upper case, it is not a single word value. Also, "RSPCA" would be initialized to "R" as it is over 4 characters in length.

The Generate Initials transformer presents no summary statistics on its processing.In the Data view, each input attribute is shown with its new derived initialized attribute to the right.

Output Filters

None.

Example

In this example, the Generate Initials transformation is used to transform company names into their initialized values, using the default configuration, that is:

  • Delimiters Reference Data: not used

  • Delimiters: space

  • Ignore upper case single words of length: 4

Note that 'BMW' is not initialized to 'B' as it is a single upper case word with only 3 characters, so is assumed to represent initials already.

BusName.Parse BusName.Initials (asc)

BMW

BMW

Bayerische Motorren Werke

BMW

Bayerishe Motorren Werke

BMW

Broad Oak Woodcraft

BOW

Brunswick Properties

BP

Body Perfect

BP

Byron Pawnbrokers

BP