1.3.11.34 Replace

The Replace processor uses a Reference Data map to transform data - for example in order to standardize it. The first column of the map is used to match values against, and the second column is used to control the replacement.

The replacement performed may be a simple whole value replacement - for example to replace the value 'Oracle Ltd' with 'Oracle Limited', or it may be a replacement of a part of the input value - for example to replace 'ltd' with 'limited' if it is found at the end of a CompanyName attribute, or to replace the String 'decsd' with 'deceased' wherever it is found. The way the Reference Data is matched, and thus the data is replaced, is controlled using one of the following options:

  • Whole value

  • Contains

  • Starts with

  • Ends with

  • Delimiter match

The matching against the Reference Data may also be case sensitive or case insensitive.

Note that when using the Contains, Starts with, or Ends with options, there may be multiple matches against the lookup column of the reference data. In this case, Replace always makes one, and only one, replacement. So, for example when performing a 'Contains' replacement where the value 'PT' is replaced by 'PINT', the value '10PT - APTITUDE BITTER' would be transformed into '10PINT - APTITUDE BITTER' and not '10PINT APINTITUDE BITTER'.

If you choose to use the Delimiter match option, and split up the data before matching using delimiters, any of the split values that match the lookup column of the replacement map will be replaced, even if there are many matches in the input value.

The way the Replace processor decides how to make its replacement where there are multiple matches can be controlled using a configuration option.

By default, the map is simply checked in order, and the first match against the map from the input data is used for the replacement. So, for example, if your replacement map contains the values 'Lyn' and 'Lynda', where 'Lyn' appears first in the list, the input value 'Lynda' would undergo the replacement using the lookup value 'Lyn' in the map.

However, you can control this using the 'Match longest value' option. If you select this option, each matched reference entry will be assessed for length, and the longest match used. So, in the example above, the replacement using the lookup value 'Lynda' in the map would be performed.

Use the Replace processor for standardization - for example to standardize all CompanyName values so that different suffixes that mean the same thing are represented in a standard way (for example, Ltd/Limited, Assoc/Assc, Cncl/Council etc.)

Replacing Dates

It is possible to use Replace to replace Date values. However, for this to work, the date values in the Reference Data map must be in the standard ISO format; that is, either YYYY-MM-DD (for example, 1900-01-01), or YYYY-MM-DD HH:mm:ss (for example, 1900-01-01 00:00:00). Note that it is possible to replace a Date with a Null value - for example to remove invalid dates.

The following table describes the configuration options:

Configuration Description

Inputs

Specify a single attribute from which you want to replace values using a reference data map. The attribute may be a String, or a String Array.

If an array is input, the replacements will be made at the array element level, and an array (with the data after the replacements have been performed) will be output.

Options

Specify the following options:

  • Replacements: matches the attribute values against the lookup column in the map. Where there is a match, the matching value is replaced by the value in the right-hand column. Specified as a Reference Data. Default value: None.

  • Match longest value?: controls which replacement to perform where there are multiple matches against the map, in Starts With, Ends With, or Contains replacement. Specified as Yes/No. Default value: No.

  • Ignore case?: determines whether or not to ignore case when matching the lookup column of the map. Specified as Yes/No. Default value: Yes.

  • Match list by: drives how to match the map, and therefore which part of the original value to replace. Specified as a Selection (Whole Value/Starts With/Ends With/Contains/Delimiter Match). Default value: Whole value.

  • Delimiters: when matching values to the map by splitting the data using delimiters, this allows you to specify the delimiter characters to use. Specified as a free text entry. Default value: Space.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

The following data attributes are output:

  • Replaced: a new String or Array attribute derived from the replaced value(s). Note that where there was no match from the input attribute value to the map, the original attribute value is carried forward into the new attribute.

Flags

The following flags are output:

  • ReplaceSuccess: indicates whether the RegEx Replace was successful, unsuccessful or invalid. Possible values are Y/N/-.

The following table describes the statistics produced by the profiler:

Statistic Description

Transformed

The number of records where a replacement was performed. Drill down on the number to see the records.

Untransformed

The number of records where a replacement was not performed.

Invalid

The number of records where the replacement failed as the replacement value was invalid for the input data type.

Note:

It is possible to use the Replace processor with attributes of any data type - Strings, Arrays, Numbers, or Dates. However, as Replace always uses the data type of the input attribute for the output attribute, there are some transformations you can choose to make that will mean the replaced value is invalid for the data type of the output attribute. For example, if you attempt to replace the Date value '2006-04-14' with 'Bad date' using a map, the value 'Bad date' is not a valid Date, and so the replacement fails. If you have any invalid replacements, you may need to convert the original attribute to a different data type before performing the replacements, or you may need to modify your Reference Data map to remove any invalid replacements.

Output Filters

The following output filters are available:

  • Records with transformed values

  • Records with untransformed values

  • Records with an invalid replacement

Example

In this example, the Replace processor is used to standardize English Counties and other similar data in attribute Address3 from the Customers table. The output attribute has been named Address3.stand.

In this case a Whole Value replacement was used. The following is an excerpt from the drill-down view of transformed records:

ADDRESS3 ADDRESS3.stand CU_NO

Lancs

Lancashire

13841

Cambs

Cambridgeshire

14053

OXON

Oxfordshire

14068

Leics

Leicestershire

14130

Linc

Lincolnshire

14207

Beds

Bedfordshure

14506