Transform Data Using Replace

Transform data in a column using a simple pattern match or a regular (regex) expression. For example, in a State column, you might change CA to California.

You can use any Perl Compatible Regular Expression (PCRE), which are also referred to as regex expressions. Transforms are applied to every applicable row in a data set.
  1. In your project (on the Prepare pane) or data set editor, select the text column that you want to transform.
  2. Click Options, and then click Replace.
  3. In the Replace step editor, specify how you'd like to match values.
    • Click Use regular expression to match using a complex regular expression (regex).
    • Click Match entire values only to change "male" to "M" but not "female" to "feM".
  4. In the String to replace field and New string field, specify the literal text or regular expressions to search and replace (case-sensitive).
    Examples:
    • To change "2553 Bolman Court" to "2553 #Bolman# #Court#", select Use regular expression, enter ([A-Za-z]+) in the String to replace field, and enter #$1# in the New String field.
    • To change "male" with "M", select Match entire values only, enter male in the String to replace field, and enter M in the New String field. . This changes "male" to "M" but won't change "female" to "feM".

    When you complete both fields, you'll see a preview of the transformed data in the preview pane. Adjust the replace options until you get the transformation you want in the preview pane.
  5. Click Add Step to save the details and add the transform step to the Preparation Script pane.
    You apply the transformation when you click Apply Script.

Example Replace Transforms Using Regular Expressions

Here're just a few examples of using regular (regex) expressions to transform data.

In these examples, the Search Expression column shows what you'd enter in the String to replace field, and the Replace Value shows what you'd enter in the New string field.

Table 4-1 Example replace transforms

Search Expression Replace Value Original Text Replaced Text Notes
@([a-z]+)(?=\.[a-z]{3}) @example MichelePFalk@yahoo.com MichelePFalk@example.com This example replaces domain details in email addresses.
^Gray|Grey$ Silver

Grey

Gray

Graystone

Silver

Silver

Graystone

The ^$ characters mean only look for entire string matches. The vertical bar is the regular expression for OR, so in this case the regular expression looks for either "Gray" or "Grey" and replace with "Silver".

The string Graystone isn't transformed because the regular expression is looking for entire value matches only.

\d+ 9999

8398 Park Street

123 Oracle Parkway

9999 Park Street

9999 Oracle Parkway

This regular expression looks for one" \d" or more "+" digits and replaces them with "9999". The replace also works when the original text has only three digits.
([A-z]+)(\d+) $2

UA101654

US829383

101654

829383

This regular expression is looking for one "[A-z]" or more "+" consecutive letters followed by one" \d" or more "+" digits - each of the two expressions is bounded by parentheses, which captures two groups - the first one ([A-z]) and the second one (\d+) - these groups are numbered automatically and can then be used in the replacement by using the dollar sign for the second group, for example, "$2".
([A-z]+)(\d+) Postal Code: $2

UA101654

US829383

Postal Code: 101654

Postal Code: 829383

This example uses the same match expression as the previous example, except that it shows how to insert your own replacement text in addition to a group. Text can be inserted before and after a recalled match group.