Transform data in a column using a simple pattern match or a regular (regex) expression. For example, in a State column, you might change CA to California.
- In your project (on the Prepare pane) or data set editor, select the text column that you want to transform.
- Click Options, and then click Replace.
- In the Replace step editor, specify how you'd like to match values.
- Click Use regular expression to match using a complex regular expression (regex).
- Click Match entire values only to change "male" to "M" but not "female" to "feM".
- In the String to replace field and New
string field, specify the literal text or regular expressions to
search and replace (case-sensitive).Examples:
When you complete both fields, you'll see a preview of the transformed data in the preview pane. Adjust the replace options until you get the transformation you want in the preview pane.
- To change "2553 Bolman Court" to "2553 #Bolman# #Court#", select
Use regular expression, enter
([A-Za-z]+)in the String to replace field, and enter
#$1#in the New String field.
To change "male" with "M", select Match entire values only, enter
malein the String to replace field, and enter
Min the New String field. . This changes "male" to "M" but won't change "female" to "feM".
- To change "2553 Bolman Court" to "2553 #Bolman# #Court#", select Use regular expression, enter
- Click Add Step to save the details and add the transform
step to the Preparation Script pane.You apply the transformation when you click Apply Script.
Example Replace Transforms Using Regular Expressions
Here're just a few examples of using regular (regex) expressions to transform data.
In these examples, the Search Expression column shows what you'd enter in the String to replace field, and the Replace Value shows what you'd enter in the New string field.
Table 4-1 Example replace transforms
|Search Expression||Replace Value||Original Text||Replaced Text||Notes|
||@example||MichelePFalk@yahoo.com||MichelePFalk@example.com||This example replaces domain details in email addresses.|
|The ^$ characters mean only look for entire string
matches. The vertical bar is the regular expression for OR, so in
this case the regular expression looks for either "Gray" or "Grey"
and replace with "Silver".
The string Graystone isn't transformed because the regular expression is looking for entire value matches only.
8398 Park Street
123 Oracle Parkway
9999 Park Street
9999 Oracle Parkway
|This regular expression looks for one" \d" or more "+" digits and replaces them with "9999". The replace also works when the original text has only three digits.|
|This regular expression is looking for one "[A-z]" or more "+" consecutive letters followed by one" \d" or more "+" digits - each of the two expressions is bounded by parentheses, which captures two groups - the first one ([A-z]) and the second one (\d+) - these groups are numbered automatically and can then be used in the replacement by using the dollar sign for the second group, for example, "$2".|
||Postal Code: $2||
Postal Code: 101654
Postal Code: 829383
|This example uses the same match expression as the previous example, except that it shows how to insert your own replacement text in addition to a group. Text can be inserted before and after a recalled match group.|