1.3.11.33 RegEx Split

The RegEx Split processor provides a way to split up the data in an attribute into an array, using a regular expression to define where the splits should occur.

Use RegEx Split to split up data where you need a more advanced way of splitting up the data than using delimiters. For example, you may want to separate the data where one of a set of characters occurs, or a variable length of a set of characters occurs.

Regular Expressions

Regular expressions are a standard technique for expressing patterns and manipulating Strings that is very powerful once mastered.

Tutorials and reference material about regular expressions are available on the Internet, and in books, including: Mastering Regular Expressions by Jeffrey E. F. Friedl published by O'Reilly UK; ISBN: 0-596-00289-0.

There are also software packages available to help you master regular expressions, such as RegExBuddy, and online libraries of useful regular expressions, such as RegExLib.

The following table describes the configuration options:

Configuration Description

Inputs

Specify one or more String or String Array attributes.

Options

Specify the following options:

  • Regular expression: the regular expression to be used as a delimiter to split the data. Specified as a regular expression. Default value: None.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

The following data attributes are output:

  • RegExSplit: a new Array attribute with the result of the RegEx Split Value is derived from the result of the RegEx split. Note that the data that matched the regular expression itself acts as a delimiter, and so does not appear in the array.

Flags

The following flags are output:

  • RegExSplitSuccess: indicates whether the RegEx Split was successful or not. Possible values are Y/N.

The following table describes the statistics produced by the profiler:

Statistic Description

Success

The number of records which were split using the regular expression.

Failure

The number of records which were not split using the regular expression.

Output Filters

The following output filters are available:

  • Records with a successful split

  • Records with an unsuccessful split

Example

In this example, RegEx Split is used to split data from a Notes attribute on an Employees table either side of a person's initials (2 or 3 upper case characters found in a sequence).

  • Regular expression: ([A-Z]{2,3})

  • Results (successful replacements):

Notes RegExSplit

started 14/10/1995 JBM ref557

{started 14/10/1995 }{ ref557}

started 15/5/95 JBM ref557

{started 15/5/95 }{ ref557}

start date 15/6/1998 HM etn247

{start date 15/6/1998 }{ etn247}

started 2/1/2004 RLJ ref-1842

{started 2/1/2004 }{ ref-1842}

started 8/10/2000 JBM ref557

{started 8/10/2000 }{ ref557}

started 10/6/2001 JBM ref557

{started 10/6/2001 }{ ref557]