1.3.11.39 Split Records from Array

The Split Records from Array processor allows you to create many records from a single record, by splitting out new records for each element in an input array.

Use Split Records from Array where data that should be represented in many records has been wrongly captured in a single record; that is, to normalize data that has been wrongly denormalized.

Often, the denormalized data that you need to split out needs to be pre-processed, before using this processor. For example, in the following Orders table, multiple orders (with multiple order numbers and product descriptions) have been wrongly entered into a single record, using free text fields:

Order_ID Order_Number Product_Desc

O574112

2788143 / 2788144

Home PC Package / Color Printer

In this case, the Order_Number and Product_Desc attributes both need simple pre-processing using the Make Array from String processor to create arrays using the / character as a separator. The arrays can then be input into Split Records from Array to split out the records as follows:

Order_ID Order_Number.normalized Product_Desc.normalized

O574112

2788143

Home PC Package

O574112

2788144

Color Printer

Note above that many array attributes may be input to this processor. In this case, the number of output records for each input record will correspond to the number of elements in the array attribute with the largest number of elements. Data from each attribute that is not input is simply copied to all of the output records created from each input record. For example, if you split the following record, inputting Title.array and FirstName.array:

Cust_ID Title.array FirstName.array Surname

13451

{Mr}{Mrs}

{John}{Dorothy}{James}

Smith

The output records will be as follows:

Cust_ID Title.array FirstName.array Surname

13451

Mr

John

Smith

13451

Mrs

Dorothy

Smith

13451

James

Smith

Note that Title.array.normalized is Null for the last record, as there is no array element in Title.array that corresponds with the third element in FirstName.array.

The following table describes the configuration options:

Configuration Description

Inputs

Specify one or more Array attributes that you want to use to split records. Each element of the input array will be output as a single value.

Options

None.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

The following data attributes are output:

  • Split: holds the normalized attribute value for each input array attribute after record splitting.

Flags

The following flags are output:

  • SplitFlag: indicates whether the split from array was successful. Possible values: Y/N.

The following table describes the statistics produced by the profiler:

Statistic Description

Input records

The number of records that were input (that is, before splitting).

Output records

The number of records that were output (that is, after splitting).

Drill-down to see all output records.

Split %

The percentage of input records that were split into multiple output records.

Drill-down to see the output records where a split occurred (that is, records where one of the input array attributes contained more than one element).

Output Filters

None.

Example

In this example, a data set of People's Names is being prepared for matching. The data contains a number of aliases and alternative spellings for people's names. These are pre-processed into a single Aliases.Array attribute, and then split out using Split Records from Array so that each name can be matched separately.

Aliases.Array Aliases.Array.Split

{Jose Angel Veron}{Jose Veron}

Jose Veron

{Jose Angel Veron}{Jose Veron}

Jose Angel Veron

{Namik Zouahi}{Namiq Zouahi}{Namig Zouahi}

Namik Zouahi

{Namik Zouahi}{Namiq Zouahi}{Namig Zouahi}

Namiq Zouahi

{Namik Zouahi}{Namiq Zouahi}{Namig Zouahi}

Namig Zouahi

{Christine Moss}{Christine Lee}{Christine Graham}

Christine Moss

{Christine Moss}{Christine Lee}{Christine Graham}

Christine Lee

{Christine Moss}{Christine Lee}{Christine Graham}

Christine Graham