1.3.6.1 Process Product Data

The Process Product Data processor connects to an instance of Oracle Enterprise Data Quality for Product Data (EDQ-P) version 5.6.2 through version 11g and uses a production Data Service Application (DSA) to process product data using semantic rules; for example, to enhance and add structure to unstructured product data.

Note:

The processor will only appear if the EDQ server is configured to connect to an EDQ-P instance using an edqp.properties file. This file must be created in oedq_local_home/edqp folder with the following settings:

  • server = [name or IP address of the EDQ-P server]

  • port = [the http port of EDQ-P server. This will be 2229 in a default installation]

  • batchsize = [number of records to submit to EDQ-P at a time – defaults to 1000]

    A batchsize greater than 1000 may cause an Out of Memory error.

The Process Product Data processor allows EDQ-P to be used within an EDQ process to parse and match product data with a DSA.

The following table describes the configuration options:

Note:

This processor always appears with a re-run marker, indicating that it will be completely re-executed each time the process is run, regardless of whether or not its configuration has changed. This will also mean that processors that are downstream of the processor will need to be rerun. This is because there may be changes made outside of the OEDQ application that could lead to different results on subsequent executions.

Configuration Description

Inputs

The inputs to the processor should correspond to the expected inputs of the selected DSA.

Options

Specify the following options:

  • DSA Name (selection): the name of a deployed (in production) DSA on the configured server.

  • Output Name (selection): the name of an output step in the selected DSA. This is used to drive the output attributes from the processor. The processor will return the record set and attributes as configured in the selected DSA and Output step.

Outputs

The output attributes from the processor are determined by the selected DSA and Output step in the Options tab. The set of attributes will correspond to the configuration of the output step of the DSA in OEDQ-P.

Flags

The following flag is output:

edqp.success (Y/N)

  • Y - The record was returned by the OEDQ-P DSA.

  • N - The record was not returned by the OEDQ-P DSA.

Note:

The processor is suitable for record-by-record processing through EDQ-P; for example, for parsing product descriptions using a DSA. For EDQ-P operations that need to work across a record set, such as matching, Oracle recommends calling an EDQ-P job using an EDQ External Task, and sharing data using either files or a staged data area in a database. As EDQ is by its nature multi-threaded, the processor assumes that the DSA it uses can scale horizontally by calling multiple instances of an EDQ-P job (one per thread).

The Process Product Data processor presents no summary statistics on its processing.

In the Data view, each input attribute is shown with the output attributes to the right.

Output Filters

The following are output filters:

  • Returned – records that were returned from the selected DSA and output step.

  • Not Returned – records that were input to, but not returned from, the selected DSA and output step.

Example

In this example, an OEDQ-P DSA is used to parse and enhance unstructured product descriptions relating to Electrical Resistors.

id description edqp.Id edqp.Description

5001

RESP ARY 5% 16 PIN 10OHM

5001

Resistor 10 Ohm 5% 16 Pin Array

5002

!gz9m;;) v!#Q 8jmASKqtfA7

5003

mfax 75 ohm 1/4 w resp 20%

5003

Resistor 75 Ohm 20% 0.25 Watt Array

5004

array 16 pin 85 ohm 5% resp

5004

Resistor 85 Ohm 5% 16 Pin Array

5005

array 16 pin 62 Ohm 5% RESP

5005

Resistor 62 Ohm 5% 16 Pin Array

5006

array 16 pin 62 Ohm 5% RESP

5006

Resistor 62 Ohm 5% 16 Pin Array

5007

1% 1/10 W THN CH2.21 OHM R...

5007

Resistor 2.21 Ohm 1% 0.1 Watt T...