1.3.8.2 Reader

A Reader is a special type of processor that is used to read data at the beginning of a process. Readers may connect to any of the following sources of data:

  • Staged data (that is, a snapshot of data - either present in the repository or not - or output data that has been written by another process);

  • A Data Interface (which can then be redirected to different sources of data using Mappings);

  • A set of Reference Data;

  • A Real time provider of messages (for example, the inbound interface of a Web Service)

A process must contain at least one Reader, but may contain many Readers, if matching data from multiple sources.

Readers are used at the beginning of processes in order to select the sources of data that you are intending to work with in the process, and any selection and reordering of data attributes from that data source that are specific to the process you are intending to create. For example, for the purposes of a specific process, you may want to select only the name and address fields from a data source, and you may want to reorder them for the purpose of display throughout your process.

A Reader is automatically added to a process for you, since a process must always have at least one Reader.

Reader Source

Select the Type of data that you want to read from the following options:

  • Staged data - that is, a snapshot of data, or the named output of another process, in the EDQ repository

    Note:

    The snapshot does not necessarily have to exist in the repository. You may be intending to run the process in streaming mode, meaning the source data will not be copied into the repository.

  • Data Interface - that is, a configured source-independent interface of a set of data attributes

  • Reference Data - that is, a set of reference data that exists in the EDQ repository

  • Real time provider - that is, a direct connection to a real time source of messages

Select the Source of data from the available sources of the selected type.

All the available attributes in the data appear in the left pane. Select those that you want to work with in the process by using the arrow buttons to select, and de-select attributes:

Arrow Button Description

Single right arrow button

Selects the attributes highlighted in the left-hand pane as inputs to the process.

Double right arrow button

Selects all available attributes as inputs.

Single left arrow button

De-selects the selected inputs in the right-hand pane.

Double left arrow button

De-selects all inputs.

In the right-hand pane, the attributes that you have chosen to work with may be re-ordered by drag-and-drop.

The order that you specify in the Reader will be used to display results throughout the process.

Note:

If you know you are not intending to work with all the attributes of a given set of data, it is a good idea to exclude them in the Reader. This will make configuring your processors and browsing your results much more straight-forward as only the attributes you are interested in will be displayed.

Options

None

Execution

The Reader is a necessary part of any process, whatever the remit of that process is. Some processors are not suitable for certain types of execution, however. For example, it is not possible to match and consolidate data from numerous sources in a real time response process, but selecting a Real time Reader Source (as above) places no restrictions on the processors that are available for configuration, as the execution of a process is driven from how its Reader(s) and Writer(s) are configured.

In general, EDQ is designed for three modes of execution:

  • Batch execution, where a set of records in one or more data sources is processed in batch.

  • Real time monitoring execution, where EDQ acts as a data quality probe for a data source, monitoring incoming records for quality as they are created, but where no real time response to each record is expected.

  • Real time response execution, where EDQ processes records, and passes them back along with extra data, on a real time response interface.

Each processor in the library is listed with the execution modes that can be sensibly used with that processor.

Results Browsing

The results browser for a Reader displays all the records present in the underlying data store once a process has been run.

Output Filters

The Reader does not provide any output filters. All records are read from the specified source and made available to the remainder of the process.

Example

The following example shows the records that are read from the Customers table.

In this case, the Reader was configured to read all the data attributes from the source, without changing their order. No further processing has yet been defined:

CU_NO CU_ACCOUNT TITLE NAME GENDER BUSINESS

13810

00-23603-JD

Ms

Lynda BAINBRIDGE

F

Filling Station

13815

00-23615-PB

William BENDALL

M

Edge Kamke & Ellis Ltd

13833

00-23624-PB

Ms

Karen SMITH

F

13840

00-23631-JD

Miss

Patricia VINER

Catchpole Engineering Products

13841

00-23642-SH

Mr

Colin WILLIAMS

M

Sanford Electical Co