1.2.1.13 System-level Reference Data Library

The following sets of System-level Reference Data are provided with EDQ.

Note:

Reference Data lists and maps that are shipped with EDQ are named with an asterisk before their name to differentiate them from Reference Data that you create.

Many of these lists and maps are used by processors by default. You may change these (by using them in projects, modifying them, and copying them back to the System library), though it is advisable to create new lists and maps with different names for your own needs, so that they will not be overwritten when EDQ is upgraded.

Oracle also provides packs of Reference Data for specific types of data, and for solving specific problems - for example, lists of known telephone number prefixes, name and address lists, and regular expressions for checking structured data such as URLs. These packs are available as extension packs for EDQ.

Reference Data Name Purpose

*Base Tokenization Map

A reference data set used to tokenize data in the Parse processor, covering only a limited set of characters. Preserved for backward compatibility purposes.

*Character Pattern Map

A reference data set used to generate patterns in the Pattern processors, covering only a limited set of characters. Preserved for backward compatibility purposes.

*Date Formats

A list of standard formats for recognizing dates.

*Delimiters

A list of commonly used delimiters.

*Email Regex

A default regular expression used to check email addresses syntactically.

*No Data Handling

The standard EDQ set of No Data characters.

*Noise Characters

A list of common 'noise' characters.

*Number Bands

An example set of Number Bands, for the Number Profiler.

*Number Formats

A list of standard formats for recognizing numbers.

*Standardize Accented Characters

A character map used to standardize accented characters to their unaccented equivalents.

*UK Postcode Regex

A default regular expression used to check UK postcodes syntactically.

*Unicode Base Tokenization Map

The default reference data set used to tokenize data in the Parse processor, covering the entire Unicode range.

*Unicode Character Pattern Map

The default reference data set used to generate patterns in the pattern processors, covering the entire Unicode range.