Reference Data

The following reference data sets are provided with the EDQ-PDS project.

EDQ-PDS consists of some pre-installed reference data sets as shown below.

Name Description

Key Generation - Blank Replacements for Abbreviate

Blank reference data for providing the required reference for the Abbreviate processor used in the Key Generation processor.

Key Generation - Product Description Strip Tokens

Tokens that are stripped from a product description attribute before key values are generated, for example single initials such as 'S' which may be too common to form keys but may be significant in matching.

Note that these are tokens that you specifically want to remove from key values but not from matching. See, Match Preparation - Product Description Strip Words for the list of tokens that are stripped from both.

Key Generation - Product Description Token Delimiters

Characters to use to delimit tokens in key generation - default is spaces.

Key Generation - Product Description Token Standardization

Token replacements to perform on product description tokens within key generation. These are applied in addition to the standardization and abbreviate performed in the Standardization process.

Match Preparation - Product Description Character strip/standardize

Individual characters to strip or standardize in the product description in preparation for key generation and matching.

Match Preparation - Product Description Strip Words

Tokens to strip from product descriptions for both key generation and matching, for example very common non-identifying words such as 'and' and 'the'.

Match Preparation - Product Description Standardize

Tokens to standardize in product description prior to matching.

Match Preparation - Product Name Character strip/standardize

Individual characters to strip or standardize from product name in match preparation.

Match Preparation - Product Name Standardize

Tokens to standardize in product name prior to matching.

Match Preparation - Strip Vowels

Contains vowels to strip from shortened product description.

Normalize Text- Remove Diacritics

Diacritic characters to remove from product description and name.

Normalize Text- Standardize Accented Characters

Standardization for accented characters.

Date - Formats

Valid formats for custom dates.

Profile - Product Data - Colors

Colors for standardization and extraction.

Profile - Product Data - Companies

Common retail companies for standardization and extraction.

Profile - Product Data - English Dictionary

Dictionary of English words - used for Profiling.

Profile - Product Data - Materials

Common materials for standardization and extraction.

Profile - Product Data - Number Bands

Number bands used for price profiling.

Profile - Product Data - Sizes

Sizes for standardization and extraction of sizes.

Profile - Product Data - Units of Measure Regex

Regular expressions for standardization and extraction of quantified units of measure.

Profile - Strip Letters

Vowels to strip when profiling.