Structure of EDQ-PDS Key Generation Services

The key generation process takes the productdescriptionabbr attribute as produced by the previous standardization process. It applies an additional Character Replace processor to the attribute using the Key Generation - Product Description Character Strip/Standardize reference data. By default this strips/replaces with spaces, the remaining punctuation characters that were not removed in the standardization process.

Then it separates this into tokens (based on a reference data Key Generation - Product Description Token Delimiters, which by default just contains a space), and standardizes and strips these tokens with the reference data Key Generation - Product Description Token Standardization and Key Generation - Product Description Strip Tokens respectively.

The tokens are then re-constructed into a single string and have the Abbreviate processor applied (with default settings) to create a string which creates a single, tight product description cluster, which is by default trimmed to 10 characters (and labeled for modification). From the product description tokens, they will be processed to produce the metaphone (by default metaphone 4 but this will be labelled for modification) of the tokens which do not have a numeric character. The tokens containing numeric characters will be unmodified, except they will be trimmed to a maximum of 12 characters length (this will be labeled for modification).

Two key methods are available for use on these processed tokens, one which produces one key per token, and one which produces one key per unique pair of tokens. ). By default the maximum number of tokens per record produced by the key per token method will be 10, and the maximum number of tokens used for creating the token pairs will be 6 (producing a maximum of 15 token pairs). These two maximum values will be labeled for modification.

In addition to the key methods on the product description key methods will also be provided on
  • productnamestandardized

  • modelnumber (whitespace trimmed)

  • uid1,2,3standardized (whitespace trimmed)

  • manufacturepartnumber (whitespace trimmed)

Each key will have a unique prefix. Each key will be mapped to a single one of the array keyvalue attributes in the internal interface (keyvalue1, keyvalue2, etc).