About integrating Stratify taxonomies

Stratify taxonomies are used to evaluate unstructured data for cases where information does not exist in a database, character-delimited file, or other easily-classifiable format.

Unstructured source data requires different processing than structured source data. Structured data, such as example, databases, CSV files, character-delimited files, fixed-width files and so on, have name/value pairs that Endeca can translate into dimensions and Endeca properties.

Unstructured data, on the other hand, is not composed of name/value pairs that Endeca can translate into dimensions and Endeca properties. For unstructured data, you have to use tools like the Stratify Discovery System™ to evaluate the content of an unstructured document and assign the document a topic based on classification logic that you configure. In an Endeca pipeline, this topic becomes a property that can be used like any other property associated with an Endeca record; for example, it can be manipulated and mapped to dimensions or Endeca properties.

In the Stratify Discovery System, you use the Stratify Taxonomy Manager™ to build a taxonomy to organize your source data, and you use the Stratify Classification Server™ to classify unstructured source data against that taxonomy. Endeca Developer Studio provides the capability to include a Stratify taxonomy and transform it to an Endeca dimension. Endeca uses the results of document classification performed in Stratify to tag Endeca records, that is your unstructured documents, with classification properties. After the records contain classification properties, you can map the properties to dimension values.

You integrate Stratify into your pipeline by adding a dimension adapter to transform the Stratify taxonomy, an Endeca Crawler to crawl unstructured documents, and a record manipulator with a STRATIFY expression to access the Stratify Classification Server. Before integrating Stratify into your project, you will find it helpful to read the Endeca Forge Guide.