|Oracle® Enterprise Data Quality for Product Data Endeca Connector Installation and Configuration Guide
Release 11g R1 (220.127.116.11)
Part Number E29135-02
|PDF · Mobi · ePub|
Poor data undermines website usability by degrading user experience leading to fewer sales. In contrast, good data supports the guided navigation that is important for website usability. The Oracle Enterprise Data Quality for Product Data (EDQP) Endeca Connector enables you to dramatically improve the guided navigation of your website and requires good data to operate efficiently. With good data, it is possible to experience the full richness of search using guided navigation, including dimension search with drill-down. Good data also eliminates large unsearchable result sets and redundant navigation options. The EDQP Endeca Connector allows you to create good data to support your Endeca implementation.
The Endeca Connector automates data standardization, reducing manual effort, increasing quality, reliability and scalability. The business maintains the rules independent of the Information Technology organization, enabling the business to make and deploy new changes very quickly. The business has increased flexibility to meet merchandising requirements, including more complete categories with better quality. As a result, more dimensions are available to improve the search experience. The Endeca Connector optimizes navigation, reduces cost, risk and time.
In production, Endeca resides on a server and responds to calls from web pages, providing guided navigation services. EDQP is used to extract and standardize the product data that Endeca uses. These two systems are connected by the Endeca Connector that is used by an Endeca pipeline at data-load time. This Endeca Connector adapter allows Endeca to call an EDQP Data Service Application (DSA) to process individual lines of data. This DSA uses EDQP elements to extract and return attributes (called dimensions by Endeca) based on the attributes defined in item definitions in one or more specific data lenses.
There are two components to the Oracle Enterprise Data Quality for Product Data (EDQP) Endeca Connector as follows:
This initialization component updates the Endeca Developers Studio project with Dimensions, Properties, and Precedence Rules directly from the DataLens Server.
This run-time component is integrated with the Endeca Forge Processing and is used when processing the input data.
The PDQ-Endeca Connector program is run to define the Endeca Dimensions and to create the input data mapping for the dimensions in the pipeline configuration file. There are additional configuration options for setting the dimension properties for all the EDQP-generated dimensions. The PDQ-Endeca Connector Attribute Discovery configures the Endeca project directly from an Oracle DataLens Administration Server.
This component is implemented as a DSA Add-In Transformation and is completely integrated with the EDQP system.
The PDQ-Endeca Connector program should only be run when there have been changes to the data lenses to add or modify the item definitions. It should also be run if new data lenses are added to the main Process Map. Simply re-load the Endeca Project in the Developer Studio to see the changes made by the PDQ-Endeca Connector program.
The Endeca Connector Adapter is a Java class that implements the Endeca application programing interface (API) for adding attributes into the data flow during the Endeca baseline update processing. The PDQ-Endeca Connector Adapter transforms data directly from a Production Oracle DataLens Server.
The PDQ-Endeca Connector Adapter is integrated into the Endeca pipeline process during the configuration, and does not need to be modified after installation. The PDQ-Endeca Connector Adapter runs whenever the Endeca baseline update is performed.
The following libraries comprise the PDQ-Endeca Connector Adapter:
||The library for the Oracle DataLens Server API.|
||The library for the Oracle DataLens Server core and utility classes used by the API.|
||The library for third party components.|
||The library for the PDQ-Endeca Connector DSA Transform Add-In and the PDQ-Endeca Connector Adapter.|
The initial part of the integration happens with the Endeca Loader process. The loader process is a DSA, which is configured to extract the relevant attributes from the data lenses and insert them into the Endeca pipeline as dimensions. The configuration for this is composed of two main pieces, the dimension discovery, and the precedence discovery.
Dimension discovery requires the following parameters: project name, project (file) location, and standardization name. Using the project location, the DSA will read in the
pipeline.epx file and look for the PdqAdapter Java Manipulator. From this Java Manipulator, it retrieves the name of the parser DSA from the value of the
DSA_MAP pass through variable. It then looks at the configuration of the parser DSA, and identifies all of the data lenses used for data processing in the DSA. The 'dlsapp_parser' DSA is included as an example in the connector package. The 'dlsapp_parser' DSA package includes the 'Writing_Instruments' data lens. The data lens is then opened and the attributes scanned. The discovery looks for the standardization defined in the loader parameters then reads all the attributes used in the standardization (equivalent to what is displayed on the Order Attributes sub-tab in the Knowledge Studio.)
After the dimension discovery has identified t the attributes in the data lenses, they must be added to the pipeline. The discovery process directly modifies the Endeca XML files (and the
pipeline.epx, which is really an XML file). Both are added to the dimension rule, as well as a mapping rule for properties of the same name. The loader typically is aware of what must be modified and attempts not to override anything that has been manually set. The loader can only scan one external dimensions file so if you have multiple x-defs, those dimensions can end up being duplicated. Since the loader tries not to override anything that has been manually configured, adding a property or dimension directly into the pipeline prevents the loader from creating conflicting external dimensions (and the parser then uses the existing dimension.)
The precedence discovery has a simpler configuration and process. The precedence discovery similarly needs to know the project name, project location, and standardization to use and the dimension to tie the attributes to in the precedence hierarchy. This must be a pipeline dimension to avoid process errors. The precedence discovery looks through the data lenses in the same manner as the dimension discovery to identify the attributes and build a precedence rule assigning the sub-dimension to the parent.
The loader process prepares the pipeline for the actual baseline run. The pipeline adapter and the parser DSA will then work in conjunction to provide the values for those attributes into the pipeline. These two pieces are incorporated and run as a part of the Endeca baseline process. The baseline process is started through your normal mechanism. The pipeline is typically modified to add the
PdqAdapter after a cache, and the data is then forked off into two routes. A record manipulator strips all the unused properties from the Oracle side of the stream. The
PdqAdapter java manipulator is then called, and the standardized and attributed data is joined back into the main stream.
PdqAdapter calls out to an external java process that manages the flow of information to the Oracle DataLens Server, and adds the return data back into the Endeca pipeline. The data in the pipeline passes through the adapter, and is sent to the Oracle DataLens Server as configured. On the Oracle DataLens Server, the data is processed by the parser DSA. The data is standardized and cleansed according to the rules defined in the data lenses. At the end of the DSA, there must be a single text output returned; there can be additional outputs, as long as the Do NOT return results to caller is selected on the Output Information tab of the output step. The
PdqAdapter then receives the results from the Oracle DataLens Server, and returns them as properties into the pipeline. These properties are turned into dimensions by the
PropMapper (the normal Endeca methodology).