Skip Headers
Oracle® Enterprise Data Quality for Product Data Endeca Connector Installation and User's Guide
Release 11g R1 (11.1.1.6)

E29135-03
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

3 Setting Up and Configuring

This chapter describes how to set up and configure the Forge process to use the Endeca Connector jar files.

Setting Up the Endeca Connector Adapter in the Endeca Development Studio

This section explains the following:

Adding the Endeca Connector Adapter

The Endeca Connector Adapter is the part of the Endeca pipeline that dynamically calls EDQP to retrieve the data lens attributes for each line of input data. These attributes are then mapped to dimensions in Endeca. The Endeca Connector Adapter must be added to your pre-existing pipeline process.

Add the Endeca Connector Adapter to the pipeline:

  1. Right-click on the ”Pipeline Diagram” and select the Java Manipulator.

    Surrounding text describes image010.png.
  2. Name the adapter PdqAdapter.

    This name is important because it provides validation with the Endeca Connector attribute discovery to verify servers, DSAs, and standardizations.

  3. Add the Class as:

    oracle.pdq.dlfoundry.adapter.PdqAdapter

  4. Add the pathnames to the Endeca Connector (PdqAdapter) libraries as in the following example.

    /Endeca/edqp/lib/opdq-api.jar
    /Endeca/edqp/lib/jdom-1.0.jar
    /Endeca/edqp/lib/opdq-core.jar
    /Endeca/edqp/opdq-connector-endeca.jar
    

    The pathnames must match the installed location you created in "Installing the Endeca Connector". These directories may be different than the example if you installed the Endeca Connector files into a different directory.

    Note:

    In a separate server installation where the Oracle DataLens Server is installed on a separate machine than the Endeca IAP server, the files in the $MW_HOME\edqp_template1\opdq-connector-endeca directory must be moved to the Endeca IAP server machine.

    Note:

    A Windows installation requires the use of a semi-colon ";" as the separator between the paths for each jar file; however, a Linux installation requires the use of a colon ":" as the separator. For example, the Windows path may be,

    /Endeca/edqp/lib/opdq-api1.jar;/Endeca/edqp/lib/jdom-1.0.jar;/Endeca/edqp/lib/opdq-core.jar;/Endeca/edqp/opdq-connector-endeca.jar

    As in the following example:

    Surrounding text describes image011.png.
  5. Click the Pass Throughs tab and add the following:

    • PDQ_SERVER_1 - This is the name or IP address of the Oracle DataLens Production server and the port (server:2229).

    • PDQ_SERVER_2 - This is an optional server for use in high availability and load balancing. For more information, see Appendix C, "Endeca Connector Robustness."

    • PDQ_SERVER_3 - This is an optional server for use in high availability and load balancing (note that there is no limit to the number of PDQ_SERVER_* entries)

    • DSA_MAP - This is the top-level DSA to call on the Oracle DataLens Server.

    • DSA_OUTPUT_STEP - This is an optional step name if the DSA has multiple outputs.

    • LOCALE - This is the input locale of the data.

    • BATCH_SIZE - The number of records to process in a single chunk.

    • PROPERTY_ID - The name of the ID field in the input data.

    • PROPERTY_ROUTE_INFO - The ”hint” used to efficiently route the data.

    • PROPERTY_DESC1 - The name of the first description field in the input data.

    • PROPERTY_DESC2 - The name of the second description field in the input data.

    • PROPERTY_ALT1 - 1st alternate data field (mfgName).

    • PROPERTY_ALT2 - 2nd alternate data field (mfgPartNo).

    • PROPERTY_ALT3 - 3rd alternate data field (user-defined).

    • RETURN_VAL1 - 1st return value from the DSA (after the ID).

    • RETURN_VAL2 - 2nd return value from the DSA.

    • RETURN_VAL3 - 3rd return value from the DSA.

    • REPLACE_UNDERSCORES_ONLY - A value of true will not proper case the attribute names, false will proper case the attributes. In either case, the underscores will be replaced with spaces.

      Note:

      This parameter is used by both the Endeca Connector Discovery processes and the PdqAdapter.

    • USE__PREFIX - A value of true will put a ”PDQ” prefix on all the attributes discovered; false will not.

      Note:

      This parameter is used by both the Endeca Connector Discovery processes and the PdqAdapter.

    • DEBUG - This toggles (true/false) debug tracing to a log file on or off. This should only be turned on when debugging, because it will slow down the Endeca Connector Adapter. The following log file is created in the Endeca project directory:

      Edf.Pipeline.RecordPipeline.JavaManipulator.PdqAdapter.log

      Surrounding text describes image012.png.
  6. Click OK.

  7. Insert the PdqAdapter into the pipeline flow between the load data nodes and the Property Mapper.

  8. Insert the PdqAdapter directly below the LoadData step in the Endeca Pipeline flow to ensure that the Endeca Connector Adapter only calls the Oracle DataLens Server a single time for each line of data to be processed as in the following:

    Surrounding text describes image013.png.
  9. Save your project.

Shared Parameters with the Discovery DSA

Several of the Endeca Connector Discovery Add-In Functions share some of the pass through parameters from the PdqAdapter. The Add-In Functions are:

  • Endeca.DiscoverDimensions

  • Endeca.DiscoverProperies

  • Endeca.DiscoverPrecedence

This simplifies the configuration and keeps the PdqAdapterPdqAdapter in sync with the DSA.

Surrounding text describes shared_param.png.

The following PdqAdapter pass through parameters are those that are shared:

  • PDQ_SERVER_1

  • PDQ_SERVER_2 (optional)

  • PDQ_SERVER_3 (optional)

  • DSA_MAP

  • REPLACE_UNDERSCORES_ONLY

  • USE_PDQ_PREFIX

Configuring the Endeca Connector Attribute Discovery with Application Studio

The Endeca Connector looks at the DSA that is being run as part of the Forge processing, and then determines exactly which data lenses are being called and exactly which standardizations are being run and exactly which attributes are being extracted. This unique list of attributes is then used to update the internal Endeca Dimensions and Properties by updating the configuration files used in the Endeca Project.

This section explains the following:

These sections creates the DSA that is necessary to the Forge processing and are progressive.

Configuring the Dimension Discovery

  1. Start your Oracle DataLens Server.

  2. Open one of the supported Web browsers for your environment. See the Oracle Enterprise Data Quality for Product Data Certification Matrix at

    http://www.oracle.com/technetwork/middleware/ias/downloads/fusion-certification-100350.html

    Locate Oracle Enterprise Data Quality in the Product Area column and then click the System Requirements and Supported Platforms for Oracle Enterprise Data Quality for Product Data 11gR1 (11.1.1.x) Certification Matrix (xls) link.

  3. Enter the following URL:

    http://hostname:port/datalens

    where hostname is the DNS name or IP address of the Administration Server and port is the listen port on which the Administration Server is listening for requests (port 2229 by default).

    If you configured the Administration Server to use Secure Socket Layer (SSL) you must add s after http as follows:

    https://hostname:port/datalens

  4. When the login page appears, enter a user name and the password. Typically, this is the user name and password you specified during the installation process.

    The Oracle DataLens Server Web pages are displayed and default to the Welcome tab.

    Surrounding text describes applaunch.png.
  5. Click the Application Studio button.

  6. Create a new project, add an input step and a core step. In this example, the core step is named DiscoverDimensions.

    Surrounding text describes image014.png.
  7. Create the new Transform Map by double-clicking on the DiscoverDimensions step and select the default, Tab-separated Input.

    Surrounding text describes image015.png.
  8. Add an input column and an output column as follows:

    Surrounding text describes image016.png.
  9. Expand the Add-In Functions, and drag over the Endeca.DiscoverDimensions, add it as a transformation to the Transform Map and connect the Transform Map steps as follows:

    Surrounding text describes image017.png.
  10. Configure the Endeca.DiscoverDimensions Add-In adapter by double clicking on the Endeca.DiscoverDimensions step and using the Parameters tab.

Any of the values in the Value column may be edited except the StartDimRange and EndDimRange values.

Endeca Project Parameters

Surrounding text describes image018.png.

For a DSA that is running on a separate server machine from the Endeca Server, an UNC pathname must be used as in the preceding example, or a mounted file system if in a Linux environment.

On the Endeca Server, share the project pipeline directory for access by the Endeca Connector as follows:

Surrounding text describes image019.png.

Be sure to set the permissions so that the Endeca Connector process can access this directory and has access rights to the files as well. In other words, the path to the project from the Endeca output adapters is:

\\endeca51Server\pipeline

DSA Project Parameters

Surrounding text describes image020.png.

Endeca Dimension Creation Parameters

Surrounding text describes image021.png.

Save and close the Transform Map.

Add an output step to the DSA.

Surrounding text describes image022.png.

Save the DSA.

Check-in the DSA to the Oracle DataLens Server to make it available for processing.

Surrounding text describes image023.png.

Configuring the Property Discovery

Create a new core step in your DSA called DiscoverProperties.

Add a new Endeca.DiscoverProperties Transform Add-In to the Transform Map by dragging the Add-In into our Transform Map and configuring, as previously described.

Surrounding text describes image024.png.

Endeca Property Creation Parameters

Now configure the Endeca.DiscoverProperties Transform Add-In using the Parameters tab.

Note:

The Endeca.* Parameters are the same as for Endeca.DiscoverDimensions.

Any of the values in the Value column may be edited.

Surrounding text describes image025.png.

Configuring Precedence Rules for the Entire Project

Create a new core step in your DSA called DiscoverPrecedence.

Add a new Endeca.DiscoverPrecedence Transform Add-In to the Transform Map by dragging the Add-In into your Transform Map and configuring, as previously described.

Surrounding text describes image026.png.

Endeca Precedence Rule Creation Parameters

Now configure the Endeca.DiscoverPrecedence Transform Add-In using the Parameters tab.

Any of the values in the Value column may be edited.

Surrounding text describes image027.png.

Determining the Parent (Source Dimension)

The first step is to determine which Dimension you would like as the source or Parent Dimension. This is the Dimension that will need to be selected to enable the activation of the other -generated Dimension in your Endeca-powered web site.

First, run the PDQ-Endeca Connector program to create the Dimensions.

Second, review the output from the Endeca Connector program and select the Dimension that you would like to use for the source. For example ”Product Category” as in the following example:

Note:

You could look at the dimensions in the Endeca Development Studio to obtain a source dimension that may not have been created.

PDQ-Endeca Connector Version 11.1.1.6.0, Build 20120804 Copyright (c) 2008, 2012,
 Oracle and/or its affiliates. All rights reserved.
Running on DataLens Administration server endeca01:2229
Extracted 197 distinct attributes from the PDQ-Endeca Connector.
Added 197 new Dimensions.
Accessory Component Quantity with Id of: 100001
…
Product Category with Id of: 100134
…
Wood Species with Id of: 100197
Wood Species with Id of: 100197

Save the DSA.

Check in the DSA to the Oracle DataLens Server to make it available for processing.

Creating Separate Precedence Rules for Each Data Lens

This operates the same as creating precedence rules as previously described.

The difference is that the PDQ.dataLens value will be set to the name of a valid data lens used in the main DSA. You would then need to create a separate step in the DSA for each Precedence mapping. Therefore, if you have 12 data lenses being used in the DSA, you would have 12 DiscoverPrecedence Transform Map steps to process all the rules.

Surrounding text describes image028.png.

In addition, another parameter is used when creating separate precedence rules for each individual data lens. This is the PDQ.GenerateRulesForAllAttributes. This will toggle on the creation of precedence rules for non-PDQ generated Dimensions if they are found in the particular data lens.

Following is an example DSA with precedence rule discovery being done for each individual data lens:

Surrounding text describes image029.png.

Note:

If you toggle on .VersionEndecaFiles, that this should only be done for the first of the multiple precedence rule steps to avoid creating unnecessary backup files.

Adding a Cleanup Step

This step will free up the dimension information from the DSA and the dimension information from the Endeca Project, including Dimensions from the Endeca External Dimensions file. This data is cached in the Oracle DataLens Server for speed in processing the multiple steps in the Endeca Connector Discovery jobs. The Endeca.CleanupDimensions step removes the data structures used by the Endeca Connector so that the dimensions and properties used in this discovery run are not used in subsequent discovery runs that may have added or deleted attributes in data lenses. This also frees the cached data so there is more memory available the server to use for processing.

Surrounding text describes image030.png.

Now, the top-level DSA should look like the following:

Surrounding text describes image031.png.

Adding a Project Versioning Step

This step versions all of the Endeca project files that are updated by the Endeca Connector. This is so that the project can be easily set back to the point where the project was at prior to running the Endeca Connector discovery DSA.

The following example shows a typical DSA Endeca Connector discovery job with the Endeca.VersionProject step added.

Surrounding text describes image032.png.

Following is the Endeca.VersionProject XFM map that is associated with the Version Project DSA Step.

Surrounding text describes image033.png.

There are only three parameters that are needed for the Endeca.VersionProject Transform Map Add-In step.

Surrounding text describes image034.png.

After running this step, you will see the following message in the Oracle DataLens Server Log file showing the file suffix that was appended to a copy of the project files.

PDQ-Endeca Connector  Versioned the Endeca project files with a suffix of 20121005104908

Removing Generated Values

Any of the Dimensions, Properties, or Precedence Rules can be deleted with a Deletion DSA as follows:

Surrounding text describes image035.png.

Note:

The Precedence Rules should be deleted before the Dimensions. This is because the Precedence Rules are built on the Dimensions that were created by the Endeca Connector.

The Properties have no dependencies and can be deleted at any step in the DSA.

The parameters for the delete functions still need the Endeca project information and just the flags for tracing and versioning. The start and end ranges cannot be changed.

Surrounding text describes image036.png.

Changing the Dimension Id Range Values Used by the Endeca Connector

The range values are set in the DSA as a hard-coded range from 100,000 to 200,000. These values should never be changed unless they conflict with values already being used by Endeca. These values cannot be change in the Endeca Connector DSA Add-Ins (the values in the DSA are read-only).

The two values that set the range are as follows:

  • Endeca.StartPdqDimRange

  • Endeca.EndPdqDimRange

Surrounding text describes start_end.png.

Note:

In the previous example, the values have been changed to 1,700,000 and 1,800,000.

To change these values do the following:

  1. Delete all the -generated precedence rules and -generated dimensions from the Endeca project using a delete DSA.

  2. Stop the Oracle DataLens Server.

  3. Edit the AddInTransformParameters.xml configuration file.

    This file resides in $EDDQ_HOME\config where $EDQP_HOME is the directory in which you installed the EDQP product.

  4. Change the default values for the Endeca.StartPdqDimRange and the Endeca.EndPdqDimRange in the four places each appears.

  5. Start the Oracle DataLens Server

  6. Recreate the DSAs used by the Endeca Connector attribute discovery and attribute delete Transform Add-Ins.

  7. Save and check-in each new DSA.

  8. Run the Discover attribute DSA on the Endeca project.