|Oracle® Enterprise Data Quality for Product Data Endeca Connector Installation and User's Guide
Release 11g R1 (220.127.116.11)
|PDF · Mobi · ePub|
This chapter describes how to set up and configure the Forge process to use the Endeca Connector jar files.
This section explains the following:
The Endeca Connector Adapter is the part of the Endeca pipeline that dynamically calls EDQP to retrieve the data lens attributes for each line of input data. These attributes are then mapped to dimensions in Endeca. The Endeca Connector Adapter must be added to your pre-existing pipeline process.
Add the Endeca Connector Adapter to the pipeline:
Right-click on the ”Pipeline Diagram” and select the Java Manipulator.
Name the adapter
This name is important because it provides validation with the Endeca Connector attribute discovery to verify servers, DSAs, and standardizations.
Add the Class as:
Add the pathnames to the Endeca Connector (
PdqAdapter) libraries as in the following example.
/Endeca/edqp/lib/opdq-api.jar /Endeca/edqp/lib/jdom-1.0.jar /Endeca/edqp/lib/opdq-core.jar /Endeca/edqp/opdq-connector-endeca.jar
The pathnames must match the installed location you created in "Installing the Endeca Connector". These directories may be different than the example if you installed the Endeca Connector files into a different directory.
In a separate server installation where the Oracle DataLens Server is installed on a separate machine than the Endeca IAP server, the files in the
$MW_HOME\edqp_template1\opdq-connector-endeca directory must be moved to the Endeca IAP server machine.
A Windows installation requires the use of a semi-colon ";" as the separator between the paths for each jar file; however, a Linux installation requires the use of a colon ":" as the separator. For example, the Windows path may be,
As in the following example:
Click the Pass Throughs tab and add the following:
PDQ_SERVER_1 - This is the name or IP address of the Oracle DataLens Production server and the port (
PDQ_SERVER_2 - This is an optional server for use in high availability and load balancing. For more information, see Appendix C, "Endeca Connector Robustness."
PDQ_SERVER_3 - This is an optional server for use in high availability and load balancing (note that there is no limit to the number of
DSA_MAP - This is the top-level DSA to call on the Oracle DataLens Server.
DSA_OUTPUT_STEP - This is an optional step name if the DSA has multiple outputs.
LOCALE - This is the input locale of the data.
BATCH_SIZE - The number of records to process in a single chunk.
PROPERTY_ID - The name of the ID field in the input data.
PROPERTY_ROUTE_INFO - The ”hint” used to efficiently route the data.
PROPERTY_DESC1 - The name of the first description field in the input data.
PROPERTY_DESC2 - The name of the second description field in the input data.
PROPERTY_ALT1 - 1st alternate data field (
PROPERTY_ALT2 - 2nd alternate data field (
PROPERTY_ALT3 - 3rd alternate data field (user-defined).
RETURN_VAL1 - 1st return value from the DSA (after the ID).
RETURN_VAL2 - 2nd return value from the DSA.
RETURN_VAL3 - 3rd return value from the DSA.
REPLACE_UNDERSCORES_ONLY - A value of true will not proper case the attribute names, false will proper case the attributes. In either case, the underscores will be replaced with spaces.
This parameter is used by both the Endeca Connector Discovery processes and the
USE__PREFIX - A value of true will put a ”PDQ” prefix on all the attributes discovered; false will not.
This parameter is used by both the Endeca Connector Discovery processes and the
DEBUG - This toggles (
false) debug tracing to a log file on or off. This should only be turned on when debugging, because it will slow down the Endeca Connector Adapter. The following log file is created in the Endeca project directory:
PdqAdapter into the pipeline flow between the load data nodes and the Property Mapper.
PdqAdapter directly below the LoadData step in the Endeca Pipeline flow to ensure that the Endeca Connector Adapter only calls the Oracle DataLens Server a single time for each line of data to be processed as in the following:
Save your project.
Several of the Endeca Connector Discovery Add-In Functions share some of the pass through parameters from the
PdqAdapter. The Add-In Functions are:
This simplifies the configuration and keeps the
PdqAdapterPdqAdapter in sync with the DSA.
PdqAdapter pass through parameters are those that are shared:
The Endeca Connector looks at the DSA that is being run as part of the Forge processing, and then determines exactly which data lenses are being called and exactly which standardizations are being run and exactly which attributes are being extracted. This unique list of attributes is then used to update the internal Endeca Dimensions and Properties by updating the configuration files used in the Endeca Project.
This section explains the following:
These sections creates the DSA that is necessary to the Forge processing and are progressive.
Start your Oracle DataLens Server.
Open one of the supported Web browsers for your environment. See the Oracle Enterprise Data Quality for Product Data Certification Matrix at
Locate Oracle Enterprise Data Quality in the Product Area column and then click the System Requirements and Supported Platforms for Oracle Enterprise Data Quality for Product Data 11gR1 (11.1.1.x) Certification Matrix (xls) link.
Enter the following URL:
where hostname is the DNS name or IP address of the Administration Server and port is the listen port on which the Administration Server is listening for requests (port 2229 by default).
If you configured the Administration Server to use Secure Socket Layer (SSL) you must add s after http as follows:
When the login page appears, enter a user name and the password. Typically, this is the user name and password you specified during the installation process.
The Oracle DataLens Server Web pages are displayed and default to the Welcome tab.
Click the Application Studio button.
Create a new project, add an input step and a core step. In this example, the core step is named DiscoverDimensions.
Create the new Transform Map by double-clicking on the DiscoverDimensions step and select the default, Tab-separated Input.
Add an input column and an output column as follows:
Expand the Add-In Functions, and drag over the Endeca.DiscoverDimensions, add it as a transformation to the Transform Map and connect the Transform Map steps as follows:
Configure the Endeca.DiscoverDimensions Add-In adapter by double clicking on the Endeca.DiscoverDimensions step and using the Parameters tab.
Any of the values in the Value column may be edited except the
For a DSA that is running on a separate server machine from the Endeca Server, an UNC pathname must be used as in the preceding example, or a mounted file system if in a Linux environment.
On the Endeca Server, share the project pipeline directory for access by the Endeca Connector as follows:
Be sure to set the permissions so that the Endeca Connector process can access this directory and has access rights to the files as well. In other words, the path to the project from the Endeca output adapters is:
Create a new core step in your DSA called DiscoverProperties.
Add a new Endeca.DiscoverProperties Transform Add-In to the Transform Map by dragging the Add-In into our Transform Map and configuring, as previously described.
Create a new core step in your DSA called DiscoverPrecedence.
Add a new Endeca.DiscoverPrecedence Transform Add-In to the Transform Map by dragging the Add-In into your Transform Map and configuring, as previously described.
Now configure the Endeca.DiscoverPrecedence Transform Add-In using the Parameters tab.
Any of the values in the Value column may be edited.
The first step is to determine which Dimension you would like as the source or Parent Dimension. This is the Dimension that will need to be selected to enable the activation of the other -generated Dimension in your Endeca-powered web site.
First, run the PDQ-Endeca Connector program to create the Dimensions.
Second, review the output from the Endeca Connector program and select the Dimension that you would like to use for the source. For example ”Product Category” as in the following example:
You could look at the dimensions in the Endeca Development Studio to obtain a source dimension that may not have been created.
PDQ-Endeca Connector Version 18.104.22.168.0, Build 20120804 Copyright (c) 2008, 2012, Oracle and/or its affiliates. All rights reserved. Running on DataLens Administration server endeca01:2229 Extracted 197 distinct attributes from the PDQ-Endeca Connector. Added 197 new Dimensions. Accessory Component Quantity with Id of: 100001 … Product Category with Id of: 100134 … Wood Species with Id of: 100197 Wood Species with Id of: 100197
Save the DSA.
Check in the DSA to the Oracle DataLens Server to make it available for processing.
This operates the same as creating precedence rules as previously described.
The difference is that the
PDQ.dataLens value will be set to the name of a valid data lens used in the main DSA. You would then need to create a separate step in the DSA for each Precedence mapping. Therefore, if you have 12 data lenses being used in the DSA, you would have 12 DiscoverPrecedence Transform Map steps to process all the rules.
In addition, another parameter is used when creating separate precedence rules for each individual data lens. This is the
PDQ.GenerateRulesForAllAttributes. This will toggle on the creation of precedence rules for non-PDQ generated Dimensions if they are found in the particular data lens.
Following is an example DSA with precedence rule discovery being done for each individual data lens:
If you toggle on
.VersionEndecaFiles, that this should only be done for the first of the multiple precedence rule steps to avoid creating unnecessary backup files.
This step will free up the dimension information from the DSA and the dimension information from the Endeca Project, including Dimensions from the Endeca External Dimensions file. This data is cached in the Oracle DataLens Server for speed in processing the multiple steps in the Endeca Connector Discovery jobs. The Endeca.CleanupDimensions step removes the data structures used by the Endeca Connector so that the dimensions and properties used in this discovery run are not used in subsequent discovery runs that may have added or deleted attributes in data lenses. This also frees the cached data so there is more memory available the server to use for processing.
Now, the top-level DSA should look like the following:
This step versions all of the Endeca project files that are updated by the Endeca Connector. This is so that the project can be easily set back to the point where the project was at prior to running the Endeca Connector discovery DSA.
The following example shows a typical DSA Endeca Connector discovery job with the Endeca.VersionProject step added.
Following is the Endeca.VersionProject XFM map that is associated with the Version Project DSA Step.
There are only three parameters that are needed for the Endeca.VersionProject Transform Map Add-In step.
After running this step, you will see the following message in the Oracle DataLens Server Log file showing the file suffix that was appended to a copy of the project files.
PDQ-Endeca Connector Versioned the Endeca project files with a suffix of 20121005104908
Any of the Dimensions, Properties, or Precedence Rules can be deleted with a Deletion DSA as follows:
The Precedence Rules should be deleted before the Dimensions. This is because the Precedence Rules are built on the Dimensions that were created by the Endeca Connector.
The Properties have no dependencies and can be deleted at any step in the DSA.
The parameters for the delete functions still need the Endeca project information and just the flags for tracing and versioning. The start and end ranges cannot be changed.
The range values are set in the DSA as a hard-coded range from 100,000 to 200,000. These values should never be changed unless they conflict with values already being used by Endeca. These values cannot be change in the Endeca Connector DSA Add-Ins (the values in the DSA are read-only).
The two values that set the range are as follows:
In the previous example, the values have been changed to 1,700,000 and 1,800,000.
To change these values do the following:
Delete all the -generated precedence rules and -generated dimensions from the Endeca project using a delete DSA.
Stop the Oracle DataLens Server.
AddInTransformParameters.xml configuration file.
This file resides in
$EDQP_HOME is the directory in which you installed the EDQP product.
Change the default values for the Endeca.StartPdqDimRange and the Endeca.EndPdqDimRange in the four places each appears.
Start the Oracle DataLens Server
Recreate the DSAs used by the Endeca Connector attribute discovery and attribute delete Transform Add-Ins.
Save and check-in each new DSA.
Run the Discover attribute DSA on the Endeca project.