|Oracle® Enterprise Data Quality for Product Data AutoBuild Reference Guide
Part Number E23608-01
Oracle DataLens Server is built on industry-leading DataLens™ Technology to standardize, match, enrich, and correct product data from different sources and systems. The core DataLens Technology uses patented semantic technology designed from the ground up to tackle the extreme variability typical of product data.
Oracle Enterprise Data Quality for Product Data, formerly known as Oracle Product Data Quality, uses three core DataLens Technology modules: Governance Studio, Knowledge Studio, and Application Studio. The following figure illustrates the process flow of these modules.
The AutoBuild application is included in the Services for Excel additionally installed (add-in) product, which provides a custom toolbar that is added to Excel. For more information about this product, see Oracle Enterprise Data Quality for Product Data Services for Excel Reference Guide.
AutoBuild can rapidly leverage your existing product information using Enterprise DQ for Product (EDQP) Smart Glossaries to create initial data lenses specific to your enterprise content. For example, a company may already know that their inventory includes pens, pencils, magic markers, and highlighters and may already have example structured content that contain information about each of these products. This data knowledge can be translated from an Excel worksheet into a data lens by the AutoBuild Application, which can save a significant amount of effort and cost.
AutoBuild constructs the initial data lens by examining the product data examples. Given sufficient information, the AutoBuild application can accomplish the following:
Construct a full Item Definition hierarchy, complete with required, scoring, and optional attributes.
Construct rich term and phrase recognition rules.
Provide an initial set of classification rules.
The AutoBuild application significantly reduces the level of effort required to:
Create an initial data lens.
Align initial data lenses to any business domain.
Leverage the data lenses to automatically meet enterprise data requirements across a variety of internal and external data sources.
AutoBuild offers a familiar, easy-to-use graphical wizard interface to step you through the process from start to finish.
Enterprise DQ for Product Smart Glossaries are data lenses designed to be applied to a broad range of data domains. The Smart Glossaries delivered with the product address generally applicable recognition of materials, colors, and units of measure for example. Each Smart Glossary is designed to be imported into an existing lens, or can be used as the basis for creating a new lens. It is easy to combine Smart Glossaries to provide the combination of recognition for your domains of data. Smart Glossaries can be created and edited using the Knowledge Studio. For more information about Smart Glossaries, see Oracle Enterprise Data Quality for Product Data Knowledge Studio Reference Guide.
When AutoBuild creates a new data lens from your structured item content, a combination of Smart Glossaries is used as the basis for the generated lens.
The default Smart Glossary used by AutoBuild is named
DLS_Import_Template. This Smart Glossary is installed along with the other pre-packaged Smart Glossaries on the server. This Smart Glossary is a composite of glossaries from units of measure, counts, colors, materials and finishes, and product packaging.
Each time you create a data lens using Autobuild, you can select a different Smart Glossary to be used in the autobuild process. You can also create and configure your own combination of Smart Glossaries that are most applicable to your domain data for use by AutoBuild. Although, only one Smart Glossary can be applied when creating a data lens with AutoBuild, additional Smart Glossaries can be imported after the lens has been autobuilt and opened.
The Smart Glossary used by AutoBuild to generate a data lens is also known as a data lens template.
Note:The default Smart Glossary,
DLS_Import_Template, is automatically available. You must check out any other Smart Glossary you want to use as a template to generate your data lens, from your Oracle DataLens Server. For information on Smart Glossaries, see Oracle Enterprise Data Quality for Product Data Knowledge Studio Reference Guide.
The Smart Glossary used by AutoBuild provides all of the initial settings for the generated data lens thus it is the foundation. Standard data lens options found in the Knowledge Studio options are copied from the selected Smart Glossary to the new data lens. Transformation types are copied from the Smart Glossary to the generated data lens. This includes the classification types, standardization types, and match types. Unit Conversion rules are also copied to the generated data lens. The generated data lens will contain all of the phrases and terminology defined in the Smart Glossary.
AutoBuild merges all of the generated item definitions, attributes, and associated phrases and terms from the selected Smart Glossary with those generated from your meta data. If there is overlap between the Smart Glossary phrases and terms and the phrases and terms generated from the supplied structured item information, AutoBuild keeps the Smart Glossary phrases and terms and ignores the generated phrases and terms unless valuesets are being used. This avoids creating ambiguous terms and phrases in the data lens that can cause errors at run time. Additionally, AutoBuild prunes unnecessary rules so that there are no unintended collisions between rules.
AutoBuild can drill down into the structure of the generated phrases and replace any units of measure and count type phrase productions discovered in the structured item information with the Enterprise DQ for Product standard phrase structures from the Smart Glossary. The benefit of using the Smart Glossary version of these phrase structures is that the Smart Glossary phrase structures have already been standardized. In most cases, the unit conversion rules have already been applied, and the quality of the Smart Glossaries has been verified.
To use the AutoBuild application successfully, the following prerequisites must be met:
In order to make efficient use of the AutoBuild application, you must work with your meta data in the form of Excel worksheets that contain some or all of the following information about your company's products:
The categories that describe the item data you wish to process. The categories may be hierarchically grouped though this is not essential.
Examples of these attributes include color, weight, size, material, packaging, and so on.
Examples of those attribute values that are valid for your data.
You can expect maximum benefit from the AutoBuild application when your structured item examples contain attribute values that are expressed in full, unabbreviated form. The Enterprise DQ for Product can leverage these full-form examples and automatically recognize a broad range of term abbreviations and variations in phrasing.
The main sources of structured item information that can be used with AutoBuild are the following:
Item names and types
Item brand information
The following figure provides an example of each of the above types of structured item information from a sample data file:
Given the structure of the information in the preceding example, AutoBuild can generate a data lens similar to the following:
The source of structured item information could be any of the following:
Existing electronic product catalog information
Item master information
Product information management (PIM) structures
Category management or product marketing worksheets
Existing eCommerce site information
Structured examples used to define your current product categories
AutoBuild can use as much or as little structured information as you have available. If you only have a list of categories, AutoBuild creates a base data lens that contains a set of conforming Item Definitions from the provided categories. If you have attribute names associated with the categories, then AutoBuild can add attributes to the Lens Item Definitions. If you have attribute values associated with the attributes, then AutoBuild can create phrase and terminology (term) rules to recognize the attribute information and automatically associate the phrases with the correct attributes in the Item Definitions. AutoBuild makes the best use of any structured product information you provide.
AutoBuild works best when you provide a few example structured items per category. These example items should represent the product categories and attributes you want to capture in your enterprise data.
Product information exports can vary greatly between systems and database schemas. As a result, AutoBuild is designed to support a wide variety of product information export formats.
Multiple column category names (each category column represents one level in a classification hierarchy).
Multiple column category code/name pairs with the category columns grouped in pairs. The first column in the pair is a category code and the second column in the pair is a category name; each column pair represents one level in a classification hierarchy.
Single column category names (single level classification hierarchy).
Single column category names that are character separated (multiple level classification hierarchy with a character string separating each category name).
Single column UNSPSC Category Codes.
Attribute names listed in the same row with the category information.
Attribute name/value pairs listed in same row with the category information. A third column can be used to identify a valueset so that only the values present in the meta data worksheet are used.
Attribute name/value/unit of measure (UOM) triplets listed in same row with the category information.
Attributes in the same file as the category information; attribute names listed as the column headers.
Attributes in a separate file from the category information; attribute names listed as the column headers.
Multiple categories in a single worksheet with each category grouped as a distinct set of rows separated by a blank line from another category group. Attributes are in same file as the category information, attribute names listed as column headers.
Category and attribute information can repeat across multiple rows within the same worksheet. In addition, category and attribute information can be listed across multiple worksheets in a workbook.