As mentioned above, the loader component initiates the data loading process, but the actual processing of the data is performed by a processor pipeline. The pipeline is invoked by a pipeline driver component to which the loader passes batches of log entries. The pipeline driver calls a pipeline manager component, which controls the pipeline. The processors in the pipeline perform such tasks as looking up dimensional data in the warehouse; looking up profile, catalog, and order data in repositories on the production site; and writing data about each item in an order to the warehouse.

The pipeline driver components, like the data loader components, are located in the /atg/reporting/datawarehouse/loaders/ Nucleus folder. Each loader component has a pipelineDriver property that points to the pipeline driver it invokes. For example, the OrderSubmitLoader.properties file includes:

pipelineDriver=OrderSubmitPipelineDriver

All of the pipeline drivers point to the same pipeline manager component, /atg/reporting/datawarehouse/process/PipelineManager. This pipeline manager’s definition file, /atg/reporting/datawarehouse/process/pipeline.xml, defines several data loading processor chains. The specific chain invoked by an individual pipeline driver is specified by its pipelineChainId property. For example, the OrderSubmitPipelineDriver.properties file includes:

pipelineManager=../process/pipelineManager
pipelineChainId=submitOrder

The following table summarizes the data loaders, their pipeline drivers, and the processor chains they invoke:

Data Loader

Pipeline Driver

Processor Chain

OrderSubmitLoader

OrderSubmitPipelineDriver

submitOrder (which also runs lineItem)

ProductCatalogLoader

DimensionPipelineDriver

dimensionUpdate

SegmentLoader

SegmentPipelineDriver

segmentUpdate

SiteVisitLoader

SiteVisitPipelineDriver

siteVisit

UserUpdateLoader

DimensionPipelineDriver

dimensionUpdate

DimensionUpdate Pipeline

The dimensionUpdate pipeline chain is triggered by the ProductCatalogLoader and UserUpdateLoader components. The following diagrams show the processors in the chain for insert, update, and delete events.

To ensure that locks are acquired correctly for all processed items, this pipeline makes two passes. The first pass identifies which items require locks. The second pass acquires the locks and then makes all inserts and updates to the repository. When the second pipeline pass is finished, all locks are released.

The pipeline uses a map to pass data through the pipeline to each processor. Each processor can get items it needs from the map, add items it is responsible for creating to the map, and update items in the map if needed.

The processors in this pipeline are:

Mapping Production Properties to Data Warehouse Properties

Individual product catalogs and user profile repositories can vary widely from one customer site to another. However, the Data Warehouse is a fixed structure; therefore, a map is required. The mapping is done using an XML file with the structure shown below, which allows you to map between any ATG repository and the Data Warehouse for data loading purposes.

You must map each item and property in your product catalog that you want to report on to the Data Warehouse.

<data-warehouse-dimension-loader-definition>
  <data-warehouse-repository repository="path_to_warehouse_repository">
    <data-warehouse-repository-item item="item_name">
      <production-repository repository="path_to_production_repository">
          <production-repository-item item="item_name" natural-key="key"/>
      </production-repository>
      <property-mappings>
        <warehouse-property name="name" default-value="value">
         <production-property name="name"/>
        </warehouse-property>
      </property-mappings>
    </data-warehouse-repository-item>
  </data-warehouse-repository>
</data-warehouse-dimension-loader-definition>

The data-warehouse-repository names the destination repository and contains the destination repository items. The production-repository names the source repository and items.

<data-warehouse-dimension-loader-definition>
  <data-warehouse-repository
 repository="/atg/reporting/datawarehouse/WarehouseRepository">
    <data-warehouse-repository-item item="category"
          natural-key="categoryId">
      <production-repository
          repository="/atg/commerce/catalog/ProductCatalog"
          nickname="catalog">
        <production-repository-item item="category"/>
      </production-repository>

The property-mappings element contains the individual warehouse-properties to be mapped for a specific warehouse item.

In cases where there is a simple one-to-one mapping between the repository item property and the Data Warehouse item property, the production-property element identifies the repository item property which is mapped to the corresponding warehouse-property element. This example uses the user’s first name from the profile repository:

<property-mappings>
 <warehouse-property name="firstName">
  <production-property name="ProfileRepository.user.firstName"/>
 </warehouse-property>
</property-mappings>

In some cases, there is not a one-to-one mapping, so a converter component is used. Converters perform tasks such as:

For example, the AddressToRegionItemConverter component combines the user’s state and country into the region used by the Data Warehouse.

<warehouse-property name="homeRegion" conversion-
component="/atg/reporting/datawarehouse/process/converter/
   AddressToRegionItemConverter">
  <production-property name="ProfileRepository.user.homeAddress.state" conversion-
context-name="state"/>
  <production-proprerty name="ProfileRepository.user.homeAddress.country"
conversion-context-name="country"/>
</warehouse-property>
SiteVisit Pipeline

The siteVisit chain is triggered by the SiteVisitLoader component. This pipeline has no branches. Each processor, if successful, starts the one that follows.

Each processor uses a passed-in parameter retrieved from the log file to look up items in the Data Warehouse. For example, the lookupVisitor processor uses the profileId from the log file to look up the visitor in the ARF_USER table and return its ID, using an RQL query. If the visitor cannot be found, the processor attempts to load the visitor into ARF_USER table first, and then return the ARF_USER.ID. If this fails, the processor returns the “Unspecified” visitor. Similar patterns apply to the other lookup processors, although the algorithm varies.

The processors are:

SubmitOrder Pipeline

The submitOrder chain is triggered by the OrderSubmitLoader component. This pipeline has no branches. When it starts, the only information available to the pipeline is the order ID. Each processor in the pipeline, if successful, adds information to the data being processed and starts the next processor.

The processors are:

Note: The allocateTax, allocateShipping, and allocateOrderDiscount processors can be replaced with processors that use a uniform rather than a weighted allocation strategy. ATG provides a sample component for this purpose (the Nucleus location is /atg/reporting/datawarehouse/process/allocators/UniformLineItemAllocator), or you can write your own processor that implements the atg.reporting.datawarehouse.commerce.LineItemAllocator interface. See the Processor Chains and the Pipeline Manager chapter in this guide for information on editing pipeline chains.

The runLineItem pipeline includes the following processors:

  • lookupProduct—Uses the product ID in the order to look up the product in the ARF_PRODUCT table via the NPRODUCT_ID column.

  • lookupSku—Uses the SKU ID associated with the order to look up the SKU in the ARF_SKU table, using the NSKU_ID column.

  • lookupCategory—Uses the PARENT_CAT_ID of the product to find the category.

  • listOrderStimuli—Retrieves a list of stimuli from markers in the order.

  • lookupStimulusGroup—Uses all stimuli in the pipeline to look the stimulus group in the ARF_STIMGRP table. Computes a hash code for all stimuli and uses the HASH_VALUE column to look up the group.

  • lookupShippingRegion—Uses the order’s shipping address to find the region to which the line item is shipped.

  • lookupQuestion—If the DCS.DW.Search module is running, runs an additional pipeline chain that determines what search fact, if any, was associated with this line item, and links the two in the ARF_LINE_ITEM_QUERY table. If DCS.DW.Search is not running, the question is “unspecified.”

  • logLineItem—Writes the line item to the Data Warehouse ARF_LINE_ITEM table.

  • tailLineItemProcessor—If the DCS.DW.Search module is running, starts the LineItemQuery pipeline and its logLineItemQuery processor,, which logs Commerce Search data for each line item in the ARF_LINE_ITEM_QUERY table. If DCS.DW.Search is not running, does nothing.

SegmentUpdate Pipeline

The segmentUpdate chain is triggered by the SegmentLoader component. This pipeline chain consists of a single processor, /atg/reporting/datawarehouse/process/SegmentLoadProcessor.

This processor handles insert, update and delete operations from the segment log files. Unlike the SiteVisit and SubmitOrder pipelines, it does not perform any lookups on the production system.

Each time the SegmentLoadProcessor component is invoked, it performs the following steps:

 
loading table of contents...