Pipeline Drivers and Processors

As mentioned above, the loader component initiates the data loading process, but the actual processing of the data is performed by a processor pipeline. The pipeline is invoked by a pipeline driver component to which the loader passes batches of log entries. The pipeline driver calls a pipeline manager component, which controls the pipeline. The processors in the pipeline perform such tasks as looking up dimensional data in the warehouse; looking up profile, catalog, and order data in repositories on the production site; and writing data about each item in an order to the warehouse.

The pipeline driver components, like the data loader components, are located in the /atg/reporting/datawarehouse/loaders/ Nucleus folder. Each loader component has a pipelineDriver property that points to the pipeline driver it invokes. For example, the OrderSubmitLoader.properties file includes:

pipelineDriver=OrderSubmitPipelineDriver

All of the pipeline drivers point to the same pipeline manager component, /atg/reporting/datawarehouse/process/PipelineManager. This pipeline manager’s definition file, /atg/reporting/datawarehouse/process/pipeline.xml, defines several data loading processor chains. The specific chain invoked by an individual pipeline driver is specified by its pipelineChainId property. For example, the OrderSubmitPipelineDriver.properties file includes:

pipelineManager=../process/pipelineManager
pipelineChainId=submitOrder

The following table summarizes the data loaders, their pipeline drivers, and the processor chains they invoke:

Data Loader	Pipeline Driver	Processor Chain
`OrderSubmitLoader`	`OrderSubmitPipelineDriver`	`submitOrder` (which also runs `lineItem`)
`ProductCatalogLoader`	`DimensionPipelineDriver`	`dimensionUpdate`
`SegmentLoader`	`SegmentPipelineDriver`	`segmentUpdate`
`SiteVisitLoader`	`SiteVisitPipelineDriver`	`siteVisit`
`UserUpdateLoader`	`DimensionPipelineDriver`	`dimensionUpdate`

DimensionUpdate Pipeline

The dimensionUpdate pipeline chain is triggered by the ProductCatalogLoader and UserUpdateLoader components. The following diagrams show the processors in the chain for insert, update, and delete events.

Pipeline Processor Chain for an Insert Event

Pipeline Processor Chain for Delete Events

Pipeline Processor Chain for Update Events

To ensure that locks are acquired correctly for all processed items, this pipeline makes two passes. The first pass identifies which items require locks. The second pass acquires the locks and then makes all inserts and updates to the repository. When the second pipeline pass is finished, all locks are released.

The pipeline uses a map to pass data through the pipeline to each processor. Each processor can get items it needs from the map, add items it is responsible for creating to the map, and update items in the map if needed.

The processors in this pipeline are:

AcquireLocksProcessor—Runs only on the second pipeline pass, when it makes a call on the lock manager to get the locks identified as necessary during the first pass of the pipeline.
WarehouseItemLookupProcessor—In both pipeline passes, looks up the most recent item in the Data Warehouse, using the itemId from the map.
ChangeTypeProcessor—In both pipeline passes, determines the type of record change indicated by the log record. If the change type is insert or delete, the next processor that executes is the WarehouseItemExistsProcessor. If the change type is update, the next processor is ProductionItemLookupProcessor.
WarehouseItemExistsProcessor—In both pipeline passes, determines whether the item being processed already exists in the warehouse. For delete actions, if the item exists, the next processor that executes is the UpdateDeletedItemProcessor; if not, it is InsertDummyItemProcessor. For insert actions, if the item does not exist, the next processor is the ProductionItemLookupProcessor.
InsertDummyItemProcessor—In the first pass, requests a lock for the item to be inserted. In the second pass, creates a dummy item and adds it to the warehouse repository.
UpdateDeletedItemProcessor—In the first pass, requests a lock for the item to be updated. In the second pass, for delete actions, this processor sets the endDate and lastUpdateDate to the event date, and deleted to true. This is the end of the processor pipeline for delete events.
ProductionItemLookupProcessor—In the first processor pass for updates, looks up the item in the production repository. If the item exists, the next processor called is the ItemConversionProcessor. If it does not, the chain ends. In the second pass, this processor only does a lookup if the first lookup was unsuccessful.
ItemConversionProcessor—Converts production items into warehouse items, using the mapping values described in Mapping Production Properties to Data Warehouse Properties. If the conversion finds a reference to a non-existent warehouse item, this processor creates that item by calling the pipeline chain recursively.
InsertNewItemProcessor—In the first pass requests a lock for the item to be updated. In the second pass, creates the new warehouse item.
ConvertUpdateToInsertProcessor—Takes an update and makes changes necessary to allow an insert into the Data Warehouse. Sets the eventDate to null, since it is not known when the item was actually inserted into the product catalog. The change type is left as update. This processor performs the conversion only on the second pass, since the item may be added by another loader between the first and second passes.
CompareEventAndLastUpdatedDatesProcessor—In both pipeline passes, this processor looks at the time of an update and the last update time of the Data Warehouse item and compares them, to see if any changes that caused the event have already been put into the Data Warehouse record.
DiffProcessor—Compares the values in the production item to the values in the Data Warehouse. This processor only does its work in the second pass.
UpdateItemProcessor—In the first pass this processor requests a lock for the item to be updated. On the second, it updates the Data Warehouse with changes.

Mapping Production Properties to Data Warehouse Properties

Individual product catalogs and user profile repositories can vary widely from one customer site to another. However, the Data Warehouse is a fixed structure; therefore, a map is required. The mapping is done using an XML file with the structure shown below, which allows you to map between any ATG repository and the Data Warehouse for data loading purposes.

You must map each item and property in your product catalog that you want to report on to the Data Warehouse.

<data-warehouse-dimension-loader-definition>
  <data-warehouse-repository repository="path_to_warehouse_repository">
    <data-warehouse-repository-item item="item_name">
      <production-repository repository="path_to_production_repository">
          <production-repository-item item="item_name" natural-key="key"/>
      </production-repository>
      <property-mappings>
        <warehouse-property name="name" default-value="value">
         <production-property name="name"/>
        </warehouse-property>
      </property-mappings>
    </data-warehouse-repository-item>
  </data-warehouse-repository>
</data-warehouse-dimension-loader-definition>

The data-warehouse-repository names the destination repository and contains the destination repository items. The production-repository names the source repository and items.

<data-warehouse-dimension-loader-definition>
  <data-warehouse-repository
 repository="/atg/reporting/datawarehouse/WarehouseRepository">
    <data-warehouse-repository-item item="category"
          natural-key="categoryId">
      <production-repository
          repository="/atg/commerce/catalog/ProductCatalog"
          nickname="catalog">
        <production-repository-item item="category"/>
      </production-repository>

The property-mappings element contains the individual warehouse-properties to be mapped for a specific warehouse item.

In cases where there is a simple one-to-one mapping between the repository item property and the Data Warehouse item property, the production-property element identifies the repository item property which is mapped to the corresponding warehouse-property element. This example uses the user’s first name from the profile repository:

<property-mappings>
 <warehouse-property name="firstName">
  <production-property name="ProfileRepository.user.firstName"/>
 </warehouse-property>
</property-mappings>

In some cases, there is not a one-to-one mapping, so a converter component is used. Converters perform tasks such as:

Combining multiple repository item properties into a single warehouse item property
Converting the data of the property before placing it in the Data Warehouse
Looking up parent objects for SKUs, products, and categories, where there may be multiple values to select from

For example, the AddressToRegionItemConverter component combines the user’s state and country into the region used by the Data Warehouse.

<warehouse-property name="homeRegion" conversion-
component="/atg/reporting/datawarehouse/process/converter/
   AddressToRegionItemConverter">
  <production-property name="ProfileRepository.user.homeAddress.state" conversion-
context-name="state"/>
  <production-proprerty name="ProfileRepository.user.homeAddress.country"
conversion-context-name="country"/>
</warehouse-property>

SiteVisit Pipeline

The siteVisit chain is triggered by the SiteVisitLoader component. This pipeline has no branches. Each processor, if successful, starts the one that follows.

Each processor uses a passed-in parameter retrieved from the log file to look up items in the Data Warehouse. For example, the lookupVisitor processor uses the profileId from the log file to look up the visitor in the ARF_USER table and return its ID, using an RQL query. If the visitor cannot be found, the processor attempts to load the visitor into ARF_USER table first, and then return the ARF_USER.ID. If this fails, the processor returns the “Unspecified” visitor. Similar patterns apply to the other lookup processors, although the algorithm varies.

The processors are:

lookupSiteVisitDay—Uses the session start timestamp in the log file to determine the starting day of the visit.
lookupSiteVisitTime—Uses the session start timestamp in the log file to determine the starting time of the visit.
lookupSiteVisitEndDay—Uses the session end timestamp in the log file to determine the ending day of the visit.
lookupSiteVisitEndTimelookupSiteVisitEndDay—Uses the session end timestamp in the log file to determine the ending time of the visit.
lookupVisitor—Uses the profile ID in the log file to look up the visitor in the ARF_USER table.
lookupSiteVisitStimulusGroup—Reserved for future use. Currently returns the “Unspecified” stimulus group.
lookupDemographic—Examines the visitor’s date-of-birth, gender, marital status and home region as defined in the Data Warehouse. It uses this information to look up the record in the ARF_DEMOGRAPHIC table that classifies this user.
logSiteVisit—Writes the row to the ARF_SITE_VISIT table in the Data Warehouse.

SubmitOrder Pipeline

The submitOrder chain is triggered by the OrderSubmitLoader component. This pipeline has no branches. When it starts, the only information available to the pipeline is the order ID. Each processor in the pipeline, if successful, adds information to the data being processed and starts the next processor.

The processors are:

fetchOrder—Uses the order ID to look up the order in the order repository.
lookupOrder—If line items exist in the data warehouse for the current order ID, this processor fetches those line items and creates a parameter map entry for them.
checkOrderExists—Acts as a switch in the pipeline. If the warehouseItemPropertyNamevalue pair exists in the parameter map, then the line items for the order already exist, and the current log record does not need to be processed. If the map entry does not exist then the pipeline processes the log entry.
createOrderId—Generates a surrogate key for the order.
createLineItems—Breaks the order down into line items. If a single line item is shipped to multiple addresses, each address is considered a separate line item.
calculateLineItemsGrossRevenue—The quantity of that particular line item, times the unit price of the item.
allocateTax—Allocates the order’s tax amount to the individual line items, using a weighted calculation. If rounding results in the sum of the line item allocations being more or less than the original total, the tax on the final line item is used to make any adjustments. The calculation is:
(line item gross revenue/total order gross revenue) * total tax
allocateShipping—Allocates the order’s shipping amount to the individual line items, using a weighted calculation. If rounding results in the sum of the line item allocations being more or less than the original total, the shipping on the final line item is used to make any adjustments. The calculation is:
(line item gross revenue/total order gross revenue) * total shipping
allocateOrderDiscount—Allocates order-level discounts (such as free shipping) to individual line items using a weighted calculation. If rounding results in the sum of the line item allocations being more or less than the original total, the shipping on the final line item is used to make any adjustments. The calculation is:
(line item gross revenue/total order gross revenue) * total discount
calculateManualAdjustments—Calculates the manual adjustment credits and debits for the entire order. This processor is used only for ATG Commerce Service Center.
allocateManualAdjustmentsDebit—Allocates manual adjustment debits to line items. This processor is used only for ATG Commerce Service Center.
allocateManualAdjustmentsCredit—Allocates manual adjustment credits to line items. This processor is used only for ATG Commerce Service Center.
calculateLineItemsDiscountAmount—Totals any discounts applied to the line items in the order.
calculateLineItemsMarkdownDiscountAmount—Calculates the markdown discount. That discount is basically any discounts that apply before promotionally discounts are applied. For example, if someone buys a package of hot dogs on sale (no promotions) for $4.50 when the list price for the package of hot dogs is $5.00 then the markdown discount is $.50.
calculateLocalOrderDiscountAmountTotal—Calculates the total order discount amount (order discount amount plus manual adjustment credits) in the local currency.
calculateLineItemsPriceOverride—Captures price override amounts. This processor comes into use only if you are using ATG Commerce Service Center.
calculateLineItemNetRevenue—Calculates net revenue for individual line items using the following formula:
Gross revenue + tax allocation + shipping allocation – line item discount - order discount allocation
calculateOrderNetRevenue—Sums the net revenue of all line items. This information is stored with the line item in the Data Warehouse, but is the same for all items in an order.
calcluateLocalOrderNetRevenueTotal—Sum of the order net revenue plus manual adjustment debits in the local currency.
convertCurrency—For all of the amounts in the pipeline to this point (gross net revenue, line item shipping allocation, etc.), converts the local currency to the standard currency. The standardCurrencyCode property of the processor points to the /atg/reporting/datawarehouse/CommerceWarehouseConfiguration.standardCurrencyCode, which identifies the standard currency; the default is USD.
The actual conversion is done by the CurrencyTools component. CurrencyTools uses the ARF_CURRENCY_CONV table in the Data Warehouse to look up the conversation rate for the day on which the order was placed, if one is available.
Note: ATG does not populate the ARF_CURRENCY_CONV table. Customers are responsible for acquiring and maintaining currency conversion rate data.
lookupBillingRegion—Uses the billing address in the order to determine the billing region.
lookupSiteVisit—Looks up visit information in the ARF_SITE_VISIT table using the SESSION_ID, START_DATE, and END_DATE columns.
lookupInternalUser—Identifies the agent involved with the order. If not using ATG Commerce Service Center, this value is set to Unspecified.
lookupOriginOfOrder—Identifies the order’s place of creation. Orders can originate either from Web or Contact Center. If the order is created from a scheduled order template and the template is created by the Commerce site, the origin is Web. If the template is for the Commerce Service Center, the origin is Contact Center.
lookupSalesChannel—Identifies the sales channel through which the order was submitted. The sales channel for an order can be either Web or Contact Center. If the order is submitted from the Commerce site, the sales channel is Web. If the order is submitted from Commerce Service Center, the origin is Contact Center.
lookupDay—Uses the timestamp of the order to look up the day in the ARF_TIME_DAY table.
lookupTime—Uses the timestamp of the order to look up the time in the ARF_TIME_TOD table.
lookupLocalCurrency—Looks ups the currency used in the transaction in the ARF_CURRENCY, using the ISO4217_ALPHA3 code for the currency.
lookupCustomer—Uses the profile ID in the order to look up the customer in the ARF_USER table.
lookupCustomerDemographic—Examines the visitor’s date-of-birth, gender, marital status and home region as defined in the Data Warehouse. It uses this information to look up the record in the ARF_DEMOGRAPHIC table that classifies this user.
lookupPromotionGroup—Examines the order for any promotions that were used and uses this information to look up a promotion group.
lookupSegmentCluster— Examines the order for segment clusters and looks up any associated with the order.
runLineItems—Runs the runLineItem pipeline chain for each line item in the order.

Note: The allocateTax, allocateShipping, and allocateOrderDiscount processors can be replaced with processors that use a uniform rather than a weighted allocation strategy. ATG provides a sample component for this purpose (the Nucleus location is /atg/reporting/datawarehouse/process/allocators/UniformLineItemAllocator), or you can write your own processor that implements the atg.reporting.datawarehouse.commerce.LineItemAllocator interface. See the Processor Chains and the Pipeline Manager chapter in this guide for information on editing pipeline chains.

The runLineItem pipeline includes the following processors:

lookupProduct—Uses the product ID in the order to look up the product in the ARF_PRODUCT table via the NPRODUCT_ID column.
lookupSku—Uses the SKU ID associated with the order to look up the SKU in the ARF_SKU table, using the NSKU_ID column.
lookupCategory—Uses the PARENT_CAT_ID of the product to find the category.
listOrderStimuli—Retrieves a list of stimuli from markers in the order.
lookupStimulusGroup—Uses all stimuli in the pipeline to look the stimulus group in the ARF_STIMGRP table. Computes a hash code for all stimuli and uses the HASH_VALUE column to look up the group.
lookupShippingRegion—Uses the order’s shipping address to find the region to which the line item is shipped.
lookupQuestion—If the DCS.DW.Search module is running, runs an additional pipeline chain that determines what search fact, if any, was associated with this line item, and links the two in the ARF_LINE_ITEM_QUERY table. If DCS.DW.Search is not running, the question is “unspecified.”
logLineItem—Writes the line item to the Data Warehouse ARF_LINE_ITEM table.
tailLineItemProcessor—If the DCS.DW.Search module is running, starts the LineItemQuery pipeline and its logLineItemQuery processor,, which logs Commerce Search data for each line item in the ARF_LINE_ITEM_QUERY table. If DCS.DW.Search is not running, does nothing.

SegmentUpdate Pipeline

The segmentUpdate chain is triggered by the SegmentLoader component. This pipeline chain consists of a single processor, /atg/reporting/datawarehouse/process/SegmentLoadProcessor.

This processor handles insert, update and delete operations from the segment log files. Unlike the SiteVisit and SubmitOrder pipelines, it does not perform any lookups on the production system.

Each time the SegmentLoadProcessor component is invoked, it performs the following steps:

Gets a reference to the most recent Data Warehouse entry for the segment referred to in the log entry.
Determines what sort of change is to be applied (insert, update, or delete).
If an insert action and the item is not already in the database, the processor creates a new item. If the item already exists, the processor checks to see if a start date is specified. If not, which will typically be the case if an update or delete action is processed before an insert action, the start date of the existing item is set. If the start date has already been set, meaning there has been a prior insert action for the segment, then the current item is marked as expired and a new item is created.
If an update action, if no matching item is found, creates a new item.
If a delete action, if no matching item is found, a new item is created and marked as deleted.
Applies changes in the item to the repository.

ATG Commerce Programming Guide

Pipeline Drivers and Processors

DimensionUpdate Pipeline

Mapping Production Properties to Data Warehouse Properties

SiteVisit Pipeline

SubmitOrder Pipeline

SegmentUpdate Pipeline