| Oracle® Retail Advanced Science Cloud Services Implementation Guide Release 17.0 E95345-03 | 
 | 
|  Previous |  Next | 
This chapter describes the Affinity Analysis application.
Affinity Analysis (AA) is used to gain insights into customer shopping patterns. A key component of AA is the process of Association Rule Mining (ARM). This process examines sales transaction data and identifies associations between types of products. Such information can help a retailer understand that promoting one product is sufficient to help drive sales of another product, given the sales associations they exhibit. The processing of these algorithms occurs each week as part of the weekly batch execution, and a set of output files are provided to expose the association rules that have been discovered by the process.
AA relies on the following data elements. These must be provided via text files, which are then loaded.
Table 12-1 Data Elements
| Object | Notes | Required/Optional | 
|---|---|---|
| Product Hierarchy | The ARM processing mainly operates at Sub Class, but it can be configured to different levels. | Required | 
| Location Hierarchy | Required | |
| Fiscal Calendar | Required | |
| Sales Transactions | Must contain transaction IDs as part of the data. If the transactions include Customer ID, then customer segment results are possible. | Required | 
| Customer Segments | Customer IDs and their association to a segment allows customer segment-specific results. | Optional | 
| MBA_ARM_SRVC_LOC_STG | Can be used to limit the scope of locations processed, or to specify a set of locations to exclude from processing. | Optional | 
In order to calculate association rules, it is necessary to receive sales transaction data that include a transaction ID. This is used to identify which products were purchased by a customer as part of a single transaction. If the customer transactions also include a customer ID to identify the customer who purchased the transaction, and a customer segment dimension is provided that links customer IDs to customer segments, then it is possible to provide some results for each customer segment.
When specifying which locations to process or which locations to not process, the MBA_ARM_SRVC_LOC_STG interface can be used to limit the scope of locations to be processed. The data in this interface can be at any level of the location hierarchy. A customer may want to limit the scope of locations for the following reasons.
Improve performance by only sampling some locations.
Exclude locations that contain many wholesale transactions, where the transactions contain data for more than a single customer.
Exclude locations that are experiencing a significant interruption to their normal sales pattern (for example, when undergoing a large scale renovation).
Exclude locations that normally do not include customer-linked transactions from the ARM_PH_CS implementation, since suitable data to include for processing will not be available.
The SRVC_NAME column of this interface allows the specification of the service that must be filtered. If, however, all executions must have the same set of locations, then this column can be provided as a NULL value. The effect will be to use the same dataset for all the services. If, however, it is necessary to have some services use a different set of locations, then it is possible to provide the data specific to the different services. If data is provided for a SRVC_NAME, then the data must be provided with a SRVC_NAME specified. The valid SRVC_NAME values that can be provided are: ARM_PH (Product Hierarchy results), ARM_PH_PROMO (Product Hierarchy with Promotions results), and ARM_PH_CS (Product Hierarchy and Customer Segment, with Promotions results).
This section describes the science algorithms and services.
This implementation calculates association rules for a configurable set of product hierarchy levels. It supports the creation of association rules for Sub Classes, Classes, and Department, which can be controlled by a system configuration. All system configurations that affect this algorithm exist in the RSE_CONFIG configuration table, and are named with "ARM_PH_" as the prefix. Because this implementation supports being run for multiple hierarchy levels, if there is a need to set a configuration uniquely for a specific hierarchy level, this can be accomplished via the RSE_CONFIG_CODE table using the hierarchy level name as the PARAM_CODE value. If no such row exists in RSE_CONFIG_CODE, then the configuration will be taken from the corresponding RSE_CONFIG row.
This implementation calculates association rules at the Sub Class level of the hierarchy and is restricted to only rules where the IF side of the rule is promoted and the THEN side of the rule is not promoted. In order to be able to execute this and have results for this implementation, it is necessary to provide promotion details with the sales transaction data. All system configurations that affect this algorithm exist in the RSE_CONFIG configuration table and are named with "ARM_PROMO_" as the prefix.
This implementation is used to focus on how products are associated when that promotion is in effect. This data can help a retailer understand the sales patterns that exist when promotions are involved, which can help the retailer avoid promoting too many items in an effort to help improve profit.
This implementation calculates association rules at the Sub Class level of the hierarchy and is restricted to rules where the IF side of the rule is promoted and the THEN side of the rule is not promoted. In order to be able to execute this and have results for this implementation, it is necessary to provide promotion details with the sales transaction data. All system configurations that affect this algorithm exist in the RSE_CONFIG configuration table and are named with "ARM_CS_" as a prefix.
This implementation provides the same type of information as the ARM_PH_PROMO implementation; however, it provides results that are specific to a customer segment. Therefore, this implementation requires the receipt of transactions that include the customer ID of the customer who purchased the transaction and the customer segment dimension, along with the association of the customers to each customer segment.
There is a consistent pattern in the naming of the configurations for the AA implementation. As described above, each implementation has a specific naming prefix. The suffixes are also similar across the implementations. These suffixes are described in more detail in Table 12-2.
Table 12-2 Implementation Suffixes
| Suffix | Example | Description | 
|---|---|---|
| HIER_LEVEL | SBC | Indicates the name of the hierarchy level that the process is to be executed for. The values here are the same values as provided as LEVEL_NAME values in the W_PROD_CAT_DH interface. Not applicable to ARM_PH. | 
| TOP_LEVEL | SBC | Indicates the highest level of the product hierarchy for which processing should be executed. Can contain SBC, CLS, or DEPT. Only applicable to ARM_PH. | 
| MIN_SUPPORT | .001 | Expresses the minimum percentage of transactions that are required to have the set of items in the same transaction. | 
| MIN_SUPPORT_TXN_CNT | 1000 | In the event that sales volume is low, this is another way to express the minimum number of sales transactions that are required for the set of items to be sold together. The implementation uses the greater of the two values. | 
| MIN_CONFIDENCE | .05 | The minimum confidence value as calculated by the rule mining algorithm for an association rule. | 
| MIN_REV_CONFIDENCE | .05 | The minimum confidence as calculated by reversing the placement of the numbers in the calculation. Setting this value higher can help prevent redundancy in the rule expressions where the IF and THEN items are transposed. | 
| MIN_LIFT | .05 | The minimum lift as calculated by the rule mining algorithm for an association rule. | 
| MAX_LIFT | 100 | The maximum lift as calculated by the rule mining algorithm for an association rule. | 
| MAX_SET_SIZE | 2 | The maximum number of hierarchy members to include in the resulting rules. The set size includes the count of both the IF and THEN components. The maximum allowed is four, although it can be an expensive to calculate that many components. | 
| MAX_RULE_COUNT | 9999 | The maximum number of rules that are retained per execution of the algorithm, per set size. This allows for the reduction of results to eliminate less important results. | 
| WEEK_CNT | 1 | The number of weeks that are processed when the execution runs. Care should be taken when changing this to more than one week, as this can negatively affect performance. | 
The results of the association rule mining can be obtained from two export interface files. One export interface contains summary information (mba_arm_run_exp) about the execution, along with various metrics that explain what the results are for. The second export interface file (mba_arm_result_exp) contains the details for each execution of the process. It is possible that a run may not contain any results to be exported. The data between the two interfaces can be joined to each other by the first column in each interface file (the RUN_ID).
In addition to metrics that quantify the rule (its frequency, its confidence, and its lift), the results also include sales values for the different components of the association rule. These sales values can help quantity the involved sales volume that is involved in the association rule.
Even if the weekly process that runs is executed for a single week each time, it is still possible to estimate the effects of the rule across multiple weeks by aggregating data across the weeks. The process for doing this requires locating the same product set across the different weeks within the same execution type. This means to join data in the mba_arm_result_exp interface by if_prod_ext_key1, if_prod_ext_key2, if_prod_ext_key3, if_promo_flg1, if_promo_flg2, if_promo_flg3, then_prod_ext_key, then_promo_flg, and the data in the mba_arm_run_exp interface by run_type, if_hier_level, then_hier_level, loc_ext_key, custseg_ext_key. The data between the mba_arm_run_exp and mba_arm_result_exp files are joined by the run_id.
Once this appropriate data has been gathered, the various sales metrics can be aggregated as needed. In order to calculate a new set of Frequency, Confidence, Lift, or Reverse Confidence values for a rule, it is possible to recalculate the values, as shown below. Note that in the these calculations, the following abbreviations are used: run = mba_arm_run_ext, result = mba_arm_result_exp.
Frequency = SUM(result.rule_txn_count) / SUM(run. tot_txn_cnt)
Confidence = SUM(result.rule_txn_cnt) / SUM(result.if_tot_txn_count)
Reverse Confidence = SUM(result.rule_txn_cnt) / SUM(result.then_tot_txn_count)
Lift = SUM(result.rule_txn_cnt) * SUM(run.tot_txn_cnt ) / SUM(result.if_tot_txn_count) / SUM(result.then_tot_txn_count