B Configuring the Cluster Procedure

Clustering may be used to provide insight into how various parts of a retailer's operations can be grouped together. Typically a retailer may cluster stores over item sales to create logical groupings of stores based upon sales of particular products. This provides increased visibility to where products are selling, and it allows the retailer to make more accurate decisions in merchandising. Beyond this traditional use of clusters, the Cluster is flexible enough to cluster any business measure based on products, locations, time, promotions, customers, or any hierarchy configured in the solution.

Note:

The syntax is slightly different than the standard RPAS functions and procedures that are described in the ”Rule Functions Reference Guide” section of the Oracle Retail Predictive Application Server Configuration Tools User Guide.

The two approaches available for clustering are BreakPoint and Cluster, or the BaNG approach. See the Oracle Retail Grade User Guide for details on these two approaches. The following sections explain the specifics for configuring clustering.

Cluster Requirements

The following libraries must be registered in any domains that will use the Cluster solution extension:

AppFunctions
ClusterEngine

Using the Cluster Procedure

The following notes are intended to serve as a guide for configuring the Cluster procedure within the RPAS Configuration Tools:

See the section, "Syntax Conventions", for the appropriate syntax for calling this procedure. Parameter labels must always be used.
If the ClusterEngine is not registered with the Configuration Tools, this rule will remain red, which indicates that it is invalid because the RPAS JNI cannot validate it at this point in time. Therefore, there is no validation for this rule. Refer to the Grade documentation for the appropriate input parameters and output measures. Make sure to register the ClusterEngine with the Configuration Tools. It is recommended that you register the ClusterEngine when creating the domain to avoid potential issues.
Make sure that the resultant measures are at the right intersection levels by using the information based on the input and output parameters.
The Cluster procedure is a multi-result procedure, which means that it can return multiple results with one procedure call within a rule. In order to get multiple results, the resultant measures must exist, and the specific measure label must be used on the left-hand-side (LHS) of the procedure call. The resultant measure parameters must be comma-separated in the procedural call.
You must configure/register all required input measures.
Be sure to create load and commit rules for the input measures. The RPAS JNI cannot validate the Cluster procedure call, so all input measures must exist within other rules in the rule set in order for them to be available for selection in the Workbook Tool.
You must use the latest version of RPAS to build the domain. You will get the following message in the log because the Cluster function is not validated:

Warning: unable to parse new expression (Unknown special expression: Cluster)

This message is okay.

Registering ClusterEngine will eliminate this error from occurring.
After the domain build, use the regfunction RPAS utility to register the Grade library. The library, which is located in the $RPAS_HOME/applib directory, is libClusterEngine.so. Do not specify the lib or .so file extension for the function name with the regfunction utility.

Example B-1 ClusterEngine

regfunction -d /domains/D01 -l ClusterEngine
Use the Mace command to run the Cluster rule with the rule group (for instance, grade_batch).

Syntax Conventions

The following table displays the syntax conventions used in this document.

Indicator	Definition
[…]	All options listed in brackets are optional.
{…\|…}	Options listed in "{}" with "\|" separators are mutually exclusive (either/or).
{…,…}	Options listed in "{}" with "," separators way are a complete set.
Bold	Labels.
Italics	Italics indicate a temporary placeholder for a constant or a measure.
Italics/meas	This indicates that the placeholder can be either a constant or a measure.
BoldItalics	This indicates a numeric placeholder for the dynamic portion of a label. Usually a number from 1 to N.
Normal	Normal text signifies required information.
Underlined	This convention is used to identify the function name.

Cluster Syntax

The syntax for using the Cluster or BaNG algorithm is shown in the following examples. The input and output parameter tables explain the specific usage of the parameters names use in the procedure.

Example B-2 Generic Example

POINTMEMBERSHIP: MEMBERMEAS, CENTROID: CENTROIDMEAS [, DISTFROMCENTROID: DISTCENTDMEAS, COHESION: COHESIONMEAS, CLUSTERPORTION: CLPORTMEAS, CENTROIDTOAVG: C2AVGMEAS, CLOSESTCLUSTER: CLOSCLUSTMEAS, CLOSESTCLUSTERDIST: CLOSCLUSTDISTMEAS] <-Cluster(MEASURE: MEASMEAS, METHOD: METHOD, NUMCLUSTERS: NUMCLUST, CLUSTERHIER: CLUSTHIER, CLUSTEROVERHIER: CLUSTOVERHIER [, BYGROUPDIMS: BYGROUPDIM, AGGMETHOD: AGGTYPES])

Example B-3 Sample - Cluster with Minimum Information:

POINTMEMBERSHIP:MEMB, CENTROID:CENT<-Cluster(MEASURE:RSAL, METHOD:"BANG", NUMCLUSTERS:5, CLUSTERHIER:"PROD", CLUSTEROVERHIER:"LOC")

Syntax for Calculate Cluster Statistics (CalculateClusterStatistics)

Example B-4 Generic Example

CENTROID: CENTROIDMEAS, [DISTFROMCENTROID: DISTCENTDMEAS, COHESION: COHESIONMEAS, CLUSTERPORTION: CLPORTMEAS, CENTROIDTOAVG: C2AVGMEAS, CLOSESTCLUSTER: CLOSCLUSTMEAS, CLOSESTCLUSTERDIST: CLOSCLUSTDISTMEAS] 
<-CalculateClusterStatistics(MEASURE: MEASMEAS, POINTMEMBERSHIP: MEMBERMEAS, CLUSTERHIER: CLUSTHIER, CLUSTEROVERHIER: CLUSTOVERHIER [, BYGROUPDIMS: BYGROUPDIM, AGGMETHOD: AGGTYPE])

Example B-5 Sample - Cluster Statistics with Minimum Information

CENTROID:CENT<-CalculateClusterStatistics(MEASURE:RSAL, POINTMEMBERSHIP:MEMB, CLUSTERHIER:"PROD", CLUSTEROVERHIER:"LOC")

Configuration Parameters and Rules

Input Parameters

The following table provides the input parameters for the Cluster procedure and special expressions.

Parameter Name	Description
POINTMEMBERSHIP	This is an input parameter for CalculateClusterStatistics and bpstatistics. Its intersection should be the dimension being clustered and all by group dimensions from other hierarchies. The values state which positions are assigned to which cluster index. Data Type: Integer Required: Yes
MEASURE	The measure you are trying to cluster. It must have at least two dimensions. Data Type: Real Required: Yes
METHOD	Determines which clustering algorithm to use. Valid values are BANG (preferred) or KMEANS. Data Type: Real Required: Yes - for Cluster.
NUMCLUSTERS	For each by group partition, the maximum number of clusters. Data Type: Integer Required: Yes - for Cluster.
CLUSTERHIER	The hierarchy that contains the dimension to cluster. The results will give you clusters of positions in this dimension. Data Type: String Required: Yes
CLUSTEROVERHIER	The hierarchy that contains the dimension to cluster over. The algorithm uses the positions in this dimension as the co-ordinates when clustering. Data Type: String Required: Yes
BYGROUPDIMS	The algorithm generates clusters one by group combination at a time. Provide the by group intersection. Data Type: String Required: No
AGGMETHOD	The algorithm aggregates the measure data up to the appropriate level. If AGGMETHOD is specified, it will use it; otherwise, it will use whatever is defined on the measure. Data Type: String Required: No

Output Parameters

The following table provides the output parameters for the Cluster procedure.

Parameter Name	Description
POINTMEMBERSHIP	Its intersection should be the dimension being clustered and all by group dimensions from other hierarchies. The values state which positions are assigned to which cluster index. Data Type: Integer Required: Yes - output for bpcluster
CENTROID	Its intersection should be the cluster dimension, the dimension being clustered over and all by group dimensions from other hierarchies. The values are the average of all points in the cluster. Data Type: Real Required: Yes
DISTFROMCENTROID	Its intersection should be the dimension being clustered and all by group dimensions from other hierarchies. The values are the squared Euclidean distance from that point to its centroid. Data Type: Real Required: No
COHESION	Its intersection should be the cluster dimension and all by group dimensions. The value is the ratio of points in this cluster versus all clusters. Data Type: Real Required: No
CLUSTERPORTION	Its intersection should be the cluster dimension and all by group dimensions. The value is the ratio of points in this cluster versus all clusters. Data Type: Real Required: No
CENTROIDTOAVG	Its intersection should be the cluster dimension, the dimension being clustered over and all by group dimensions from other hierarchies. The values are the ratio of the centroid to the average of all points. Data Type: Real Required: No
CLOSESTCLUSTER	Its intersection should be the cluster dimension and all by group dimensions. The value is the nearest cluster index. Data Type: Integer Required: No
CLOSESTCLUSTERDIST	Its intersection should be the cluster dimension and all by group dimensions. The values are the squared Euclidean distance from the centroid of the cluster to the centroid of the closest cluster. Data Type: Real Required: No

Syntax for Break Point Cluster (bpcluster)

Example B-6 Generic Example

POINTMEMBERSHIP <- bpcluster(SOURCEMEASNAME, CONFIGURATIONMEASNAME, CONFIGNAME [, GROUPBYINT] )

Example B-7 Sample - Break Point Cluster with Minimum Information

MEMB<-bpcluster(RSAL, GCFG, "GCFG01", "CHN_PGRP")

Syntax of Break Point Cluster Statistics (bpstatistics)

Example B-8 Generic Example

CENTROID, DISTANCE <- bpstatistics(POINTMEMBERSHIP, SOURCEMEASNAME [, GROUPBYINT] )

Example B-9 Sample - Break Point Cluster Statistics with Minimum Information

CENTROID, DISTANCE <- bpstatistics(MEMB, RSAL, "CHN_PGRP" )

Configuration Parameters and Rules

Input Parameters

The following table provides the input parameters for the bpcluster and bpstatistics procedures and special expressions.

Parameter Name	Description
SOURCEMEASNAME	The measure you are trying to cluster. Data Type: Real Required: Yes
CONFIGURATIONMEASNAME	Measure defined at Cluster/Configuration intersection. It contains the thresholds for the breakpoint calculation. Data Type: Real Required: Yes - for bpcluster.
CONFIGNAME	Breakpoint Configuration that will be used to produce the grades. (Refer to the Oracle Retail Grade User Guide for details on Breakpoint configuration and administration.) Data Type: String Required: Yes
GROUPBYINT	The algorithm generates clusters by group combination one at a time. Provide the by group intersection. Data Type: String Required: No

Output Parameters

The following table provides the output parameters for the bpcluster and bpstatistics procedures and special expressions.

Parameter Name

Description

POINTMEMBERSHIP

Its intersection should be the dimension being clustered and all by group dimensions from other hierarchies.

The values state which positions are assigned to which cluster index.

Data Type: Integer

Required: Yes - output for bpcluster

CENTROID

Measure defined at Cluster/Configuration intersection. It contains the thresholds for the breakpoint calculation.

Data Type: Real

Required: Yes - for bpcluster.

DISTANCE

This is the point distance of member from centroid.

Data Type: Real

Required: Yes