Clustering may be used to provide insight into how various parts of a retailer's operations can be grouped together. Typically a retailer may cluster stores over item sales to create logical groupings of stores based upon sales of particular products. This provides increased visibility to where products are selling, and it allows the retailer to make more accurate decisions in merchandising. Beyond this traditional use of clusters, the Cluster is flexible enough to cluster any business measure based on products, locations, time, promotions, customers, or any hierarchy configured in the solution.
Note: The syntax is slightly different than the standard RPAS functions and procedures that are described in the ”Rule Functions Reference Guide” section of the Oracle Retail Predictive Application Server Configuration Tools User Guide. |
The two approaches available for clustering are BreakPoint and Cluster, or the BaNG approach. See the Oracle Retail Grade User Guide for details on these two approaches. The following sections explain the specifics for configuring clustering.
The following libraries must be registered in any domains that will use the Cluster solution extension:
AppFunctions
ClusterEngine
The following notes are intended to serve as a guide for configuring the Cluster procedure within the RPAS Configuration Tools:
See the section, "Syntax Conventions", for the appropriate syntax for calling this procedure. Parameter labels must always be used.
If the ClusterEngine is not registered with the Configuration Tools, this rule will remain red, which indicates that it is invalid because the RPAS JNI cannot validate it at this point in time. Therefore, there is no validation for this rule. Refer to the Grade documentation for the appropriate input parameters and output measures. Make sure to register the ClusterEngine with the Configuration Tools. It is recommended that you register the ClusterEngine when creating the domain to avoid potential issues.
Make sure that the resultant measures are at the right intersection levels by using the information based on the input and output parameters.
The Cluster procedure is a multi-result procedure, which means that it can return multiple results with one procedure call within a rule. In order to get multiple results, the resultant measures must exist, and the specific measure label must be used on the left-hand-side (LHS) of the procedure call. The resultant measure parameters must be comma-separated in the procedural call.
You must configure/register all required input measures.
Be sure to create load and commit rules for the input measures. The RPAS JNI cannot validate the Cluster procedure call, so all input measures must exist within other rules in the rule set in order for them to be available for selection in the Workbook Tool.
You must use the latest version of RPAS to build the domain. You will get the following message in the log because the Cluster function is not validated:
Warning: unable to parse new expression (Unknown special expression: Cluster)
This message is okay.
Registering ClusterEngine will eliminate this error from occurring.
After the domain build, use the regfunction RPAS utility to register the Grade library. The library, which is located in the $RPAS_HOME/applib directory, is libClusterEngine.so. Do not specify the lib or .so file extension for the function name with the regfunction utility.
Use the Mace command to run the Cluster rule with the rule group (for instance, grade_batch).
The following table displays the syntax conventions used in this document.
Indicator | Definition |
---|---|
[…] |
All options listed in brackets are optional. |
{…|…} |
Options listed in "{}" with "|" separators are mutually exclusive (either/or). |
{…,…} | Options listed in "{}" with "," separators way are a complete set. |
Bold | Labels. |
Italics | Italics indicate a temporary placeholder for a constant or a measure. |
Italics/meas | This indicates that the placeholder can be either a constant or a measure. |
BoldItalics | This indicates a numeric placeholder for the dynamic portion of a label. Usually a number from 1 to N. |
Normal | Normal text signifies required information. |
Underlined | This convention is used to identify the function name. |
The syntax for using the Cluster or BaNG algorithm is shown in the following examples. The input and output parameter tables explain the specific usage of the parameters names use in the procedure.
Example B-2 Generic Example
POINTMEMBERSHIP: MEMBERMEAS, CENTROID: CENTROIDMEAS [, DISTFROMCENTROID: DISTCENTDMEAS, COHESION: COHESIONMEAS, CLUSTERPORTION: CLPORTMEAS, CENTROIDTOAVG: C2AVGMEAS, CLOSESTCLUSTER: CLOSCLUSTMEAS, CLOSESTCLUSTERDIST: CLOSCLUSTDISTMEAS] <-Cluster(MEASURE: MEASMEAS, METHOD: METHOD, NUMCLUSTERS: NUMCLUST, CLUSTERHIER: CLUSTHIER, CLUSTEROVERHIER: CLUSTOVERHIER [, BYGROUPDIMS: BYGROUPDIM, AGGMETHOD: AGGTYPES])
Example B-4 Generic Example
CENTROID: CENTROIDMEAS, [DISTFROMCENTROID: DISTCENTDMEAS, COHESION: COHESIONMEAS, CLUSTERPORTION: CLPORTMEAS, CENTROIDTOAVG: C2AVGMEAS, CLOSESTCLUSTER: CLOSCLUSTMEAS, CLOSESTCLUSTERDIST: CLOSCLUSTDISTMEAS] <-CalculateClusterStatistics(MEASURE: MEASMEAS, POINTMEMBERSHIP: MEMBERMEAS, CLUSTERHIER: CLUSTHIER, CLUSTEROVERHIER: CLUSTOVERHIER [, BYGROUPDIMS: BYGROUPDIM, AGGMETHOD: AGGTYPE])
The following table provides the input parameters for the Cluster procedure and special expressions.
Parameter Name | Description |
---|---|
POINTMEMBERSHIP | This is an input parameter for CalculateClusterStatistics and bpstatistics.
Its intersection should be the dimension being clustered and all by group dimensions from other hierarchies. The values state which positions are assigned to which cluster index. Data Type: Integer Required: Yes |
MEASURE | The measure you are trying to cluster. It must have at least two dimensions.
Data Type: Real Required: Yes |
METHOD | Determines which clustering algorithm to use.
Valid values are BANG (preferred) or KMEANS. Data Type: Real Required: Yes - for Cluster. |
NUMCLUSTERS | For each by group partition, the maximum number of clusters.
Data Type: Integer Required: Yes - for Cluster. |
CLUSTERHIER | The hierarchy that contains the dimension to cluster. The results will give you clusters of positions in this dimension.
Data Type: String Required: Yes |
CLUSTEROVERHIER | The hierarchy that contains the dimension to cluster over. The algorithm uses the positions in this dimension as the co-ordinates when clustering.
Data Type: String Required: Yes |
BYGROUPDIMS | The algorithm generates clusters one by group combination at a time. Provide the by group intersection.
Data Type: String Required: No |
AGGMETHOD | The algorithm aggregates the measure data up to the appropriate level.
If AGGMETHOD is specified, it will use it; otherwise, it will use whatever is defined on the measure. Data Type: String Required: No |
The following table provides the output parameters for the Cluster procedure.
Parameter Name | Description |
---|---|
POINTMEMBERSHIP | Its intersection should be the dimension being clustered and all by group dimensions from other hierarchies.
The values state which positions are assigned to which cluster index. Data Type: Integer Required: Yes - output for bpcluster |
CENTROID | Its intersection should be the cluster dimension, the dimension being clustered over and all by group dimensions from other hierarchies.
The values are the average of all points in the cluster. Data Type: Real Required: Yes |
DISTFROMCENTROID | Its intersection should be the dimension being clustered and all by group dimensions from other hierarchies.
The values are the squared Euclidean distance from that point to its centroid. Data Type: Real Required: No |
COHESION | Its intersection should be the cluster dimension and all by group dimensions.
The value is the ratio of points in this cluster versus all clusters. Data Type: Real Required: No |
CLUSTERPORTION | Its intersection should be the cluster dimension and all by group dimensions.
The value is the ratio of points in this cluster versus all clusters. Data Type: Real Required: No |
CENTROIDTOAVG | Its intersection should be the cluster dimension, the dimension being clustered over and all by group dimensions from other hierarchies.
The values are the ratio of the centroid to the average of all points. Data Type: Real Required: No |
CLOSESTCLUSTER | Its intersection should be the cluster dimension and all by group dimensions.
The value is the nearest cluster index. Data Type: Integer Required: No |
CLOSESTCLUSTERDIST | Its intersection should be the cluster dimension and all by group dimensions.
The values are the squared Euclidean distance from the centroid of the cluster to the centroid of the closest cluster. Data Type: Real Required: No |
The following table provides the input parameters for the bpcluster and bpstatistics procedures and special expressions.
Parameter Name | Description |
---|---|
SOURCEMEASNAME | The measure you are trying to cluster.
Data Type: Real Required: Yes |
CONFIGURATIONMEASNAME | Measure defined at Cluster/Configuration intersection. It contains the thresholds for the breakpoint calculation.
Data Type: Real Required: Yes - for bpcluster. |
CONFIGNAME | Breakpoint Configuration that will be used to produce the grades. (Refer to the Oracle Retail Grade User Guide for details on Breakpoint configuration and administration.)
Data Type: String Required: Yes |
GROUPBYINT | The algorithm generates clusters by group combination one at a time. Provide the by group intersection.
Data Type: String Required: No |
The following table provides the output parameters for the bpcluster and bpstatistics procedures and special expressions.
Parameter Name | Description |
---|---|
POINTMEMBERSHIP | Its intersection should be the dimension being clustered and all by group dimensions from other hierarchies.
The values state which positions are assigned to which cluster index. Data Type: Integer Required: Yes - output for bpcluster |
CENTROID | Measure defined at Cluster/Configuration intersection. It contains the thresholds for the breakpoint calculation.
Data Type: Real Required: Yes - for bpcluster. |
DISTANCE | This is the point distance of member from centroid.
Data Type: Real Required: Yes |