About aggregating attribute values

You aggregate attribute values by applying an operation (sum, average, min/max, etc.) to one or more attributes and then grouping the derived result by an attribute.

For example, you could aggregate the average of a Standard Cost attribute and group the result by a Product Category. In that case, you get the average cost of each item in a particular product category.

Or, you could aggregate the minimum and maximum values of a Standard Cost attribute and group the results by a Product Category. In that case, you get the lowest cost of each item in a category and the highest cost of each item in a category.

Aggregating attribute values provides a way to consolidate data into a new derived attribute for further examination. And in some cases, the new derived attribute values are often used before you perform additional operations such joining one data set with another data set.

Some attributes have a data type that make them unavailable for aggregation. Specifically, multi-assign and geocode attributes display in the Aggregate editor as unavailable.

Some operators can only transform attributes of a particular data type. The average, sum, standard dev and variance operators can only used with numeric attributes. The min and max operators can be used with numeric attributes and date time attributes.

Aggregation operators

The supported aggregation operators are sum, average, min, max, , records with values, and unique values. In addition to these operators, you might also have set, variance, and standard dev if the df.advancedSparkAggregationsEnabled property is set to true on the Control Panel > Studio Settings page.

Table 19-1 Aggregation operators

Operator Description
sum Finds the sum of all values in the attribute.
average Finds the average of all values in the attribute.
min Finds the minimum value of all values in the attribute.
max Finds the maximum value of all values in the attribute.
records with values Finds the number of records with values in the attribute.
unique values Finds the number of unique values in the attribute.
set Optionally enabled. Finds the set of matching values in the attribute.
variance Optionally enabled. Finds the variance between a set of values.
standard dev Optionally enabled. Finds the standard deviation for the attribute values.

Interaction with data set updates

Aggregated attributes are calculated based on values from one point in time. If you run a transformation script with an aggregation and then reload or update a data set, you have to re-run the transform script with the aggregation using the updated data. In other words, the values in an aggregated attribute have been collapsed during the aggregation, and you need the un-aggregated values to run subsequent updates.