APPROXCOUNTDISTINCT function

The APPROXCOUNTDISTINCT function counts the number of distinct values for an attribute.

APPROXCOUNTDISTINCT is similar to the COUNTDISTINCT function except that it is allowed to produce an approximation for the number of distinct values in certain circumstances. The APPROXCOUNTDISTINCT function returns the number of unique values in a field for each GROUP BY result. APPROXCOUNTDISTINCT can be used for both single-assign and multi-assigned attributes.

The APPROXCOUNTDISTINCT function uses the HyperLogLog algorithm to calculate the a set of refinements. If the number of distinct values is low, then the results will be accurate; if the number of distinct values is high, the results will be an approximation.

APPROXCOUNTDISTINCT will also evaluate a record with an empty set (that is, an empty set is returned for any record that does not have an assignment for the specified multi-assign attribute).

APPROXCOUNTDISTINCT syntax

The syntax of the APPROXCOUNTDISTINCT function is:
APPROXCOUNTDISTINCT(<attribute>)
where attribute is either a multi-assign or single-assign attribute.

APPROXCOUNTDISTINCT example

Assume the following nine records that are of WineType=Red (where WineType is a single-assign attribute). Each record includes one or two assignments for the multi-assign Body attribute:
Body                WineID
--------------------------
{ Silky, Tannins }    3
{ Robust, Tannins }   4
{ Silky, Tannins }    5
{ Robust }            6
{ Robust }            8
{ Silky, Tannins }    9
{ Silky, Tannins }   12
{ Silky, Tannins }   16
{ Silky, Tannins }   18
--------------------------
The following statement returns the number of different values for the Body attribute in the WineType=Red records:
RETURN Result AS
SELECT APPROXCOUNTDISTINCT (Body) AS Total
FROM WineState
WHERE WineType = 'Red'
GROUP BY WineType
The statement result is:
Total=3, WineType=Red
For this group, the value of Total is 3 because there are three non-empty sets with unique values for the Body attribute:
  • One set for Records 3, 5, 9, 12, 16, and 18, each of which has the "Silky" and "Tannins" assignments for Body.
  • One set for Records 6 and 8, each of which has the "Robust" assignment for Body.
  • One set for Record 4, which has the "Robust" and "Tannins" assignments for Body.

Thus, there are three sets of distinct values for the Body attribute, when grouped by the WineType attribute.