COUNT_APPROX

COUNT_APPROX returns the most frequent refinements.

COUNT_APPROX is similar to the COUNT function except that it is allowed to produce imprecise results in certain circumstances. Like COUNT, the COUNT_APPROX function counts the number of records that have non-NULL values in a field for each GROUP BY result.

The syntax of the COUNT_APPROX function is:

COUNT_APPROX(<attribute>)

where attribute is either a multi-assign or single-assign attribute. You can also use the COUNT_APPROX(1) format.

The COUNT_APPROX function uses a FrequentK pattern-matching algorithm that calculates a set of refinements. Specifically, it reports the most frequent values. By using the PAGE function, you can indicate the frequency range (as illustrated in the example below).

COUNT_APPROX works best when the distribution is skewed so that a small set of values appear very frequently. However, if the FrequentK pattern matching fails to produce any results, then COUNT_APPROX falls back to using the same implementation as the COUNT function (which does not use the FrequentK algorithm). When running in the FrequentK pattern-matching mode, COUNT_APPROX may return imprecise results; however, its accuracy is precise if it falls back to COUNT mode.

COUNT_APPROX example

In this example, COUNTRY is a single-assign attribute containing country names:

RETURN Results AS
  SELECT
    COUNT_APPROX(COUNTRY) AS Approx
    FROM SalesData
    WHERE COUNTRY IS NOT NULL
    GROUP BY COUNTRY
    ORDER BY Approx DESC
    PAGE(0, 10)

The result of this statement might be:

Approx    COUNTRY
--------------------
| 81970 | USA      |
|  1590 | GERMANY  |
|  1353 | JAPAN    |
|   667 | KOREA    |
|   598 | ENGLAND  |
|   585 | ITALY    |
|   546 | CANADA   |
|   242 | GUAM     |
|   203 | COLOMBIA |
|   176 | SPAIN    |
--------------------