Filters/WHERE, HAVING

The Analytics API supports a general set of filtering operations. As in SQL, these can be used to filter the input records considered by a given statement (WHERE) and the output records returned by that statement (HAVING).

You can also filter input records for each expression to compute analytic expressions for a subset of the result buckets.

A variety of filter operations are supported, such as numeric and string value comparison functions (such as equality, inequality, greater than, less than, between, and so on), and Boolean operators (AND, OR, or NOT) for creating complex filter constraints.

For example, we could limit the sales reps to those who generated at least $10,000:

RETURN Reps AS
SELECT SUM(Amount) AS SalesTotal
GROUP BY SalesRep
HAVING SalesTotal > 10000

You could further restrict this analytics sub-query to only the sales in the Western region:

RETURN Reps AS
SELECT SUM(Amount) AS SalesTotal
WHERE Region = 'West'
GROUP BY SalesRep
HAVING SalesTotal > 10000

Alternatively, you could use the WHERE filter with an ID of the dimension value based on which we are restricting the results. For example, if the ID of the “West” dimension is “1234”, the WHERE filter looks like this:

RETURN Reps AS
SELECT SUM(Amount) AS SalesTotal
WHERE Dval(1234)
GROUP BY SalesRep
HAVING SalesTotal > 10000

Note: Ensure that you specify a valid attribute to the WHERE filter (or to any other filter clause).

The example with the name of the dimension “West”, as expressed using the programmatic API, is:

Statement reps = new Statement();
reps.setName("Reps");
reps.setWhereFilter(
  new FilterCompare("Region", FilterCompare.EQ, "West"));
reps.setHavingFilter(
  new FilterCompare("SalesTotal", FilterCompare.GT, "10000"));
GroupByList groupBys = new GroupByList();
groupBys.add(new GroupBy("SalesRep"));
reps.setGroupByList(groupBys);
Expr e = new ExprAggregate(ExprAggregate.SUM,
  new ExprKey("Amount"));
SelectList selects = new SelectList();
selects.add(new Select("SalesTotal",e));
reps.setSelectList(selects);

As mentioned above, filters can be specified for each expression. For example, a single query can include two expressions, where one expression computes the total for the entire aggregate while another expression computes the total for a particular sales representative:

RETURN QuarterTotals AS SELECT
  SUM(Amount) AS SalesTotal,
  SUM(Amount) WHERE SalesRep = 'John Smith' AS JohnTotal
GROUP BY Quarter

This would give us both the total overall sales and the total sales for John Smith in each quarter. The same example in the Java API is:

Statement stmnt = new Statement();
stmnt.setName("QuarterTotals");
GroupByList groupBys = new GroupByList();
groupBys.add(new GroupBy("Quarter"));
stmnt.setGroupByList(groupBys);

SelectList selects = new SelectList();
ExprAggregate e = new ExprAggregate(ExprAggregate.SUM,
  new ExprKey("Amount"));
selects.add(new Select("SalesTotal",e));

e = new ExprAggregate(ExprAggregate.SUM, new ExprKey("Amount"));
e.setFilter(
  new FilterCompare("SalesRep", FilterCompare.EQ, 
   John Smith"));
selects.add(new Select("JohnTotal",e));

stmnt.setSelectList(selects);

When the HAVING clause is applied

In a query such as:

SELECT a AS b GROUP BY c HAVING d>7

the HAVING clause applies after the "a AS b" clause, so that it filters out the records returned by the SELECT, not the records on which the SELECT operates. If we were to write the query as a set of function calls, it would look like this:

HAVING(d>7,SELECT(a AS b,GROUP(BY c)))

That is, the SELECT gets data from the GROUP BY, does its calculations, and passes its results on to the HAVING to be filtered. If you are familiar with SQL, this may be a surprise, because, in SQL, HAVING has the opposite effect: it filters the results of the GROUP BY, not the results of the SELECT.