Bookshelf v7.8.2: Rules for Queries with Aggregate Functions

Siebel Business Analytics Server Administration Guide > SQL Reference > SQL Syntax and Semantics >

Rules for Queries with Aggregate Functions

The Analytics Server simplifies the SQL needed to craft aggregate queries. This section outlines the rules that the Analytics Server follows with respect to whether or not a query contains a GROUP BY clause and, if a GROUP BY clause is specified, what results you should expect from the query. The rules outlined in this section apply to all aggregates used in SQL statements (SUM, AVG, MIN, MAX, COUNT(*), and COUNT).

Computing Aggregates of Baseline Columns

A baseline column is a column that has no aggregation rule defined in the Aggregation tab of the Logical Column dialog in the repository. Baseline columns map to nonaggregated data at the level of granularity of the logical table to which they belong. If you perform aggregation (SUM, AVG, MIN, MAX, or COUNT) on a baseline column through a SQL request, the Analytics Server calculates the aggregation at the level based on the following rules:

If there is no GROUP BY clause specified, the level of aggregation is grouped by all of the nonaggregate columns in the SELECT list.
If there is a GROUP BY clause specified, the level of aggregation is based on the columns specified in the GROUP BY clause.

For example, consider the following query, where the column revenue is defined in the repository as a baseline column (no aggregation rules specified in the Logical Column > Aggregation tab):

select year, product, sum(revenue)

from time, products, facts

YEAR PRODUCT SUM(REVENUE)

1998 Coke 500

1998 Pepsi 600

1999 Coke 600

1999 Pepsi 550

2000 Coke 800

2000 Pepsi 600

This query returns results grouped by year and product; that is, it returns one row for each product and year combination. The sum calculated for each row is the sum of all the sales for that product in that year. It is logically the same query as the following:

select year, product, sum(revenue)

from time, products, facts

group by year, product

If you change the GROUP BY clause to only group by year, then the sum calculated is the sum of all products for the year, as follows:

select year, product, sum(revenue)

from time, products, facts

group by year

YEAR    PRODUCT     SUM(REVENUE)
1998    Coke         1100
1998    Pepsi        1100
1999    Coke         1150
1999    Pepsi        1150
2000    Coke         1400
2000    Pepsi        1400

In this query result set, the sum of revenue is the same for each row corresponding to a given year, and that sum represents the total sales for that year. In this case, it is the sales of Coke plus the sales of Pepsi.

If you add a column to the query requesting the COUNT of revenue, the Analytics Server calculates the number of records used to calculate the results for each group. In this case, it is a year, as shown in the following example:

select year, product, sum(revenue), count(revenue)
from time, products, facts
group by year

YEAR    PRODUCT     SUM(REVENUE)  COUNT(REVENUE)
1998    Coke         1100         6000
1998    Pepsi        1100         6000
1999    Coke         1150         6500
1999    Pepsi        1150         6500
2000    Coke         1400         8000
2000    Pepsi        1400         8000

Computing Aggregates of Measure Columns

A measure column is a column that has a default aggregation rule defined in the Aggregation tab of the Logical Column dialog in the repository. Measure columns always calculate the aggregation with which they are defined. If you perform explicit aggregation (SUM, AVG, MIN, MAX, or COUNT) on a measure column through a SQL request, you are actually asking for an aggregate of an aggregate. For these nested aggregates, the Analytics Server calculates the aggregation based on the following rules:

A request for a measure column without an aggregate function defined in a SQL statement is always grouped at the level of the nonaggregate columns in the SELECT list, regardless of whether the query specifies a GROUP BY clause.
If there is no GROUP BY clause specified, the nested aggregate is a grand total of each group determined by all of the nonaggregate columns in the SELECT list.
If there is a GROUP BY clause specified, the nested aggregation calculates the total for each group as specified in the GROUP BY clause.

For example, consider the following query, where the column SumOfRevenue is defined in the repository as a measure column with a default aggregation rule of SUM (SUM aggregation rule specified in the Aggregation tab of the Logical Column dialog):

select year, product, SumOfRevenue, sum(SumOfRevenue)
from time, products, facts

YEAR    PRODUCT     SUMofREVENUE   SUM(SUMofREVENUE)
1998    Coke         500            3650
1998    Pepsi        600            3650
1999    Coke         600            3650
1999    Pepsi        550            3650
2000    Coke         800            3650
2000    Pepsi        600            3650

This query returns results grouped by year and product; that is, it returns one row for each product and year combination. The sum calculated for each row in the SumOfRevenue column is the sum of all the sales for that product in that year because the measure column is always at the level defined by the nonaggregation columns in the query. It is logically the same query as the following:

select year, product, SumOfRevenue, sum(SumOfRevenue)
from time, products, facts
group by year, product

If you change the GROUP BY clause to only group by year, then the sum calculated in the SumOfRevenue column is the sum of each product for the year, and the sum calculated in the SUM(SumOfRevenue) column is total sales of all products for the given year, as follows:

select year, product, SumOfRevenue, sum(SumOfRevenue)
from time, products, facts
group by year

YEAR    PRODUCT     SUMofREVENUE   SUM(SUMofREVENUE)
1998    Coke         500            1100
1998    Pepsi        600            1100
1999    Coke         600            1150
1999    Pepsi        550            1150
2000    Coke         800            1400
2000    Pepsi        600            1400

In this result set, the sum calculated for each row in the SumOfRevenue column is the sum of all the sales for that product in that year because the measure column is always at the level defined by the nonaggregation columns in the query. The SUM(SumOfRevenue) is the same for each row corresponding to a given year, and that sum represents the total sales for that year. In this case, it is the sales of Coke plus the sales of Pepsi.

Display Function Reset Behavior

A display function is a function that operates on the result set of a query. The display functions the Analytics Server supports (RANK, TOPn, BOTTOMn, PERCENTILE, NTILE, MAVG, MEDIAN, and varieties of standard deviation) are specified in the SELECT list of a SQL query. Queries that use display functions conform to the following rules:

If no GROUP BY clause is specified, the display function operates across the entire result set; that is, the grouping level for the display function is the same as for the query.
If there is a GROUP BY clause specified, the display function resets its values for each group as specified in the GROUP BY clause.

For example, consider the following query, where SumOfRevenue is defined as a measure column with the default aggregation rule of SUM:

select year, product, SumOfRevenue, rank(SumOfRevenue)
from time, products, facts

YEAR    PRODUCT     SUMOFREVENUE RANK(SUMOFREVENUE)
1998    Coke         500          6
1998    Pepsi        600          2
1999    Coke         600          2
1999    Pepsi        550          5
2000    Coke         800          1
2000    Pepsi        600          2

In this query result set, there is no GROUP BY clause specified, so the rank is calculated across the entire result set. The query is logically the same query as the following:

select year, product, SumOfRevenue, rank(SumOfRevenue))
from time, products, facts
group by year, product

If you change the GROUP BY clause to only group by year, then the rank is reset for each year, as follows:

select year, product, sum(revenue), rank(sum(revenue))
from time, products, facts
group by year

YEAR    PRODUCT     SUM(REVENUE) RANK(SUM(REVENUE))
1998    Coke         500          2
1998    Pepsi        600          1
1999    Coke         600          1
1999    Pepsi        550          2
2000    Coke         800          1
2000    Pepsi        600          2

In this result set, the rank is reset each time the year changes, and because there are two rows for each year, the value of the rank is always either
1 or 2.

Alternative Syntax

When using an aggregate function, you can calculate a specified level of aggregation using BY within the aggregate function. If you do this, you do not need a GROUP BY clause.

For example, the query:

select year, product, revenue, sum(revenue by year) as year_revenue from softdrinks

will return the column year_revenue that displays revenue aggregated by year.

The same syntax can be used with display functions. The query:

select year, product, revenue, rank(revenue), rank(revenue by year) from softdrinks order by 1, 5

will calculate overall rank of revenue for each product for each year (each row in the entire result set) and also the rank of each product's revenue within each year.

Siebel Business Analytics Server Administration Guide