Bookshelf v10.1.3.2: Rules for Queries with Aggregate Functions

Oracle^®Business Intelligence Server Administration Guide > Oracle BI Server SQL Reference > SQL Syntax and Semantics >

Rules for Queries with Aggregate Functions

The Oracle BI Server simplifies the SQL needed to craft aggregate queries. This section outlines the rules that the Oracle BI Server follows with respect to whether or not a query contains a GROUP BY clause and, if a GROUP BY clause is specified, what results you should expect from the query. The rules outlined in this section apply to all aggregates used in SQL statements (SUM, AVG, MIN, MAX, COUNT(*), and COUNT).

Computing Aggregates of Baseline Columns

A baseline column is a column that has no aggregation rule defined in the Aggregation tab of the Logical Column dialog in the repository. Baseline columns map to nonaggregated data at the level of granularity of the logical table to which they belong. If you perform aggregation (SUM, AVG, MIN, MAX, or COUNT) on a baseline column through a SQL request, the Oracle BI Server calculates the aggregation at the level based on the following rules:

If there is no GROUP BY clause specified, the level of aggregation is grouped by all of the nonaggregate columns in the SELECT list.
If there is a GROUP BY clause specified, the level of aggregation is based on the columns specified in the GROUP BY clause.

For example, consider the following query, where the column revenue is defined in the repository as a baseline column (no aggregation rules specified in the Logical Column > Aggregation tab):

select year, product, sum(revenue)

from time, products, facts

YEAR PRODUCT SUM(REVENUE)

1998 Coke 500

1998 Pepsi 600

1999 Coke 600

1999 Pepsi 550

2000 Coke 800

2000 Pepsi 600

This query returns results grouped by year and product; that is, it returns one row for each product and year combination. The sum calculated for each row is the sum of all the sales for that product in that year. It is logically the same query as the following:

select year, product, sum(revenue)

from time, products, facts

group by year, product

If you change the GROUP BY clause to only group by year, then the sum calculated is the sum of all products for the year, as follows:

select year, product, sum(revenue)

from time, products, facts

group by year

YEAR    PRODUCT     SUM(REVENUE)
1998    Coke         1100
1998    Pepsi        1100
1999    Coke         1150
1999    Pepsi        1150
2000    Coke         1400
2000    Pepsi        1400

In this query result set, the sum of revenue is the same for each row corresponding to a given year, and that sum represents the total sales for that year. In this case, it is the sales of Coke plus the sales of Pepsi.

If you add a column to the query requesting the COUNT of revenue, the Oracle BI Server calculates the number of records used to calculate the results for each group. In this case, it is a year, as shown in the following example:

select year, product, sum(revenue), count(revenue)
from time, products, facts
group by year

YEAR    PRODUCT     SUM(REVENUE)  COUNT(REVENUE)
1998    Coke         1100         6000
1998    Pepsi        1100         6000
1999    Coke         1150         6500
1999    Pepsi        1150         6500
2000    Coke         1400         8000
2000    Pepsi        1400         8000

Computing Aggregates of Measure Columns

A measure column is a column that has a default aggregation rule defined in the Aggregation tab of the Logical Column dialog in the repository. Measure columns always calculate the aggregation with which they are defined. If you perform explicit aggregation (SUM, AVG, MIN, MAX, or COUNT) on a measure column through a SQL request, you are actually asking for an aggregate of an aggregate. For these nested aggregates, the Oracle BI Server calculates the aggregation based on the following rules:

A request for a measure column without an aggregate function defined in a SQL statement is always grouped at the level of the nonaggregate columns in the SELECT list, regardless of whether the query specifies a GROUP BY clause.
If there is no GROUP BY clause specified, the nested aggregate is a grand total of each group determined by all of the nonaggregate columns in the SELECT list.
If there is a GROUP BY clause specified, the nested aggregation calculates the total for each group as specified in the GROUP BY clause.

For example, consider the following query, where the column SumOfRevenue is defined in the repository as a measure column with a default aggregation rule of SUM (SUM aggregation rule specified in the Aggregation tab of the Logical Column dialog):

select year, product, SumOfRevenue, sum(SumOfRevenue)
from time, products, facts

YEAR    PRODUCT     SUMofREVENUE   SUM(SUMofREVENUE)
1998    Coke         500            3650
1998    Pepsi        600            3650
1999    Coke         600            3650
1999    Pepsi        550            3650
2000    Coke         800            3650
2000    Pepsi        600            3650

This query returns results grouped by year and product; that is, it returns one row for each product and year combination. The sum calculated for each row in the SumOfRevenue column is the sum of all the sales for that product in that year because the measure column is always at the level defined by the nonaggregation columns in the query. It is logically the same query as the following:

select year, product, SumOfRevenue, sum(SumOfRevenue)
from time, products, facts
group by year, product

If you change the GROUP BY clause to only group by year, then the sum calculated in the SumOfRevenue column is the sum of each product for the year, and the sum calculated in the SUM(SumOfRevenue) column is total sales of all products for the given year, as follows:

select year, product, SumOfRevenue, sum(SumOfRevenue)
from time, products, facts
group by year

YEAR    PRODUCT     SUMofREVENUE   SUM(SUMofREVENUE)
1998    Coke         500            1100
1998    Pepsi        600            1100
1999    Coke         600            1150
1999    Pepsi        550            1150
2000    Coke         800            1400
2000    Pepsi        600            1400

In this result set, the sum calculated for each row in the SumOfRevenue column is the sum of all the sales for that product in that year because the measure column is always at the level defined by the nonaggregation columns in the query. The SUM(SumOfRevenue) is the same for each row corresponding to a given year, and that sum represents the total sales for that year. In this case, it is the sales of Coke plus the sales of Pepsi.

Display Function Reset Behavior

A display function is a function that operates on the result set of a query. The display functions the Oracle BI Server supports (RANK, TOPn, BOTTOMn, PERCENTILE, NTILE, MAVG, MEDIAN, and varieties of standard deviation) are specified in the SELECT list of a SQL query. Queries that use display functions conform to the following rules:

If no GROUP BY clause is specified, the display function operates across the entire result set; that is, the grouping level for the display function is the same as for the query.
If there is a GROUP BY clause specified, the display function resets its values for each group as specified in the GROUP BY clause.

For example, in the following query, SumOfRevenue is defined as a measure column with the default aggregation rule of SUM:

select year, product, SumOfRevenue, rank(SumOfRevenue)
from time, products, facts

YEAR    PRODUCT     SUMOFREVENUE RANK(SUMOFREVENUE)
1998    Coke         500          6
1998    Pepsi        600          2
1999    Coke         600          2
1999    Pepsi        550          5
2000    Coke         800          1
2000    Pepsi        600          2

In this query result set, there is no GROUP BY clause specified, so the rank is calculated across the entire result set. The query is logically the same query as the following:

select year, product, SumOfRevenue, rank(SumOfRevenue))
from time, products, facts
group by year, product

If you change the GROUP BY clause to only group by year, then the rank is reset for each year, as follows:

select year, product, sum(revenue), rank(sum(revenue))
from time, products, facts
group by year

YEAR    PRODUCT     SUM(REVENUE) RANK(SUM(REVENUE))
1998    Coke         500          2
1998    Pepsi        600          1
1999    Coke         600          1
1999    Pepsi        550          2
2000    Coke         800          1
2000    Pepsi        600          2

In this result set, the rank is reset each time the year changes, and because there are two rows for each year, the value of the rank is always either
1 or 2.

Alternative Syntax

When using an aggregate function, you can calculate a specified level of aggregation using BY within the aggregate function. If you do this, you do not need a GROUP BY clause.

For example, the query:

select year, product, revenue, sum(revenue by year) as year_revenue from softdrinks

will return the column year_revenue that displays revenue aggregated by year.

The same syntax can be used with display functions. The query:

select year, product, revenue, rank(revenue), rank(revenue by year) from softdrinks order by 1, 5

will calculate overall rank of revenue for each product for each year (each row in the entire result set) and also the rank of each product's revenue within each year.

Using FILTER to Compute a Conditional Aggregate

In SQL query language, traditional aggregates, such as SUM, COUNT, MIN, and MAX are evaluated on a group of tuples (an ordered list of objects, each of a specified type), determined by the GROUP BY clause. All the aggregates specified in the SELECT clause of a query are evaluated over the same subset of tuples. Conditional aggregates extend SQL by restricting their input using a predicate.

FILTER is an operator that restricts the set of rows used to compute its aggregate argument to rows that satisfy the USING condition. The FILTER operator is a logical SQL construct. It may be used in logical queries referring to the metadata, or in logical columns that use existing logical columns as the source.

Syntax

Conditional aggregates are only notational concepts and they do not represent executable operators. Conditional aggregates are expressed in the form of a function as shown in the following statement:

FILTER(<measure_expression> USING <boolean_expression>)

Where:

<measure_expression> is an expression that contains at least one measure. The following is a list of examples:
- The expression Sales + 1 is allowed if Sales is a measure.
- The expression productid is not allowed if productid is a scalar attribute.
<boolean_expression> is a boolean expression (evaluates to TRUE or FALSE) that does not contain any measures. This expression may not contain any nested queries.

Example of the FILTER Function

The following is a simple example of the FILTER function:

SELECT year,

FILTER(sales USING product = 'coke'),

FILTER(sales USING product = 'pepsi')

FROM logBeverages

After navigation, this query is executed as follows:

SELECT year,

SUM(CASE WHEN product = 'coke' THEN sales),

SUM(CASE WHEN product = 'pepsi' THEN sales)

FROM physBeverages

WHERE product = 'coke' OR product = 'pepsi'

GROUP BY year

Error handling

In the example FILTER(X USING Y), error messages will be returned in the following situations:

The Y expression is not a boolean expression.
The Y expression contains measures.
FILTER is used in outer query block.
Explicit aggregates are used in the X (measure) expression. For example, FILTER(COUNT(product), C).

Oracle® Business Intelligence Server Administration Guide		Copyright © 2007, Oracle. All rights reserved.