Bookshelf Home | Contents | Index | Search | PDF |
Siebel Analytics Server Administration Guide > SQL Reference > SQL Syntax and Semantics >
Rules for Queries with Aggregate Functions
The Siebel Analytics Server simplifies the SQL needed to craft aggregate queries. This section outlines the rules that the Siebel Analytics Server follows with respect to whether or not a query contains a GROUP BY clause and, if a GROUP BY clause is specified, what results you should expect from the query. The rules outlined in this section apply to all aggregates used in SQL statements (SUM, AVG, MIN, MAX, COUNT(*), and COUNT).
Computing Aggregates of Baseline Columns
A baseline column is a column that has no aggregation rule defined in the Aggregation tab of the Logical Column dialog in the repository. Baseline columns map to nonaggregated data at the level of granularity of the logical table to which they belong. If you perform aggregation (SUM, AVG, MIN, MAX, or COUNT) on a baseline column through a SQL request, the Siebel Analytics Server calculates the aggregation at the level based on the following rules:
- If there is no GROUP BY clause specified, the level of aggregation is grouped by all of the nonaggregate columns in the SELECT list.
- If there is a GROUP BY clause specified, the level of aggregation is based on the columns specified in the GROUP BY clause.
For example, consider the following query, where the column revenue is defined in the repository as a baseline column (no aggregation rules specified in the Logical Column > Aggregation tab):
select year, product, sum(revenue)
from time, products, facts
YEAR PRODUCT SUM(REVENUE)
1998 Coke 500
1998 Pepsi 600
1999 Coke 600
1999 Pepsi 550
2000 Coke 800
2000 Pepsi 600
This query returns results grouped by year and product; that is, it returns one row for each product and year combination. The sum calculated for each row is the sum of all the sales for that product in that year. It is logically the same query as the following:
select year, product, sum(revenue)
from time, products, facts
group by year, product
If you change the GROUP BY clause to only group by year, then the sum calculated is the sum of all products for the year, as follows:
select year, product, sum(revenue)
from time, products, facts
group by year
YEAR PRODUCT SUM(REVENUE)
1998 Coke 1100
1998 Pepsi 1100
1999 Coke 1150
1999 Pepsi 1150
2000 Coke 1400
2000 Pepsi 1400In this query result set, the sum of revenue is the same for each row corresponding to a given year, and that sum represents the total sales for that year, which in this case is the sales of Coke plus the sales of Pepsi.
If you add a column to the query requesting the COUNT of revenue, the Siebel Analytics Server calculates the number of records used to calculate the results for each group, which is a year in this case, as follows:
select year, product, sum(revenue), count(revenue)
from time, products, facts
group by year
YEAR PRODUCT SUM(REVENUE) COUNT(REVENUE)
1998 Coke 1100 6000
1998 Pepsi 1100 6000
1999 Coke 1150 6500
1999 Pepsi 1150 6500
2000 Coke 1400 8000
2000 Pepsi 1400 8000Computing Aggregates of Measure Columns
A measure column is a column that has a default aggregation rule defined in the Aggregation tab of the Logical Column dialog in the repository. Measure columns always calculate the aggregation with which they are defined. If you perform explicit aggregation (SUM, AVG, MIN, MAX, or COUNT) on a measure column through a SQL request, you are actually asking for an aggregate of an aggregate. For these nested aggregates, the Siebel Analytics Server calculates the aggregation based on the following rules:
- A request for a measure column without an aggregate function defined in a SQL statement is always grouped at the level of the nonaggregate columns in the SELECT list, regardless of whether the query specifies a GROUP BY clause.
- If there is no GROUP BY clause specified, the nested aggregate is a grand total of each group determined by all of the nonaggregate columns in the SELECT list.
- If there is a GROUP BY clause specified, the nested aggregation calculates the total for each group as specified in the GROUP BY clause.
For example, consider the following query, where the column SumOfRevenue is defined in the repository as a measure column with a default aggregation rule of SUM (SUM aggregation rule specified in the Aggregation tab of the Logical Column dialog):
select year, product, SumOfRevenue, sum(SumOfRevenue)
from time, products, facts
YEAR PRODUCT SUMofREVENUE SUM(SUMofREVENUE)
1998 Coke 500 3650
1998 Pepsi 600 3650
1999 Coke 600 3650
1999 Pepsi 550 3650
2000 Coke 800 3650
2000 Pepsi 600 3650This query returns results grouped by year and product; that is, it returns one row for each product and year combination. The sum calculated for each row in the SumOfRevenue column is the sum of all the sales for that product in that year because the measure column is always at the level defined by the nonaggregation columns in the query. It is logically the same query as the following:
select year, product, SumOfRevenue, sum(SumOfRevenue)
from time, products, facts
group by year, productIf you change the GROUP BY clause to only group by year, then the sum calculated in the SumOfRevenue column is the sum of each product for the year, and the sum calculated in the SUM(SumOfRevenue) column is total sales of all products for the given year, as follows:
select year, product, SumOfRevenue, sum(SumOfRevenue)
from time, products, facts
group by year
YEAR PRODUCT SUMofREVENUE SUM(SUMofREVENUE)
1998 Coke 500 1100
1998 Pepsi 600 1100
1999 Coke 600 1150
1999 Pepsi 550 1150
2000 Coke 800 1400
2000 Pepsi 600 1400In this result set, the sum calculated for each row in the SumOfRevenue column is the sum of all the sales for that product in that year because the measure column is always at the level defined by the nonaggregation columns in the query. The SUM(SumOfRevenue) is the same for each row corresponding to a given year, and that sum represents the total sales for that year, which in this case is the sales of Coke plus the sales of Pepsi.
Display Function Reset Behavior
A display function is a function that operates on the result set of a query. The display functions the Siebel Analytics Server supports (RANK, TOPn, BOTTOMn, PERCENTILE, NTILE, MAVG, MEDIAN, and varieties of standard deviation) are specified in the SELECT list of a SQL query. Queries that use display functions conform to the following rules:
- If no GROUP BY clause is specified, the display function operates across the entire result set; that is, the grouping level for the display function is the same as for the query.
- If there is a GROUP BY clause specified, the display function resets its values for each group as specified in the GROUP BY clause.
For example, consider the following query, where SumOfRevenue is defined as a measure column with the default aggregation rule of SUM:
select year, product, SumOfRevenue, rank(SumOfRevenue)
from time, products, facts
YEAR PRODUCT SUMOFREVENUE RANK(SUMOFREVENUE)
1998 Coke 500 6
1998 Pepsi 600 2
1999 Coke 600 2
1999 Pepsi 550 5
2000 Coke 800 1
2000 Pepsi 600 2In this query result set, there is no GROUP BY clause specified, so the rank is calculated across the entire result set. The query is logically the same query as the following:
select year, product, SumOfRevenue, rank(SumOfRevenue))
from time, products, facts
group by year, productIf you change the GROUP BY clause to only group by year, then the rank is reset for each year, as follows:
select year, product, sum(revenue), rank(sum(revenue))
from time, products, facts
group by year
YEAR PRODUCT SUM(REVENUE) RANK(SUM(REVENUE))
1998 Coke 500 2
1998 Pepsi 600 1
1999 Coke 600 1
1999 Pepsi 550 2
2000 Coke 800 1
2000 Pepsi 600 2In this result set, the rank is reset each time the year changes, and because there are two rows for each year, the value of the rank is always either
1 or 2.Alternative Syntax
When using an aggregate function, you can calculate a specified level of aggregation using BY within the aggregate function. If you do this, you do not need a GROUP BY clause.
For example, the query:
select year, product, revenue, sum(revenue by year) as year_revenue from softdrinks
will return the column year_revenue that displays revenue aggregated by year.
The same syntax can be used with display functions. The query:
select year, product, revenue, rank(revenue), rank(revenue by year) from softdrinks order by 1, 5
will calculate overall rank of revenue for each product for each year (each row in the entire result set) and also the rank of each product's revenue within each year.
Bookshelf Home | Contents | Index | Search | PDF |
Siebel Analytics Server Administration Guide Published: 23 June 2003 |