Oracle® Business Intelligence Discoverer Plus User's Guide
10g Release 2 (10.1.2.0.0) Part No. B13915-01 |
|
![]() Previous |
![]() Next |
The examples in this section show you how to use the Oracle8i analytic functions with Discoverer to perform detailed data analysis.
This section contains the following topics:
Notes
OracleBI Discoverer supports all functions that are supported by the version of the Oracle database being used. For example, analytic functions are supported by Oracle 8.1.7 (or later) Enterprise Edition databases.
Examples in this section use a selection of commonly used functions. For a complete list of functions and their full syntax, refer to Oracle8i SQL Reference and Oracle8i Data Warehousing Guide.
For more information on how to create calculations, see "What are analytic functions?".
Discoverer provides easy-to-use templates for the most popular analytic functions (for more information, see "What analytic function templates are available in Discoverer?").
Analytic functions are classified in the following categories:
Ranking - Address business questions like: 'What are the top 10 and bottom 10 salespeople per region?'.
Banding - Address business questions like 'What brands make up 25% of sales?'.
Windowing - Address business questions like 'What is the 13-week moving average of a stock price?' or 'What is the cumulative sum of sales per region?'.
Reporting Aggregates - After a query has been processed, aggregate values like the number of resulting rows, or the sum of a column in a set of rows. Address questions like 'What are each product's Sales as a percentage of Sales for its product group?'.
Lag/Lead - Address business questions like 'What was the value of sales for the same period one year ago?'.
Statistics - Perform statistical analysis with Business Intelligence OLAP/spreadsheet applications. For example, covariance and linear regression functions.
When you use analytic functions, note that they have a precise definition which might not change as you drill, pivot, or sort the result set. For example, if you use the Rank function to assign ranks to sales figures partitioned by quarter, if you drill down to the month level, the rank still only applies to the quarter level.
Discoverer Plus Relational provides easy-to-use templates for the most popular analytic functions (for more information, see "What analytic function templates are available in Discoverer?").
If you want to create an analytic function that does not have a template, you can either type or paste it directly into the Calculation field on the "New Calculation dialog", or you can select it from the function list.
If you select analytic functions from the function list on the "New Calculation dialog", Discoverer copies a blank analytic function into the Calculation field. The blank analytic function contains expr prompts for missing values that tell you what information you might need to provide. The expr prompts are designed to cover most types of usage, and should be used only as a guide. In other words, you will not always need to use every expr prompt to define an analytic function.
For example, when you select the RANK analytic function in the "New Calculation dialog", Discoverer types the following text into the Calculation field:
RANK() OVER (PARTITION BY expr1 ORDER BY expr2)
Although you can define a complex function using both expressions (expr1 and expr2), you can often define a simple function using only the ORDER BY expression; for example:
RANK() OVER(ORDER BY 'Sales')
This example ranks sales figures (defined in the 'Sales' item).
Notes
By default, results data is sorted in ascending order (ASC), nulls first (NULLS FIRST).
For more information about entering analytic functions, see "More about analytic functions expressions".
This section contains examples of rank functions.
Ranking functions compute the ranked list position of an item with respect to other items in an ordered list.
Examples:
Hint: You can also use analytic function templates to quickly and easily create calculations based on ranking functions (for more information, see "How to create a new calculation using an analytic function template".
This example calculates the ranked list position of a set of sales figures.
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Region, City, Sales SUM |
Sort Order | Rank Sales, Year, Region |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 |
Calculation Name | Rank Sales |
Calculation | RANK() OVER(ORDER BY Sales SUM DESC) |
Figure A-4 Worksheet containing the Rank Sales calculation
The worksheet shows the ranked list position of sales figures for cities in the year 2000.
Notes
By default, ranked results data is sorted in ascending order (ASC), nulls first (NULLS FIRST). The additional DESC parameter sorts the results in descending order, which ranks the highest value with the Rank 1.
This example calculates the ranked list position of a set of sales figures within each region.
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Region, City, Sales SUM |
Sort Order | Year, Region |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 |
Calculation Name | Rank sales within Region |
Calculation | RANK() OVER(PARTITION BY Year, Region ORDER BY Sales SUM DESC) |
Figure A-5 Worksheet containing the Rank sales within Region calculation
The worksheet shows the ranked list position of sales figures for cities grouped by region within year.
Notes
By default, ranked results data is sorted in ascending order (ASC), nulls first (NULLS FIRST). The additional DESC parameter sorts the results in descending order, which ranks the highest value with the Rank 1.
This example calculates the ranked list position of a set of sales figures and displays the top three selling cities.
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Region, City, Sales SUM |
Sort Order | Year, Region |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 Rank Top <= 3 |
Calculation Name | Rank Top |
Calculation | RANK() OVER(PARTITION BY Year, Region ORDER BY Sales SUM DESC) |
Figure A-6 Worksheet containing the Rank Top calculation used in a condition
The worksheet shows a ranked list of the top three highest sales figures for each region within year.
Notes
By default, ranked results data is sorted in ascending order (ASC), nulls first (NULLS FIRST). The additional DESC parameter sorts the results in descending order, which ranks the highest value with the Rank 1.
Hint: To quickly filter the list to the first, second, or third ranked cities, pivot the Rank Top item to the page axis.
This example calculates the ranked list position of a set of sales figures and displays the top three and bottom three performing cities per region.
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Region, City, Sales SUM |
Sort Order | Year, Region, Rank Top |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 Rank Top <= 3 OR Rank Bottom <= 3 |
Calculation Name | Rank Top |
Calculation | RANK() OVER(PARTITION BY Year, Region ORDER BY Sales SUM DESC) |
Additional Calculations Required | Rank Bottom = RANK() OVER(PARTITION BY Year, Region ORDER BY Sales SUM ASC) |
Figure A-7 Worksheet containing the Rank Top calculation used in a condition
The worksheet shows a ranked list of the three highest and three lowest sales figures for each region within year.
Notes
This analysis involves three steps:
Assign ranks to Cities on Sales SUM in descending order, as Rank Top.
Assign ranks to Cities on Sales SUM in ascending order, as Rank Bottom.
Displaying only Rank Top, filter the data using a Condition to return only the top three and bottom three ranked Cities.
In the example, in the 'Central' Region, the top three cities are ranked 1, 2, and 3; the bottom three cities are ranked 5, 6, and 7. In the 'East' Region, the top three cities are ranked 1, 2, and 3; the bottom three cities are ranked 6, 7, and 8, and so on.
This section contains examples of banding functions.
Banding is a type of ranking that divides a list of values in a partition into a specified number of groups called bands (also known as buckets) and assigns each value to a band.
Examples:
Hint: You can also use analytic function templates to quickly and easily create calculations based on banding functions (for more information, see "How to create a new calculation using an analytic function template".
Two common types of banding are:
Banding by value - this divides values into groups according to their value (also known as equi-width bands). This type of analysis is also known as frequency distribution.
Here, the function typically takes the largest value minus the lowest value, and divides the result by the number of bands required. This value defines the range of each Band.
Values are then assigned to bands according to which range they fall into. Therefore, the number of values in each Band might differ. For example, if we have 100 values and divide them into four equi-width bands, each band might contain different numbers of values.
Use the GREATEST function or the CASE function to produce equi-width bands based on value.
Note: If that database version that you are using does not support IF or CASE statements, you can use DECODE instead. Alternatively, you can create a PL/SQL function in the database.
Hint: If you are using an Oracle9i database, use the WIDTH_BUCKET function to produce equi-width bands (see "Example: Producing equi-width bands using WIDTH_BUCKET").
Banding by rank - this divides values into groups according to their rank (also known as equi-height bands). This type of analysis is also known as percentile analysis (e.g. 4 bands give quartiles).
Here, the function divides the number of values in the partition by the number of bands, which gives the number of values in each band. An equal number of values are then placed in each band. For example, if we have 100 values and divide them into four equi-height bands, each band contains 25 values.
Use the NTILE function to produce equi-height bands based on rank.
This example divides sales figures into bands according to their value (e.g. for frequency analysis). For more information, see "Example: Banding by value (2)".
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Region, City, Sales SUM |
Sort Order | Year, Region |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 Region = Central |
Calculation Name | Sales Bands |
Calculation | GREATEST(1,4-FLOOR((Sales SUM-Min Sales for Region)/GREATEST(1,FLOOR((Max Sales for Region-Min Sales for Region+1)/4)))) |
Additional Calculations Required | Max Sales for Region = MAX(Sales SUM) OVER(PARTITION BY Region,Year)
Min Sales for Region = MIN(Sales SUM) OVER(PARTITION BY Region,Year) |
Figure A-10 Worksheet containing the Sales Bands calculation
The worksheet shows equi-width bands for sales figures for cities in the central region within year.
Notes
Using the Central Region and Year 2000 as an example, this function takes the largest value (45,758) minus the smallest value (7,749) and divides it by four ((45,758-7,749)/4), giving four equal Bands of 9,502.25. This gives four bands with the following ranges:
Band 1 - 36,255.75 to 45,758
Band 2 - 26,753.5 to 36,255.75
Band 3 - 17,251.25 to 26,753.5
Band 4 - 7,749 to 17,251.25
Each value is placed in one of the four Bands depending on which range the Sales SUM value falls into.
The FLOOR function returns the largest integer equal to or less than n. For example, in Dallas, the expression FLOOR(Sales SUM-Min Sales for Region) returns the smallest integer value from 7,749 minus 7,749, which returns 0. When used in conjunction with the GREATEST function (see calculation above), the expression GREATEST(1,4-FLOOR((Sales SUM-Min Sales for Region) returns the largest value from either 1 or, 4 minus the smallest integer value from 7,749 minus 7,749 (4 minus 0 equals 4). In other words, the expression returns the value 4.
If you are using an Oracle9i database, use the WIDTH_BUCKET function to produce equi-width bands (see "Example: Producing equi-width bands using WIDTH_BUCKET").
This example divides sales figures into bands according to their value. The example creates the same results as the example in "Example: Banding by value (1)", except that it uses a CASE statement rather than the GREATEST function (e.g. for frequency analysis).
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Region, City, Sales SUM |
Sort Order | Year, Region |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 Region = Central |
Calculation Name | Sales Bands 2 |
Calculation | CASE WHEN Sales SUM < Q1 THEN 4 WHEN Sales SUM < Q2 THEN 3 WHEN Sales SUM < Q3 THEN 2 WHEN Sales SUM >= Q3 THEN 1 END |
Additional Calculations Required | MAX Sales = MAX(Sales SUM) OVER(PARTITION BY Year)
MIN Sales = MIN(Sales SUM) OVER(PARTITION BY Year) Range = (MAX Sales - MIN Sales)/4 Q1 = MIN Sales + Range Q2 = MIN Sales + (Range*2) Q3 = MAX Sales - Range |
Figure A-11 Worksheet containing the Sales Bands 2 calculation
The worksheet shows equi-width bands for sales figures for cities in the central region within year.
Notes
This function uses a series of IF statements in the form of a CASE function to assign sales figures into bands (see Band ranges below).
If you are using an Oracle9i database, use the WIDTH_BUCKET function to produce equi-width bands (see "Example: Producing equi-width bands using WIDTH_BUCKET").
This example divides sales figures into two bands according to their rank (e.g. for percentile analysis).
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Region, City, Sales SUM |
Sort Order | Year, Region |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 Region = Central |
Calculation Name | Sales Bands 3 |
Calculation | NTILE(2) OVER(PARTITION BY Year, Region ORDER BY Sales SUM DESC) |
Figure A-12 Worksheet containing the Sales Bands 3 calculation
The worksheet shows equi-height bands for sales figures for cities in the central region within year.
Notes
Using the Central Region and Year 2000 as an example, this function takes the number of values (which is six) and divides it by two, giving three values per Band. It then takes the list of values ordered by Sales SUM and places values one, two, and three in band 1, values four, five, and six in band 2.
This section contains examples of windowing functions.
Windowing functions are used to compute aggregates using values from other rows. For example, cumulative, moving, and centered aggregates.
Note: For more information about using row-based and time-based intervals, see "Examples of using row-based and time-based intervals".
Examples:
Hint: You can also use analytic function templates to quickly and easily create calculations based on windowing functions (for more information, see "How to create a new calculation using an analytic function template".
Two common types of windowing are:
Windowing with time-based intervals - here, a time-based interval is based on a value relative to an existing value. For example, three months preceding a date value.
Note: Time-based intervals are also known as logical offsets.
Time-based intervals are useful when data has missing rows. For example, you might not have seven rows for sales figures for seven days in a week, but you still want to find a weekly average.
For example, if we have a list of monthly sales figures, a logical window might compute a moving average of the previous three months (inclusive of the current month). When calculating the average, the calculation assumes a NULL value for months missing from the list. In the example below, the three-month moving average for November assumes NULL values for the missing months September and October.
Windowing with row-based intervals - here, a row-based interval is based on a value that is a specified number of rows from an existing value. For example, three rows from the current item.
Note: Time-based intervals are also known as physical offsets.
Row-based intervals are useful when data does not have missing rows. For example, if you know that you have seven rows for sales figures for seven days in a week, you can calculate a weekly average by starting the interval six rows before the current row (inclusive).
For example, if we have a list of monthly sales figures, a physical window might compute a moving average of the previous three rows. When calculating the average, the calculation ignores months missing from the list. In the example below, the three-month moving average for November uses June, July, and November in the calculation.
Note: For more information about using row-based and time-based intervals, see "Examples of using row-based and time-based intervals".
This example uses a time-based interval to calculate a moving three month sales average.
Note: Moving averages are also known as rolling averages.
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Calendar Month, Sales SUM |
Sort Order | Year, Month |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 Region = Central |
Calculation Name | Moving Average |
Calculation | AVG(Sales SUM) OVER(ORDER BY "Calendar Month" RANGE INTERVAL '2' MONTH PRECEDING) |
Figure A-15 Worksheet containing the Moving Average calculation
The worksheet shows a moving three month average for sales figures for months in the year 2000.
Notes
Note that you define the RANGE INTERVAL as '2', not '3', even though you want a three month window. This is because the window expression implicitly includes the current row. Therefore, in this example, the INTERVAL '2' plus the current row gives a total of three months (2 + current row = 3).
This example uses a row-based interval to calculate the cumulative value of sales.
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Region, City, Sales SUM |
Sort Order | Year, Region |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 Region = Central |
Calculation Name | Cumulative Total |
Calculation | SUM(Sales SUM) OVER(PARTITION BY "Calendar Year", Region ORDER BY Sales SUM ROWS UNBOUNDED PRECEDING) |
Figure A-16 Worksheet containing the Cumulative Total calculation
The worksheet shows a cumulative total for sales figures for cities in the central region.
This example uses a time-based interval to calculate sales figures for previous years. This enables you to compare sales figures over different years, or compare previous years' sales figures with other values, such as spending in previous years.
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Sales SUM |
Sort Order | N/A |
Conditions | N/A |
Calculation Name | Sales Last Year |
Calculation | SUM(Sales SUM) OVER(ORDER BY "Calendar Year" RANGE BETWEEN INTERVAL '1' YEAR PRECEDING AND INTERVAL '1' YEAR PRECEDING) |
Figure A-17 Worksheet containing the Sales Last Year calculation
For each row, the worksheet shows the sales total for the previous year.
Notes
In the example above, the Sales Last Year value for 1998 is NULL because the database does not contain information for 1997.
You can also use LAG/LEAD functions to compare values across time (see "LAG/LEAD function examples".
This section contains examples of reporting functions.
Reporting functions are used to compute aggregates.
Examples:
Hint: You can also use analytic function templates to quickly and easily create calculations based on reporting functions (for more information, see "How to create a new calculation using an analytic function template".
This example calculates annual sales.
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Region, City, Sales SUM |
Sort Order | Year, Region |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 |
Calculation Name | Annual Sales |
Calculation | SUM(Sales SUM) OVER() |
Figure A-18 Worksheet containing the Annual Sales calculation
The worksheet shows the annual sales value for cities in the year 2000.
Notes
On table worksheets, the calculation displays the returned value for each row in the worksheet. To return a single value, move the calculation into the Page Items area (for more details see Additional Notes in "Example: Find the largest sales transactions in area with most sales trans'".
This example calculates the total annual sales by region.
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Region, City, Sales SUM |
Sort Order | Year, Region |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 |
Calculation Name | Annual Sales by Region |
Calculation | SUM(Sales SUM) OVER(PARTITION BY Year, Region ORDER BY Year, Region) |
.
Figure A-19 Worksheet containing the Annual Sales by Region calculation
The worksheet shows the annual sales total for cities, grouped by region within year.
This example calculates the percentage of annual sales per region for each city in each year.
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Region, City, Sales SUM |
Sort Order | Year, Region, % of Annual Sales |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 |
Calculation Name | % of Annual Sales |
Calculation | Sales SUM*100/Annual Sales by Region |
Additional Calculations Required | Annual Sales by Region = SUM(Sales SUM) OVER(PARTITION BY Year, Region ORDER BY Year, Region) |
Figure A-20 Worksheet containing the % of Annual Sales calculation
The worksheet shows sales as a percentage of annual sales, grouped by region within year.
This example calculates city sales as a percentage of total sales.
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Region, City, Sales SUM |
Sort Order | Year, Region, % of Annual Sales |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 Region = Central |
Calculation Name | % of total Sales |
Calculation | RATIO_TO_REPORT(Sales SUM) OVER()*100 |
Figure A-21 Worksheet containing the % of Annual Sales calculation
The worksheet shows the sales value for cities as a percentage of total sales.
Notes
The function RATIO_TO_REPORT computes the ratio of a value to the sum of a set of values.
This section contains examples of LAG and LEAD functions.
LAG and LEAD functions are typically used to compare values in different time periods. For example, compare sales figures in 2000 with sales figures in 2001.
LAG - provides access to more than one row of a table at the same time without a self-join.
LEAD - provides access to a row at a given offset after the current position.
You can also use windowing functions to compare values over time (see "Example: Compare sales figures across time using windowing".
Hint: You can also use analytic function templates to quickly and easily create calculations based on LAG/LEAD functions (for more information, see "How to create a new calculation using an analytic function template".
Examples:
In this example, you want to compare monthly sales figures with sales figures for the same month in the previous year. For example, to look at how January 1999 sales compare with January 1998 sales.
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Calendar Quarter, Calendar Month, Sales SUM |
Sort Order | Calendar Year, Calendar Quarter, Calendar Month |
Conditions | Department = Video Sale OR Department = Video Rental
|
Calculation Name | Previous Year |
Calculation | LAG(Sales SUM,1) OVER(PARTITION BY "Calendar Month" ORDER BY "Calendar Year") |
Figure A-22 Worksheet containing the Previous Year calculation partitioned by Calendar Month
The worksheet shows contains the calculation Previous Year, which shows for each Sales SUM amount the sales amount for one year previously. For example, the Previous Year value for January 1999 is $50889, which is the Sales SUM value for January 1998.
Notes
Because there are no comparative figures for 1998, the Previous Year values for 1998 are blank.
Notice that the value '1' in the LAG(Sales SUM,1) clause calculates the value from one year previously. For example, if you changed this value to '2', you would calculate the value from two years previously.
Notice that the calculation includes the clause 'PARTITION BY Calendar Month', which gives you a value for each combination of Calendar Year (in the ORDER BY clause) and Calendar Month (in the PARTITION BY clause). In other words, the Previous Year value for February 1999 is the Sales SUM value for February 1998. If you removed this clause, you would calculate the value for the previous month (see example below). In other words, the Previous Year value for February 1999 would be the Sales SUM value for January 1999.
Figure A-23 Worksheet containing the Previous Year calculation with the partition removed
In this example, you want to calculate the percentage growth of sales across years by comparing the sales figures with sales figures for the same month in the previous year. You will do this using the comparative sales figures from example "Example: Compare sales figures across time using LAG/LEAD".
Worksheet option | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Calendar Quarter, Calendar Month, Sales SUM |
Sort Order | Calendar Year, Calendar Quarter, Calendar Month |
Conditions | Department = Video Sale OR Department = Video Rental
|
Calculation Name | Growth % |
Calculation | (Sales SUM-"Previous year")*100/"Previous year" |
Additional Calculations Required | LAG(Sales SUM,1) OVER(PARTITION BY "Calendar Month" ORDER BY "Calendar Year") |
Figure A-24 Worksheet containing the Growth calculation
The worksheet shows contains the calculation Growth %, which shows for each month the percentage increase in sales since the previous year. For example, the Growth % value for January 1999 is 30.40% (i.e. from $50889 to $67887).
Notes
Because there are no comparative figures for 1998, the Growth values for 1998 are blank.
The calculation subtracts the Previous Year value from the Sales SUM value, then multiplies the result by the Sales SUM value divided by the Previous Year value. For example, if sales have risen from 75 to 100, the calculation becomes 25 * 1.33, giving 33.33% increase.
For more information about the calculation Previous Year, see "Example: Compare sales figures across time using LAG/LEAD".
In this example, you want to create a ranked list of sales growth, to show which months show the highest year on year increase in sales.
You will do this using the comparative sales figures and growth figures from examples "Example: Compare sales figures across time using LAG/LEAD" and "Example: Calculate sales growth across time", and a RANK function.
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Calendar Quarter, Calendar Month, Sales SUM |
Sort Order | Calendar Year, Calendar Quarter, Calendar Month |
Conditions | Department = Video Sale OR Department = Video Rental
|
Calculation Name | Rank Growth |
Calculation | RANK() OVER(PARTITION BY "Calendar Year" ORDER BY "Growth %" DESC) |
Additional Calculations Required | Previous Year = LAG(Sales SUM,1) OVER(PARTITION BY "Calendar Month" ORDER BY "Calendar Year")
Growth % = (Sales SUM-"Previous year")*100/"Previous year" |
Figure A-25 Worksheet containing the Rank Growth calculation
The worksheet shows the ranked list position of sales growth. For example, the Rank Growth value for January 1999 is 3, which means that January was the third best performing month (i.e. the sales growth for the month of January between 1998 and 1999 was the third highest in the ranked list).
Notes
Because there are no comparative figures for 1998, the Rank Growth values for 1998 are 1.
Looking at the example above, you can see that sales growth was highest between October 1998 and October 1999 (40.02%)
For more information about the calculation Previous Year, see "Example: Compare sales figures across time using LAG/LEAD".
For more information about the calculation Growth, see "Example: Calculate sales growth across time".
This section contains examples of statistical functions.
Statistics functions are used to compute covariance, correlation, and linear regression statistics. Each function operates on an unordered set. They also can be used as windowing and reporting functions.
Examples:
This example computes an ordinary least-squares regression line that expresses the Profit SUM per month as a linear function of its Sales SUM. The following functions are used:
SLOPE - slope of determination of the regression line
INTERCEPT - intercept of determination of the regression line
REGR_R2 - coefficient of determination of the regression line
REGR_COUNT - number of items
REGR_AVGX - average sales
REGR_AVGY - average profit
Worksheet options | Set to: |
---|---|
Items | Video Analysis Information: Calendar Year, Calendar Month, Sales SUM, Profit SUM |
Sort Order | Calendar Year |
Conditions | Department = Video Sale OR Department = Video Rental
Calendar Year = 2000 |
Calculations | Slope = REGR_SLOPE(Profit SUM,Sales SUM) OVER(PARTITION BY Calendar Year ORDER BY Profit SUM)
Intercept = REGR_INTERCEPT(Profit SUM,Sales SUM) OVER(PARTITION BY Calendar Year ORDER BY Profit SUM) Coefficient = REGR_R2(Profit SUM,Sales SUM) OVER(PARTITION BY Calendar Year ORDER BY Profit SUM) Count = REGR_COUNT(Profit SUM,Sales SUM) OVER(PARTITION BY Calendar Year ORDER BY Profit SUM) Average = REGR_AVGX(Profit SUM,Sales SUM) OVER(PARTITION BY Calendar Year ORDER BY Profit SUM) Average 2 = REGR_AVGY(Profit SUM,Sales SUM) OVER(PARTITION BY Calendar Year ORDER BY Profit SUM) |
Figure A-26 Worksheet containing the statistical calculations
The worksheet shows for each month the slope, intercept, coefficient, count, and average values.
Notes
For more information about regression analysis, refer to Oracle8i SQL Reference and Oracle8i Data Warehousing Guide.
When you select an analytic function in the "New Calculation dialog", Discoverer types generic text into the Calculation field to help you define the function. This generic text includes:
OVER (PARTITION BY expr1 ORDER BY expr2)
The expressions are used as follows.
OVER - indicates that the function operates on a query result set, after the other query clauses have been applied (e.g. FROM, WHERE, HAVING).
PARTITION BY - partition (or group) the query results set (e.g. PARTITION BY 'Region').
ORDER BY - specify how the results set is logically ordered (e.g. ORDER BY 'Sales SUM').
For more information about Oracle expressions, see "How to find more information about Oracle analytic functions".
When you use analytic functions in conditions, the way that you combine them with non-analytic functions affects the Discoverer data returned by the query. The following sequencing rules apply (for more information, see "Examples of sequencing"):
Where conditions contain only non-analytic functions, these are applied before conditions that contain analytic functions. In the example below, the 'Region = 'Central' condition is applied first, then the rank is computed, then the 'Rank <= 3' condition is applied (which contains an analytic function).
Where conditions contain a combination of non-analytic functions and analytic functions, the analytic functions are applied before the non-analytic functions. In the example below, the rank is evaluated, then the 'Rank <= 3' condition is applied, then the 'Region = 'Central' condition is applied.
To illustrate how sequencing affects the Discoverer data returned by a query, consider the following two examples:
In the first scenario, we apply two single conditions: Region = 'Central', and Rank <= 3 (where Rank is an analytic function).
The Region = 'Central' condition is applied first, then Rank <= 3.
Therefore, only sales figures for the Central region that have a ranking of three or less are included in the Results Set.
In the second scenario, we apply a multiple condition: Region = 'Central' AND Rank <= 3 (where Rank is an analytic function).
The Rank <= 3 condition is applied first, then the Region = 'Central' condition.
Therefore, only figures in the Central region that have an overall ranking of three or less are included in the Results Set.