REGR_ (Linear Regression) Functions

The linear regression functions are:

• REGR_SLOPE
• REGR_INTERCEPT
• REGR_COUNT
• REGR_R2
• REGR_AVGX
• REGR_AVGY
• REGR_SXX
• REGR_SYY
• REGR_SXY

Syntax

linear_regr::= Text description of linear_regr

 See Also: "Analytic Functions" for information on syntax, semantics, and restrictions

Purpose

The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. You can use them as both aggregate and analytic functions.

 See Also: "Aggregate Functions" "About SQL Expressions" for information on valid forms of expr

Oracle applies the function to the set of (expr1, expr2) pairs after eliminating all pairs for which either expr1 or expr2 is null. Oracle computes all the regression functions simultaneously during a single pass through the data.

expr1 is interpreted as a value of the dependent variable (a "y value"), and expr2 is interpreted as a value of the independent variable (an "x value"). Both expressions must be numbers.

• REGR_SLOPE returns the slope of the line. The return value is a number and can be null. After the elimination of null (expr1, expr2) pairs, it makes the following computation:
COVAR_POP(expr1, expr2) / VAR_POP(expr2)

• REGR_INTERCEPT returns the y-intercept of the regression line. The return value is a number and can be null. After the elimination of null (expr1, expr2) pairs, it makes the following computation:
AVG(expr1) - REGR_SLOPE(expr1, expr2) * AVG(expr2)

• REGR_COUNT returns an integer that is the number of non-null number pairs used to fit the regression line.
• REGR_R2 returns the coefficient of determination (also called "R-squared" or "goodness of fit") for the regression. The return value is a number and can be null. VAR_POP(expr1) and VAR_POP(expr2) are evaluated after the elimination of null pairs. The return values are:
NULL if VAR_POP(expr2)  = 0

1 if VAR_POP(expr1)  = 0 and
VAR_POP(expr2) != 0

POWER(CORR(expr1,expr),2) if VAR_POP(expr1)  > 0 and
VAR_POP(expr2  != 0

All of the remaining regression functions return a number and can be null:

• REGR_AVGX evaluates the average of the independent variable (expr2) of the regression line. It makes the following computation after the elimination of null (expr1, expr2) pairs:
AVG(expr2)

• REGR_AVGY evaluates the average of the dependent variable (expr1) of the regression line. It makes the following computation after the elimination of null (expr1, expr2) pairs:
AVG(expr1)

REGR_SXY, REGR_SXX, REGR_SYY are auxiliary functions that are used to compute various diagnostic statistics.

• REGR_SXX makes the following computation after the elimination of null (expr1, expr2) pairs:
REGR_COUNT(expr1, expr2) * VAR_POP(expr2)

• REGR_SYY makes the following computation after the elimination of null (expr1, expr2) pairs:
REGR_COUNT(expr1, expr2) * VAR_POP(expr1)

• REGR_SXY makes the following computation after the elimination of null (expr1, expr2) pairs:
REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2)

The following examples are based on the sample tables sh.sales and sh.products.

General Linear Regression Example

The following example provides a comparison of the various linear regression functions:

SELECT
s.channel_id,
REGR_SLOPE(s.quantity_sold, p.prod_list_price)  SLOPE ,
REGR_INTERCEPT(s.quantity_sold, p.prod_list_price)  INTCPT ,
REGR_R2(s.quantity_sold, p.prod_list_price)  RSQR ,
REGR_COUNT(s.quantity_sold, p.prod_list_price)  COUNT ,
REGR_AVGX(s.quantity_sold, p.prod_list_price)  AVGLISTP ,
REGR_AVGY(s.quantity_sold, p.prod_list_price)  AVGQSOLD
FROM  sales s, products p
WHERE s.prod_id=p.prod_id AND
p.prod_category='Men'  AND
s.time_id=to_DATE('10-OCT-2000')
GROUP BY s.channel_id
;

C      SLOPE     INTCPT       RSQR      COUNT   AVGLISTP   AVGQSOLD
- ---------- ---------- ---------- ---------- ---------- ----------
C -.03529838 16.4548382 .217277422         17 87.8764706 13.3529412
I  -.0108044 13.3082392 .028398018         43  116.77907 12.0465116
P -.01729665 11.3634927 .026191191         33 80.5818182 9.96969697
S -.01277499  13.488506 .000473089         71  52.571831 12.8169014
T -.01026734 5.01019929 .064283727         21       75.2 4.23809524

REGR_SLOPE and REGR_INTERCEPT Examples

The following example determines the slope and intercept of the regression line for the amount of sales and sale profits for each fiscal year:

SELECT t.fiscal_year,
REGR_SLOPE(s.amount_sold, s.quantity_sold) "Slope",
REGR_INTERCEPT(s.amount_sold, s.quantity_sold) "Intercept"
FROM sales s, times t
WHERE s.time_id = t.time_id
GROUP BY t.fiscal_year;

FISCAL_YEAR      Slope  Intercept
----------- ---------- ----------
1998 49.3934247 71.6015479
1999 49.3443482 70.1502601
2000 49.2262135 75.0287476

The following example determines the cumulative slope and cumulative intercept of the regression line for the amount of and quantity of sales for two products (270 and 260) for weekend transactions (day_number_in_week = 6 or 7) during the last three weeks (fiscal_week_number of 50, 51, or 52) of 1998:

SELECT t.fiscal_month_number "Month", t.day_number_in_month "Day",
REGR_SLOPE(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month) AS CUM_SLOPE,
REGR_INTERCEPT(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month) AS CUM_ICPT
FROM sales s, times t
WHERE s.time_id = t.time_id
AND s.prod_id IN (270, 260)
AND t.fiscal_year=1998
AND t.fiscal_week_number IN (50, 51, 52)
AND t.day_number_in_week IN (6,7)
ORDER BY t.fiscal_month_desc, t.day_number_in_month;

Month        Day  CUM_SLOPE   CUM_ICPT
---------- ---------- ---------- ----------
12         12        -68       1872
12         12        -68       1872
12         13 -20.244898 1254.36735
12         13 -20.244898 1254.36735
12         19 -18.826087       1287
12         20 62.4561404  125.28655
12         20 62.4561404  125.28655
12         20 62.4561404  125.28655
12         20 62.4561404  125.28655
12         26 67.2658228 58.9712313
12         26 67.2658228 58.9712313
12         27 37.5245541 284.958221
12         27 37.5245541 284.958221
12         27 37.5245541 284.958221

REGR_COUNT Examples

The following example returns the number of customers in the customers table (out of a total of 319) who have account managers.

SELECT REGR_COUNT(customer_id, account_mgr_id) FROM customers;

REGR_COUNT(CUSTOMER_ID,ACCOUNT_MGR_ID)
--------------------------------------
231

The following example computes the cumulative number of transactions for each day in April of 1998:

SELECT UNIQUE t.day_number_in_month,
REGR_COUNT(s.amount_sold, s.quantity_sold)
OVER (PARTITION BY t.fiscal_month_number
ORDER BY t.day_number_in_month) "Regr_Count"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND t.fiscal_year = 1998 AND t.fiscal_month_number = 4;

DAY_NUMBER_IN_MONTH Regr_Count
------------------- ----------
1        825
2       1650
3       2475
4       3300
.
.
.
26      21450
30      22200

REGR_R2 Examples

The following example computes the coefficient of determination of the regression line for amount of sales greater than 5000 and quantity sold:

SELECT REGR_R2(amount_sold, quantity_sold) FROM sales
WHERE amount_sold > 5000;

REGR_R2(AMOUNT_SOLD,QUANTITY_SOLD)
----------------------------------
.024087453

The following example computes the cumulative coefficient of determination of the regression line for monthly sales amounts and quantities for each month during 1998:

SELECT t.fiscal_month_number,
REGR_R2(SUM(s.amount_sold), SUM(s.quantity_sold))
OVER (ORDER BY t.fiscal_month_number) "Regr_R2"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND t.fiscal_year = 1998
GROUP BY t.fiscal_month_number
ORDER BY t.fiscal_month_number;

FISCAL_MONTH_NUMBER    Regr_R2
------------------- ----------
1
2          1
3 .927372984
4 .807019972
5 .932745567
6  .94682861
7 .965342011
8 .955768075
9 .959542618
10 .938618575
11 .880931415
12 .882769189

REGR_AVGY and REGR_AVGX Examples

The following example calculates the regression average for the amount and quantity of sales for each year:

SELECT t.fiscal_year,
REGR_AVGY(s.amount_sold, s.quantity_sold) "Regr_AvgY",
REGR_AVGX(s.amount_sold, s.quantity_sold) "Regr_AvgX"
FROM sales s, times t
WHERE s.time_id = t.time_id
GROUP BY t.fiscal_year;

FISCAL_YEAR  Regr_AvgY  Regr_AvgX
----------- ---------- ----------
1998 716.602044 13.0584283
1999 714.910831 13.0665536
2000 717.331304 13.0479781

The following example calculates the cumulative averages for the amount and quantity of sales profits for product 260 during the last two weeks of December 1998:

SELECT t.day_number_in_month,
REGR_AVGY(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month)
"Regr_AvgY",
REGR_AVGX(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month)
"Regr_AvgX"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND s.prod_id = 260
AND t.fiscal_month_desc = '1998-12'
AND t.fiscal_week_number IN (51, 52)
ORDER BY t.day_number_in_month;

DAY_NUMBER_IN_MONTH  Regr_AvgY  Regr_AvgX
------------------- ---------- ----------
14        882       24.5
14        882       24.5
15        801      22.25
15        801      22.25
16      777.6       21.6
18 642.857143 17.8571429
18 642.857143 17.8571429
20      589.5     16.375
21        544 15.1111111
22 592.363636 16.4545455
22 592.363636 16.4545455
24 553.846154 15.3846154
24 553.846154 15.3846154
26        522       14.5
27      578.4 16.0666667

REGR_SXY, REGR_SXX, and REGR_SYY Examples

The following example computes the REGR_SXY, REGR_SXX, and REGR_SYY values for the regression analysis of amount and quantity of sales for each year in the sample sh.sales table:

SELECT t.fiscal_year,
REGR_SXY(s.amount_sold, s.quantity_sold) "Regr_sxy",
REGR_SYY(s.amount_sold, s.quantity_sold) "Regr_syy",
REGR_SXX(s.amount_sold, s.quantity_sold) "Regr_sxx"
FROM sales s, times t
WHERE s.time_id = t.time_id
GROUP BY t.fiscal_year;

FISCAL_YEAR   Regr_sxy   Regr_syy   Regr_sxx
----------- ---------- ---------- ----------
1998 1620591607 2.3328E+11 32809865.2
1999 1955866724 2.7695E+11 39637097.2
2000 2127877398 3.0630E+11 43226509.7

The following example computes the cumulative REGR_SXY, REGR_SXX, and REGR_SYY statistics for amount and quantity of weekend sales for products 270 and 260 for each year-month value in 1998:

SELECT t.day_number_in_month,
REGR_SXY(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_sxy",
REGR_SYY(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_syy",
REGR_SXX(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_sxx"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND prod_id IN (270, 260)
AND t.fiscal_month_desc = '1998-02'
AND t.day_number_in_week IN (6,7)
ORDER BY t.day_number_in_month;

DAY_NUMBER_IN_MONTH   Regr_sxy   Regr_syy   Regr_sxx
------------------- ---------- ---------- ----------
1  130973783 1.8916E+10 2577797.94
.
.
.
30  130973783 1.8916E+10 2577797.94