Functions, 98 of 166

## REGR_ (linear regression) functions

The linear regression functions are:

• `REGR_SLOPE`

• `REGR_INTERCEPT`

• `REGR_COUNT`

• `REGR_R2`

• `REGR_AVGX`

• `REGR_AVGY`

• `REGR_SXX`

• `REGR_SYY`

• `REGR_SXY`

#### Syntax

linear_regr::= Text description of `linear_regr`

#### Purpose

The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. You can use them as both aggregate and analytic functions.

 See Also: "Aggregate Functions" "About SQL Expressions" for information on valid forms of `expr`

Oracle applies the function to the set of (expr1, expr2) pairs after eliminating all pairs for which either expr1 or expr2 is null. Oracle computes all the regression functions simultaneously during a single pass through the data.

expr1 is interpreted as a value of the dependent variable (a "y value"), and expr2 is interpreted as a value of the independent variable (an "x value"). Both expressions must be numbers.

• `REGR_SLOPE` returns the slope of the line. The return value is a number and can be null. After the elimination of null (expr1, expr2) pairs, it makes the following computation:

```COVAR_POP(expr1, expr2) / VAR_POP(expr2)

```
• `REGR_INTERCEPT` returns the y-intercept of the regression line. The return value is a number and can be null. After the elimination of null (expr1, expr2) pairs, it makes the following computation:

```AVG(expr1) - REGR_SLOPE(expr1, expr2) * AVG(expr2)

```
• `REGR_COUNT` returns an integer that is the number of non-null number pairs used to fit the regression line.

• `REGR_R2` returns the coefficient of determination (also called "R-squared" or "goodness of fit") for the regression. The return value is a number and can be null. `VAR_POP`(expr1) and `VAR_POP`(expr2) are evaluated after the elimination of null pairs. The return values are:

```                     NULL if VAR_POP(expr2)  = 0

1 if VAR_POP(expr1)  = 0 and
VAR_POP(expr2) != 0

POWER(CORR(expr1,expr),2) if VAR_POP(expr1)  > 0 and
VAR_POP(expr2  != 0

```

All of the remaining regression functions return a number and can be null:

• `REGR_AVGX` evaluates the average of the independent variable (expr2) of the regression line. It makes the following computation after the elimination of null (expr1, expr2) pairs:

```AVG(expr2)

```
• `REGR_AVGY` evaluates the average of the dependent variable (expr1) of the regression line. It makes the following computation after the elimination of null (expr1, expr2) pairs:

```AVG(expr1)

```

`REGR_SXY`, `REGR_SXX`, `REGR_SYY` are auxiliary functions that are used to compute various diagnostic statistics.

• `REGR_SXX` makes the following computation after the elimination of null (expr1, expr2) pairs:

```REGR_COUNT(expr1, expr2) * VAR_POP(expr2)

```
• `REGR_SYY` makes the following computation after the elimination of null (expr1, expr2) pairs:

```REGR_COUNT(expr1, expr2) * VAR_POP(expr1)

```
• `REGR_SXY` makes the following computation after the elimination of null (expr1, expr2) pairs:

```REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2)

```

The following examples are based on the sample tables `sh.sales` and `sh.products`.

#### General Linear Regression Example

The following example provides a comparison of the various linear regression functions:

```SELECT
s.channel_id,
REGR_SLOPE(s.quantity_sold, p.prod_list_price)  SLOPE ,
REGR_INTERCEPT(s.quantity_sold, p.prod_list_price)  INTCPT ,
REGR_R2(s.quantity_sold, p.prod_list_price)  RSQR ,
REGR_COUNT(s.quantity_sold, p.prod_list_price)  COUNT ,
REGR_AVGX(s.quantity_sold, p.prod_list_price)  AVGLISTP ,
REGR_AVGY(s.quantity_sold, p.prod_list_price)  AVGQSOLD
FROM  sales s, products p
WHERE s.prod_id=p.prod_id AND
p.prod_category='Men'  AND
s.time_id=to_DATE('10-OCT-2000')
GROUP BY s.channel_id
;

C      SLOPE     INTCPT       RSQR      COUNT   AVGLISTP   AVGQSOLD
- ---------- ---------- ---------- ---------- ---------- ----------
C -.03529838 16.4548382 .217277422         17 87.8764706 13.3529412
I  -.0108044 13.3082392 .028398018         43  116.77907 12.0465116
P -.01729665 11.3634927 .026191191         33 80.5818182 9.96969697
S -.01277499  13.488506 .000473089         71  52.571831 12.8169014
T -.01026734 5.01019929 .064283727         21       75.2 4.23809524
```

#### REGR_SLOPE and REGR_INTERCEPT Examples

The following example determines the slope and intercept of the regression line for the amount of sales and sale profits for each fiscal year.

```SELECT t.fiscal_year,
REGR_SLOPE(s.amount_sold, s.quantity_sold) "Slope",
REGR_INTERCEPT(s.amount_sold, s.quantity_sold) "Intercept"
FROM sales s, times t
WHERE s.time_id = t.time_id
GROUP BY t.fiscal_year;

FISCAL_YEAR      Slope  Intercept
----------- ---------- ----------
1998 54.7377214 45.3884971
1999 54.4868592 44.3616117
2000 55.4035957  44.717026

```

The following example determines the cumulative slope and cumulative intercept of the regression line for the amount of and quantity of sales for the fourth quarter of 1998:

```SELECT t.fiscal_month_number "Month", t.day_number_in_month "Day",
REGR_SLOPE(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month) AS CUM_SLOPE,
REGR_INTERCEPT(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month) AS CUM_ICPT
FROM sales s, times t
WHERE s.time_id = t.time_id AND t.fiscal_year=1998
AND t.fiscal_quarter_number = 4
ORDER BY t.fiscal_month_desc, t.day_number_in_month;
```
```
Month        Day  CUM_SLOPE   CUM_ICPT
---------- ---------- ---------- ----------
...
11         18  47.775583  40.028992
11         18  47.775583  40.028992
11         18  47.775583  40.028992
11         18  47.775583  40.028992
11         18  47.775583  40.028992
11         18  47.775583  40.028992
11         18  47.775583  40.028992
11         18  47.775583  40.028992
11         19 47.6878438 40.6296492
11         19 47.6878438 40.6296492
...
```

#### REGR_COUNT Examples

The following example returns the number of customers in the `customers` table (out of a total of 319) who have account managers.

```SELECT REGR_COUNT(customer_id, account_mgr_id) FROM customers;

REGR_COUNT(CUSTOMER_ID,ACCOUNT_MGR_ID)
--------------------------------------
231

```

The following example computes the cumulative number of transactions for each day in April of 1998:

```SELECT UNIQUE t.day_number_in_month,
REGR_COUNT(s.amount_sold, s.quantity_sold)
OVER (PARTITION BY t.fiscal_month_number
ORDER BY t.day_number_in_month) "Regr_Count"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND t.fiscal_year = 1998 AND t.fiscal_month_number = 4;

DAY_NUMBER_IN_MONTH Regr_Count
------------------- ----------
1        825
2       1650
3       2475
4       3300
.
.
.
26      21450
30      22200
```

#### REGR_R2 Examples

The following example computes the coefficient of determination of the regression line for amount of sales greater than 5000 and quantity sold:

```SELECT REGR_R2(amount_sold, quantity_sold) FROM sales
WHERE amount_sold > 5000;

REGR_R2(AMOUNT_SOLD,QUANTITY_SOLD)
----------------------------------
.005208421

```

The following example computes the cumulative coefficient of determination of the regression line for monthly sales amounts and quantities for each month during 1998:

```SELECT t.fiscal_month_number,
REGR_R2(SUM(s.amount_sold), SUM(s.quantity_sold))
OVER (ORDER BY t.fiscal_month_number) "Regr_R2"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND t.fiscal_year = 1998
GROUP BY t.fiscal_month_number
ORDER BY t.fiscal_month_number;

FISCAL_MONTH_NUMBER    Regr_R2
------------------- ----------
1
2          1
3 .763816809
4 .581171805
5 .854723188
6 .877870333
7 .907073344
8 .905223336
9 .912142295
10 .858149007
11  .74838262
12 .738707443
```

#### REGR_AVGY and REGR_AVGX Examples

The following example calculates the regression average for the amount and quantity of sales for each year:

```SELECT t.fiscal_year,
REGR_AVGY(s.amount_sold, s.quantity_sold) "Regr_AvgY",
REGR_AVGX(s.amount_sold, s.quantity_sold) "Regr_AvgX"
FROM sales s, times t
WHERE s.time_id = t.time_id
GROUP BY t.fiscal_year;

FISCAL_YEAR  Regr_AvgY  Regr_AvgX
----------- ---------- ----------
1998 745.788191 12.7955581
1999 741.839772 12.8008509
2000 752.701384 12.7786717

```

The following example calculates the cumulative averages for the amount and quantity of sales profits in December of 1998:

```SELECT t.day_number_in_month,
REGR_AVGY(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month)
"Regr_AvgY",
REGR_AVGX(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month)
"Regr_AvgX"
FROM sales s, times t
WHERE s.time_id = t.time_id AND t.fiscal_month_desc = '1998-12'
ORDER BY t.day_number_in_month;

DAY_NUMBER_IN_MONTH  Regr_AvgY  Regr_AvgX
------------------- ---------- ----------
1 695.028571       12.9
1 695.028571       12.9
1 695.028571       12.9
.
.
.
27 692.061411 12.9648677
27 692.061411 12.9648677
27 692.061411 12.9648677
```

#### REGR_SXY, REGR_SXX, and REGR_SYY Examples

The following example computes the `REGR_SXY`, `REGR_SXX`, and `REGR_SYY` values for the regression analysis of amount and quantity of sales for each year in the sample `sh.sales` table:

```SELECT t.fiscal_year,
REGR_SXY(s.amount_sold, s.quantity_sold) "Regr_sxy",
REGR_SYY(s.amount_sold, s.quantity_sold) "Regr_syy",
REGR_SXX(s.amount_sold, s.quantity_sold) "Regr_sxx"
FROM sales s, times t
WHERE s.time_id = t.time_id
GROUP BY t.fiscal_year;

FISCAL_YEAR   Regr_sxy   Regr_syy   Regr_sxx
----------- ---------- ---------- ----------
1998 1757092061 2.5677E+11 32100204.7
1999 2112447869 3.0619E+11 38769859.3
2000 2338925878 3.4321E+11 42216138.7

```

The following example computes the cumulative `REGR_SXY`, `REGR_SXX`, and `REGR_SYY` statistics for amount and quantity of sales for each year-month value in 1998:

```SELECT t.day_number_in_month,
REGR_SXY(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_sxy",
REGR_SYY(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_syy",
REGR_SXX(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_sxx"
FROM sales s, times t
WHERE s.time_id = t.time_id AND t.fiscal_month_desc = '1998-02'
ORDER BY t.day_number_in_month;

DAY_NUMBER_IN_MONTH   Regr_sxy   Regr_syy   Regr_sxx
------------------- ---------- ---------- ----------
1  144226271 2.1996E+10 2497704.77
.
.
.
30  144226271 2.1996E+10 2497704.77
```