Oracle8i SQL ReferenceRelease 2 (8.1.6)A76989-01

Functions, 75 of 121

## REGR_ (linear regression) functions

The linear regression functions are:

• `REGR_SLOPE`

• `REGR_INTERCEPT`

• `REGR_COUNT`

• `REGR_R2`

• `REGR_AVGX`

• `REGR_AVGY`

• `REGR_SXX`

• `REGR_SYY`

• `REGR_SXY`

#### Syntax

For information on syntax and semantics, see "Analytic Functions".

#### Purpose

The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. You can use them as both aggregate and analytic functions.

Oracle applies the function to the set of (expr1, expr2) pairs after eliminating all pairs for which either expr1 or expr2 is null. Oracle computes all the regression functions simultaneously during a single pass through the data.

expr1 is interpreted as a value of the dependent variable (a "y value"), and expr2 is interpreted as a value of the independent variable (an "x value"). Both expressions must be numbers.

• `REGR_SLOPE` returns the slope of the line. The return value is a number and can be null. After the elimination of null (expr1, expr2) pairs, it makes the following computation:

```COVAR_POP(expr1, expr2) / VAR_POP(expr2)

```
• `REGR_INTERCEPT` returns the y-intercept of the regression line. The return value is a number and can be null. After the elimination of null (expr1, expr2) pairs, it makes the following computation:

```AVG(expr1) - REGR_SLOPE(expr1, expr2) * AVG(expr2)

```
• `REGR_COUNT` returns an integer that is the number of non-null number pairs used to fit the regression line.

• `REGR_R2` returns the coefficient of determination (also called "R-squared" or "goodness of fit") for the regression. The return value is a number and can be null. `VAR_POP`(expr1) and `VAR_POP`(expr2) are evaluated after the elimination of null pairs. The return values are:

```                     NULL if VAR_POP(expr2)  = 0

1 if VAR_POP(expr1)  = 0 and
VAR_POP(expr2) != 0

POWER(CORR(expr1,expr),2) if VAR_POP(expr1)  > 0 and
VAR_POP(expr2  != 0

```

All of the remaining regression functions return a number and can be null:

• `REGR_AVGX` evaluates the average of the independent variable (expr2) of the regression line. It makes the following computation after the elimination of null (expr1, expr2) pairs:

```AVG(expr2)

```
• `REGR_AVGY` evaluates the average of the dependent variable (expr1) of the regression line. It makes the following computation after the elimination of null (expr1, expr2) pairs:

```AVG(expr1)

```

`REGR_SXY`, `REGR_SXX`, `REGR_SYY` are auxiliary functions that are used to compute various diagnostic statistics.

• `REGR_SXX` makes the following computation after the elimination of null (expr1, expr2) pairs:

```REGR_COUNT(expr1, expr2) * VAR_POP(expr2)

```
• `REGR_SYY` makes the following computation after the elimination of null (expr1, expr2) pairs:

```REGR_COUNT(expr1, expr2) * VAR_POP(expr1)

```
• `REGR_SXY` makes the following computation after the elimination of null (expr1, expr2) pairs:

```REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2)

```

The following examples are based on the SALES table, described in "COVAR_POP".

#### REGR_SLOPE and REGR_INTERCEPT Examples

The following example determines the slope and intercept of the regression line for the amount of sales and sale profits for each year.

```SELECT s_year,
REGR_SLOPE(s_amount, s_profit),
REGR_INTERCEPT(s_amount, s_profit)
FROM sales GROUP BY s_year;

S_YEAR     REGR_SLOPE REGR_INTER
---------- ---------- ----------
1998 128.401558 -2277.5684
1999  55.618655 226.855296
```

The following example determines the cumulative slope and cumulative intercept of the regression line for the amount of sales and sale profits for each day in 1998:

```SELECT s_year, s_month, s_day,
REGR_SLOPE(s_amount, s_profit)
OVER (ORDER BY s_month, s_day) AS CUM_SLOPE,
REGR_INTERCEPT(s_amount, s_profit)
OVER (ORDER BY s_month, s_day) AS CUM_ICPT
FROM sales
WHERE s_year=1998
ORDER BY s_month, s_day;

S_YEAR     S_MONTH    S_DAY      CUM_SLOPE  CUM_ICPT
---------- ---------- ---------- ---------- ----------
1998          6          5
1998          6          9 132.093066 401.884833
1998          6          9 132.093066 401.884833
1998          6         10 131.829612  450.65349
1998          8         21 132.963737  -153.5413
1998          8         25 130.681718 -451.47349
1998          8         25 130.681718 -451.47349
1998          8         26  128.76502 -236.50096
1998         11          9 131.499934 -1806.7535
1998         11          9 131.499934 -1806.7535
1998         11         10 130.190972 -2323.3056
1998         11         10 130.190972 -2323.3056
1998         11         11 128.401558 -2277.5684
```

#### REGR_COUNT Examples

The following example returns the number of sales transactions in the `SALES` table that resulted in a profit. (None of the rows for containing a sales amount have a null in the `S_PROFIT` column, so the function returns the total number of rows in the `SALES` table.)

```SELECT REGR_COUNT(s_amount, s_profit) FROM sales;

REGR_COUNT
----------
23

```

The following example computes, for each day, the cumulative number of transactions within each month for the year 1998:

```SELECT s_month, s_day,
REGR_COUNT(s_amount,s_profit)
OVER (PARTITION BY s_month ORDER BY s_day)
FROM SALES
WHERE S_YEAR=1998
ORDER BY S_MONTH;

S_MONTH    S_DAY      REGR_COUNT
---------- ---------- ----------
6          5          1
6          9          3
6          9          3
6         10          4
8         21          1
8         25          3
8         25          3
8         26          4
11          9          2
11          9          2
11         10          4
11         10          4
11         11          5
```

#### REGR_R2 Examples

The following example computes the coefficient of determination of the regression line for amount of sales and sale profits:

```SELECT REGR_R2(s_amount, s_profit) FROM sales;

REGR_R2(S_
----------
.942435028

```

The following example computes the cumulative coefficient of determination of the regression line for monthly sales and monthly profits for each month in 1998:

```SELECT s_month,
REGR_R2(SUM(s_amount), SUM(s_profit))
OVER (ORDER BY s_month)
FROM SALES
WHERE s_year=1998
GROUP BY s_month
ORDER BY s_month;

S_MONTH    REGR_R2(SU
---------- ----------
6
8          1
11 .740553632
```

#### REGR_AVGY and REGR_AVGX Examples

The following example calculates the regression average for the amount of sales and sale profits for each year:

```SELECT s_year,
REGR_AVGY(s_amount, s_profit),
REGR_AVGX(s_amount, s_profit)
FROM sales GROUP BY s_year;

S_YEAR     REGR_AVGY( REGR_AVGX(
---------- ---------- ----------
1998 41227.5462 338.820769
1999   7330.748    127.725
```

The following example calculates the cumulative averages for the amount of sales and sale profits in 1998:

```SELECT s_year, s_month, s_day,
REGR_AVGY(s_amount, s_profit)
OVER (ORDER BY s_month, s_day) AS CUM_AMOUNT,
REGR_AVGX(s_amount, s_profit)
OVER (ORDER BY s_month, s_day) AS CUM_PROFIT
FROM sales
WHERE s_year=1998
ORDER BY s_month, s_day;

S_YEAR     S_MONTH    S_DAY      CUM_AMOUNT CUM_PROFIT
---------- ---------- ---------- ---------- ----------
1998          6          5      16068      118.2
1998          6          9 44375.6667      332.9
1998          6          9 44375.6667      332.9
1998          6         10   52678.25    396.175
1998          8         21   44721.72      337.5
1998          8         25    45333.8 350.357143
1998          8         25    45333.8 350.357143
1998          8         26    47430.7   370.1875
1998         11          9   41892.91    332.317
1998         11          9   41892.91    332.317
1998         11         10  40777.175 331.055833
1998         11         10  40777.175 331.055833
1998         11         11 41227.5462 338.820769
```

#### REGR_SXY, REGR_SXX, and REGR_SYY Examples

The following example computes the `REGR_SXY`, `REGR_SXX`, and `REGR_SYY` values for the regression analysis of amount of sales and sale profits for each year:

```SELECT s_year,
REGR_SXY(s_amount, s_profit),
REGR_SYY(s_amount, s_profit),
REGR_SXX(s_amount, s_profit)
FROM sales GROUP BY s_year;

S_YEAR     REGR_SXY(S REGR_SYY(S REGR_SXX(S
---------- ---------- ---------- ----------
1998 48723551.8 6423698688 379462.311
1999 3605361.62  200525751 64822.8841

```

The following example computes the cumulative `REGR_SXY`, `REGR_SXX`, and `REGR_SYY` statistics for amount of sales and sale profits for each month-day value in 1998:

```SELECT s_year, s_month, s_day,
REGR_SXY(s_amount, s_profit)
OVER (ORDER BY s_month, s_day) AS CUM_SXY,
REGR_SYY(s_amount, s_profit)
OVER (ORDER BY s_month, s_day) AS CUM_SXY,
REGR_SXX(s_amount, s_profit)
OVER (ORDER BY s_month, s_day) AS CUM_SXX
FROM sales
WHERE s_year=1998
ORDER BY s_month, s_day;

S_YEAR     S_MONTH    S_DAY      CUM_SXY    CUM_SXY    CUM_SXX
---------- ---------- ---------- ---------- ---------- ----------
1998          6          5          0          0          0
1998          6          9 14822857.8 1958007601  112215.26
1998          6          9 14822857.8 1958007601  112215.26
1998          6         10 21127009.3 2785202281 160259.968
1998          8         21 30463997.3 4051329674  229115.08
1998          8         25 34567985.3 4541739739 264520.437
1998          8         25 34567985.3 4541739739 264520.437
1998          8         26 36896592.7 4787971157 286542.049
1998         11          9 45567995.3 6045196901 346524.854
1998         11          9 45567995.3 6045196901 346524.854
1998         11         10 48178003.8 6392056557 370056.411
1998         11         10 48178003.8 6392056557 370056.411
1998         11         11 48723551.8 6423698688 379462.311
```