﻿ Practice: Measuring Tailedness of Data with the KURTOSIS Functions

## Practice: Measuring Tailedness of Data with the KURTOSIS Functions

### Overview

This practice shows how to use the `KURTOSIS_POP` and `KURTOSIS_SAMP` aggregate functions to measure tailedness of data. Higher kurtosis means more of the variance is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations. A normal distribution has a kurtosis of zero.

Before starting any new practice, refer to the Practices Environment recommendations.

### Step 1 : Set up the environment

• Connect to `PDB21` as `HR` and execute the `/home/oracle/labs/M104784GC10/Houses_Prices.sql` SQL script to create and populate a table.

### Step 2 : Examine the kurtosis of the distribution

• Display the table rows. The `HOUSE` column values refer to types of house that you want to look at and is used categorize the data that you look at statistically and compare with each other.

• Display the result of population kurtosis (`KURTOSIS_POP`) and sample kurtosis (`KURTOSIS_SAMP`) for the three types of houses.

`PRICE_SMALL_CITY` has a higher kurtosis compared to `PRICE_BIG_CITY`. Observe whether there is more data in the tails or around the peak in `PRICE_SMALL_CITY` and in `PRICE_BIG_CITY`.

### Step 3 : Examine the kurtosis of the distribution after data evolution

• Insert more rows into the table.

As you can see, as the number of values in the data set increases the difference between the computed values of `KURTOSIS_SAMP` and `KURTOSIS_POP` decreases.

• Determine the kurtosis of distinct values in `PRICE_SMALL_CITY` and `PRICE_BIG_CITY`.

Is the result much different if the query does not evaluate the distinct values in `PRICE_BIG_CITY` and `PRICE_SMALL_CITY`?

The population tailedness value is not different because the same exact rows were inserted.

• Insert more rows into the table with a big data set for `HOUSE` number 1.

Now the tailedness of the data becomes positive for house number 1 which means that data is skewed to right. `PRICE_SMALL_CITY` has a much higher kurtosis compared to `PRICE_BIG_CITY`. This implies that in `PRICE_SMALL_CITY`, more of the variance is the result of many infrequent extreme deviations, whereas in `PRICE_BIG_CITY`, the variance is attributed to very frequent modestly sized deviations.