﻿ Practice: Measuring Asymmetry in Data with the SKEWNESS Functions

## Practice: Measuring Asymmetry in Data with the SKEWNESS Functions

### Overview

This practice shows how to use the `SKEWNESS_POP` and `SKEWNESS_SAMP` aggregate functions to measure asymmetry in data. For a given set of values, the result of population skewness (`SKEWNESS_POP`) and sample skewness (`SKEWNESS_SAMP`) is always deterministic.

Before starting any new practice, refer to the Practices Environment recommendations.

### Step 1 : Set up the environment

• Connect to `PDB21` as `HR` and execute the `/home/oracle/labs/M104784GC10/Houses_Prices.sql` SQL script to create a table with skewed data.

### Step 2 : Examine skewed data

• Display the table rows. The `HOUSE` column values refer to types of houses that you want to look at and is used to categorize the data that you look at statistically and compare. With skewness you measure whether there is more data towards the left or the right end of the tail (positive/negative) or how close you are to a normal distribution (skewness = 0).

• Display the result of population skewness prices (`SKEWNESS_POP`) and sample skewness prices (`SKEWNESS_SAMP`) for the three houses in the table.

Skewness is important in a situation where `PRICE_BIG_CITY` and `PRICE_SMALL_CITY` represent the prices of houses to buy and you want to determine whether the outliers in data are biased towards the left end or right end of the distribution, that is, if there are more values to the left of the mean when compared to the number of values to the right of the mean.

### Step 3 : Examine skewed data after data evolution

• Insert more rows into the table.

As the number of values in the data set increases, the difference between the computed values of `SKEWNESS_SAMP` and `SKEWNESS_POP` decreases.

• Determine the skewness of distinct values in the `PRICE_BIG_CITY` and `PRICE_SMALL_CITY` columns.

Is the result much different if the query does not evaluate the distinct values in `PRICE_BIG_CITY` and `PRICE_SMALL_CITY`?

The population skewness value is not different because the same exact rows were inserted.

• Insert more rows into the table with a big data set for `HOUSE` number 1.