Practice: Measuring Asymmetry in Data with the SKEWNESS Functions

Overview

This practice shows how to use the SKEWNESS_POP and SKEWNESS_SAMP aggregate functions to measure asymmetry in data. For a given set of values, the result of population skewness (SKEWNESS_POP) and sample skewness (SKEWNESS_SAMP) is always deterministic.

Before starting any new practice, refer to the Practices Environment recommendations.

Step 1 : Set up the environment

Connect to PDB21 as HR and execute the /home/oracle/labs/M104784GC10/Houses_Prices.sql SQL script to create a table with skewed data.

Step 2 : Examine skewed data

Display the table rows. The HOUSE column values refer to types of houses that you want to look at and is used to categorize the data that you look at statistically and compare. With skewness you measure whether there is more data towards the left or the right end of the tail (positive/negative) or how close you are to a normal distribution (skewness = 0).

Display the result of population skewness prices (SKEWNESS_POP) and sample skewness prices (SKEWNESS_SAMP) for the three houses in the table.

Skewness is important in a situation where PRICE_BIG_CITY and PRICE_SMALL_CITY represent the prices of houses to buy and you want to determine whether the outliers in data are biased towards the left end or right end of the distribution, that is, if there are more values to the left of the mean when compared to the number of values to the right of the mean.

Step 3 : Examine skewed data after data evolution

Insert more rows into the table.

As the number of values in the data set increases, the difference between the computed values of SKEWNESS_SAMP and SKEWNESS_POP decreases.

Determine the skewness of distinct values in the PRICE_BIG_CITY and PRICE_SMALL_CITY columns.

Is the result much different if the query does not evaluate the distinct values in PRICE_BIG_CITY and PRICE_SMALL_CITY?

The population skewness value is not different because the same exact rows were inserted.

Insert more rows into the table with a big data set for HOUSE number 1.