演習: SKEWNESS関数を使用したデータの非対称性の測定

この演習では、SKEWNESS_POPおよびSKEWNESS_SAMP集計関数を使用してデータの非対称性を測定する方法を示します。特定の値セットについて、母集団の歪度(SKEWNESS_POP)および標本の歪度(SKEWNESS_SAMP)の結果は常に決定的です。

  1. 新しい演習を開始する前に、演習の環境の推奨事項を参照してください。
  2. HRとしてPDB20に接続し、/home/oracle/labs/M104784GC10/Houses_Prices.sql SQLスクリプトを実行します。このSQLスクリプトでは、スキューされたデータを含む表を作成します。
    $ cd /home/oracle/labs/M104784GC10
    $ sqlplus hr@PDB20
    
    SQL*Plus: Release 20.0.0.0.0 - Production on Mon Mar 16 09:27:03 2020
    Version 20.2.0.0.0
    
    Copyright (c) 1982, 2020, Oracle.  All rights reserved.
    
    Enter password: password
    Last Successful login time: Mon Mar 16 2020 08:49:41 +00:00
    
    Connected to:
    Oracle Database 20c Enterprise Edition Release 20.0.0.0.0 - Production
    Version 20.2.0.0.0
    
    SQL> @/home/oracle/labs/M104784GC10/Houses_Prices.sql
    SQL> SET ECHO ON
    SQL>SQL> DROP TABLE houses;
    DROP TABLE houses
               *
    ERROR at line 1:
    ORA-00942: table or view does not exist
    
    SQL> CREATE TABLE houses (house NUMBER, price_big_city NUMBER, price_small_city NUMBER, price_date DATE);
    
    Table created.
    
    SQL> INSERT INTO houses VALUES (1,100000,10000, sysdate);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (1,200000,15000, sysdate+1);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (1,300000,25000, sysdate+1);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (1,400000,28000, sysdate+2);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (1,500000,30000, sysdate+3);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (1,600000,32000, sysdate+3);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (1,700000,35000, sysdate+4);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (1,800000,38000, sysdate+4);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (1,900000,40000, sysdate+5);
    
    1 row created.
    
    SQL>
    SQL> INSERT INTO houses VALUES (2,2000000,1000000, sysdate+6);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (2,200000,20000, sysdate);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (2,400000,35000, sysdate+1);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (2,600000,55000, sysdate+1);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (2,800000,48000, sysdate+2);
    
    1 row created.
    
    SQL>
    SQL> INSERT INTO houses VALUES (3,400000,40000, sysdate+3);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (3,500000,42000, sysdate+3);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (3,600000,45000, sysdate+4);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (3,700000,48000, sysdate+4);
    
    1 row created.
    
    SQL> INSERT INTO houses VALUES (3,800000,49000, sysdate+5);
    
    1 row created.
    
    SQL> COMMIT;
    
    Commit complete.
    
    SQL>
  3. 表の行を表示します。HOUSE列の値は、調べる住宅のタイプを指し、統計的に調べて相互に比較するデータを分類します。歪度を使用して、尾の左端または右端のどちらに向かってより多くのデータがあるか(正/負)、または正規分布(歪度 = 0)にどの程度近いかを測定できます。
    SQL> SET PAGES 100
    SQL> SELECT * FROM houses;
    
         HOUSE PRICE_BIG_CITY PRICE_SMALL_CITY PRICE_DAT
    ---------- -------------- ---------------- ---------
             1         100000            10000 05-FEB-20
             1         200000            15000 06-FEB-20
             1         300000            25000 06-FEB-20
             1         400000            28000 07-FEB-20
             1         500000            30000 08-FEB-20
             1         600000            32000 08-FEB-20
             1         700000            35000 09-FEB-20
             1         800000            38000 09-FEB-20
             1         900000            40000 10-FEB-20
             2        2000000          1000000 11-FEB-20
             2         200000            20000 05-FEB-20
             2         400000            35000 06-FEB-20
             2         600000            55000 06-FEB-20
             2         800000            48000 07-FEB-20
             3         400000            40000 08-FEB-20
             3         500000            42000 08-FEB-20
             3         600000            45000 09-FEB-20
             3         700000            48000 09-FEB-20
             3         800000            49000 10-FEB-20
    
    19 rows selected.
    
    SQL>
  4. 表内の3つの住宅について、母集団の歪度価格(SKEWNESS_POP)および標本の歪度価格(SKEWNESS_SAMP)の結果を表示します。
    SQL> SELECT house, count(house) FROM houses GROUP BY house ORDER BY 1;
    
         HOUSE COUNT(HOUSE)
    ---------- ------------
             1            9
             2            5
             3            5
    
    SQL> SELECT house, SKEWNESS_POP(price_big_city), SKEWNESS_POP(price_small_city) FROM houses
            GROUP BY house;
    
         HOUSE SKEWNESS_POP(PRICE_BIG_CITY) SKEWNESS_POP(PRICE_SMALL_CITY)
    ---------- ---------------------------- ------------------------------
             1                            0                     -.66864012
             2                   1.13841996                     1.49637083
             3                            0                     -.12735442
    
    SQL> SELECT house, SKEWNESS_SAMP(price_big_city), SKEWNESS_SAMP(price_small_city) FROM houses
            GROUP BY house;
    
         HOUSE SKEWNESS_SAMP(PRICE_BIG_CITY) SKEWNESS_SAMP(PRICE_SMALL_CITY)
    ---------- ----------------------------- -------------------------------
             1                             0                      -.81051422
             2                    1.69705627                      2.23065793
             3                             0                      -.18984876
    
    SQL>

    PRICE_BIG_CITYおよびPRICE_SMALL_CITYが購入する住宅の価格を表し、データの外れ値が分布の左端に向かってバイアスされるか右端に向かってバイアスされるか、つまり平均の右側の値の数と比較して平均の左側により多くの値があるかどうかを判断する状況では、歪度が重要です。

  5. 表に行をさらに挿入します。
    SQL> INSERT INTO houses SELECT * FROM houses;
    
    19 rows created.
    
    SQL> /
    
    38 rows created.
    
    SQL> /
    
    76 rows created.
    
    SQL> /
    
    152 rows created.
    
    SQL> COMMIT;
    
    Commit complete.
    
    SQL> SELECT house, SKEWNESS_POP(price_big_city), SKEWNESS_POP(price_small_city) FROM houses
         GROUP BY house ORDER BY 1;
    
         HOUSE SKEWNESS_POP(PRICE_BIG_CITY) SKEWNESS_POP(PRICE_SMALL_CITY)
    ---------- ---------------------------- ------------------------------
             1                            0                     -.66864012
             2                   1.13841996                     1.49637083
             3                            0                     -.12735442
    
    SQL> SELECT house, SKEWNESS_SAMP(price_big_city), SKEWNESS_SAMP(price_small_city) FROM houses
         GROUP BY house ORDER BY 1;
    
         HOUSE SKEWNESS_SAMP(PRICE_BIG_CITY) SKEWNESS_SAMP(PRICE_SMALL_CITY)
    ---------- ----------------------------- -------------------------------
             1                             0                      -.67569912
             2                     1.1602897                      1.52511703
             3                             0                      -.12980098
    
    SQL>

    ご覧のとおり、データセット内の値の数が増加すると、SKEWNESS_SAMPおよびSKEWNESS_POPの計算値の差は減少します。

  6. PRICE_BIG_CITYおよびPRICE_SMALL_CITYの個別値の歪度を確認します。
    SQL> SELECT house, 
                         SKEWNESS_POP(DISTINCT price_big_city) pop_big_city, 
                         SKEWNESS_SAMP(DISTINCT price_big_city) samp_big_city,
                         SKEWNESS_POP(DISTINCT price_small_city) pop_small_city, 
                         SKEWNESS_SAMP(DISTINCT price_small_city) samp_small_city  
                  FROM houses
                  GROUP BY house;
    
         HOUSE POP_BIG_CITY SAMP_BIG_CITY POP_SMALL_CITY SAMP_SMALL_CITY
    ---------- ------------ ------------- -------------- ---------------
             1            0             0     -.66864012      -.81051422
             2   1.13841996    1.69705627     1.49637083      2.23065793
             3            0             0     -.12735442      -.18984876
    
    SQL>

    問合せで列PRICE_BIG_CITYおよびPRICE_SMALL_CITYの個別値が評価されない場合、結果は大きく異なりますか。

    SQL> SELECT house, 
                         SKEWNESS_POP(price_big_city) pop_big_city, 
                         SKEWNESS_SAMP(price_big_city) samp_big_city,
                         SKEWNESS_POP(price_small_city) pop_small_city, 
                         SKEWNESS_SAMP(price_small_city) samp_small_city  
                  FROM houses
                  GROUP BY house;
    
         HOUSE POP_BIG_CITY SAMP_BIG_CITY POP_SMALL_CITY SAMP_SMALL_CITY
    ---------- ------------ ------------- -------------- ---------------
             1            0             0     -.66864012      -.67569912
             2   1.13841996     1.1602897     1.49637083      1.52511703
             3            0             0     -.12735442      -.12980098
    
    SQL>

    まったく同じ行が挿入されたため、母集団の歪度値は異なりません。

  7. HOUSE番号1のビッグ・データ・セットが含まれる表にさらに行を挿入します。
    SQL> INSERT INTO houses (house, price_big_city, price_small_city)
                  SELECT house, price_big_city*0.5, price_small_city*0.1 
                  FROM houses WHERE house=1;
    
    144 rows created.
    
    SQL> /
    
    288 rows created.
    
    SQL> /
    
    576 rows created.
    
    SQL> /
    
    1152 rows created.
    
    SQL> /
    
    2304 rows created.
    
    SQL> COMMIT;
    
    Commit complete.
    
    SQL> SELECT house, count(house) FROM houses GROUP BY house ORDER BY 1;
    
          HOUSE COUNT(HOUSE)
    ---------- ------------
             1         4608
             2           80
             3           80
    		 
    SQL> SELECT house, 
                         SKEWNESS_POP(price_big_city) pop_big_city, 
                         SKEWNESS_SAMP(price_big_city) samp_big_city,
                         SKEWNESS_POP(price_small_city) pop_small_city, 
                         SKEWNESS_SAMP(price_small_city) samp_small_city  
                  FROM houses
                  GROUP BY house;
    
         HOUSE POP_BIG_CITY SAMP_BIG_CITY POP_SMALL_CITY SAMP_SMALL_CITY
    ---------- ------------ ------------- -------------- ---------------
             1   2.57050631    2.57134341      5.7418481      5.74371797
             2   1.13841996     1.1602897     1.49637083      1.52511703
             3            0             0     -.12735442      -.12980098
    
    SQL>EXIT
    $

    これで、住宅番号1の歪度が正になり、データが右に偏ります。