Non-Negative Matrix Factorization

9.19 Non-Negative Matrix Factorization

oml.nmfクラスは、特徴抽出用にNon-Negative Matrix Factorization (NMF)モデルを作成します。

NMFによって抽出される各特徴は、元の属性セットの線形結合です。各特徴には、負でない一連の係数があり、それらは特徴の各属性の重みのメジャーです。引数allow.negative.scoresがTRUEの場合、負の係数が許可されます。

Non-Negative Matrix Factorizationモデルの設定

次の表に、Non-Negative Matrix Factorizationモデルに適用される設定を示します。

表9-17 Non-Negative Matrix Factorizationモデルの設定

設定名	設定値	説明
`NMFS_CONV_TOLERANCE`	`(0< numeric_expr <=0.5)`	NMFアルゴリズムでの収束許容値。デフォルトは`0.05`です
`NMFS_NONNEGATIVE_SCORING`	`NMFS_NONNEG_SCORING_ENABLE` `NMFS_NONNEG_SCORING_DISABLE`	スコアリング結果で負数を許可するかどうか。`NMFS_NONNEG_SCORING_ENABLE`に設定すると、負の素性値が0(ゼロ)に置き換えられます。`NMFS_NONNEG_SCORING_DISABLE`に設定すると、負の素性値が許可されます。デフォルトは`NMFS_NONNEG_SCORING_ENABLE`です。
`NMFS_NUM_ITERATIONS`	`(1 <= numeric_expr <=500)`	NMFアルゴリズムの反復回数。デフォルトは`50`です
`NMFS_RANDOM_SEED`	`(numeric_expr)`	NMFアルゴリズムのランダム・シード。デフォルトは`–1`です。

例9-19 oml.nmfクラスの使用

この例では、NMFモデルを作成し、oml.nmfクラスのメソッドの一部を使用します。


import oml
import pandas as pd from sklearn import datasets
#For on-premises database follow the below command to connect to the database
oml.connect("<username>","<password>",dsn="dsn")

iris = datasets.load_iris()
x = pd.DataFrame(iris.data, columns = ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width'])
x.insert(0, "ID", range(1, len(x) + 1))
y = pd.DataFrame(list(map(lambda x: {0: 'setosa', 1: 'versicolor', 2:'virginica'}[x], iris.target)), columns = ['Species'])

z = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')

#Create training and test data sets.

train_dat, test_dat = oml.sync(table = "IRIS").split()

#Create a Non-Negative Matrix Factorization model using oml.nmf.

nmf_mod = oml.nmf()

#Fit the model to the training data.

nmf_mod = nmf_mod.fit(train_dat)

#Show the model details.

nmf_mod
#Use the model to make predictions on the test data, returning the Sepal_Length, Sepal_Width, Petal_Length, and Species columns in the result.

nmf_mod.predict(test_dat, supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Species']]) 
     
nmf_mod.transform(test_dat, supplemental_cols = test_dat[:, ['Sepal_Length']], topN = 2).sort_values(by = ['Sepal_Length', 'TOP_1', 'TOP_1_VAL']) 

#Feature comparison

nmf_mod.feature_compare(test_dat, compare_cols = ["Sepal_Length", "Petal_Length"], supplemental_cols = ["Species"]) 

#Set new parameters and refit the model to produce U matrix output.

new_setting = {'nmfs_conv_tolerance':0.05}
nmf_mod2 = nmf_mod.set_params(**new_setting).fit(train_dat, case_id = "ID")
nmf_mod2

この例のリスト


>>> import oml
>>> import pandas as pd
>>> from sklearn import datasets

>>> #For on-premises database follow the below command to connect to the database
>>> oml.connect("<username>","<password>", dsn="<dsn>")

>>> iris = datasets.load_iris()
>>> x = pd.DataFrame(iris.data, columns = ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width'])
>>> x.insert(0, "ID", range(1, len(x) + 1))
>>> y = pd.DataFrame(list(map(lambda x: {0: 'setosa', 1: 'versicolor', 2:'virginica'}[x], iris.target)), columns = ['Species'])

>>> z = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')

#Create training and test data sets.

>>> dat = oml.sync(table = "IRIS").split()
>>> train_dat = dat[0]
>>> test_dat = dat[1]

#Create a Non-Negative Matrix Factorization model using oml.nmf.

>>> nmf_mod = oml.nmf()

#Fit the model to the training data.

>>> nmf_mod = nmf_mod.fit(train_dat)

#Show the model details.

>>> nmf_mod

Algorithm Name: Non-Negative Matrix Factorizationx

Mining Function: FEATURE_EXTRACTION

Settings:
                   setting name                   setting value
0                     ALGO_NAME  ALGO_NONNEGATIVE_MATRIX_FACTOR
1           NMFS_CONV_TOLERANCE                             .05
2      NMFS_NONNEGATIVE_SCORING      NMFS_NONNEG_SCORING_ENABLE
3           NMFS_NUM_ITERATIONS                              50
4              NMFS_RANDOM_SEED                              -1
5                  ODMS_DETAILS                     ODMS_ENABLE
6  ODMS_MISSING_VALUE_TREATMENT         ODMS_MISSING_VALUE_AUTO
7                 ODMS_SAMPLING           ODMS_SAMPLING_DISABLE
8                     PREP_AUTO                              ON

Computed Settings:
              setting name setting value
0        FEAT_NUM_FEATURES             2
1      NMFS_NUM_ITERATIONS             2
2  ODMS_EXPLOSION_MIN_SUPP             1

Global Statistics:
  attribute name attribute value
0      CONVERGED             YES
1     CONV_ERROR       0.0444448
2     ITERATIONS               2
3       NUM_ROWS             111
4    SAMPLE_SIZE             111

Attributes:
ID
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Species

Partition: NO

H:

    FEATURE_ID  FEATURE_NAME ATTRIBUTE_NAME ATTRIBUTE_VALUE  COEFFICIENT
0            1             1             ID            None     0.581551
1            1             1   Petal_Length            None     0.355323
2            1             1    Petal_Width            None     0.158492
3            1             1   Sepal_Length            None     0.656558
4            1             1    Sepal_Width            None     0.424101
5            1             1        Species          setosa     0.089560
6            1             1        Species      versicolor     0.534806
7            1             1        Species       virginica     0.539590
8            2             2             ID            None     0.344647
9            2             2   Petal_Length            None     0.506623
10           2             2    Petal_Width            None     0.650077
11           2             2   Sepal_Length            None     0.170237
12           2             2    Sepal_Width            None     0.248640
13           2             2        Species          setosa     0.249221
14           2             2        Species      versicolor     0.042316
15           2             2        Species       virginica     0.093861

W:

    FEATURE_ID  FEATURE_NAME ATTRIBUTE_NAME ATTRIBUTE_VALUE  COEFFICIENT
0            1             1             ID            None     0.288559
1            1             1   Petal_Length            None    -0.062579
2            1             1    Petal_Width            None    -0.370128
3            1             1   Sepal_Length            None     0.502382
4            1             1    Sepal_Width            None     0.212611
5            1             1        Species      versicolor     0.486970
6            1             1        Species          setosa    -0.113835
7            1             1        Species       virginica     0.450038
8            2             2             ID            None     0.119462
9            2             2   Petal_Length            None     0.578697
10           2             2    Petal_Width            None     0.982575
11           2             2   Sepal_Length            None    -0.238993
12           2             2    Sepal_Width            None     0.082511
13           2             2        Species          setosa     0.353453
14           2             2        Species      versicolor    -0.359264
15           2             2        Species       virginica    -0.275074



#Use the model to make predictions on the test data, returning the Sepal_Length, Sepal_Width, Petal_Length, and Species columns in the result.

>>> nmf_mod.predict(test_dat, supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Species']]) 
     Sepal_Length  Sepal_Width  Petal_Length     Species  FEATURE_ID
 0            5.0          3.6           1.4      setosa           2
 1            5.0          3.4           1.5      setosa           2
 2            4.4          2.9           1.4      setosa           2
 3            4.9          3.1           1.5      setosa           2
...           ...          ...           ...         ...         ...
 35           6.9          3.1           5.4   virginica           2
 36           5.8          2.7           5.1   virginica           2
 37           6.2          3.4           5.4   virginica           2
 38           5.9          3.0           5.1   virginica           2
 
#Transform

>>> nmf_mod.transform(test_dat, supplemental_cols = test_dat[:, ['Sepal_Length']], topN = 2).sort_values(by = ['Sepal_Length', 'TOP_1', 'TOP_1_VAL']) 
     Sepal_Length  TOP_1  TOP_1_VAL  TOP_2  TOP_2_VAL
 0            4.4      2   0.464041      1   0.000000
 1            4.4      2   0.482051      1   0.045518
 2            4.8      2   0.475169      1   0.083874
 3            4.8      2   0.510372      1   0.101880
...           ...    ...        ...    ...        ...
 35           7.2      1   0.915012      2   0.850330
 36           7.2      1   0.938112      2   0.745207
 37           7.6      2   0.980757      1   0.864508
 38           7.9      1   1.048287      2   0.947744
 
#Feature comparison

>>> nmf_mod.feature_compare(test_dat, compare_cols = ["Sepal_Length", "Petal_Length"], supplemental_cols = ["Species"]) 
      Species_A  Species_B  SIMILARITY
 0       setosa     setosa    0.990134
 1       setosa     setosa    0.929516
 2       setosa     setosa    0.976885
 3       setosa     setosa    0.953770
...         ...        ...         ...
 737  virginica  virginica    0.849758
 738  virginica  virginica    0.944063
 739  virginica  virginica    0.983637
 740  virginica  virginica    0.958018

[741 rows x 3 columns]

#Set new parameters and refit tthe model to produce U matrix output.

>>> new_setting = {'nmfs_conv_tolerance':0.05}
>>> nmf_mod2 = nmf_mod.set_params(**new_setting).fit(train_dat, case_id = "ID")
>>> nmf_mod2

Algorithm Name: Non-Negative Matrix Factorizationx

Mining Function: FEATURE_EXTRACTION

Settings:
                   setting name                   setting value
0                     ALGO_NAME  ALGO_NONNEGATIVE_MATRIX_FACTOR
1           NMFS_CONV_TOLERANCE                            0.05
2      NMFS_NONNEGATIVE_SCORING      NMFS_NONNEG_SCORING_ENABLE
3           NMFS_NUM_ITERATIONS                              50
4              NMFS_RANDOM_SEED                              -1
5                  ODMS_DETAILS                     ODMS_ENABLE
6  ODMS_MISSING_VALUE_TREATMENT         ODMS_MISSING_VALUE_AUTO
7                 ODMS_SAMPLING           ODMS_SAMPLING_DISABLE
8                     PREP_AUTO                              ON

Computed Settings:
              setting name setting value
0        FEAT_NUM_FEATURES             2
1      NMFS_NUM_ITERATIONS             8
2  ODMS_EXPLOSION_MIN_SUPP             1

Global Statistics:
  attribute name attribute value
0      CONVERGED             YES
1     CONV_ERROR       0.0277253
2     ITERATIONS               8
3       NUM_ROWS             111
4    SAMPLE_SIZE             111

Attributes:
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Species

Partition: NO

H:

    FEATURE_ID  FEATURE_NAME ATTRIBUTE_NAME ATTRIBUTE_VALUE   COEFFICIENT
0            1             1   Petal_Length            None  9.889792e-02
1            1             1    Petal_Width            None  1.060984e-01
2            1             1   Sepal_Length            None  1.947197e-01
3            1             1    Sepal_Width            None  5.099539e-01
4            1             1        Species          setosa  7.507257e-01
5            1             1        Species      versicolor  5.773815e-03
6            1             1        Species       virginica  8.136382e-02
7            2             2   Petal_Length            None  6.652922e-01
8            2             2    Petal_Width            None  6.571416e-01
9            2             2   Sepal_Length            None  5.702848e-01
10           2             2    Sepal_Width            None  2.420062e-01
11           2             2        Species          setosa  1.643131e-08
12           2             2        Species      versicolor  5.158020e-01
13           2             2        Species       virginica  4.948837e-01

W:

    FEATURE_ID  FEATURE_NAME ATTRIBUTE_NAME ATTRIBUTE_VALUE  COEFFICIENT
0            1             1   Petal_Length            None    -0.071259
1            1             1    Petal_Width            None    -0.059774
2            1             1   Sepal_Length            None     0.077608
3            1             1    Sepal_Width            None     0.571981
4            1             1        Species      versicolor    -0.144686
5            1             1        Species          setosa     0.947005
6            1             1        Species       virginica    -0.043170
7            2             2   Petal_Length            None     0.392684
8            2             2    Petal_Width            None     0.385395
9            2             2   Sepal_Length            None     0.304214
10           2             2    Sepal_Width            None     0.003195
11           2             2        Species          setosa    -0.221185
12           2             2        Species      versicolor     0.325338
13           2             2        Species       virginica     0.289804

親トピック: インデータベース機械学習アルゴリズムへのアクセスを提供するOML4Pyクラス